Professor Coruja - Business and Open Source Technology: March 2013

Saturday, March 30, 2013

It is time to remember about the Pentaho Community Events

Thursday, March 21, 2013

Resultado do Curso de Pentaho no TJSC já pode ser visto na internet.

Amigo leitor,

Um tempo atrás ministrei pela IT4biz um treinamento para o TJSC (Tribunal de Justiça de Santa Catarina) e recentemente fui informado de um Dashboard muito bacana criado por eles que encontra-se disponível na internet.

Link do dashboard:
https://malotedigital.tjsc.jus.br/portal_hermes/numeros.jsp

Abaixo algumas imagens.

Também aproveito para agradecer ao Pedro Alves pela criação do CTools, como podemos ver o Dashboard foi criado baseado em suas ferramentas.

Parabéns ao TJSC, ficou excelente!!!

Wednesday, March 20, 2013

Dados Abertos: Dados da Copa 2014

Amigo leitor,

Foi divulgado no site dados.gov.br os dados da copa 2014 no formato open data.

Segue o link:

http://dados.gov.br/dataset/copa-2014

O texto abaixo foi extraído do site dados.gov.br

Com o objetivo de garantir o maior número possível de finalidades aos dados da copa, foi decidido que eles seriam disponibilizados em sua forma bruta. Sendo assim, fica a cargo do usuário desses dados fazer as agregações e derivações que julgar mais interessante. O importante é que, com a disponibilização desses dados da forma como estão hoje em nossa base de dados, a CGU não cria nenhum tipo de impedimento quanto ao seu uso.

Por isso, os dados estão separados por tabelas, inclusive com as chaves estrangeiras para outras tabelas, possibilitando, assim, a recuperação de seus relacionamentos.

Os dados são disponibilizados diariamente para o mês corrente e mensalmente para os meses anteriores. Por exemplo, no dia 5 de novembro de 2012, estavam disponíveis os dados de cada tabela para download dos dias 1o. a 5 de novembro, separadamente, bem como os dados do dia 30 de setembro e 31 de outubro, representando assim os meses de setembro e outubro, respectivamente.

O download pode ser feito diretamente pela página, como já mencionado, por um cidadão comum, ou diretamente pelo link padronizado, por um robô, por exemplo. Para acessar os dados de todas as tabelas do mês de setembro, basta acessar o link http://www.portaldatransparencia.gov.br/copa2014/gestor/download?nomeArquivo=201209_BaseDados.zip. Para acessar um dia específico, por exemplo, o dia 04/09/2012, basta acessar o link http://www.portaldatransparencia.gov.br/copa2014/gestor/download?nomeArquivo=20120904_BaseDados.zip. Já para acessar uma tabela específica de um dia específico, por exemplo, a tabela Empreendimento do dia 04/09/2012, basta acessar o linkhttp://www.portaldatransparencia.gov.br/copa2014/gestor/download?nomeArquivo=20120904_Empreendimento.zip.

Como pode ser visto nos exemplos apresentados, há um padrão para a URL de download das tabelas do portal da copa, definido a seguir:

Para acessar os dados de todas as tabelas do mês MM/yyyy, onde MM representa o mês (incluindo o 0 a esquerda, se for mês de 1 a 9) e yyyy o ano, basta acessar o seguinte link:
http://www.portaldatransparencia.gov.br/copa2014/gestor/download?nomeArquivo=yyyyMM_BaseDados.zip
Para acessar os dados de todas as tabelas do dia dd/MM/yyyy, onde dd representa o dia do mês (incluindo o 0 a esquerda, se for dia de 1 a 9), MM representa o mês (incluindo o 0 a esquerda, se for mês de 1 a 9) e yyyy o ano, basta acessar o seguinte link:
http://www.portaldatransparencia.gov.br/copa2014/gestor/download?nomeArquivo=yyyyMMdd_BaseDados.zip
Para acessar os dados da tabela NOMETABELA do dia dd/MM/yyyy, onde dd representa o dia do mês (incluindo o 0 a esquerda, se for dia de 1 a 9), MM representa o mês (incluindo o 0 a esquerda, se for mês de 1 a 9) e yyyy o ano, basta acessar o seguinte link:
http://www.portaldatransparencia.gov.br/copa2014/gestor/download?nomeArquivo=yyyyMMdd_NOMETABELA.zip

Note que está destacado em negrito a parte da URL que varia de acordo com a informação que se deseja recuperar. Todo o resto da URL é invariável. Dessa forma, pode-se elaborar um código ou robô capaz de recuperar diariamente a informação mais atual do site Copa 2014, sem a necessidade de interferência humana.

Por fim, é importante ressaltar que os dados disponibilizados no site não são incrementais. Ou seja, ao baixar os dados do dia 01/10/2012, você obterá todas as informações disponíveis no site até aquele dia, incluindo também informações que foram inseridas tanto no dia anterior, quanto no mês anterior.

Tuesday, March 19, 2013

Ideas for Solving the 'Data' Problem First, the 'Big' Problem Second: The Pentaho Way

Imagine a word cloud to represent discussions about how to make data useful in business right now. Based on what I saw at Strata in late February in Santa Clara, you would see “big” in letters about 4 inches high and the word “data” in regular 12 point type. As my pals at Gawker say, “Thatz Not Okay.”

The analytical nerds over at Cheezburger recommend solving the “data” problem before tackling the challenge of “big.” To me this seems like good advice, but carrying it out may be a bit tricky because of a disconnect that occurs in the way that data is analyzed in different sizes. Ask yourself this: if you took all the money you were going to spend on “big” and instead spent it on “data” would you be better off?

First, some basic assumptions:

No dataset comes ready to provide answers, no matter how small or big it is.
Most of the work in analysis involves massaging data and getting it ready so that you can ask questions.
The quality of analysis improves with better data.
The quality of analysis can improve with more datasets, but not always.
The quality of analysis can improve with higher volumes of data, but not always.
The more people involved – from business, IT, and development – at all stages the better.
Unstructured data is becoming more important.
Real-time analysis is becoming more important.

What does all of this mean?

Assumptions 1 and 2 mean that we will gain productivity if we can focus on compressing the early stages of the analysis pipeline, meaning the data preparation, manipulation and transformation needed to get data ready before analyzing it.
Assumptions 3, 4, and 6 suggest that we should arm as many people as possible to evaluate new datasets to see if they can help.
Assumptions 5 and 7 suggest that sometimes this new data will be big data.

I’m going to leave assumption 8 out of this discussion for the most part and deal with it in detail in another article.

The problem with the current focus on “big” is that it addresses only two of these assumptions, 5 and 7. (Remember that the way “big” is used most often means both voluminous and unstructured.)

The second problem is that most of the time the methods used to handle “big” are specialized and can only be done by high priests of programming, which creates a bottleneck.

My goal for this series of articles is to present a few ways that various types of technology could be used together to create an infrastructure that provides the most possible value for a group of workers. The collections of technology I’m going to propose seek to meet the following criteria:

Increase the number of people involved at all stages.
As much as possible, use the same techniques for massaging and analyzing data at all stages.
As much as possible, use the same techniques for processing both small and big data sets.

The Pentaho Way

Pentaho is an open source business intelligence and big data analytics company that was founded by five deeply nerdy people. Three of the original founders, Richard Daley, Doug Moran, and James Dixon are focused on Pentaho’s big data technology and strategy.

Unlike many other companies in the realm of data science, Pentaho is focused on what users do to get value from data and how to make it easier, and it shows in their product.

“Pentaho is taking the responsible, adult approach to tackling big data head-on,” says Dave Henry, Senior Vice President, Enterprise Solutions at Pentaho. “We offer great connectivity, easy-to-use data development apps and, by putting data integration and visualization so close together, deliver a productive experience that lets more people work together.”

Pentaho’s secret sauce is a system known as Pentaho Data Integration, which is essentially a visually-oriented toolkit for massaging data that has the following characteristics:

A multitude of connectors bring data in from a huge variety of sources, and you can build your own if a connector is missing.
Transformations can be applied to the data by dragging and dropping functions of various sorts to perform the transformation in a visual interface.
The functions range from simple transformations to those for more complex techniques like regular expressions and machine-learning algorithms.
Pentaho Data Integration can be applied to a single file, to data from any number of sources, from spreadsheets to MPP databases, and also to MapReduce programming.

So, how well does Pentaho meet my criteria?

One powerful aspect of Pentaho Data Integration is that because of its simple, drag-and-drop visual interface, it can be used by analysts in all areas of the organization. Pentaho can serve the business analyst who needs to quickly grab some information, add some context and then analyze and share it using Pentaho Instaview. It can also be used by a developer who is using the full Pentaho Business Analytics platform to build a formal data warehouse.

Data integration processes (often called ETL when building a data warehouse or data prep when an analyst is massaging data) can take place in Pentaho so that the massive work of massaging and cleaning data does not have to be tightly bound to a specific data warehouse technology or to complicated programming methods. This will be increasingly important for solution building in the cloud where developers cannot pre-integrate all of the data. Instead, data integration must occur on demand (i.e., combining data fromSalesforce.com with data from Amazon Redshift), and Pentaho makes it easy for this to happen as part of an orchestrated process.

Someone exploring external data sets to see what they offer can do the same thing. Pentaho offers native connectivity to popular structured and semi-structured data sources including:

Native connectivity to Hadoop (e.g., Apache Hadoop, Cloudera, Hortonworks, MapR)
Native connectivity to NoSQL databases (e.g., MongoDB, Cassandra, HBase)
Native connectivity to analytic databases (e.g., Vertica, Greenplum, Teradata)
Connectivity to enterprise applications (e.g., SAP)
Connectivity to cloud-based and SaaS applications (e.g., Salesforce, Amazon Web Services)

In addition, Pentaho Data Integration (PDI) can access unstructured, raw data such as tweets, do pattern matching, find the structure, and perform sentiment analysis. The Pentaho Instaview template for Twitter provides an environment to play around with the data. Bringing more people in direct contact with data is vital to solving the data problem first.

The most distinctive thing about PDI’s power for big data processing is its integration with MapReduce programming in Hadoop. Both Map and Reduce programs can be created using PDI; these work within the MapReduce 1.0 framework (MRv1) with plans to support MRv2 later this year. This opens the power of Hadoop to a much wider audience than if traditional MapReduce programming in Java or other such languages is required.

PDI can be an embedded data transformation engine that is part of a real time application. As a result, PDI can be inserted into a business process. For instance, PDI could be embedded in a storage appliance used on demand for device data analysis, such as capacity forecasting or failure prediction.

PDI reduces the time necessary to create a new analytic tool from two days to an hour or two. It makes it quick and easy to access big data sources and enrich them with Pentaho’s analytic and visualization tools. As a result, PDI encourages more experimentation and increases the likelihood that something useful will be discovered.

Essentially, PDI allows one technique to be used over and over in a large number of contexts. A company can build expertise in many different departments and allow people to help each other out and have users train other users.

The ideal result is that more and more people can meet their own needs and perform the all-important function of playing with the data, but when it comes time to get serious or to scale, use the same techniques.

The challenge that many companies have when attempting to create a data driven culture is that the glamorous part, the chart and graph at the end of the process, is really about 10 percent or less of the work. The analysis may be only 20 percent of the work. The other 70 percent of the work is massaging data from wherever it comes from and getting it into shape to go on stage as it were. That’s what Pentaho helps with. It may not be glamorous, but it is the sort of work that is essential to solving the data problem first and getting as many people as possible involved.

Follow Dan Woods on Twitter

Dan Woods is CTO and editor of CITO Research, a publication that seeks to advance the craft of technology leadership. For more stories like this one visitwww.CITOResearch.com.

Source:

http://www.forbes.com/sites/danwoods/2013/03/14/solving-the-data-first-big-second-pentaho/

Thursday, March 14, 2013

It is time to share about the event Pentaho Day 2013!!!

Download all the images and start to share about the event Pentaho Day 2013!!!

Pentaho Day - April, 20 2013 - Fortaleza - Brazil

Download all the images and start to share about the event Pentaho Day 2013!!!

Pentaho Day 2013
April, 20 2013
Place: Teatro 550 at FA7 - Av. Almirante Maximiniano da Fonseca, 1395, Fortaleza, CE, Brazil.
Free event.

Tuesday, March 12, 2013

Regalo para emprendedores

Saiku Chart Plus 2.4 RC1 Out Now!

What is Saiku Chart Plus?

It is an open source project that helps Pentaho BI users to create other types of charts and maps based on Saiku Project, Highcharts and Google Maps.

Learn more about it:
http://it4biz.github.com/SaikuChartPlus/

Saiku Chart Plus by IT4biz

Saturday, March 09, 2013

Saiku + IT4biz

Saiku + Maps

native2ascii - Native-to-ASCII Converter

Converts a file with native-encoded characters (characters which are non-Latin 1 and non-Unicode) to one with Unicode-encoded characters.

SYNOPSIS

native2ascii [options] [inputfile [outputfile]]

DESCRIPTION

The Java compiler and other Java tools can only process files which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters. native2ascii converts files which contain other character encodings into files containing Latin-1 and/or Unicode-encoded charaters.If outputfile is omitted, standard output is used for output. If, in addition, inputfile is omitted, standard input is used for input.

OPTIONS

-reverse

Perform the reverse operation: convert a file with Latin-1 and/or Unicode encoded characters to one with native-encoded characters.

-encoding encoding_name

Specify the encoding name which is used by the conversion procedure. The default encoding is taken from System property file.encoding. The encoding_name string must be taken from the first column of the table of supported encodings in the Supported Encodings document.

-Joption

Pass option to the Java virtual machine, where option is one of the options described on the reference page for the java application launcher. For example, -J-Xms48m sets the startup memory to 48 megabytes.

Fonte:
http://docs.oracle.com/javase/1.4.2/docs/tooldocs/windows/native2ascii.html

Translation of Pivot4j from English to Brazilian Portuguese

Hi all,

It is hard to find time to write here in my blog, but I have good news.

After working with Xavier, Fernando and Maiara, we finished the translation of Pivot4J.

More infos:

https://github.com/caiomsouza/it4biz-pivot4j-pt-BR

http://mysticfall.github.com/pivot4j/

https://github.com/mysticfall

Saturday, March 02, 2013

SourceTree is a free Mac client for Git and Mercurial version control systems

SourceTree is a free Mac client for Git and Mercurial version control systems.

http://www.sourcetreeapp.com/

Friday, March 01, 2013

j2objc, A Java to iOS Objective-C translation tool and runtime.

Hi guys,

Do you need to translate Java to iOS Objective-C?

If you said "yes", take a look at the Google project below:

https://code.google.com/p/j2objc/

What J2ObjC Is

J2ObjC is an open-source command-line tool from Google that translates Java code to Objective-C for the iOS (iPhone/iPad) platform. This tool enables Java code to be part of an iOS application's build, as no editing of the generated files is necessary. The goal is to write an app's non-UI code (such as data access, or application logic) in Java, which is then shared by web apps (using GWT), Android apps, and iOS apps.

J2ObjC supports most Java language and runtime features required by client-side application developers, including exceptions, inner and anonymous classes, generic types, threads and reflection. JUnit test translation and execution is also supported.

J2ObjC is currently between alpha and beta quality. Several Google projects rely on it, but when new projects first start working with it, they usually find new bugs to be fixed. Apparently every Java developer has a slightly different way of using Java, and the tool hasn't translated all possible paths yet. It's initial version number is 0.5, which hopefully represents its release status correctly.

What J2ObjC isn't

J2ObjC does not provide any sort of platform-independent UI toolkit, nor are there any plans to do so in the future. iOS UI code needs to be written in Objective-C or Objective-C++ using Apple's iOS SDK (Android UIs using Android's API, web app UIs using GWT, etc.).

UnRARX, a free alternative for WinRAR for Mac OS X

Hi guys,

If you need a good and free alternative for WinRAR to use in your Mac OS X, you can install UnRARX.

http://www.unrarx.com/

Professor Coruja - Business and Open Source Technology

Pages

Google Ads

Saturday, March 30, 2013

It is time to remember about the Pentaho Community Events

Thursday, March 21, 2013

Resultado do Curso de Pentaho no TJSC já pode ser visto na internet.

Wednesday, March 20, 2013

Dados Abertos: Dados da Copa 2014

Tuesday, March 19, 2013

Ideas for Solving the 'Data' Problem First, the 'Big' Problem Second: The Pentaho Way

Thursday, March 14, 2013

It is time to share about the event Pentaho Day 2013!!!

Pentaho Day - April, 20 2013 - Fortaleza - Brazil

Tuesday, March 12, 2013

Regalo para emprendedores

Saiku Chart Plus 2.4 RC1 Out Now!

Saiku Chart Plus by IT4biz

Saturday, March 09, 2013

Saiku + IT4biz

Saiku + Maps

native2ascii - Native-to-ASCII Converter

native2ascii - Native-to-ASCII Converter

SYNOPSIS

DESCRIPTION

OPTIONS

Translation of Pivot4j from English to Brazilian Portuguese

Saturday, March 02, 2013

SourceTree is a free Mac client for Git and Mercurial version control systems

Friday, March 01, 2013

j2objc, A Java to iOS Objective-C translation tool and runtime.

What J2ObjC Is

What J2ObjC isn't

UnRARX, a free alternative for WinRAR for Mac OS X