Google Ads

Thursday, December 10, 2015

Quer aprender muito sobre BI com Pentaho e ainda receber por isso?

Começar a trabalhar com BI é um grande desafio para muitos, afinal existem poucas empresas dispostas a te ajudar a aprender a dar os primeiros passos.

A IT4biz Global além de realizar projetos é referência na formação de profissionais de Business Intelligence Open Source, sendo pioneira no Brasil na utilização da Suite de BI Pentaho.

Se você deseja fazer um estágio em Business Intelligence em 2016 e aprender tudo sobre o assunto, envie um e-mail para

Estamos com vagas abertas para 2016 com início previsto para Janeiro ou Fevereiro de 2016.

Além de aprender muito, alguns estagiários foram incorporados pela IT4biz como consultores e muitos dos nossos ex-estagiários foram contratados por empresas como Oracle, Catho, Locaweb, etc.

Ficamos tristes por vê-los partir, mas felizes por saber que fizemos parte deste começo de carreira tão difícil.

A IT4biz Global possui atualmente projetos no Brasil, Espanha e Estados Unidos, e já realizou projetos em diversos outros lugares.

Temos escritórios em São Paulo, Brasil e Madrid, Espanha. 

Somos super ativos na Comunidade Pentaho no Brasil e no Mundo, temos alguns projetos open source sendo utilizados por mais de 160 países e temos mais de 300 clientes espalhados pelo mundo.

Venha fazer parte!!!

Thursday, November 26, 2015

#Tips: How to delete backup from my mac

Hi Folks,

I had a lot of space in my Mac Book Pro dedicated to backups.

I want to share how to get this space back.

  1. Simply "turn off" Time Machine in its preference pane, under System Preferences.
  2. Turn it back on, or select "Back Up Now" in the title bar icon for TM, when you wish it to sync with your TC.

Alternatively, you can disable Local Snapshots as well, if you don't want to turn off Time Machine.

Enter this in Terminal:

sudo tmutil disablelocal


Friday, November 20, 2015

#PCM2015 Facebook Pics - London, UK - Nov, 7, 2015

Pentaho Community Meetup - Nov, 7, 2015 - London - #PCM2015 Link:

Posted by Pentaho Brasil on Friday, November 20, 2015

Let's Help / Vamos ayudar / Vamos ajudar Saiku Team #crowdfunding

Hi friends / Hola amigos/ Olá amigos,

EN: I still remember the day when Tom Barber presented PAT (Pentaho Analytics Tool) in Barcelona, Spain in 2009 at #PCM09.
ES: Yo aun me acuerdo el dia cuando Tom Barber presentó PAT (Pentaho Analysis Tool) en Barcelona, España en 2009 en #PCM09.
PT-BR: Eu ainda me lembro do dia quando Tom Barner apresentou PAT (Pentaho Analysis Tool) em Barcelona, Espanha em 2009 no #PCM09.

EN: Before PAT we were using jPivot and PAT was a very nice option to replace jPivot.
ES: Antes de PAT teniamos que utilizar el jPivot y PAT fue una muy buena opción para subsituir el jPivot.
PT-BR: Antes do PAT tinhamos que utilizar o jPivot e o PAT foi uma excelente opção para substituir o JPivot.
EN: Some years ago they changed the name to Saiku.
ES: Algunos años atras ellos cambiaron el nombre para Saiku.
PT-BR: Alguns anos atrás eles mudaram o nome para Saiku.

EN: The reason of this post is to help our friends from Saiku get money to improve this amazing Open Source Project.
ES: La razón de este post es para ayudar nuestros amigos de Saiku recebir diñero para mejorar este muy bueno proyecto de software libre.
PT-BT: A razão deste post é para ajudar nossos amigos do Saiku a conseguir dinheiro para melhorar este excelente projeto de software livre.

EN: Let's help them!!!
ES: Vamos a ayudarles!!!
PT-BR:  Vamos ajudar!!!

Link to send them money: / Enlace para enviar diñero: / Link para enviar dinheiro:

Thursday, November 05, 2015

#PCM15 - Pentaho Community Meeting on November 7th, 2015 at London, UK.

Hi Folks,
I am very happy that I will got to #PCM 2015 in London this saturday, I will talk about some of ours Pentaho plugins.

I hope to see all my friends from the Pentaho Community all over the world.
Below there is a copy of Pedro's post about PCM.

It's almost time. #PCM continues it's cruise around Europe. Next stop, for it's 8th year in a row, London. 
Here's all the information, taken from the project's Github page (geeks!!)

Pentaho Community Meeting 2015

This page holds all the essential info about the Pentaho Community Meeting 2015.
Most important bits first:
  • Location: London
  • Date: November 7th - there will also be a Hackathon on Friday evening and social activities on Sunday.
  • Venue: W12 Conferences, West London


Attending the presentations and hackathon is completely free!
Please register for the main event here for a free ticket.
For the hackathon on Friday evening please register here.


  • As with previous PCM meetings there will be a nominal charge to cover lunch on Saturday.


Rough outline, details TBC:
  • Friday: Evening Hackathon in the city with fancy prizes (courtessy of IVY-IS) to be won, then drinks!
  • Saturday: 2 streams of talks - Business and Tech focussed
  • Saturday Evening Dinner/drinks around Piccadilly Circys
  • Sunday: The sightseeing tour all sightseeing tours should be like. Not to be missed.

How to submit the talks

Please send details to
Provide the following:
  • Your full name
  • Links to your profie and company
  • Title of the talk and synopsis

Friday: Venue Info, Agenda Etc

Venue Location

Skillsmatter "CodeNode" How to get there
Time - TBC - Evening, which hackathon starting by 7pm, possibly other events occurring before.

Saturday: Venue Info, Agenda Etc

Venue Location

W12 Conferences, West London.
W12 Conferences, Artillery Lane, 150 Du Cane Road, London, W12 0HS

How to Get There

Please visit Transport for London for best directions from your London location to W12 Conferences.
Nearest tube stations
  • Central Line: White City and East Acton both within a 10 minute walk.
  • Hammersmith & City and Circle Lines: Wood Lane within a 12 min walk

Buses Buses 7, 70, 72, 272 & 283 bus routes stop directly outside on Du Cane Road. When getting off the bus, look for Queen Charlottes hospital, to your left when looking at Hammersmith Hospital. Artillery Lane is the road running past this towards the car park. Follow this road and turn right at the mini roundabout, then turn left directly into the W12 courtyard, our reception is here.
Nearest Mainline Train stations Paddington Station - 10 mins Take Hammersmith & City or Circle line to Wood Lane Tube station
Liverpool Street – 25 mins Take Central Line to White City station
Victoria Station – 22 mins Take Victoria Line to Oxford Circus. Change to Central Line to White City station.
Kings Cross/St Pancras – 25 mins Take Hammersmith & City or Circle line to Wood Lane Tube station
IMPORTANT: The conference centre is part of the Hospital. YOU MUST NOT ENTER via the main hospital entrance. There is a road between the Prison and Hospital which will take you to conference centre:

Social Media

Twitter hashtag: #pcm15

Talks so far (List will evolve!)


Tech Stream

  • Matt Casters, Pentaho Chief of Data Integration / Kettle project founder: Data Sets and Unit Tests PDI plugin
  • Tom Barber of Meteorite BI will talk about Saiku and managing metadata in a NoSQL world
  • Caio Moreno de Souza: Monitoring the BI Pentaho Server using Pentaho CE Audit and Performance Monitoring Plugin / Creating Maps with Saiku Chart Plus
  • Will Gorman - Pentaho Chief Architect will present something exciting!
  • Antonio García-Domínguez and Inmaculada Medina-Bulo: ArtifactCatalog: Better Descriptions and Hierarchical Tagging for Pentaho Resources
  • Roland Bouman: Will talk about PHASE (Pentaho Analysis Editor) and PASH (Pentaho Analysis Shell).
  • Miguel Cunhal: Will present 15-20 top tips and tricks for PDI
  • Sébastian Jelsch: Bigdata MDX with Mondrian and Apache Kylin
  • Pedro Vale (Webdetails): All the secrets behind Pentaho 6.0
  • Jens Bleuel: Everything you wanted to know about PDI 6.0
  • Know BI: Metadata injection driven map-reduce
  • Diogo Mariano (Webdetails): How to easily embed your cTools dashboard in your web application
  • Julio Costa (Webdetails): Responsive design with cTools
  • Francesco Corti and Alberto Mercati: transparent and trusted authentication between an external application and Pentaho. (If you've not seen Francescos presentations before you're in for a treat!)
  • Marc Batchelor (founder and VP of Engineering Pentaho): Community Contribution Guidelines and Process – How can the community contribute to Pentaho Development
  • Andre Simoes (XpandIT: Taming Big Data - Big data is getting mature, is your company ready to handle capture and orchestrate all the processes running within a cluster?

Business Stream

  • Nelson Sousa of Ubiquis Consulting will talk about Mapping and the benefits it can provide to the business
  • Owen Bowden Beating blood cancer with help from the Pentaho community
  • Mark Stubbs Pentaho Solutions Architect will talk about some of the most exciting Pentaho BigData projects currently running in London
  • Juanjo Ortilles Web Adhoc Query Executor (WAQE) A successor to WAQR.
  • Emilio Arias Analyzing Ashley Madison data with Pentaho
  • Marcello Pontes from oncase Customizing biserver with Tapa and some more plugin goodies. How to take advantage of the plugin whats reusable ans some more hot news for Dashboarders.

Sunday: Venue info, agenda, etc.

Location: central London

Agenda: the London sightseeing tour you wish all sightseeing tours would be like.

A guided tour of London not to be missed. Only 30 spots available, pre-registration required when you sign-in to the conference on Saturday.
Will last 2 hours approximately, covering the City of London and surrounding areas.


Below are a few hotel suggestions.
Please note London is a very large city and commuting may be time consuming. We suggest you choose a hotel either close to the conference venue (Acton, White City, Sheperd's Bush) or closer to the social even on Saturday night (Covent Garden, Soho, Charing Cross).
London Underground is expected to start a 24 hour service during the fall on week-ends. The announced date was September 12, but the recent Tube strikes have pushed this back and there is no official date for commencement of the 24 service yet.

Central London

These hotels are close to the social events on Saturday night, but expect around 30 minutes to get to the conference venue on Saturday:

Acton/White City

Closer to the conference venue. They are located in zone 2 of the London transporation zone, about 15-20 minutes from Central London
See you there!

40% of Top 5 Data Scientist from Kaggle are Brazilians.

Hi Folks,

I took a snapshot today of The Kaggle Rankings and 40% of the Top 5 Best Data Scientist are Brazilians.

Kaggle Rankings Stats:

Top 5
2 from Brazil
1 from USA
1 from Grece
1 from Russia

Top 10
4 from Russia
2 from Brazil
1 from Spain
1 from Germany
1 from Greece
1 from USA

Thursday, October 29, 2015

Curso de Pentaho + Docker

Baseado na experiência de mais de 2 anos utilizando o Docker com o Pentaho em grandes clientes, lançamos um curso único e pioneiro no Brasil chamado de Curso de Pentaho + Docker.

Durante o curso o aluno aprenderá como utilizar o Docker para criar um projeto de BI com o Pentaho totalmente automático.

O curso tem duração de 12 horas.


  • Introdução ao Docker;
  • Introdução ao Docker Compose;
  • Introdução ao Docker Hub;
  • Introdução ao Github;
  • Introdução a Amazon AWS e serviços necessários para um projeto de BI automatizado;
  • Criando um primeiro projeto automático utilizando o Docker;
  • Passo a Passo de como realizar a automatização de um projeto real feito no Pentaho para funcionar no Docker;
  • Deploy de um projeto real na Amazon AWS utilizando o Docker;
  • Explicação detalhada do Projeto Real e Open Source - Projeto EDW CENIPA.
Faça parte da primeira turma e automatize seu projeto inteiro de Pentaho!!!

Para maiores informações envie um e-mail para

Saturday, October 24, 2015

Norse – Superior Attack Intelligence

Norse maintains the world’s largest dedicated threat intelligence network. With over eight million sensors that emulate over six thousand applications – from Apple laptops, to ATM machines, to critical infrastructure systems, to closed-circuit TV cameras - the Norse Intelligence Network gathers data on who the attackers are and what they’re after. Norse delivers that data through the Norse Appliance, which pre-emptively blocks attacks and improves your overall security ROI, and the Norse Intelligence Service, which provides professional continuous threat monitoring for large networks.


Thursday, October 15, 2015

Origin of Kettle (PDI - Pentaho Data Integration)

In this video you will have the opportunity to learn about the Origin of Kettle (PDI - Pentaho Data Integration) told by Kettle creator Mr. Matt Casters at Pentaho Day 2014 on May, 16, 2014 in FEA/USP (School of Economics, Business and Accounting of the University of São Paulo).

Link to the video:  

Monday, September 21, 2015

Microsoft has Built its own Linux Operating System

Steve Jobs: The Man In The Machine - Official Trailer

How to go to Google Campus Madrid by car, metro or bike

Hi Geek folks from Madrid or anywhere in the world,

Do you want to go to Google Campus Madrid?

Google Campus Madrid
Address: Calle Moreno Nieto, 2 28005 Madrid, Spain
Hours: Monday-Friday 9am-7pm

By car
In the beginning I was coming to Google Campus Madrid by car, but it is probably the worst way to come and the reason is that you can not park here for more than 4 hours.

So you will have to park your car in the streets close to Google Campus and pay about 7 euros per 4 hours and after 4 hours you will have to stay 2 hours away from the district so that way you can come again and park for more 4 hours.

So, by car you will pay about 14 euros to park and you will have all the inconvinient of having to take off your car every 4 hours.

To come by car just in waze or your favorite GPS the street name Calle Moreno Nieto, 2, Madrid.

There are always places to park and it very easy to park anywhere.

By Metro

I do not know why, but Google is in a place "very" far away from a metro station for Madrid standards.

It means you will have to walk about 1 km from the metro station to Google Campus.

In fact, there are a lot of options to come here by metro, but they are all far away.

I already tried two options until now, see it below:

Puerta del Ángel
The best way to go home If you do not want to go up in the hill to Opera.

Opera Metro Station
You can walk from Opera Station to Google Campus.
It is a good option when you are going to Google, because it is all going down by a hill from the Madrid Palace to Google. I will not try to do this to go back home just If I was trying to do some exercise.

By Bike
I came once from my house to Google Campus, it was 10km by bike and it was preety good, but the bad thing is that there is no place to park the bikes inside. They said that they will let everybody park the bike inside even if you are not a resident.
I liked this way a lot and I hope they let us park here without having to pay for it as a resident.

Why did I write this post?

I did it for me, for all my friends that are asking me about Google Campus and maybe for you that are reading it.

I have a bad memory, so I decided to write this post to remember. I do not come here everyday, and even every week or even every months, so it is easy to forget how to come here.

Why Google Campus Madrid?

Google Campus Madrid is a excellent place to work, meet friends, make new friends, learn, make business contacts, study, eat or anything. It is kind of a Starbucks but dedicated to geeks, entrepreneurs and tech guys.

If you come to Google Campus Madrid you will enjoy this international environment, this startup spirit, the amazing internet, the amazing facility, in fact if you are a geek you will feel home.

The problem is that after you start working at Google Campus Madrid you will have difficult to work in others place. Here is just amazing and I wish I could come here everyday.

Working anywhere

I usually work anywhere. When I mean anywhere, that's true. I work anywhere in the world. I travel a lot to a lot of countries and cities and it is very cool to be able to work anywhere.

I work in my client's offices, at the University, at the Metro, the Mall, in the bus, in the car, in the taxi, in the airplane, at home, in my company office in Sao Paulo or Madrid, in the park, at Starbucks and since Google Campus Madrid opened sometimes I will also come here to work and I love it.

If you want to have freedom, you have to learn how to work anywhere and make it productivity.

A rich life with less stuff | The Minimalists | TEDxWhitefish

P.W. Singer: Military robots and the future of war

Vijay Kumar: Robots that fly ... and cooperate

Robin Murphy: These robots come to the rescue after a disaster

The wonderful and terrifying implications of computers that can learn

Larry Page: Where’s Google going next?

Friday, September 11, 2015

Big Data A to Z: A glossary of Big Data terminology

This is almost a complete glossary of Big Data terminology widely used today. Let us know if you would like to add any big data terminology missing in this list.

ACID test

A test applied to data for atomicity, consistency, isolation, and durability

A process of searching, gathering and presenting data

A mathematical formula placed in software that performs an analysis on a set of data.

The severing of links between people in a database and their records to prevent the discovery of the source of the records.

Artificial Intelligence
Developing intelligence machines and software that are capable of perceiving the environment and take corresponding action when required and even learn from those actions.

Automatic identification and capture (AIDC)
Any method of automatically identifying and collecting data on items, and then storing the data in a computer system. For example, a scanner might collect data about a product being shipped via an RFID chip.

Avro is a data serialization system that allows for encoding the schema of Hadoop files. It is adept at parsing data and performing remote procedure calls

Behavioral analytics
Using data about people’s behavior to understand intent and predict future actions.

Big Data Scientist
Someone who is able to develop the algorithms to make sense out of big data.

Business Intelligence (BI)
The general term used for the identification, extraction, and analysis of data.

Cascading provides a higher level of abstraction for Hadoop, allowing developers to create complex jobs quickly, easily, and in several different languages that run in the JVM, including Ruby, Scala, and more. In effect, this has shattered the skills barrier, enabling Twitter to use Hadoop more broadly.

Call Detail Record (CDR) analysis
CDRs contain data that a telecommunications company collects about phone calls, such as time and length of call. This data can be used in any number of analytical applications.

Cassandra is a distributed and Open Source database. Designed to handle large amounts of distributed data across commodity servers while providing a highly available service. It is a NoSQL solution that was initially developed by Facebook. It is structured in the form of key-value.

Cell phone data
Cell phones generate a tremendous amount of data, and much of it is available for use with analytical applications.

Clickstream Analytics
The analysis of users’ Web activity through the items they click on a page.

Classification analysis
A systematic process for obtaining important and relevant information about data, also meta data called; data about data.

Cloud computing
A distributed computing system over a network used for storing data off-premises

Clustering analysis
The process of identifying objects that are similar to each other and cluster them in order to understand the differences as well as the similarities within the data.

Cold data storage
Storing old data that is hardly used on low-power servers. Retrieving the data will take longer

Comparative analysis
It ensures a step-by-step procedure of comparisons and calculations to detect patterns within very large data sets.

Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness.  Chukwa also includes a flexible and powerful toolkit for displaying monitoring and analyzing results, in order to make the best use of this collected data.

Clojure is a dynamic programming language based on LISP that uses the Java Virtual Machine (JVM). It is well suited for parallel data processing.

A broad term that refers to any Internet-based application or service that is hosted remotely.

Columnar database or column-oriented database
A database that stores data by column rather than by row. In a row-based database, a row might contain a name, address, and phone number. In a column-oriented database,  all names are in one column, addresses in another, and so on. A key advantage of a columnar database is faster hard disk access.

Two ways you may compare your keys is by implementing the interface or by implementing the RawComparator interface. In the former approach, you will compare (deserialized) objects, but in the latter approach, you will compare the keys using their corresponding raw bytes.

Complex event processing (CEP)
CEP is the process of monitoring and analyzing all events across an organization’s systems and acting on them when necessary in real time.

The act of making an intuition-based decision appear to be data-based.

Cross-channel analytics
Analysis that can attribute sales, show average order value, or the lifetime value.

Data access
The act or method of viewing or retrieving stored data.

A graphical representation of the analyses performed by the algorithms

Data aggregation
The act of collecting data from multiple sources for the purpose of reporting or analysis.

Data architecture and design
How enterprise data is structured. The actual structure or design varies depending on the eventual end result required. Data architecture has three stages or processes: conceptual representation of business entities. the logical representation of the relationships among those entities, and the physical construction of the system to support the functionality.


A digital collection of data and the structure around which the data is organized. The data is typically entered into and accessed via a database management system (DBMS).

Database administrator (DBA)

A person, often certified, who is responsible for supporting and maintaining the integrity of the structure and content of a database.

Database as a service (DaaS)

A database hosted in the cloud and sold on a metered basis. Examples include Heroku Postgres and Amazon Relational Database Service.

Database management system (DBMS)

Software that collects and provides access to data in a structured format.

Data center

A physical facility that houses a large number of servers and data storage devices. Data centers might belong to a single organization or sell their services to many organizations.

Data cleansing

The act of reviewing and revising data to remove duplicate entries, correct misspellings, add missing data, and provide more consistency.

Data collection

Any process that captures any type of data.

Data custodian

A person responsible for the database structure and the technical environment, including the storage of data.

Data-directed decision making

Using data to support making crucial decisions.

Data exhaust

The data that a person creates as a byproduct of a common activity–for example, a cell call log or web search history.

Data feed

A means for a person to receive a stream of data. Examples of data feed mechanisms include RSS or Twitter.

Data governance

A set of processes or rules that ensure the integrity of the data and that data management best practices are met.

Data integration

The process of combining data from different sources and presenting it in a single view.

Data integrity

The measure of trust an organization has in the accuracy, completeness, timeliness, and validity of the data.

Data mart

The access layer of a data warehouse used to provide data to users.

Data migration

The process of moving data between different storage types or formats, or between different computer systems.

Data mining

The process of deriving patterns or knowledge from large data sets.

Data model, data modeling

A data model defines the structure of the data for the purpose of communicating between functional and technical people to show data needed for business processes, or for communicating a plan to develop how data is stored and accessed among application development team members.

Data point

An individual item on a graph or a chart.

Data profiling

The process of collecting statistics and information about data in an existing source.

Data quality

The measure of data to determine its worthiness for decision making, planning, or operations.

Data replication

The process of sharing information to ensure consistency between redundant sources.

Data repository

The location of permanently stored data.

Data science

A recent term that has multiple definitions, but generally accepted as a discipline that incorporates statistics, data visualization, computer programming, data mining, machine learning, and database engineering to solve complex problems.

Data scientist

A practitioner of data science.

Data security

The practice of protecting data from destruction or unauthorized access.

Data set

A collection of data, typically in tabular form.

Data source

Any provider of data–for example, a database or a data stream.

Data steward

A person responsible for data stored in a data field.

Data structure

A specific way of storing and organizing data.

Data visualization

A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively.

Data warehouse

A place to store data for the purpose of reporting and analysis.


The act of removing all data that links a person to a particular piece of information.

Demographic data

Data relating to the characteristics of a human population.

Deep Thunder

IBM’s weather prediction service that provides weather data to organizations such as utilities, which use the data to optimize energy distribution.

Distributed cache

A data cache that is spread across multiple systems but works as one. It is used to improve performance.

Distributed object

A software module designed to work with other distributed objects stored on other computers.

Distributed processing

The execution of a process across multiple computers connected by a computer network.

Distributed File System

Systems that offer simplified, highly available access to storing, analysing and processing data

Document Store Databases

A document-oriented database that is especially designed to store, manage and retrieve documents, also known as semi structured data.

Document management

The practice of tracking and storing electronic documents and scanned images of paper documents.


An open source distributed system for performing interactive analysis on large-scale datasets. It is similar to Google’s Dremel, and is managed by Apache.


An open source search engine built on Apache Lucene.

Event analytics

Shows the series of steps that led to an action.


One million terabytes, or 1 billion gigabytes of information.

External data

Data that exists outside of a system.

Extract, transform, and load (ETL)

A process used in data warehousing to prepare data for use in reporting or analytics.

Exploratory analysis

Finding patterns within data without standard procedures or methods. It is a means of discovering the data and to find the data sets main characteristics.


The automatic switching to another computer or node should one fail.


Flume is a framework for populating Hadoop with data. Agents are populated throughout ones IT infrastructure – inside web servers, application servers and mobile devices, for example – to collect data and integrate it into Hadoop.

Grid computing

The performing of computing functions using resources from multiple distributed systems. Grid computing typically involves large files and are most often used for multiple applications. The systems that comprise a grid computing network do not have to be similar in design or in the same geographic location.

Graph Databases

They use graph structures (a finite set of ordered pairs or certain entities), with edges, properties and nodes for data storage. It provides index-free adjacency, meaning that every element is directly linked to its neighbour element.


An open source software library project administered by the Apache Software Foundation. Apache defines Hadoop as “a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model.”


Hama is a distributed computing framework based on Bulk Synchronous Parallel computing techniques for massive scientific computations e.g., matrix, graph and network algorithms. It’s a Top Level Project under the Apache Software Foundation.


A software/hardware in-memory computing platform from SAP designed for high-volume transactions and real-time analytics.


HBase is a non-relational database that allows for low-latency, quick lookups in Hadoop. It adds transactional capabilities to Hadoop, allowing users to conduct updates, inserts and deletes. EBay and Facebook use HBase heavily


HCatalog is a centralized metadata management and sharing service for Apache Hadoop. It allows for a unified view of all data in Hadoop clusters and allows diverse tools, including Pig and Hive, to process any data elements without needing to know physically where in the cluster the data is stored.

HDFS (Hadoop Distributed File System)

HDFS (Hadoop Distributed File System) the storage layer of Hadoop, is a distributed, scalable, Java-based file system adept at storing large volumes of unstructured.


Hive is a Hadoop-based data warehousing-like framework originally developed by Facebook. It allows users to write queries in a SQL-like language called HiveQL, which are then converted to MapReduce. This allows SQL programmers with no MapReduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools such as Microstrategy, Tableau, Revolutions Analytics, etc.


Hue (Hadoop User Experience) is an open source web-based interface for making it easier to use Apache Hadoop. It features a file browser for HDFS, an Oozie Application for creating workflows and coordinators, a job designer/browser for MapReduce, a Hive and Impala UI, a Shell, a collection of Hadoop API and more.


Impala (By Cloudera) provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase using the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries.

In-database analytics

The integration of data analytics into the data warehouse.

In-memory database

Any database system that relies on memory for data storage.

In-memory data grid (IMDG)

The storage of data in memory across multiple servers for the purpose of greater scalability and faster access or analytics.

Internet of Things

Ordinary devices that are connected to the internet at any time any where via sensors


Kafka (developed by LinkedIn) is a distributed publish-subscribe messaging system that offers a solution capable of handling all data flow activity and processing these data on a consumer website. This type of data (page views, searches, and other user actions) are a key ingredient in the current social web.

Key Value Stores

Key value stores allow the application to store its data in a schema-less way. The data could be stored in a datatype of a programming language or an object. Because of this, there is no need for a fixed data model.

KeyValue Databases

They store data with a primary key, a uniquely identifiable record, which makes easy and fast to look up. The data stored in a KeyValue is normally some kind of primitive of the programming language.


Any delay in a response or delivery of data from one point to another.

Linked data

As described by World Wide Web inventor Time Berners-Lee, “Cherry-picking common attributes or languages to identify connections or relationships between disparate sources of data.”

Load balancing

The process of distributing workload across a computer network or computer cluster to optimize performance.

Location analytics

Location analytics brings mapping and map-driven analytics to enterprise business systems and data warehouses. It allows you to associate geospatial information with datasets.

Location data

Data that describes a geographic location.

Log file

A file that a computer, network, or application creates automatically to record events that occur during operation–for example, the time a file is accessed.

Machine-generated data

Any data that is automatically created from a computer process, application, or other non-human source.

Machine2Machine data

Two or more machines that are communicating with each other

Machine learning

The use of algorithms to allow a computer to analyze data for the purpose of “learning” what action to take when a specific pattern or event occurs.


MapReduce is a software framework that serves as the compute layer of Hadoop. MapReduce jobs are divided into two (obviously named) parts. The “Map” function divides a query into multiple parts and processes data at the node level. The “Reduce” function aggregates the results of the “Map” function to determine the “answer” to the query.


The process of combining different datasets within a single application to enhance output–for example, combining demographic data with real estate listings.


Mahout is a data mining library. It takes the most popular data mining algorithms for performing clustering, regression testing and statistical modeling and implements them using the Map Reduce model.


Data about data; gives information about what the data is about.


MongoDB is a NoSQL database oriented to documents, developed under the open source concept. It saves data structures in JSON documents with a dynamic scheme (called MongoDB BSON format), making the integration of the data in certain applications more easily and quickly.

MPP database

A database optimized to work in a massively parallel processing environment.

Multi-Dimensional Databases

A database optimized for data online analytical processing (OLAP) applications and for data warehousing.

MultiValue Databases

They are a type of NoSQL and multidimensional databases that understand 3 dimensional data directly. They are primarily giant strings that are perfect for manipulating HTML and XML strings directly

Network analysis

Viewing relationships among the nodes in terms of the network or graph theory, meaning analysing connections between nodes in a network and the strength of the ties.


An elegant, well-defined database system that is easier to learn and better than SQL. It is even newer than NoSQL


NoSQL (commonly interpreted as “not only SQL“) is a broad class of database management systems identified by non-adherence to the widely used relational database management system model. NoSQL databases are not built primarily on tables, and generally do not use SQL for data manipulation.

Object Databases

They store data in the form of objects, as used by object-oriented programming. They are different from relational or graph databases and most of them offer a query language that allows object to be found with a declarative programming approach.

Object-based Image Analysis

Analysing digital images can be performed with data from individual pixels, whereas object-based image analysis uses data from a selection of related pixels, called objects or image objects.

Online analytical processing (OLAP)

The process of analyzing multidimensional data using three operations: consolidation (the aggregation of available), drill-down (the ability for users to see the underlying details), and slice and dice (the ability for users to select subsets and view them from different perspectives).

Online transactional processing (OLTP)

The process of providing users with access to large amounts of transactional data in a way that they can derive meaning from it.


The open source version of Google’s Big Query java code. It is being integrated with Apache Drill.

Open Data Center Alliance (ODCA)

A consortium of global IT organizations whose goal is to speed the migration of cloud computing.

Operational data store (ODS)

A location to gather and store data from multiple sources so that more operations can be performed on it before sending to the data warehouse for reporting.


Oozie is a workflow processing system that lets users define a series of jobs written in multiple languages – such as Map Reduce, Pig and Hive — then intelligently link them to one another. Oozie allows users to specify, for example, that a particular query is only to be initiated after specified previous jobs on which it relies for data are completed.

Parallel data analysis

Breaking up an analytical problem into smaller components and running algorithms on each of those components at the same time. Parallel data analysis can occur within the same system or across multiple systems.

Parallel method invocation (PMI)

Allows programming code to call multiple functions in parallel.

Parallel processing

The ability to execute multiple tasks at the same time.

Parallel query

A query that is executed over multiple system threads for faster performance.

Pattern recognition

The classification or labeling of an identified pattern in the machine learning process.


Pentaho offers a suite of open source Business Intelligence (BI) products called Pentaho Business Analytics providing data integration, OLAP services, reporting, dashboarding, data mining and ETL capabilities


One million gigabytes or 1,024 terabytes.


Pig Latin is a Hadoop-based language developed by Yahoo. It is relatively easy to learn and is adept at very deep, very long data pipelines (a limitation of SQL).

Predictive analytics

Using statistical functions on one or more datasets to predict trends or future events.

Predictive modeling

The process of developing a model that will most likely predict a trend or outcome.

Public data

Public information or data sets that were created with public funding


Asking for information to answer a certain question

Query analysis

The process of analyzing a search query for the purpose of optimizing it for the best possible result.


R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.


Combining several data sets to find a certain person within anonymized data

Real-time data

Data that is created, processed, stored, analysed and visualized within milliseconds

Recommendation engine

An algorithm that analyzes a customer’s purchases and actions on an e-commerce site and then uses that data to recommend complementary products.

Reference data

Data that describes an object and its properties. The object may be physical or virtual.

Risk analysis

The application of statistical methods on one or more datasets to determine the likely risk of a project, action, or decision.

Root-cause analysis

The process of determining the main cause of an event or problem.

Routing analysis

Finding the optimized routing using many different variables for a certain means of transport in order to decrease fuel costs and increase efficiency.


The ability of a system or process to maintain acceptable performance levels as workload or scope increases.


The structure that defines the organization of data in a database system.

Search data

Aggregated data about search terms used over time.

Semi-structured data

Data that is not structured by a formal data model, but provides other means of describing the data and hierarchies.

Sentiment analysis

The application of statistical functions on comments people make on the web and through social networks to determine how they feel about a product or company.


A physical or virtual computer that serves requests for a software application and delivers those requests over a network.

Spatial analysis

It refers to analysing spatial data such geographic data or topological data to identify and understand patterns and regularities within data distributed in geographic space.


A programming language for retrieving data from a relational database


Sqoop is a connectivity tool for moving data from non-Hadoop data stores – such as relational databases and data warehouses – into Hadoop. It allows users to specify the target location inside of Hadoop and instruct Sqoop to move data from Oracle, Teradata or other relational databases to the target.


Storm is a system of real-time distributed computing, open source and free, born into Twitter. Storm makes it easy to reliably process unstructured data flows in the field of real-time processing, which made Hadoop for batch processing.

Software as a service (SaaS)

Application software that is used over the web by a thin client or web browser. Salesforce is a well-known example of SaaS.


Any means of storing data persistently.


An open-source distributed computation system designed for processing multiple data streams in real time.

Structured data

Data that is organized by a predetermined structure.

Structured Query Language (SQL)

A programming language designed specifically to manage and retrieve data from a relational database system.

Text analytics

The application of statistical, linguistic, and machine learning techniques on text-based sources to derive meaning or insight.

Transactional data

Data that changes unpredictably. Examples include accounts payable and receivable data, or data about product shipments.


“Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.”

Unstructured data

Data that has no identifiable structure–for example, the text of email messages.


All that available data will create a lot of value for organizations, societies and consumers. Big data means big business and every industry will reap the benefits from big data.


The amount of data, ranging from megabytes to brontobytes


A visual abstraction of data designed for the purpose of deriving meaning or communicating information more effectively.

WebHDFS Apache Hadoop

WebHDFS Apache Hadoop provides native libraries for accessing HDFS. However, users prefer to use HDFS remotely over the heavy client side native libraries. For example, some applications need to load data in and out of the cluster, or to externally interact with the HDFS data. WebHDFS addresses these issues by providing a fully functional HTTP REST API to access HDFS.

Weather data

Real-time weather data is now widely available for organizations to use in a variety of ways. For example, a logistics company can monitor local weather conditions to optimize the transport of goods. A utility company can adjust energy distribution in real time.

XML Databases

XML Databases allow data to be stored in XML format. XML databases are often linked to document-oriented databases. The data stored in an XML database can be queried, exported and serialized into any format needed.


ZooKeeper is a software project of the Apache Software Foundation, a service that provides centralized configuration and open code name registration for large distributed systems. ZooKeeper is a subproject of Hadoop.


Visão Geral sobre os Mórmons

Visão Geral sobre os Mórmons

Este vídeo mostra uma visão geral sobre A Igreja de Jesus Cristo dos Santos dos Últimos Dias. Conheça algumas das crenças básicas e programas dos mórmons.

Posted by A Igreja de Jesus Cristo dos Santos dos Últimos Dias (Oficial) - Notícias on Thursday, June 11, 2015

Thursday, September 10, 2015

Encuentro de la Comunidad Pentaho en Madrid, España en el dia 17/09/15

Hola a todos,

En el dia 17 de Septiembre de 2015 a las 19:00hs en la calle Salvatierra, 4, Madrid, 28034 (Open Sistemas) vamos tener un Encuentro de la Comunidad Pentaho en Madrid, España.

Catch up sobre pentaho (#pcm), desarrollos propios y aquitectura y mas cosas.


•  Presentación de la sesión

• Arquitectura de pentaho en HA

• Desarrollo y funcionamiento de OS-User-Actions.

• Trabajo con enlaces simbólicos de Git y pentaho

• Monitorizando BI Pentaho Server con Pentaho CE Audit y Plugin de monitorización de rendimiento

Link para hacer la inscripción en el evento:

Saturday, August 29, 2015

Lindo video feito pelo Toddynho sobre a família

How to install Android Studio on Mac OS X 10.11 (Beta)

Visit the link

Installing Android Studio

Android Studio provides everything you need to start developing apps for Android, including the Android Studio IDE and the Android SDK tools.

If you didn't download Android Studio, go download Android Studio now, or switch to the stand-alone SDK Tools install instructions.

Before you set up Android Studio, be sure you have installed JDK 6 or higher (the JRE alone is not sufficient)—JDK 7 is required when developing for Android 5.0 and higher. To check if you have JDK installed (and which version), open a terminal and type javac -version. If the JDK is not available or the version is lower than 6, go download JDK.

[ Show instructions for all platforms ]
To set up Android Studio on Mac OSX:

Launch the .dmg file you just downloaded.
Drag and drop Android Studio into the Applications folder.
Open Android Studio and follow the setup wizard to install any necessary SDK tools.
Depending on your security settings, when you attempt to open Android Studio, you might see a warning that says the package is damaged and should be moved to the trash. If this happens, go to System Preferences > Security & Privacy and under Allow applications downloaded from, select Anywhere. Then open Android Studio again.
If you need use the Android SDK tools from a command line, you can access them at:


Android Studio is now ready and loaded with the Android developer tools, but there are still a couple packages you should add to make your Android SDK complete.

See the screens below to help you understand better the whole process.

Saturday, August 22, 2015

Free and Open Source Easy Reimbursement Platform

If your travels take up much of your agenda, and controlling your travel expenses consume most of the time you have left, do not waste time and experience (Easy Reimbursement) now!
Using a cell phone the user record his/her expenses.

Visit our project webpage at

The first free and open source platform about reimbursement management in the World
It is the first free and open source platform about reimbursement management in the World. It was launched officialy on Google Play at March, 8, 2011.

We are the only free and open source reimbursement management platform in the World.

All our components of Easy Reimbursement Platform are open source and free.

Easy Reimbursement Free
Easy Reimbursement Free is an Android app to help people to manage their travel expensives.

Download it from Google PlayThe link is

Learn more about it:
Visit the link:

Reembolso Facil at | Tech Tudo

Reembolso Fácil Twitter:

Caio Moreno de Souza (twitter: @caiomsouza)
Fausto Koga

Monday, August 10, 2015

Slidify Demo

Tuesday, July 14, 2015

Logstash | Collect, Enrich & Transport Data

Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). If you store them in Elasticsearch, you can view and analyze them with Kibana.

It is fully free and fully open source. The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.

For more info, see

Monday, July 13, 2015

Telefonica Smart Steps

Smart Steps is an Insights solution that uses anonymous and aggregated mobile data to help organizations make better business decisions based on actual behaviour.

Saturday, July 04, 2015

Dunbar's number x Our Facebook "Friends/connections"

How can we have so many "Friends" on Facebook?

Are they really our "friends" or just connections like we have in LinkedIn, twitter or others social networks?

Read the text below and think about it.

Dunbar's number

Dunbar's number is a suggested cognitive limit to the number of people with whom one can maintain stable social relationships. These are relationships in which an individual knows who each person is and how each person relates to every other person.[1][2][3][4][5][6] This number was first proposed in the 1990s by British anthropologist Robin Dunbar, who found a correlation between primate brain size and average social group size.[7] By using the average human brain size and extrapolating from the results of primates, he proposed that humans can only comfortably maintain 150 stable relationships.[8] Proponents assert that numbers larger than this generally require more restrictive rules, laws, and enforced norms to maintain a stable, cohesive group. It has been proposed to lie between 100 and 250, with a commonly used value of 150.[9][10] Dunbar's number states the number of people one knows and keeps social contact with, and it does not include the number of people known personally with a ceased social relationship, nor people just generally known with a lack of persistent social relationship, a number which might be much higher and likely depends on long-term memory size.

Dunbar theorized that "this limit is a direct function of relative neocortex size, and that this in turn limits group size ... the limit imposed by neocortical processing capacity is simply on the number of individuals with whom a stable inter-personal relationship can be maintained." On the periphery, the number also includes past colleagues, such as high school friends, with whom a person would want to reacquaint themself if they met again.[11]


Tuesday, June 30, 2015

Hi folks,

Do you need to work with PDF files?

You need to use this website called

Saturday, June 27, 2015

How to install GitBook using NPM

Visit the website:

Type on your terminal:

npm install gitbook-cli -g


Caios-MacBook-Pro:thedatasciencenotebook caiomsouza$ npm install gitbook-cli -g
/usr/local/bin/gitbook -> /usr/local/lib/node_modules/gitbook-cli/bin/gitbook.js
gitbook-cli@0.3.4 /usr/local/lib/node_modules/gitbook-cli
├── bash-color@0.0.3
├── user-home@1.1.1
├── commander@2.6.0
├── tmp@0.0.23
├── q@1.0.1
├── semver@2.2.1
├── lodash@2.4.1
├── npmi@0.1.1 (semver@4.3.6)
├── optimist@0.6.1 (wordwrap@0.0.3, minimist@0.0.10)
├── npm@2.4.1
└── fs-extra@0.15.0 (jsonfile@2.2.1, graceful-fs@3.0.8, rimraf@2.4.0)

How to Install Node.js and NPM on a Mac

Visit the website:

Type on your terminal:

brew install node


Caios-MacBook-Pro:thedatasciencenotebook caiomsouza$ brew install node
==> Downloading
######################################################################## 100.0%
==> Pouring node-0.12.5.yosemite.bottle.tar.gz
==> Caveats

Bash completion has been installed to:
==> Summary
🍺  /usr/local/Cellar/node/0.12.5: 2681 files, 29M
Caios-MacBook-Pro:thedatasciencenotebook caiomsouza$ node -v
Caios-MacBook-Pro:thedatasciencenotebook caiomsouza$ npm -v

How to Install Homebrew on Mac OS

Read the website

Type on your terminal:

ruby -e "$(curl -fsSL"


Caios-MacBook-Pro:thedatasciencenotebook caiomsouza$ ruby -e "$(curl -fsSL"
==> This script will install:
==> The following directories will be made group writable:
==> The following directories will have their group set to admin:

Press RETURN to continue or any other key to abort
==> /usr/bin/sudo /bin/chmod g+rwx /usr/local/. /usr/local/include /usr/local/lib /usr/local/lib/pkgconfig /usr/local/share /usr/local/share/man /usr/local/share/man/man1
==> /usr/bin/sudo /usr/bin/chgrp admin /usr/local/. /usr/local/include /usr/local/lib /usr/local/lib/pkgconfig /usr/local/share /usr/local/share/man /usr/local/share/man/man1
==> /usr/bin/sudo /bin/mkdir /Library/Caches/Homebrew
==> /usr/bin/sudo /bin/chmod g+rwx /Library/Caches/Homebrew
==> Downloading and installing Homebrew...
remote: Counting objects: 3641, done.
remote: Compressing objects: 100% (3474/3474), done.
remote: Total 3641 (delta 36), reused 726 (delta 26), pack-reused 0
Receiving objects: 100% (3641/3641), 2.94 MiB | 0 bytes/s, done.
Resolving deltas: 100% (36/36), done.
 * [new branch]      master     -> origin/master
HEAD is now at a1ad7ee dynamips: update homepage
==> Installation successful!
==> Next steps
Run `brew help` to get started

Friday, June 26, 2015

Atom 1.0 - A hackable text editor for the 21st Century

Atom is a text editor that's modern, approachable, yet hackable to the core—a tool you can customize to do anything but also use productively without ever touching a config file.

Monday, June 22, 2015

adminpackage4r - Admin Package For R is an easy way to manage your packages in R

Hi Folks,

This weekend I created Admin Package for R, it is still version 0.1 but maybe it will help you.

# Load adminpackage4r
# Specify the list of required packages to be installed and load    
Required_Packages=c("ggplot2", "Rcpp", "plyr", "sqldf");

# Call the Function

Big Data ¿Navegar o naufragar en un mar de datos?

Friday, June 19, 2015

Kylin integration into Pentahos Business Analytics Platform

Kylin integration into Pentahos Business Analytics Platform

Pre-installation Requirements

  • Pentahos Business Analytics Platform (Community Edition):
  • Installed Saiku from Marketplace
  • Cloudera 5.3 VM with TCP Port-Forwarding from 7071 (Host) to 7070 (Guest)
  • Kylin 0.6.4 installed on your VM with at least one successfully built Cube. Kylin has to run on Port 7070. For more information see

See the link below to more details:

Sunday, June 14, 2015

Raffaello D'Andrea: The astounding athletic power of quadcopters

Build Your Own Drone

How To: Set Up Your DJI Drone | Phantom - F450 - NAZA - Installing Software

Setting Up Your DJI Drone with Naza Assistant Software | Phantom, Naza Flight Controller

How To: Make A Drone (Quadcopter)

Here are the specs of the build:
HJ450 Frame - Black and White Arms (DJI F450 Look Alike) -
Naza M-V2 w/ GPS and PMU -
Spektrum DX7s 2.4 GHz TX -
Spekrtum AR8000 2.4 GHz RX - Came with DX7s TX
Hobby King 30 Amp ESCs -
Cheetah 2217-08 Motors (1100kV, 200W) -

Saturday, June 13, 2015

How to install python-louvain 0.3 (Louvain algorithm for community detection) on Mac OS

Install community library:

Louvain algorithm for community detection

1) download from:

2) Unzip python-louvain-0.3.tar.gz

3) Run on terminal the command sudo python install in side the unzip folder called python-louvain-0.3

4) Restart ipython notebook

5) Try it.

Last login: Sat Jun 13 10:25:49 on ttys000
Caios-MacBook-Pro:u-tad caiomsouza$ sudo python install
python: can't open file '': [Errno 2] No such file or directory
Caios-MacBook-Pro:u-tad caiomsouza$ ls
Mod1            Mod15            Mod5            contributors.txt
Mod10            Mod16            Mod6            material-internet
Mod11            Mod17            Mod7            planning_EDS_2ED.pdf
Mod12            Mod2            Mod8
Mod13            Mod3            Mod9
Mod14            Mod4            actividades-utad.xlsx
Caios-MacBook-Pro:u-tad caiomsouza$ cd Mod9/
Caios-MacBook-Pro:Mod9 caiomsouza$ ls
GD_M09_Grafos_SNA.pdf    datasets        python-lib        slides
Caios-MacBook-Pro:Mod9 caiomsouza$ cd python-lib/
Caios-MacBook-Pro:python-lib caiomsouza$ ls
Caios-MacBook-Pro:python-lib caiomsouza$ ls
python-louvain-0.3        python-louvain-0.3.tar.gz
Caios-MacBook-Pro:python-lib caiomsouza$ cd python-louvain-0.3
Caios-MacBook-Pro:python-louvain-0.3 caiomsouza$ ls
PKG-INFO        community        setup.cfg
README            python_louvain.egg-info
Caios-MacBook-Pro:python-louvain-0.3 caiomsouza$ sudo python install


Caios-MacBook-Pro:python-louvain-0.3 caiomsouza$ sudo python install
running install
/Users/caiomsouza/anaconda/lib/python2.7/site-packages/setuptools-14.3-py2.7.egg/pkg_resources/ PEP440Warning: 'llvmlite (0.2.2-1-gbcb15be)' is being parsed as a legacy, non PEP 440, version. You may find odd behavior and sort order. In particular it will be sorted as less than 0.0. It is recommend to migrate to PEP 440 compatible versions.
running bdist_egg
running egg_info
writing requirements to python_louvain.egg-info/requires.txt
writing python_louvain.egg-info/PKG-INFO
writing top-level names to python_louvain.egg-info/top_level.txt
writing dependency_links to python_louvain.egg-info/dependency_links.txt
writing entry points to python_louvain.egg-info/entry_points.txt
reading manifest file 'python_louvain.egg-info/SOURCES.txt'
writing manifest file 'python_louvain.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.5-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/community
copying community/ -> build/lib/community
creating build/bdist.macosx-10.5-x86_64
creating build/bdist.macosx-10.5-x86_64/egg
creating build/bdist.macosx-10.5-x86_64/egg/community
copying build/lib/community/ -> build/bdist.macosx-10.5-x86_64/egg/community
byte-compiling build/bdist.macosx-10.5-x86_64/egg/community/ to __init__.pyc
creating build/bdist.macosx-10.5-x86_64/egg/EGG-INFO
copying python_louvain.egg-info/PKG-INFO -> build/bdist.macosx-10.5-x86_64/egg/EGG-INFO
copying python_louvain.egg-info/SOURCES.txt -> build/bdist.macosx-10.5-x86_64/egg/EGG-INFO
copying python_louvain.egg-info/dependency_links.txt -> build/bdist.macosx-10.5-x86_64/egg/EGG-INFO
copying python_louvain.egg-info/entry_points.txt -> build/bdist.macosx-10.5-x86_64/egg/EGG-INFO
copying python_louvain.egg-info/requires.txt -> build/bdist.macosx-10.5-x86_64/egg/EGG-INFO
copying python_louvain.egg-info/top_level.txt -> build/bdist.macosx-10.5-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating dist
creating 'dist/python_louvain-0.3-py2.7.egg' and adding 'build/bdist.macosx-10.5-x86_64/egg' to it
removing 'build/bdist.macosx-10.5-x86_64/egg' (and everything under it)
Processing python_louvain-0.3-py2.7.egg
Copying python_louvain-0.3-py2.7.egg to /Users/caiomsouza/anaconda/lib/python2.7/site-packages
Adding python-louvain 0.3 to easy-install.pth file
Installing community script to /Users/caiomsouza/anaconda/bin

Installed /Users/caiomsouza/anaconda/lib/python2.7/site-packages/python_louvain-0.3-py2.7.egg
Processing dependencies for python-louvain==0.3
Searching for networkx==1.9.1
Best match: networkx 1.9.1
Adding networkx 1.9.1 to easy-install.pth file

Using /Users/caiomsouza/anaconda/lib/python2.7/site-packages
Finished processing dependencies for python-louvain==0.3

Note for Linux Ubuntu Users:
If you are using Linux Ubuntu you will maybe need to install setuptools.

export PATH=/opt/anaconda/bin/:$PATH;
ipython notebook