Melhore a performance de seus Cubos OLAP criando tabelas Agregadas no Pentaho usando o PAD (Pentaho Aggregation Designer)

Amigo leitor,

Um tempo atrás fiz a documentação "Melhore a performance de seus Cubos OLAP criando tabelas Agregadas no Pentaho" e gostaria de compartilhar.

Para acessar o link Google Docs desta documentação, clique aqui.

O que são tabelas agregadas?

Tabelas agregadas são tabelas sumarizadas que armazenam dados em níveis mais elevados do que quando foram inicialmente capturados e armazenados.

Por que eu preciso criar tabelas agregadas?

Cria-se tabelas agregadas com o objetivo de aumentar a performance de um cubo OLAP.

Como criar tabelas agregadas no Pentaho?

A forma mais fácil e rápida para se criar tabelas agregadas no Pentaho é utilizar o PAD (Pentaho Aggregation Designer).

O que é o PAD (Pentaho Aggregation Designer)?

Uma ferramenta gráfica desenvolvida em Java para a criação de tabelas agregadas.

Onde eu faço a descarga/download do PAD (Pentaho Aggregation Designer)?

Até o momento a versão mais recente e estável do PAD é a versão 1.2.0.

Para baixar, clique no link abaixo:

http://sourceforge.net/projects/mondrian/files/aggregation%20designer/1.2.0-stable/pad-ce-1.2.0-stable.tar.gz/download

Outras versões do PAD encontram-se no link abaixo:

http://sourceforge.net/projects/mondrian/files/

Como configurar o Mondrian para reconhecer as tabelas agregadas?

É necessário informar ao Mondrian OLAP Server que as tabelas agregadas existem, para isso adicione as linhas abaixo no arquivo Mondrian.properties localizado em pentaho-solutions\system\mondrian (BI Server 3.5)

mondrian.rolap.aggregates.Use=true
mondrian.rolap.aggregates.Read=true

Feito isso, reinicie o BI Server.

Como habilitar o log MDX e SQL no BI Server 3.5?

Quando uma consulta MDX é executada, o Mondrian transforma essa consulta MDX em uma consulta SQL. Em alguns casos você precisa de mais detalhes, como por exemplo saber se o Mondrian está usando as tabelas agregadas.

Para isso:

Edite o arquivo log4j.xml localizado na pasta \tomcat\webapps\pentaho\WEB-INF\classes

Descomente as linhas abaixo:





<!-- ========================================================= -->

<!-- Special Log File specifically for Mondrian -->

<!-- ========================================================= -->





<appender name="MONDRIAN" class="org.apache.log4j.RollingFileAppender">

<param name="File" value="mondrian.log"/>

<param name="Append" value="false"/>

<param name="MaxFileSize" value="500KB"/>

<param name="MaxBackupIndex" value="1"/>



<layout class="org.apache.log4j.PatternLayout">

<param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>

</layout>

</appender>



<category name="mondrian">

<priority value="DEBUG"/>

<appender-ref ref="MONDRIAN"/>

</category>







<!-- ========================================================= -->

<!-- Special Log File specifically for Mondrian MDX Statements -->

<!-- ========================================================= -->



<appender name="MDXLOG" class="org.apache.log4j.RollingFileAppender">

<param name="File" value="mondrian_mdx.log"/>

<param name="Append" value="false"/>

<param name="MaxFileSize" value="500KB"/>

<param name="MaxBackupIndex" value="1"/>

<layout class="org.apache.log4j.PatternLayout">

<param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>

</layout>

</appender>



<category name="mondrian.mdx">

<priority value="DEBUG"/>

<appender-ref ref="MDXLOG"/>

</category>





<!-- ========================================================= -->

<!-- Special Log File specifically for Mondrian SQL Statements -->

<!-- ========================================================= -->





<appender name="SQLLOG" class="org.apache.log4j.RollingFileAppender">

<param name="File" value="mondrian_sql.log"/>

<param name="Append" value="false"/>

<param name="MaxFileSize" value="500KB"/>

<param name="MaxBackupIndex" value="1"/>

<layout class="org.apache.log4j.PatternLayout">

<param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>

</layout>

</appender>



<category name="mondrian.sql">

<priority value="DEBUG"/>

<appender-ref ref="SQLLOG"/>

</category>

Se você quer que o SQL também seja mostrado, adicione a linha abaixo no arquivo mondrian.properties localizado no arquivo pentaho-solutions\system\mondrian

mondrian.rolap.generate.formatted.sql=true

Reinicie o servidor e procure pelos arquivos mondrian.log, mondrian_sql.log e mondrian_mdx.log na pasta /tomcat/bin.

Como analisar se a query realmente está sendo realizada no PostgreSQL?

Uma das formas é habilitar o log do PostgreSQL caso o seu DW (Data Warehouse) seja o PostgreSQL.
log_statement = all

Adicione a linha abaixo no arquivo postgresql.conf
log_statement = 'all' (linha provável: 354)

Reiniciar o posgres

Links relacionados

Sim, abaixo os links encontrados:

http://mondrian.pentaho.com/documentation/schema.php

http://sourceforge.net/projects/mondrian/files/

http://julianhyde.blogspot.com/2008/10/pentaho-20-brings-good-things.html

http://www.willgorman.com/?p=30

http://diethardsteiner.blogspot.com/2009/07/tutorial-aggregated-tables-for-mondrian_6998.html

Os textos abaixo em inglês foram extraídos do arquivo Pentaho_ce_aggregation_designer_UG_v1.0.pdf (Documentação da Pentaho sobre o PAD)

Pentaho Aggregation Designer Overview

The Pentaho Aggregation Designer simplifies the creation and deployment of aggregate tables that improve the performance of your Pentaho Analysis (Mondrian) OLAP cubes. Pentaho Analysis is a pure, relational OLAP engine that works solely with the data stored in your relational database rather than providing its own multidimensional data storage model. This simplifies deployment and data management, but places limitations on performance when working with very large data sets (fact tables with more than 10 million records and/or cubes with a high cardinality of levels and members). To improve performance in these scenarios, Pentaho Analysis supports aggregate tables. Aggregate tables coexist with the base fact table and contain pre-aggregated measures built from the fact table. This improves performance by enabling the Mondrian engine to fulfill certain summary level queries from the smaller aggregate table versus aggregating a large number of individual facts from the base fact table.

The Pentaho Aggregation Designer provides you with a simple interface that allows you to create

aggregate tables from levels within the dimensions you specify. Based on these selections, the

Aggregation Designer generates the Data Definition Language (DDL) for creating the aggregate

tables, the Data Manipulation Language (DML) for populating them, and an updated Mondrian

schema which references the new aggregate tables. If you are unfamiliar with aggregate table

design concepts, the Aggregation Designer also includes an intelligent adviser that evaluates the

structure and cardinality of your OLAP cube and recommends some initial aggregate tables to

create for improving performance.

PAD - Installation Instructions

The pad-open-1.0-xx.zip file contains all the libraries and script files necessary to run Pentaho
Aggregation Designer. To install the Pentaho Aggregation Designer, unzip this file into a directory of your choice.

To launch the Aggregation Designer on Windows...

Run the startaggregationdesigner.bat script located in the root of your installation directory.

To launch the Aggregation Designer on Linux...
Run the startaggregationdesigner.sh script located in the root of your installation directory.

CAUTION: Place your JDBC driver JARs in the Drivers directory. Once in this directory, the drivers are added to the classpath automatically when the Pentaho Aggregation Designer starts.