Performance tuning

Contents:

These are some instructions on improving the performance of the Pentaho BI server, and hence make your reports run faster.

Before you proceed, you might want to consider some words of wisdom from Matt Casters, Chief Data Integration at Pentaho:

" On a higher level please remember that reports that need that much memory are not providing any information anyway. "

Select appropriate components

Pentaho works with other components to deliver a full BI infrastructure. Some things to consider are:

  • Hardware
    • Have you got adequate RAM, CPU, disc types, disc space, bandwidth, etc.?
    • Is the box dedicated to Pentaho or sharing with other services (virus scanning, network services, etc.)?
    • Are the databases on the same machine as Pentaho?
  • Architecture
    • Use software optimised for the architecture you are using, e.g. a 64-bit OS and application stack on 64-bit CPUs
  • Operating system
    • What is optimal for your environment and skills?
    • We have best results with Linux
  • Database
    • The Hypersonic version of Pentaho is intended for evaluation and is not suitable for production use
    • MySQL works well for most loads, but for larger volumes you might wish to consider something else, like PostgreSQL
    • we strongly recommend that your reports run from a data warehouse (not your transactional databases) optimised for the kinds of reports that you wish to run
  • Types of reports and queries to run
    • Some kinds of reports (especially those which use Mondrian) are more intensive than others

Database — MySQL

These tips apply for MySQL, but equivalents exist for most other major databases.

The settings should go into your MySQL configuration file, normally at /etc/my.cnf or /etc/mysql/my.cnf

  1. Enable the slow query log
    log_slow_queries=/path/to/logfile
    log_queries_not_using_indexes=1
    long_query_time=10
  2. Enable the query cache
    query_cache_size=1000M
  3. Increase the size of the result-set cache
    query_cache_limit=100M
The amount of memory you give to the cache should be related to how large your queries are. For best results, make sure the cache is large enough to fit the majority of your queries.

Java

See Java.

Mondrian

See Mondrian.

Scheduler settings

Sometimes it can help to tune the schedulers in the operating system itself. The following instructions are for Linux.

You can set the CPU priority with the 'nice' and 'renice' commands. You might want to decrease the priority to prevent Pentaho from stealing CPU time away from other processes. Alternatively, you might want to lower the priority of another process to give Pentaho a bigger share, or bump up Pentaho's priority to increase its responsiveness. Note that this might slow down some other important processes, so use with care.

ionice is an equivalent for I/O scheduling. If I/O is a problem, some tuning might be in order.

Using 'taskset', you can restrict a process to specific CPUs/cores.

Another useful scenario for these utilities is on a development workstation. You can lower the priority of Kettle or the Pentaho server to prevent intense ETL or reporting operations from reducing responsiveness of desktop applications.

last modified by sd on 04/04/2009 at 16:07

Creator: sd on 2008/08/08 14:49
XWiki Enterprise 1.7.2.16857 - Documentation