There comes a time where numbers on a paper have more weight than the screen on your desk..
Reporting Advertising
What Pentaho Reporting can do for you |
Current Stable |
Previous |
In Development |
|
Pentaho Reporting allows you to refine your raw data into visually appealing reports that convey all the information you need to make better decisions and to get your job done faster. The open architecture of the reporting system and our Open-Source nature makes it a breeze to integrate the reporting engine into your existing systems. Many of the worlds leading enterprises already use our technology to gain a competitive edge. What are you waiting for? Download it now! |
Pentaho Reporting 3.8.3 |
Pentaho Reporting 3.8.2 |
Pentaho Reporting 4.0.0Development for this version has just started. Relax, it will take a while. Crosstabs are coming .. |
Saturday, April 10, 2010
Agile without fast tools aint agile: Tuning our performance ..
As a general rule, I treat such requests as a nice way to test and optimize the performance of our reporting engine. I focus primarily on making small, well, human readable reports fast. Making the huge ones faster is ok and when I get the chance, I happily optimize that as well. But if I have to choose to make smaller reports slower to make the insane ones slightly faster, then I happily resist the change. After all, if your CPU burns 5 hours or 4 hour does not matter, if you are not going to look at the report 9 hours later. But waiting 10 seconds instead of 20 seconds for a report during your work day surely makes all the difference.
During the last few weeks, I once again had such a case. The customer needed to produce a large scale report, probably just to fulfil some ill-thought-out government regulations. But for some reason, the report constantly failed with OutOfMemoryExceptions.
(Yes, this is the moment where a support contract comes in really handy. ;) )
Memory management is usually a rather critical issue. For our reporting engine, it is even more critical, as this engine is based on the idea that all reporting problems can be solved in the available memory, without making a mess in your temp-directory. Actually, I'm way to old to believe in the "throw more memory/CPU/disk-space/nodes" myth. If you can solve the problem efficiently in a embedded-systems scenario, you can always scale up. But if you assume everyone has a high-end system, your code probably wont scale down that nicely.
So ok, we are a all-in-memory engine, and I want to keep things like that for a while. Therefore I work with a assumed limitation of 128MB for normal reports (<3000 pages) and 512MB for anything else.* Less memory consumption for report runs means you can run more reports on your server at the same time. People seem to like that idea.
After digging through the case, running a sample report, I discovered a couple of conditions, where we started to add up memory during the report processing at a rather unhappy rate. During profiling, I also discovered a bunch of non-optimal (polite for: purely crappy) data-structures I introduced years and years ago, which make the problem even worse. Oh, and the customer uses engine version 0.8.9 - not my favourite place to spend my time either.
After loads of tests, loads of profiling, loads of just waiting for results (ye olde MacBook aint that fast), we are now at the happy spot of reporting success.
In 0.8.9 and 3.6.1, this report now runs within the 512MB barrier. It is not lightning fast, but it completes running within 90minutes here, and thus it is fast enough for a nightly batch processing run. (In 0.8.9, the table (HTML, CSV etc) exports needs a lot more memory and thus require access to a full 2GB heap. Luckily that condition had been fixed in the 3.5 codeline.)
In the 3.7 codeline, I eliminated the last few memory hogs and there the same report runs within the 128MB corset. As these changes required some non-trivial API changes, this is nothing I could sanely add to a bug-fix release.
A updated build for the 0.8.9-reporting engine can be found in our Hudson system. Be aware that you also have to replace libfonts with the version supplied here, as it contains other performance fixes (+some API changes) we've made earlier on for a different customer.
Hudson job: LEGACY_classic_engine_core_089_bugfix
While working on that issue, PRD-2579 came up. This case reports that report processing has been slower in the 3.5-versions than it has been in the 0.8.9-versions. A bit of investigation turned out that this is indeed the case and that we better fix that before the higher CPU utilization causes more global warming.
The initial tests showed that PDF generation and print(-preview) was about 4 times faster in 3.5 than it was in 0.8.9-10. But HTML export was slow: 10 seconds vs 30 seconds. As I tend to work primarily with the Swing-preview or the PDF exports, I never noticed that part. BI-Server users tend to see more HTML exports than anything else, and there the slowdown matters.
Adding smarter caches solved the slowdown - which was originally caused by the fix for the table-export memory consumption problem in the 0.8.9-problem. In combination with some other performance fixes, our table export rendering speed is (nearly) back to where it was in the old days, while the PDF speed is faster than ever. (And ya can't complain about a 4x speedup!)
Right now, I'm busy making more bug-fixes for the 3.6.1 release, which is at the moment scheduled for April 22 (.. this year).
Pentaho Reporting 3.7 with the new drill-linking API should be out in the wild within Q2-2010.
As the 3.7-codeline is currently a bit "funny", you might want to check out the 3.6-branch CI-builds instead.
* Subject to change if I ever get access to the BORG-cluster. You will be assimilated, but I have all the CPU time of the world. :)
This blog is brought to you by
I am the software designer and lead developer for the Pentaho Reporting Engine and the Pentaho Report Designer. I started writing the reporting engine 10 years ago and with the help of a great community we formed it into a product that is used in large and small companies around the world.
View my complete profile
2 comments: