The Pentaho Reporting Camp

Project news and updates directly from the source

Current Version: 3.6.1

Download Release Notes ChangeLog

Previous Version: 3.6.0

Download ChangeLog

Development Version: 3.7.0

Download Release Notes ChangeLog

What is Pentaho Reporting

Pentaho Reporting is a suite of open-source reporting tools which allows you to create relational and analytical reports from a wide range of data-sources.

 

The Pentaho Reporting Engine is able to create PDF, Excel, HTML, Text, Rich-Text-File and XML and CSV outputs of your data. Our OpenFormula/Excel-formula expressions help you to create more dynamic reports exactly the way you want them. Our open architecture and our powerful API and extension points make sure this system can grow with your requirements.

.. more ..

 

Subprojects and Project Structure

Tuesday, July 21, 2009

Charting in Citrus - hardened and extended Chart-Expressions

As we are now reaching RC stage within the next week or so, I think its time to spread some light on the new stuff to come for our Charting.



The charting in Pentaho Reporting was always a very delicate matter. Charting was (and is) implemented via a dual system of Chart-Expressions and Data-Collector functions. The data-collectors are responsible to produce the datasets, while the chart-expressions consume the data-set and produce a JFreeChart object. As JFreeChart-objects are Drawable-implementations, the engine can render them as vector-image for high-quality output.



But until recently, there was a creepy pattern when using our chart-related expressions. Make an error in the data-collector configuration, and you are killed. Define an invalid value - killed. Null values - killed. A invalid property in the chart-expression? Kaboom. Killed big time. An invalid combination of otherwise valid properties? Bloody nightmare. You are dead. Over time, we caused more dead developers than the browser wars at the beginning of the century. Many of these bloody incidents would have been avoidable by just applying defensive programming patterns. The old code behaved generally very unfriendly and rough. Wild casts, no check for invalid values or null-pointers caused all kind of funny exceptions. When asked what is wrong with a certain report-definition, I usually fired up the debugger to understand why an - from the outside - perfectly sane chart-definition failed to produce a chart. (Yeah, I could have resorted to guessing, but you can't come up with the illogical and arcane rules needed to understand what's going on.)



As the Citrus release is our big-overhaul, and everything is under revision and no back-pocket of insanity is left untouched. It was not surprising to find charting on the list of things to work on.



Now when creating new charts in Citrus, the first thing you will notice is that the beast (yes, beast!) no longer bites. The majority of properties has been hardened against invalid or missing values, and once arcane "free-form-text" properties are now equipped with real property-editors to guide you towards the few acceptable inputs.



When you create a chart, you first have to set-up the data-collector. There are currently 5 types available:





  • a Categorical-DataSet-Collector for Bar-Charts,

  • a Pivoting Categorical DataSet Collector, which reads series data by rows, instead by column

  • a XY-DataSetCollector, for numeric data charts

  • TimeSeries Collector, for plotting Date-Values on the X-Axis

  • a XYZ-DataSet-Collector for 3-Dimensional charts.




Thanks to the backward-compatibility promise, we cannot go in an change the old collectors without breaking existing reports. So we decided that creating new collectors was the smartest option. May the old code rot in hell. In most cases, the new collectors produce the same results as the old ones. As part of the hardening, they react differently to invalid types and null-values, as they now ignore the invalid values instead of crashing the whole chart-creation process. The Pivoting-Collector was built from scratch, based on how the original one was expected to work originally. The collectors now have a greatly reduced complexity and a smaller set of properties needed to configure them. And while we were at it, we removed the Number One trouble maker, the "summary-only" flag, as this can be safely deducted from other properties.



The Chart-Expressions themselves have seen a bit of change as well. WebDetails, the creators of the Community Dashboard Project, contributed logarithmic axes and a smarter way to generate human readable labels for them. Roman Wild contributed a Radar-Chart implementation. And finally we went through our own support-cases bucket and addressed many of the issues reported there. As a result, Citrus-PRD now allows to configure the tick-mark generation on the axes, the Bar-Line chart can have shared y-axis and the percentage-mode on stacked-area-charts finally works too.



Oh, and the chart-editor looks a lot better now, thanks to the hard work of our UI-designer Brett.

Saturday, July 4, 2009

Citrus PRD and XActions: Redefining an out-worn relationship

From the early days of Pentaho's Platform, XActions were needed to get meaningful input out of the system. XActions started as sort-of workflow processes. But very soon (thanks to crappy work-flow engines and an ever-present need for always more flexibility) they evolved into a full-blown programming language with loops and conditions.



At the time Pentaho Reporting (at that time still known as JFreeReport) was integrated, our reporting engine did not bother to provide datasources. As we started as a reporting engine for desktop applications, one of our assumptions was, that these applications already have a working data-model and dont need yet another one forced down their throat. So we rather took a free ride on the application's table models than to waste time reinventing the wheel.



With no built-in datasources nor parameters, XActions had to do all of the data-preparation work.



When the first report-designer came into play, things started to get weird. The old report-designer shipped with an own report-definition format, which incorporated datasources, but was not understood by either the reporting engine nor the BI-Platform. To make reports run, the reports had to be exported (or as we called it: published) into the engine's native XML format and a XAction describing the datasources. This export was a one-way road, the resulting artefacts could not be safely edited by the report-designer.



Ever since these days, there is a growing disparity between the capabilities of the report-designer's datasources, the datasources the reporting-engine supports natively, and the various data-supplying components the Pentaho-Platform can utilize. Once we started to integrate Parameters into the reporting-engine, things started to shift apart even more.




Problem zones



  1. Storing information in XActions is a one-way street


    The biggest problem we faced were the XActions itself. XActions are insanely flexible (for a good reason) and highly expressive. But thanks to that power and flexiblity, we cannot safely parse XActions back into datasources the report-designer could use. It is a interesting research area to interpret source code (and it really doesn't matter whether you write it in C-Syntax or XML) and to map all inputs to an output.



    So our report-definitions always have to contain datasource information so that we can edit that information later. No one wants to alternate between Report-Designer and Design-Studio all the time while editing reports.


  2. XActions duplicate information and XActions are user-editable


    Resulting of the problems we have extracting information out of XActions and our need to keep our own datasource-information in the report-definitions, a few problems arise:



    Both the report-definitions and the XACtions can be edited independently. As long as only the report-definition *or* only the XAction is edited, everything is fine. But that case is not very likely. The report-designer can always generate a new XAction, but this will erase all other changes made to the XAction. In return, the design-studio cannot update the report-definition when the data-components have been edited. Doing so would require full knowledge of what the XAction is going to do when being executed.



    So for most cases, the report-definition contains exactly the same information as XActions, but in a declarative style instead of a procedural programming language.



    This leads us to:


  3. Plain, auto-generated XActions have no added-value over report-definitions


    In the Platform versions prior to Citrus in the majority of cases XActions will be the auto-generated ones for reports that do not use Parameters. For reports, which need parameters, XActions will contain additional components to get the Platform's parameter UI in place and to validate and pass the parameters to the engine.



    The lack of built-in parameters in our reporting engine greatly kept our support-department busy. A non-technical person seldomly has love for programming in XML in a separate tool than the tool they created their report in. They probably also don't like to create fake-queries to get the report-designer to work, nor do they like to battle with the data-components in the XAction later to make the queries there parametrizable.



    Like with data-sources, we need to store the parameter-information in the report-definition, to make it editable later. So with the new capabilities of the Citrus-Engine, the built-in features of reporting-engine cover a lot more of what previously required a second editing step.




  4. So far, there are only two major cases left, where a separate XAction proves valuable. The first case opens when complex pre-processing of reports is needed. The reporting-engine's datasources are tailored to fix the simple "give me query, I give you data" case. Anything that cannot be expressed in a single script or requires multiple processing steps is better handled by a rich language like XActions provide. The second case comes up, whenever the report itself is not the end-result of the processing, as it happens regularly in Bursting scenarios.


How the reporting engine integrates into the platform








Looking back




In the Pre-Citrus releases, all Pentaho-Reporting activities in the platform were channeled through the "JFreeReportComponent". Aside from the obsolete naming, this component has some severe problems to start with.



When running the report, this component happily discards all datasources that may have been defined in the report-definition. If it's not defined in the XAction, it is not true. Likewise, it replaces the resource-bundle localization mechanism and performs some odd attempts to parametrize the report.



Subreport-datasources are defined via sub-components defined inside the component definition of the JFreeReportComponent. When they get executed, we do some magic hacks to make them work as if they were part of the XAction, but I wouldn't bet my life that other than the tested few components would behave so generously by not crashing.



Thanks to a Microsoft style "Stay backward compatible no matter the cost"-policy, we cannot go in and fix the component, as this may break existing XActions. And forcing an administrator of 3000+ reports to edit each one of them to cope with our changes somehow doesn't sound nice either.




Heading Into the Future




So for the sake of old, pre-Citrus reports, we leave the JFreeReportComponent behind, so that it is free to rot in a corner, and concentrate on a new lightweight component instead.



This component duplicates the functionality of the PRPT-content generator, which is used to execute our PRPT-report-definitions when there is no XAction.� The component publishes the report's parameter information to the BI-Server in the same way the Secure-Filter-Component triggers a parameter prompt. It accepts parameter values from the outside and validates them against the report-definition's data, and finally, it executes the report, letting the engine use the report-definition's datasources to query the data itself.



If data is pre-processed by the BI-Platform, then the engine's "External DataSource" provides a controllable and well-defined way to inject that data into the engine's processing. The External-DataSource interprets the value of an parameter as TableModel and returns that model when being queried by the engine during the report-processing. This is a much more reliable way to feed the engine than to scrap all datasources.



By separating the information for the report-processing and the optional pre- and post-processing that happens in the engine, we no longer have to duplicate information in the XAction. The XAction, if needed, can concentrate on its own responsibilities, and editing either the XAction or the report-definition no longer wrecks havon on the other.



The engine itself also adds a capability or two to eliminate the need for custom steps in the XAction. Beginning with Citrus, we provide a Scriptable-DataSource, which allows to construct TableModels at runtime via any of the languages supported by Apache's BeanScriptingFramework. And to solve the cases where a Report- or Wizard-Specification needs to be created or altered before the report-processing starts, we provide ReportPreProcessors to make this task more efficient than before.