The Pentaho Reporting Camp

Project news and updates directly from the source

Current Version: 3.6.1

Download Release Notes ChangeLog

Previous Version: 3.6.0

Download ChangeLog

Development Version: 3.7.0

Download Release Notes ChangeLog

What is Pentaho Reporting

Pentaho Reporting is a suite of open-source reporting tools which allows you to create relational and analytical reports from a wide range of data-sources.

 

The Pentaho Reporting Engine is able to create PDF, Excel, HTML, Text, Rich-Text-File and XML and CSV outputs of your data. Our OpenFormula/Excel-formula expressions help you to create more dynamic reports exactly the way you want them. Our open architecture and our powerful API and extension points make sure this system can grow with your requirements.

.. more ..

 

Subprojects and Project Structure

Thursday, September 2, 2010

Fixed rules: More on Parameter processing

In the famous "lets do it right this time" release of Pentaho Reporting 3.5, we introduced the ability to have parameter on a report. Well, it wasn't quite right, with parameters you need to pre and post process the data to make it sound. That was release 3.6. At that point, so our theory, you, dear user, should be happy. But apparently, the beast we created wasn't all pretty.

Well, its PRD-3.7 now and guess what we are improving: Parameters.

So far, the date parameter processing was not quite right. I still wonder why after 5 years of XAction no one complained about that. Sure, XActions only have strings, and any processing or parsing is up to you - and so is the blame if it does not work.

The various system level options on the parameter UIs was .. sub-optimal. (I'm getting better at phrasing it more positively, don't you think?) The list of supported parameter (aka "the list of parameters you shall not use in your report") grew with every release. The Swing UI and the server UI never quite agreed on what setting to accept and how to behave in border cases. Thus creating formulas that worked in both settings was a chore.

And last but not least: Validating parameters and getting them run in a consistent way was difficult. Give a Integer where a Long was expected and you are screwed. Without a error message. Thus even for me working with the parameters was more a easter-egg search than sane designing.

And last but not least: Even the parameter processing order was a bit funny. It works for simple cases, but behaves rather funny for the not so simple ones.

How Pentaho Reporting Processes Parameters


Each parameter in Pentaho Reporting carries at least two formulas that eventually need to be evaluated.

The default-value-formula is used to produce a valid value if the user provided no value.

The post-processing-formula is used to transform the user's input into something more usable or simply to validate arbitrary business rules (a deposit cannot be negative, for instance).

And last but not least, if you reference an other parameter, you expect it to contain the proper post-processed value.

In PRD-3.6, the order of the validation was largely out of sync with those expectations. In fact, post processing was done in blocks, so that parameter were not able to use other post-processed parameter values in their queries. Now that's bad, and I guess Gretchen will be able to share a few unhappy tales in Lisbon about that.

In PRD-3.7, each parameter is now fully processed on its own before the next defined parameter in the chain is processed.

Lets be more formal for a while:

For each parameter defined:
If the value for the parameter is <null>, we compute the parameter's default value and use that as untrusted parameter value. The default-value formula only sees the previously validated parameters.

In a second step, we post process the parameter to get a trusted value. The post processing formula sees the previously validated parameters and the untrusted value. So be careful how you use the untrusted value here, as you cannot trust users and SQL-injections or cross-site-scripting troubles are never to far away.

If the post-processing formula fails with an error the trusted value of the parameter will be null, a warning message will be issued and last but not least we refuse to execute the report. The parameter processing continues with this value set to <null>.

And finally we check the type of the parameter and compare the parameter value against the list of valid key-values. If the value passes this test it becomes a trusted value and will be used in the further parameter processing and ultimately it will be used in the report.

If the parameter fails the test, we report an error, prevent any report processing and continue to validate the remaining parameters using the parameter's default value.

Beginning with this version, the parameter validation also creates the set of validated values after the validation is complete. For a report without any parameter values set, this will yield the default values for all parameters.

So what does this mean for you?


The new schema brings a couple of changes to the way the system behaves. Default values are now context sensitive and can change when the selection for the previously declared parameters changes. Our parameter UIs do not directly use that feature for usability reasons.Automatically changing the user's input is not very nice and confuses and/or upsets people. A lot.

The post processing formulas are now executed in a timely manor and before the default-value or selection for a parameter is computed. This way, you are now able to compute the mondrian-role array in a hidden parameter's post-processing formula and be sure that your datasource sees it.

And last but not least, your formulas wont be able to use values that have not been validated, nor would the report ever include them. Especially with the SINGLEVALUEQUERY and MULTIVALUEQUERY formula functions, this is mandatory. Your database is yours and we all want to keep it that way.

Wednesday, August 25, 2010

New Formula Functions - power to the parameters!

You may have heard rumours that the next release is all about parameter and drill-down. Rest assured, its true. During the last months I started reading a bit about the inquisition and their reliable methods of customer surveys ("Are you a happy catholic? - Yes, ouch!, burn! - choose one") I did keep an eye on what problems come up most.

Guess what. Parameters range pretty high on the scale. And more specifically, dealing with multi-selection parameters is problem zone number one. But hey, we all know that there is no problem that cannot be solved by simply adding more code.

So lets start with a cook-book:

0. Multi-select parameter return their values as arrays and the engine expects arrays when we get parameter values from the outside.

1. Count the number of selected values:

=COUNTA([mparam])

Based on that, check whether multi-selection parameters have a selection:
=IF(COUNTA([mparam]) == 0; "do something if empty"; "do something if not empty")

2. Array Manipulation:


Merge two arrays
=ARRAYCONCATENATE([mparam]; [mparam2])

or merge with static array values:
=ARRAYCONCATENATE([mparam]; {"value 1" | "value2"})

Extract four values from a array: (Leading, Trailing, in between)
=ARRAYLEFT([mparam]; 4);
=ARRAYRIGHT([mparam]; 4);
=ARRAYMID([mparam]; 2; 4);

and the highlight of the show:

3. Query a single value from a database


=SINGLEVALUEQUERY("query"; "result-column"; [query-timeout]) 

4. Query multiple values from a database


=MULTIVALUEQUERY("query"; "result-column"; [query-timeout]; [query-limit]) 


In combination with the hidden parameters with a post-processing function, this can be as powerful as a laser weapon in a medieval village.

On a side note: Post-processing and formula validation is a bit quirky in the current versions (PRD-3.5 to PRD-3.6). Stay tune for the next article to see how a well-defined version (like the one shipping in PRD-3.7) will clear up the fog of confusion.

Thursday, August 5, 2010

Its about time .. a better date parameter handling in PRD-3.7

For a long time date-parameters were probably the most creepy new feature that we had since finishing the Citrus release a year ago. Unlike a text or a number, the absolute value of dates is dependent on the runtime environment.

Yes, you may have guessed it: Timezones - I just hate them. Not that they give you jetlag when traveling, they also mess up date parsing when you are not very very careful.

First, Java does not have a sane date system. It basically just inherits the low-tech mess of C/C++ and tries to disguise that by calling the milliseconds since 1.1.1970 a "Date" object. But at the end of the day you don't deal with dates (the day-month-year part, not the time within a day) - you deal with timestamps.

And thus the date "2010-05-30" parsed with the UTC timezone yields something different than parsing the date "2010-05-30" in "Eastern Time". And if you're not careful your database will happily jump on the wagon and will return a few hours more or less of your data. I can tell you: the IRS loves randomized balance sheets.

Up until now, the parameter handling for date parameters was as random as it could be. In theory, date parameters are supposed to be passed in as standardized ISO formats. The ISO format yyyy-MM-dd'T'HH:mm:ss,SSSZ is used to transport timestamps in a locale and timezone independent fashion. On the BI-Server, however, dates were always passed around using the format 'yyyy-MM-dd' ignoring all time or timezone information. This made it impossible to use time or timestamp parameters there. Ultimately this led to case PRD-2624. Cutting off user input is bad - but it is worse when such bugs stay for a while, as at that point people develop workarounds which (due to the magic of "stay backward compatible and do not break existing reports") block us from implementing a clean fix. So we create a workaround for the workarounds. Bah! Complexity is the enemy of a good design.

Beginning with PRD-3.7 and BI-Server 3.7 date parameters for PRPTs follow a better schema. Each date-parameter can define whether the date given is a client-side, a server-side date or whether it should be interpreted in a fixed timezone (UTC, for instance) regardless of the client's or server's location.

(XActions and all other components work around that problem by declaring everything as string and thus offload the problem into the user space. Use JavaScript to parse your date. The target audience there are sys-admins and technically skilled users anyway - they are used to pain.)

Existing reports now parse the time-stamps correctly and thus at least enable the use of time and timestamp parameters. The timezone for those reports always defaults to server-side processing - and the parameter UI correctly takes that into account. Of course URLs that have the old short dates specified are still accepted - but you may see a deprecation warning in the logs now. And last but not least, the date picker now also allows you to enter time information into the text field if your data-format pattern allows it.

The Pentaho Wiki contains the complete story on how dates should be passed around.

Saturday, July 24, 2010

Java: the holy language of God

After years and years of studying various holy sources I am finally able to present my research results. Looking at the Qur'an (yes, Ministry of Homeland Security, I'm one of them apparently), the Popol Vuh, the Bhagavad-Gita and even a page or two of the Bible clearly proves that God (or Gods, as with omnipresent omniscient entities you can never be sure with whom you are talking right now) speaks Java (and not more than Java Language Specification 1.2).

Every reader of the aforementioned books first notices the convoluted and verbose use of syntactical elements. Every reader of Java source code first notices the convoluted and verbose use of syntactical elements. Of course, this is good, as it gives you time to think while reading the text, without overloading your mind with to many facts in to little time. Linguists found that redundancy in a language improves its error resistance and its understandability.

Along with that, there is a lack of shorthand constructs and cryptic sequences of unspeakable characters. Gods language is clean and simple so that every creature can hear it without getting knots in their brain. It is notable that some print media uses text known from lesser languages like *(x+=y++%8) to signal profanity.

The products and rules of God expressed in the holy books apply to the whole universe. Most of the texts make explicit statements about this "write once, run everywhere" principle. The creator of the universe made his rules .. well .. universal. Being the omnipotent entity he is he could have created a universe where he has no power. But to avoid imploding the brains of those who study the universe he obviously refrained from creating that paradox.

Holy (and maybe not so holy) men (and woman, but less in numbers and usually burning rather quickly if being overly verbose) pointed out that one can find the answer to any of life's problems in those holy books, if you study the text long enough. Smart (and sometimes not so smart) men (and woman, but less in numbers but not burning these days due to worries about the CO2 footprint) pointed out that one can find the answer to any of the core programming problems in the fairly complete Java runtime library, if you study the JavaDoc long enough.


Sadly, even tho the Gods are Java and Java therefore is God, there is evidence that humans are probably created in C or C++.

The first thing you notice when you look at humans is their obvious need for manual resource management. Breathing is a classical example of allocating a resource (air) and freeing it after use (breathing out). There are numerous examples where failure of either allocation or release of the resource caused unrecoverable errors in the program execution.

Looking around the planet, it is also obvious that humans in general do not do a good job in freeing their resources properly. Garbage collection is virtually unknown even tho viable garbage collectors exist. The various resource allocation strategies and conflicts give cause to much grief and pain.

Looking at humans you can see a clear platform dependency. Although it is claimed that the same source or genetic code could be adapted to various environments, humans tend to die rather quickly (and sometimes in fairly interesting ways) if transplanted to a different runtime environment.

And last but not least: A freshly created human is void of any usable strategies and has to spend years and years to build up a reasonable runtime library (which then is over-adapted to the specific environment and fails easily when ported to a different platform (see the section on resource management above). Likewise the C languages themselves come with only the minimum standard library. Extra functionality has to be manually coded or brought in from outside. Humans call this process education, programmers "importing a library". Different educations/libraries usually come with complex and contradictory requirements causing more vendor dependence and thus again pain and grief.


So humans: abandon your erroneous ways, repent and join the forces of god and stick to 100% pure Java.

Wednesday, July 14, 2010

Charting the way YOU want it ..

How would you feel if you knew you could each of the million features of JFreeChart to work within the Pentaho Reporting System? After all, it is a ugly thing for a consultant or system developer to say "no, sorry, can't do that" to a customer or client. Especially when you have that shiny new JFreeChart Developer Guide on your desk telling you that there is a way.

Imagine a world without sentences like "Uh .. sure JFreeChart supports this feature, but it is not exposed in the Report Designer".

Well, with Pentaho Reporting 3.7 you can live in that world right now.

All chart expressions now have the "Chart Post-Processing Script Language" and "Chart Post-Processing Script" properties. The "Script Language" property defines the scripting language in which the actual script is written (so that the more masochistic ones among our users can continue to use JavaScript all day long) and the "Script" property itself then contains the script.

Like the well-known BeanShell- and BSH-Expressions, a set of predefined variables grants access to the outside world.

"chart" is the fully initialized JFreeChart object. "dataRow" grants access to the values of the other named expressions and the data-table "runtime" grants access to the current ExpressionRuntime instance. The runtime itself allows the use of the ResourceBundles (for internationalization) and grants access to the report's defined DataFactories (to get more data from the outside world).

A simple post processing script could look like this (Language = BeanShell):
import java.util.Date;

chart.setTitle("My Title: " + new Date());


But with the ability to fire queries, you can easily add your own line or item marker to explain significant details or enrich the plots with your own styling.

Your imagination (and coding abilities) are the limit now!

Friday, July 9, 2010

Drill-Down Update: Charting and Images now with ImageMap support

Our main focus for Pentaho Reporting 3.7 is locked on improving the user experience when adding links to reports. The DRILLDOWN function and its made-to-measure UI bring us a long way towards the holy grail of being 'easy to use'(tm).

Putting links on text is nice. But since Dashboards became the standard, everyone seems to want to click on hot areas in images and charts as well. In HTML, this problem is solved by providing a image-map. This structure holds the outline of the hot-spots as polygons (or circles or rectangles) along with the tooltip and hyperlink target.

I love to 'borrow' a good idea when I see it.

All content-elements now ship with a new attribute called "image-map", that allows you to provide a image-map for your images. The image-map must be a valid XML fragment, something like this:

<?xml version="1.0">
<map>
<area type="rect" coords="10,10,40,40" 
      href="http://www.google.com" title="A search engine"/>
</map>

Usually such a image-map will be computed at runtime by using an attribute-expression. But computing such maps manually is less fun than it sounds. So if no user-provided image-map is given, our Drawables can compute these maps on the fly as well.

All of our JFreeChart generator expressions now produce a JFreeChartReportDrawable objects instead of a plain JFreeChart objects. All ReportDrawables provide the "getImageMap(Rectangle2D)" method so that output targets that support image-maps can annotate their generated images with links and tooltips.

You can use the new "tooltip-formula" and "url-formula" properties on the chart-expressions to provide a formula for the tooltip and URL generators. Within the formula you have access to several automatically generated fields (in addition to all values of the regular fields at the time the chart is rendered).

Categorical-Charts:

  • "chart::series-key": The current series-value
  • "chart::category-key": The current category-value
  • "chart::series-index": The number of the series
  • "chart::category-index": The number of the category
  • "chart::series-keys": All known series-keys as array
  • "chart::category-keys": All known category-keys as array
  • "chart::value": The current value for the current series and category.

XY-Charts:

  • "chart::x-value": The current x-value
  • "chart::y-value": The current y-value
  • "chart::z-value": The current z-value
  • "chart::series-index": The number of the current series
  • "chart::series-count": The total number of series in the chart
  • "chart::item-index": The current item within the series
  • "chart::item-count": The total number of items within the series.

Pie-, Ring- and Multi-Pie Charts:

  • "chart::key": The current key for the pie-slice
  • "chart::keys": All known keys
  • "chart::item": The current item value for the pie-slice
  • "chart::items": All items
  • "chart::pie-index": The number of the current pie-chart (used in Multi-Pie charts)

At the moment, only the HTML export fully supports image maps. The PDF export already supports tooltips, but the links are not yet working. The work on the full image-map link support into PDF and to enable that stuff in the Swing preview as well should be finished next week.

And remember: The formula-properties of the chart-expressions can use the DRILLDOWN function to compute the links.

Friday, June 25, 2010

How to work with Advanced SQL/MDX Datasources efficiently

When creating complex reports or even a Guided AdHoc Report you usually need to modify the structure of the report's query. And this inevitably means that you use a Advanced SQL or Mondrian/OLAP4J data-source along with a calculated query.

What is a Advanced Data-Source


An Advanced data-source is a data-source that only configures the connection. Instead of configuring the query at design-time, it tries to interpret the report's query as valid SQL or MDX statement. The value for the query can then be computed at runtime using either a formula or any other expression.

Parameter values are specified via the ${PARAMETERNAME} syntax within the query. Ordinary parametrization in SQL data-sources uses JDBC-PreparedStatements for filling in the parameter values. The JDBC-standard limits this kind of parametrization to pure value replacement, and makes it impossible to replace column names or any other structural element, like adding or changing a ORDER BY clause.

When using a formula to compute the query, those limitations are lifted. Parametrization now happens by doing string concatenation to calculate the SQL statement. This now allows you to inject any SQL fragments into your query at any place and in any fashion you like.

="SELECT * FROM Customers ORDER BY " & [Sorting]


Warning - SQL-Injection: With great powers comes great responsibilities


While this technique is powerful, it is also dangerous: Injecting user-supplied values is commonly known as SQL-Injection and a great way for hackers to enter your server. So make sure that you only use properly validated SQL-fragments.

If you do not validate the input, your users can supply dangerous values for Sorting, like this one:
COLUMN; DELETE FROM TABLE;

which would produce this scary SQL statement
SELECT * FROM Customers ORDER BY COLUMN; DELETE FROM TABLE;

which can delete all rows from your table. (Whether this is successful also depends on your database and JDBC driver. But do you want to take the risk?)

Both the Advanced SQL and Advanced Mondrian/OLAP4J data-sources still allow the Prepared-Statement parametrization in addition to the formula computation. So if you have to parametrize user-supplied values, do it via the classical ${PARAMETERNAME} syntax and only use the formula parametrization with validated parameters.

Designing Reports with Advanced Data-Sources


Inside the Pentaho Report Designer, Advanced SQL-Data-sources do not show field names in the various field pickers. The query is only calculated when the report is run, and thus the design-time environment has not enough information to retrieve data from the data-source. This makes it hard to work with such reports - you always have to remember how your columns are named, which is no fun at all.

But there is a trick! There is always a trick ..

A report can have definitions for both the static value and a formula for the query-attribute at the same time. When the report is executed the formula will be evaluated and the static value will be ignored. At design-time, the Report-Designer only uses the static value and ignores the formula.

Add the Advanced SQL datasource to the report and add the formula for the query-attribute to configure the report's behaviour at runtime. Then add a standard SQL-Datasource (or just a plain Table-Datasource) to the report, configure it with a suitable query that returns columns with the same names and types as the computed query and set the report's static query-attribute to the name of that query. You should now see the columns of the standard data-source at design-time and see the data from the calculated query via the Advanced Datasource when the report runs.

Warning: You have to make sure that the Advanced data-source is positioned after the standard data-source or this trick will fail.