The Pentaho Reporting Camp

Project news and updates directly from the source

Current Version: 3.6.1

Download Release Notes ChangeLog

Previous Version: 3.6.0

Download ChangeLog

Development Version: 3.7.0

Download Release Notes ChangeLog

What is Pentaho Reporting

Pentaho Reporting is a suite of open-source reporting tools which allows you to create relational and analytical reports from a wide range of data-sources.

 

The Pentaho Reporting Engine is able to create PDF, Excel, HTML, Text, Rich-Text-File and XML and CSV outputs of your data. Our OpenFormula/Excel-formula expressions help you to create more dynamic reports exactly the way you want them. Our open architecture and our powerful API and extension points make sure this system can grow with your requirements.

.. more ..

 

Subprojects and Project Structure

Thursday, November 19, 2009

Keeping the promise .. ReportPreProcessor documentation

The reporting Wiki now contains a new page explaining the uses and characteristics of the Report-Pre-Processors of Pentaho-Reporting.

http://wiki.pentaho.com/display/Reporting/PreProcessorIntroduction

As the pre-processors in PRD-3.5 do not contain usable property-translations, use the Lemonade-PRD to work with them.

Tuesday, November 17, 2009

Formula-Functions for simplified parametrization in Pentaho Reporting Lemonade

Creating parametrized reports or combining several of these reports via drill-down-links is not necessarily the most enjoyable of all activities when it comes to writing reports.

A drill-down report is simply a report, that has a click-able link somewhere. The link itself then points to another report and also contains all the parameters to actually see that report run.

Within the Pentaho Platform, there are two URLs that are responsible for showing a report:

1. For XAction driven reports: Call the XAction handler with a suitable XAction and all the parameters

http://localhost:8080/pentaho/ViewAction?solution=samples&path=getting-started&action=HelloWorld.xaction



2. For PRPT-Reports: Call the Reporting-Plugin's content handler with the report file and the parameters.

http://localhost:8080/pentaho/content/reporting/reportviewer/report.html?solution=samples&path=getting-started&name=HelloWorld.prpt


URLs are usually added via an style-expression on the links::url style-property with a formula similar to this one:

="http://www.google.com/search?q=" & URLENCODE([field])



This works reasonable well for Strings and Numbers. Complex types like Arrays or Date objects, however, need to be specially formatted. The new PARAMETERTEXT function provides a easy option to get raw data objects into the right format for parametrization in drill-down reports.

="http://localhost:8080/pentaho/content/reporting/reportviewer/report.html?solution=samples&path=getting-started&name=HelloWorld.prpt&parameter=" & URLENCODE(PARAMETERTEXT([field]))


To format values inside a formula via a format-string, we now offer the MESSAGE function and extended the TEXT function to allow more control when creating strings.

The MESSAGE function uses a java.text.MessageFormat to format values into text.

=MESSAGE("{0,number} chicken crossed the road on {1,date,short} to {2}"; ( 100 * 50 + 10); DATEVALUE(20, 5, 2005); "to visit grandma")

The TEXT function simply converts a value into text.  The optional second parameter specifies a number or date format to convert the value.

=TEXT(NOW(); "dd-MM-yyyy")

Thursday, November 12, 2009

Firewalling a Java-Application

PRD-2078 landed on my desk. 

The problem


There are some weird administrators out there, who turn their internal company firewalls into a black-hole. Requests don't get answered with a clear "no-route-to-host" ICMP-message, nope, they just get swallowed letting the application hang in a limbo waiting for the timeout.

For the Pentaho Report-Designer, this is a bit ugly. Open a file that has a resource from a non-existing host, and you instantly run into problems. In our system, there are many possible sources of network communication: URLs get read, JDBC connections opened, depending on what datasource you use, we may throw in a RMI or raw-socket connection as well. In short: If the network does not play nice, we are hosed.

Phase 1: Avoidance


Now, from a application developer perspective, trying to prevent network connections by tiptoeing around all the places that may trigger network uses is a bit futile. Heck, in the same way you could prevent a economic meltdown by tiptoeing around naming who is broke and who is not.

Preventing reading images from URLs may work, as there are only one or two places where we do that. But even just finding out whether a JDBC driver uses the network, or with which host it may communicate is futile. JDBC URLs are not standardized at all. The standard says "let 'em start with the string "jdbc:", followed by a vendor specific string. They recommend, that the JDBC URL follows the URL schema. But then, there are tinker-shop companies that just feel that random, undocumented schemas are so much more fun, and present you with URLs like this:


jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS= (PROTOCOL=TCP)(HOST=lcqsol24)(PORT=1521))(ADDRESS=(PROTOCOL=TCP) (HOST=lcqsol25)(PORT=1521))(FAILOVER=on)(LOAD_BALANCE=off)) (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=snrac)))


Yeah, I always wanted to implement custom parsers for all the JDBC drivers in the world. 

And jus for the fun of it, lets assume that I go that way, and cover 90% of the cases. Now all it needs is
* a JNDI defined datasource: JNDI datasources are black-boxes, and we have no way to look into it to see the JDBC-URL
* a Kettle transformation: Hundreds of components able to talk over the network. Some of them even don't know with whom they communicate until they run.
* a JavaScript or BeanShell Scripting expression: Heck, who knows what's going on in there?

So slapping workarounds into the code does not help our case. Damn!

Phase 2: Remember thy past


Back in the old ages, when the seas were full of fish and the lion slept next to a sheep without thinking about a tasty meal, we all learned that Java has a unique feature that prevents Applets from connecting to other hosts.That feature was called, a sandbox, enforced by a SecurityManager.

Cool, sounds like an option to me. Slap on a properly configured security manager and we should be ready to go. Implementing a simple security manager that only checks for SocketPermissions and grants everything else was easy. Adding it was easy as well.

But: Once you add a security manager, performance goes down to a crawl. Even if the security manager is an empty implementation, performance is cut in half. When monitoring the process, you will also suddenly notice, that the process spends half of its time in the kernel, and only half of its time working in the user code.

Why?

'Cause checking permissions is expensive. Code to check permissions looks like that:


SecurityManager m = System.getSecurityManager();
if (m != null)
{
  Permission permission = new RuntimePermission("*");
  m.checkPermission(permission);
}



Looks innocent. But nonetheless, due to the massive amount of security checks in the JDK, this starts to sum up to a huge performance drag. So matter what we do: Slowing down that much is not an option, as by far more people will complain about bad performance than people complain about bad networking code.

Phase 3: Going low-level: Sockets, SocketImpl and SocketImplFactory


Since the good old days of JDK 1.0, the Socket class was separated from its native implementation, so that the socket-backend can be replaced by a vendor easily. Smells good. Now, if we can go in and provide our own SocketImplFactory, that then creates a filtering wrapper around the existing sockets, we should be ready to go. After all, to talk to the net, you need to create a Socket first.

Half an hour later, after reading through the source code of java.net.Socket and its related classes, I'm sober again. There classes that are there are insufficient to actually create an own Socket wrapper in a reasonable amount of time. Internally, Socket uses SocksSocketImpl, a package protected class in the java.net package. This class provides the ability to talk to the net via proxy-servers. As this class is package-protected, it is not accessible from the outside world. So if we use our own SocketImplFactory, we either have to drop the ability to use Proxy-Servers, or we have to reimplement the various proxy-server protocolls. Yeah, lets implement a insane amount of code just for the filtering.


So we have a theoretical ability to provide a SocketImplFactory, but due to the state of the implementation of the JDK, we cannot use it without reinventing the wheel. Thanks, Sun, for not helping.

Phase 4: Enlightenment though reading code ('cause there is no other high-level documentation)


While reading the socket code, I encountered a class called ProxySelector, which is called most of the time, when creating a socket. This class is intended to select the correct proxy-server for a given address. Well, at that point I'm not choosy any more - if a flying goose with pink elves riding on it would come along and would promise me a solution, I would not hesitate to follow that lead.

Implementing a ProxySelector is easy, installing it is easy, and although it is not designed for filtering, throwing RuntimeExceptions from within that selector is easy too. It feels like the most devilish hacking, with no sense for cleanliness at all - but (to quote Marc Batchelor) "..., but it works!".

Aftermath


The Proxy-selector is not used for UDP-communication, so the final solution will include the SecurityManager as well. The security-manager part will be disabled by default - there is no need to punish the masses for the misbehaviours of a few sinners. If you have a JDBC-driver that talks via UDP, and you have a paranoid firewall administrator as well, then you will have to tweak a few settings to get it working. But that's the price you pay for surrounding you with sinners.

Tuesday, November 10, 2009

Teasing the masses: PRD-3.6

The next release of the reporting-package is in the forge and a milestone release seems to be unavoidable now. It's amazing how many small bugs and issues add up once you make the decision to make the release. The minute after the commit-ban is in place to let the release build, bugs start to float in. Luckily, we are no evil mega vendor, so we can be agile about it and just make sure the next release hits the market faster than the bugs hit us. So far, we already squashed 68 cases, half of them bugs, and half of them new features that make bridge gaps we identified or just were easy to implement. (Yes, laziness is one of the deciding factors in what new feature should go in first.)


The surely most needed improvement was the page-setup dialog. For some foolish reason, we made the assumption that when the JDK comes with a Page-Setup dialog, then it would be usable. But JDK 1.5 is badly broken and the fixes in JDK 1.6 just made this functionality broken in other interesting ways. So far, Sun hasn't managed to deal with missing attributes in responses from a CUPS server that breaks printing on Linux, and likewise wasn't able to test whether a printer-driver is installed on Windows (which gives a NullPointerException if you haven't one installed). Well, I cant fix the JDK, and printing with Java might be a adventure I dont want to get back to any time soon.


But Pentaho-Reporting has some nice features that have been hidden in the past. One of them is our ability to print on a large virtual page (also known as poster-printing). So if you ever wanted to see your company presentation on a 10 by 10 meter paper but only have ordinary desktop printer available, we can make that dream come true.