Thursday, November 8, 2012

Big Data News Updates

Pentaho obtained a weeks ago a Series C funding of $23 million. They have a nice 5-Minute Marketing Video available that explains their product. Their workbench has four aspects:
  • Data sources. They connect to a variety of data sources and have a data integration platform that can perform joins across data sources, column mappings, i.e., it seems that one can create SPJ queries across data sources.
  • Reports: Once you have created an integrated data source, you can simply create a report by dragging and dropping columns and adding filters. It seems that the expressive power is equal to a SELECT-FROM-WHERE qeuery over the integrated data source.
  • Analysis: Capabilities seem to be a subset of Excel with some visual OLAP functionality like in Excel PowerPivot.
  • Dashboard: This enables the creation of a panel of various linked reports and analyses, including mapping functionality.
It seems that their preferred interaction pattern is for a user to load the relevant subset of their data into main memory and then interact with it. They also seem to have the capability of doing the same analysis at a truly large scale over Hadoop. Their memory scale-out story is based on a distributed caching layer such as 
Infinispan/JBoss Data Grid or Memcached.

A competitor in this space is Jaspersoft, which I will discuss in a later post.

