Archive for February 2009

Green Data Warehouse Top 10

I just published the second in the series of “Green Data Warehouse” articles in BeyeNetwork.  This article, “Top 10 Things You Can Do to Improve Energy Efficiency“, provides a pragmatic list of 10 things you can do immediately to reduce consumption at the system level.

Joe Foley (CTO Illuminate) left a comment on the benefits of “overloading processors rather than I/O”.  The EPA study cited in my article found that CPUs consumed about 31% of the energy in an average system, more than any other component and more than 6 times the amount consumed by disks.  So I think making more efficient use of CPU (and therefore reducing the number of CPUs) would be more beneficial than reducing disk.

Also, I want to thank Scott Humphrey (Humphrey Strategic Communications) for helping pitch my article series to BeyeNetwork.

New to the list - InfoBright & Aster Data

I’ve added two new (to me) database appliance vendors this past week, InfoBright and Aster Data.  The revised vendor matrix is attached.

I spoke with InfoBright CEO Miriam Tuerk about their product, and my initial reaction was that this was just another run of the mill column oriented database built on an open source platform.  After the discussion three key differentiators stood out:

  1. InfoBright is a true open source product - their Infobright Community Edition (ICE) contains 90% of the product functionality and is provided as a free download
  2. MySQL extension - turns the ubiquitous open source database into a high powered data warehouse engine 
  3. “Knowledge Grid” - this component of their architecture is unique in this space and gives Infobright a huge competitive advantage in scalability, performance, and maintainability

Their business model is structured to mirror that of MySQL, with a revenue stream tied to support, training, and some consulting.  But the secret sauce is the combination of open source availability with innovations in the architecture.  The Knowledge Grid, combined with their Data Pack storage method, provide linear scalability, massive compression, and query acceleration.  Miriam provided several case studies that showed both rapid deployment (under 24 hours in one case) and extreme compression (over 30x).  Under the covers its a column store database built on Red Hat Linux.  They currently run on an Intel or AMD platform, but are planning a Windows and Solaris version this calendar year.  I’m looking forward to continuing discussions with Miriam this week, and may co-author an article with her.

I also spoke with Steve Wooledge from Aster Data this week.  He gave me an overview of their nCluster database.  Built on top of the Postgress database, this MPP platform offers extreme scalability through a clustered architecture.  MySpace uses nCluster to collect large amounts of information every hour for analytics purposes, requiring only 1/2 a resource to maintain the system - testament to their claim of “hands-off” system management.  They also run on standard Intel x86 machines, and have recently launched a “green” initiative whereby they give customers credits for each piece of existing hardware they reuse.  They have also launched a cloud version of their software, nCluster Cloud Edition, that runs on Amazon Web Services.  The only concern I have is Steve didn’t have a good answer to the question around long-term management of hot-spots in the MPP environment, although the MySpace example seems to show they have a solution in place.

illuminate

I’m getting a lot of interest recently in the data warehouse appliance chart I’ve been maintaining.  I just spoke with Joe Foley, CTO of illuminate and added them to the spreadsheet.  Their flagship product, iLuminate, stores data in a “value based storage” methodology that is neither row nor column.  Essentially each unique data element in the database is stored, with all relationships (forward and reverse) captured in a pointer fashion.  According to Joe this enables the database to realize significant compression of greater than 50%, while being able to scale in a linear manner without bound (except for 64-bit addressing limit in the hardware).  The only run on a Windows platform, but are planning on rolling out a Linux based system in 2009.  illuminate also has an analytical package called iCorrelate, which provides ad hoc reporting and analysis capability.

I have calls scheduled this week with Infobright’s CEO Miriam Tuerk and Aster Data.  Will post a new sheet at the end of the week.

|