Archive for 22. May 2008

Analytics in the Cloud

I attended a webinar today sponsored by Amazon and Vertica called “Data Analytics in the Cloud“.  The Vertica portion was mostly a duplicate of a prior web cast, but the Amazon portion on the Cloud concept was very interesting.  The key points of the cloud concept as I see it are:

  • Pay as you go model - you only pay for the disk space and processing you consume. No start-up costs, but you have to sign a contract. (They claimed the cost was 1/2 that of an in-house solution)
  • Time to market - hours instead of weeks to turn up a terabyte sized system, including hardware, OS, and the Vertica column based database
  • On-demand scalability - seamlessly scales to meet your demand
  • Proven platform - hosted by Amazon on the same platform that hosts the Amazon.com site

I think the benefits of this approach are obvious, especially for a small but rapidly growing operation.  The infrastructure and software license costs alone would be prohibitive, and time to market is critical especially when launching a new idea.

The downsides include:

  • Security concerns, especially for highly sensitive customer data
  • Performance - both in terms of loading large amounts of data and in real-time queries
  • Long-term cost - as with any usage based cost model, the upfront savings could be surpassed by subsequent usage fees

Buzzword: “ODS”

The Operational Data Store (or ODS), is classically defined as a physically integrated view of all or part of the transactional data environment.  The term is generally used in conjunction with a Data Warehouse (DW) and Data Marts (DM) to form the analytical data architecture triumvirate.

The ODS typically distinguishes itself from the DW and DM in two ways:

  1. Latency - the ODS is generally populated more frequently than a DW or DM, and newer systems offer near-real time access to underlying transactional data, either via virtual data integration or trickle feeds that populate the ODS on a continuous basis.
  2. Data structure - the ODS is typically a more normalized model than a DW or DM.  This facilitates lower latency refreshes as the model more closely matches the transactional system.  This also support the type of reporting and data distribution methods typically seen with an ODS, e.g., spreadsheet like operational reports, or data feeds to a customer care application.

In additional to providing the DW/DM with a clean integrated view of the transactional environment, the ODS directly supports business groups such as Customer Care.  Integrating care applications with the ODS allows for richer customer data for screen pops, real-time insight into multiple communication channels, and access to all products and services for a customer.

That being said, the ODS as was defined 10 years ago is dying, and is being replaced by EII technology that combines virtual and physical data integration with a meta data layer providing end users with a deeper understanding the the data.

|