Buzzword: “OLAP” & “OLTP”

On-line Analytical Processing, or OLAP, is a not used much anymore, probably for the same reasons that you don’t hear “data warehouse” that much.  The term in it’s broadest sense is meant to be the yin to OLTP’s (On-line Transaction Processing) yang.   Ten years ago it was insightful to distinguish between transaction and analytical processing, and it was sufficient to lump processing into those two broad categories.  Now, of course, not only have the distinctions blurred but the two processing types have become much more granular.

Within the business intelligence community, OLAP has taken on a more narrow definition, one centered around structuring data to optimize reporting queries.  The structure centers around a cube, either physically or conceptually.  There are several flavors:

  • MOLAP (Multi-dimensional On-line Analytical Processing) - just another term for OLAP
  • ROLAP (Relational On-line Analytical Processing) - structuring relational tables to mimic a cube or in-memory analytical structure
  • HOLAP (Hybrid On-line Analytical Processing) - a combination of MOLAP & ROLAP

So when you do hear the term OLAP now, it’s usually in reference to a very niche reporting tool that stores data in a multidimensional in-memory structure (either read from a database or files) and supports random walking through the data.

Widgets

I’m new to the widget world.  As part of my build-out of www.360degreevendor.com, I’ve been experimenting with the Clearspring widget set.  I’ve tested out both the LaunchPad options: In-Widget and On-Page methods.  The In-Widget seems to be a better fit for my site, as I have a number of areas within the home page that I’d like to share, such as the vendor search box.  Although I can see putting an On-Page button on the home page to allow people to share the entire site.

I floated a question about  Clearspring on the LinkedIn Q&A board, and received some good feedback.  The majority of the respondents recommended using WidgetBox, so I think I’ll do a trial with both and see which works out best.  I don’t think this is necessarily an either/or situation, so I might end up using both depending on their relative strengths.

Data Warehouse Appliances - Apples to Apples (update)

I’ve updated the data warehouse appliance spreadsheet, adding two new solutions (SAND & Calpont).  I’ve also updated information on several existing vendor solutions.

Buzzword: “Single Version of the Truth”

I was once a firm believer of the SVT concept, and I still believe in the fundamental principals.  But it’s not as cut and dried as people make it out to be.

On it’s face, a SVT seems to be a noble goal.  Who doesn’t support the truth, right?  The problem is, who defines what the truth really is, and in this case “the truth” maybe different depending on your audience.  I’ll give you a simplistic example: who is your customer, where are they located, and how valuable are they to your company? If you’re in sales, the customer is the one who made the decision to buy your company’s product and pays the bill.  If you’re in engineering or customer support, the customer is the person using your product and is requesting service enhancements or technical support.  That’s the easy part, determining the value of the customer is more difficult, and involves a number of variables that may or may not apply depending on where you sit within the organization.

These are certainly not insurmountable issues by any stretch, but they underscore the nuance required in getting to the “truth”.

Analytics in the Cloud

I attended a webinar today sponsored by Amazon and Vertica called “Data Analytics in the Cloud“.  The Vertica portion was mostly a duplicate of a prior web cast, but the Amazon portion on the Cloud concept was very interesting.  The key points of the cloud concept as I see it are:

  • Pay as you go model - you only pay for the disk space and processing you consume. No start-up costs, but you have to sign a contract. (They claimed the cost was 1/2 that of an in-house solution)
  • Time to market - hours instead of weeks to turn up a terabyte sized system, including hardware, OS, and the Vertica column based database
  • On-demand scalability - seamlessly scales to meet your demand
  • Proven platform - hosted by Amazon on the same platform that hosts the Amazon.com site

I think the benefits of this approach are obvious, especially for a small but rapidly growing operation.  The infrastructure and software license costs alone would be prohibitive, and time to market is critical especially when launching a new idea.

The downsides include:

  • Security concerns, especially for highly sensitive customer data
  • Performance - both in terms of loading large amounts of data and in real-time queries
  • Long-term cost - as with any usage based cost model, the upfront savings could be surpassed by subsequent usage fees

Buzzword: “ODS”

The Operational Data Store (or ODS), is classically defined as a physically integrated view of all or part of the transactional data environment.  The term is generally used in conjunction with a Data Warehouse (DW) and Data Marts (DM) to form the analytical data architecture triumvirate.

The ODS typically distinguishes itself from the DW and DM in two ways:

  1. Latency - the ODS is generally populated more frequently than a DW or DM, and newer systems offer near-real time access to underlying transactional data, either via virtual data integration or trickle feeds that populate the ODS on a continuous basis.
  2. Data structure - the ODS is typically a more normalized model than a DW or DM.  This facilitates lower latency refreshes as the model more closely matches the transactional system.  This also support the type of reporting and data distribution methods typically seen with an ODS, e.g., spreadsheet like operational reports, or data feeds to a customer care application.

In additional to providing the DW/DM with a clean integrated view of the transactional environment, the ODS directly supports business groups such as Customer Care.  Integrating care applications with the ODS allows for richer customer data for screen pops, real-time insight into multiple communication channels, and access to all products and services for a customer.

That being said, the ODS as was defined 10 years ago is dying, and is being replaced by EII technology that combines virtual and physical data integration with a meta data layer providing end users with a deeper understanding the the data.

Buzzword: “EII”

Enterprise Information Integration is a hybrid of service oriented architecture (SOA), enterprise application integration (EAI), virtual data integration, and physical data integration, with a little meta-data management thrown in for kicks.  The concept is to throw a layer on top of physical data storage, in order to provide a single interface to end users or other applications. (Wikipedia entry)

This layer generally includes interfaces into physical data stores, message buses, and other sources of data, with a meta data component to tie it all together.  A conceptual data layer is then defined which is modeled based on the consumer desired view of the world.  The final piece is the interface methods for end users or applications to access the conceptual information view.

That’s the technical view of EII, but the real business benefit is in providing real or near-real-time access to business information without having to navigate the underlying data stores.  The value is realized on two levels:

  1. Integration of data elements across disparate systems - this is the grunt work associated with mapping a customer name between two systems when one stores the name in a single field and the other stores it in three separate fields (First, Middle, Last)
  2. Providing contextual understanding of information - this is where the meta-data comes into play, by providing the end user with background and additional meaning to the information

Of course there are a number of companies claiming to provide a complete EII solution, but in my mind a true EII solution is too broad for any one product.  It should be treated as a business solution by starting with the benefits and working back to the appropriate technologies required to deliver those benefits in the most cost effective manner possible.

Buzzword: “Data Warehouse”

I almost didn’t bother with this one, since it’s almost too generic now to be useful, but it does deserve a few sentences if nothing more than to pay respects.  (Wikipedia has a good definition and some of the history around the term if you want background.)

Nowadays, I find that the term is hardly used anymore, probably because of the proliferation of more specific terms that describe the individual components (e.g., ETL, data quality, EII).  I think another reason is the move to more real-time analytics, and the term “data warehouse” conjures up visions of static information sitting in an Oracle (or Teradata) database. 

Ten years ago, all you had to do was say I’m building a “data warehouse” and most people knew what you were talking about.  Now, it could mean a dozen different things, which makes communicating more difficult.  It would be nice to have all of this wrapped up into one nice term that everyone can agree upon, but I doubt that’s going to happen, which is actually a good thing.  It means that people realize (both business and technical) that data driven solutions are not one-size fits all, and that there are a myriad of implementation options available.

The “Data Warehouse” isn’t dead, it just lives on in it’s numerous children and grandchildren.

Buzzword: “Data Quality”

Data Quality - everyone wants it, and everyone complains that they don’t have “good quality data”.  But how do you define data quality? What are the business benefits associated with the investment required to improve the quality of corporate data? Those are the questions you should be asking when approached by an angry business user complaining they can’t do their job because their data source(s) stink.

I think the most common misconception around DQ is that it’s an all or nothing proposition.  In reality there’s a cost-benefit analysis required to determine the payback associated with improving data quality.  Raising the data quality bar has a cost, and unless you can justify the expenditure you’re wasting corporate resources.

The business case can range from a simple exercise in comparing the cost of automating vs the current cost of manual labor required to fix and/or circumnavigate around incorrect data elements.  For example, it doesn’t make sense to spend a half million dollars implementing a data quality technology solution, to save a couple of hours a week of a business or data analyst’s time.  On the other end of the spectrum are strategic implications such as financial reporting and risk management, where the reputation of the company is at stake (just ask Fannie Mae).

Look at data quality as a bar that you raise and lower based on cost, business benefit, risk tolerance, and other factors that are important to the corporation.

Buzzword: “Business Intelligence”

The term “Business Intelligence”, or just BI, has been used and abused so much that it has nearly as many personalities as Herschel Walker.  And that’s just within the context of the analytics and data management community, never mind the legions of people who associate it with corporate espionage.

Within the analytics world, BI has taken on (at least) the following definitions:

  • The process of utilizing data for making better business decisions.  Sometimes used interchangeably with business or corporate performance management (originally coined by Gartner analyst Howard Dresner) 
  • Reporting and dissemination of data from a data warehouse
  • All systems required for collecting, integrating, cleansing, and reporting of data
  • Software tools that extract data from a repository (database or otherwise) and present to a user in various formats
  • Metrics used to measure business performance

So when someone uses the term BI, make sure you understand the context of the discussion (and the person’s background) so you’ll know which alter ego you’re conversing.