Archive for the Industry Buzzwords Category

Dashboards

The term dashboards brings up a number of responses: including the housing for airplane controls, a place to mount your GPS in the car, and surprisingly (to me anyway) the wiki definition of an”application for Apple’s Mac OS X v10.4 Tiger and Mac OS X v10.5 Leopard operating systems” (who knew? - I think Wikipedia needs to work on that one.)

In the business intelligence community, dashboard is generally defined as a reporting tool or application that presents metrics or KPIs to an end user. It is meant to mimic the plane reference above, presumably whereby corporate executives could sit in the “cockpit” and watch the dashboard while driving the company. In reality, this rarely if ever happens. The most effective dashboard implementations I’ve seen are targeted at an operations group (say customer care), and are used in a more tactical role. The group leader has the top level view, which displays key metrics for that group along with target values. When she notices a metric that is off base by a certain tolerance (good or bad), she can discuss the delta with the person in charge of that area. The value add for a dashboard is the ability to drill down from the top level metrics, and decompose those numbers into lower level supporting metrics. This fosters communication throughout the group, and allows for quick identification of problem areas or areas of opportunity.

One common misconception is to confuse dashboards with corporate performance management (CPM). CPM is a process for utilizing technology to define measures that drive the business, and then managing to those measures. A dashboard is usually an important component of a CPM initiative, but they are not one and the same. Be particularly wary of a dashboard vendor trying to sell you a CPM solution.

So what are the key takeaways when considering a dashboard?

  1. Make sure the business unit(s) are sponsoring the initiative, and have bought in to using it to manage the business
  2. Identify data sources for all levels in the business unit, to support drill-down capability
  3. Don’t buy into the hype around dashboards transforming the business - its’ a tool, nothing more or less
  4. Don’t overlook usability design and testing.  If the application is confusing or difficult to use, it will quickly be abandoned.

Semantic Web

I’ve been trying to get my arms around the semantic web movement, and finally decided to devote some time to the topic this morning.  First, let’s break down this phrase by defining the two words (courtesy of Websters.com):

semantic - “of, pertaining to, or arising from the different meanings of words or other symbols…

web - “something formed by or as if by weaving or interweaving.”

So we have a weaving together of the different meanings of words or symbols, and presumably other objects such as video clips and files.  So how is that different from the version of the “web” we’ve weaved today?  The answer comes from an old Twilight Zone episode - it’s another dimension.  The semantic web concept boils down to providing context (or dimensions) to the words, phrases, files, and other detritus that’s floating around out there now.  The “Web 2.0” movement is attempting to address this issue, by building a community that comments on subject areas, thereby giving others context on that subject area.  The semantic web concept goes beyond this, by embedding this extra dimension into the structure in which content is stored.  Which highlights another important difference: Who gets to define content?  The author, or the viewers?  Ideally it would be both, with the ability to determine gaps in the definitions.  So where “Web 2.0″ supports the viewer definition, the “Semantic Web” as advertised today encompasses technologies that support the author definition.   But going back to our original breakdown of this phrase, in particular the piece about “weaving together of the different meanings of words and symbols” - doesn’t that mean capturing both author and viewer definitions? 

Leaving all the philosophical discussions aside, how do you implement a “semantic web” solution? And what are the benefits and drawbacks? The implementation starts with how the data is stored.  A 1.0/2.0 generation website stores information in HTML files that are then directly translated and presented via a browser.  A semantic website stores information in a structured format (either a database, Resource Description Framework or XML file) that supports a metadata layer.  The metadata layer provides this extra dimension, by allowing descriptors to be stored on the content itself.  This also decouples the storage from the presentation, which provides flexibility at a cost of presentation speed.  This allows the content to be translated for web page viewing, but more importantly allows other applications to accurately integrate the data, by using the metadata as a roadmap.  Thereby creating a web within a web, where applications (calendering system) talk to one another without human intervention.  The benefit - all the data on this new web becomes much more valuable because of the leverage you get by combining content across multiple sites.  The downside is the enormous cost and effort to implement a semantic web solution.  There is an order of magnitude difference between putting content in an HTML file and storing data in a structured format with associated metadata.

What does the future hold for the semantic web? Data that has significant value to the author or publisher will migrate towards a structured solution.  These semantic enabled sites will then link up on an opportunistic basis, forming informal networks based on common interests.  As these networks grow, the value proposition (and technological capabilities) will allow more sites to migrate.

As a side note, the semantic web is a subset of Web 3.0, but I’m out of breath and will save that for another posting.

Buzzword: “Knowledge Management”?

Knowledge Management - is there really such a thing as managing your knowledge? Isn’t it more accurate to call it “Knowledge Capitalization”? Let’s break it down by pulling the most appropriate definitions from Webster for these terms:

  • Knowledge Management - “the technologies involved in creating, disseminating, and utilizing knowledge data; also any enterprise involved in this”
  • Knowledge - “the body of truths or facts accumulated in the course of time”; “acquaintance with facts, truths, or principles, as from study or investigation”
  • Management - “the act or manner of managing; handling, direction, or control”; “skill in managing; executive ability”
  • Capitalize - “to take advantage of; turn something to one’s advantage (often fol. by on): to capitalize on one’s opportunities.”

Seems to me that the primary objective is to “take advantage of” the “body of truths or facts accumulated in the course of time”, as opposed to just “handling or controlling” this information. It’s no accident that business users have become gun-shy about the whole “Knowledge Management” concept. This has become an IT driven endeavor, and as a result the focus has been put on “handling” and “controlling”, task oriented words, as opposed to end goals such as “capitalize”.

Too many “Knowledge Management” systems today place a disproportionate emphasis on the collection and storage of knowledge, and not enough on the end results. This makes it prohibitively expensive for users to add information, which dooms the system to mediocrity. All of us involved in delivering technology solutions should be focused on the end benefit of our work. In the case of “Knowledge Management”, put the focus on finding ways to capitalize on the “body of truths or facts” that are part of the corporate history.

How do we do this?

  1. Information capture should be as seamless to the knowledge worker as possible - for instance a user should be able to designate certain directories on their laptop as sharable, and the knowledge system will automatically scan for changes and upload. Same with corporate web sites used for collaboration on projects or other initiatives
  2. Categorization should be automatic, with multiple categories associated (or tagged) to each piece of information. Users should be able to enhance or override the categorizations, but it shouldn’t be a mandatory step
  3. Collaboration should be enabled for a particular unit of information, with this activity becoming part of the body of knowledge
  4. Users should be able to access information in a variety of ways (e.g., mobile phone, Excel, corporate applications), in addition to access via a web browser
  5. A feedback loop should be available to capture results from knowledge exploration, including benefits achieved. This will have three advantages - (1) it will provide insights into the effectiveness of the knowledge system as a whole, (2) it will give other knowledge worker ideas on how to best exploit the underlying knowledge, and (3) it will close the loop by providing suggestions on how to improve on step 1.

Knowledge Capitalization should be, like learning itself, an iterative process.

Buzzword: “Column-store database”

There has been copious amounts written recently about the advantages (and disadvantages) of column-store databases, so I thought I’d do a little research to find out what the noise was about.  After all, SybaseIQ has been around for a decade now, touting the benefits of column storage and compression.  Vertica seems to be making the most noise out there now, with Michael Stonebreaker leading the charge.  But there are a number of column-based vendors out there these days (see my data warehouse appliance spreadsheet), among them Kickfire, Calpont, InfoBright, and ParAccel, so this is obviously not just a lone-wolf situation.

Breaking this down, the concept is really simple - you are storing your data in what you would typically think of as an “index” in the row based world.  Storing your data in this manner gives you two big advantages:

  1. You can achieve much higher compression rates, since the likelihood of encountering repeating values within one column is much higher than within a row.
  2. For typical analytical queries access a small number of columns, you can skip all the other columns entirely which provides a huge performance boost.

As with any technological approach, there are downsides:

  1. In the case of operational reporting, you can actually see performance degradations since you’re typically reading across rows as opposed to down columns.
  2. Writing data in a row format is quicker, which is important in a low-latency reporting environment where trickle-feeds or other near-real time updating is required.

This discussion is similar to the “dimensional model vs 3rd normal form” debate.  I don’t think there is a right or wrong answer.  You need to understand how your users are accessing the data and the loading requirements before making a decision.

Buzzword: “EAI”

Although Enterprise Application Integration (EAI) is not usually considered part of the business intelligence, data management sandbox that I play in, it is useful to discuss how it fits into the overall data integration and data delivery picture.  EAI technologies are used to keep systems in sync, and to provide a virtual layer on top of multiple underlying systems to present a consolidated view of data to a end user. 

There are two primary architectures: hub-and-spoke, and point-to-point.  In a hub-and-spoke model, all systems are connected to a central “routing” point, and all transactions and inquiries flow through that central router to the required system(s).  The point-to-point model employs a directory that allows for systems to talk directly to one another.  Both have their advantages and disadvantages.  Hub-and-spoke reduces the number of interfaces for a given system, but transaction latency can suffer if the central “router” becomes overloaded.  Point-to-point provides the fastest response times, but the number of interfaces can become unmanageable for a large number of systems.

Now let’s tie this all back to the business intelligence world.  We’ve discussed Virtual Data Integration before, and if you break VDI down it’s basically an EAI technology.  The focus is on providing a consolidated view of two or more underlying systems, and it accomplishes this by building interfaces into these systems.   Sounds like EAI to me.

Buzzwords: “ETL” (and “ELT”)

Extract, Transform, and Load (ETL) is another one of those terms that was very useful as recently as 3-5 years ago, but now has lost some of it’s value for some of the same reasons “data warehouse” has become archaic - it describes a component of the architecture that has sprawled outside the bounds defined by this term. Ten years ago, ETL accurately described the majority of the processing required to load data into a data warehouse, data mart, etc… The term mimics the process used to accomplish this task, namely “extracting”(or receiving) data from source systems, “transforming” the data by applying business rules, data cleansing, and other manipulations on the data sets, and then “loading” the data into the target data store. In a predominately batch world, this is a fine way to handle the loading process, and in fact a significant amount of data is still being loaded in this manner today.

But things are changing - the analytical world today is not so straightforward, and as a result the loading requirements have become much more complex. And in fact the biggest change is the move away from a physical “load” of data, to a virtual integration of data. Virtual data integration (VDI) is not always the answer, but in a business climate that increasingly rewards real-time feedback, VDI provides visibility into the business operations as they occur, not days or even hours later.

One final word about “ELT” - extract, load, and transform. I’ve heard some vendors (and analysts) talk up ELT as a new and improved ETL, but in my mind this is just an architectural choice. It makes little difference if you load the data before transforming, or the other way around.

Buzzword: “CPM” and “BPM”

Corporate Performance Management (CPM) and Business Performance Management (BPM) have gained significant visibility in the past 5 years.  Although there are purists that would argue these are different, I’m lumping them together because from an consumer perspective I don’t think there are appreciable differences.  Dashboards have also gotten a lot of press recently, and often gets lumped into the CPM discussion.  But I see dashboards as an implementation option within a larger CPM initiative.

So what is CPM/BPM?  Essentially, it’s using metrics and KPIs to measure and improve the business.  A successful CPM solution must be driven from the executive ranks down through the business to technology.  It’s DOA if it starts in IT, and has little chance when germinating within a particular business unit such as Finance.  The reason for this is simple: the goal of CPM is to improve the business by taking measurements at various levels (starting at the top), setting thresholds, and managing to exceptions.   Now I don’t necessarily agree with this approach to running a business, particularly if this is presented as the silver bullet.  This approach often fosters a very reactive culture within the business, but coupled with executive direction and sponsorship, and properly defined non-punitive measures, CPM can provide significant visibility into the operations of the organization.

That being said, I can’t stress enough the importance of not driving this from the IT side of the house.  Particularly insidious is the temptation to follow vendor claims of implementing a CPM solution (usually just a tricked out dashboard).  If the executive suite doesn’t buy in and drive this, the technical solution is a moot point.

Buzzword: “OLAP” & “OLTP”

On-line Analytical Processing, or OLAP, is a not used much anymore, probably for the same reasons that you don’t hear “data warehouse” that much.  The term in it’s broadest sense is meant to be the yin to OLTP’s (On-line Transaction Processing) yang.   Ten years ago it was insightful to distinguish between transaction and analytical processing, and it was sufficient to lump processing into those two broad categories.  Now, of course, not only have the distinctions blurred but the two processing types have become much more granular.

Within the business intelligence community, OLAP has taken on a more narrow definition, one centered around structuring data to optimize reporting queries.  The structure centers around a cube, either physically or conceptually.  There are several flavors:

  • MOLAP (Multi-dimensional On-line Analytical Processing) - just another term for OLAP
  • ROLAP (Relational On-line Analytical Processing) - structuring relational tables to mimic a cube or in-memory analytical structure
  • HOLAP (Hybrid On-line Analytical Processing) - a combination of MOLAP & ROLAP

So when you do hear the term OLAP now, it’s usually in reference to a very niche reporting tool that stores data in a multidimensional in-memory structure (either read from a database or files) and supports random walking through the data.

Buzzword: “Single Version of the Truth”

I was once a firm believer of the SVT concept, and I still believe in the fundamental principals.  But it’s not as cut and dried as people make it out to be.

On it’s face, a SVT seems to be a noble goal.  Who doesn’t support the truth, right?  The problem is, who defines what the truth really is, and in this case “the truth” maybe different depending on your audience.  I’ll give you a simplistic example: who is your customer, where are they located, and how valuable are they to your company? If you’re in sales, the customer is the one who made the decision to buy your company’s product and pays the bill.  If you’re in engineering or customer support, the customer is the person using your product and is requesting service enhancements or technical support.  That’s the easy part, determining the value of the customer is more difficult, and involves a number of variables that may or may not apply depending on where you sit within the organization.

These are certainly not insurmountable issues by any stretch, but they underscore the nuance required in getting to the “truth”.

Buzzword: “ODS”

The Operational Data Store (or ODS), is classically defined as a physically integrated view of all or part of the transactional data environment.  The term is generally used in conjunction with a Data Warehouse (DW) and Data Marts (DM) to form the analytical data architecture triumvirate.

The ODS typically distinguishes itself from the DW and DM in two ways:

  1. Latency - the ODS is generally populated more frequently than a DW or DM, and newer systems offer near-real time access to underlying transactional data, either via virtual data integration or trickle feeds that populate the ODS on a continuous basis.
  2. Data structure - the ODS is typically a more normalized model than a DW or DM.  This facilitates lower latency refreshes as the model more closely matches the transactional system.  This also support the type of reporting and data distribution methods typically seen with an ODS, e.g., spreadsheet like operational reports, or data feeds to a customer care application.

In additional to providing the DW/DM with a clean integrated view of the transactional environment, the ODS directly supports business groups such as Customer Care.  Integrating care applications with the ODS allows for richer customer data for screen pops, real-time insight into multiple communication channels, and access to all products and services for a customer.

That being said, the ODS as was defined 10 years ago is dying, and is being replaced by EII technology that combines virtual and physical data integration with a meta data layer providing end users with a deeper understanding the the data.