Archive for the Industry Buzzwords Category

Buzzword: “EII”

Enterprise Information Integration is a hybrid of service oriented architecture (SOA), enterprise application integration (EAI), virtual data integration, and physical data integration, with a little meta-data management thrown in for kicks.  The concept is to throw a layer on top of physical data storage, in order to provide a single interface to end users or other applications. (Wikipedia entry)

This layer generally includes interfaces into physical data stores, message buses, and other sources of data, with a meta data component to tie it all together.  A conceptual data layer is then defined which is modeled based on the consumer desired view of the world.  The final piece is the interface methods for end users or applications to access the conceptual information view.

That’s the technical view of EII, but the real business benefit is in providing real or near-real-time access to business information without having to navigate the underlying data stores.  The value is realized on two levels:

  1. Integration of data elements across disparate systems - this is the grunt work associated with mapping a customer name between two systems when one stores the name in a single field and the other stores it in three separate fields (First, Middle, Last)
  2. Providing contextual understanding of information - this is where the meta-data comes into play, by providing the end user with background and additional meaning to the information

Of course there are a number of companies claiming to provide a complete EII solution, but in my mind a true EII solution is too broad for any one product.  It should be treated as a business solution by starting with the benefits and working back to the appropriate technologies required to deliver those benefits in the most cost effective manner possible.

Buzzword: “Data Warehouse”

I almost didn’t bother with this one, since it’s almost too generic now to be useful, but it does deserve a few sentences if nothing more than to pay respects.  (Wikipedia has a good definition and some of the history around the term if you want background.)

Nowadays, I find that the term is hardly used anymore, probably because of the proliferation of more specific terms that describe the individual components (e.g., ETL, data quality, EII).  I think another reason is the move to more real-time analytics, and the term “data warehouse” conjures up visions of static information sitting in an Oracle (or Teradata) database. 

Ten years ago, all you had to do was say I’m building a “data warehouse” and most people knew what you were talking about.  Now, it could mean a dozen different things, which makes communicating more difficult.  It would be nice to have all of this wrapped up into one nice term that everyone can agree upon, but I doubt that’s going to happen, which is actually a good thing.  It means that people realize (both business and technical) that data driven solutions are not one-size fits all, and that there are a myriad of implementation options available.

The “Data Warehouse” isn’t dead, it just lives on in it’s numerous children and grandchildren.

Buzzword: “Data Quality”

Data Quality - everyone wants it, and everyone complains that they don’t have “good quality data”.  But how do you define data quality? What are the business benefits associated with the investment required to improve the quality of corporate data? Those are the questions you should be asking when approached by an angry business user complaining they can’t do their job because their data source(s) stink.

I think the most common misconception around DQ is that it’s an all or nothing proposition.  In reality there’s a cost-benefit analysis required to determine the payback associated with improving data quality.  Raising the data quality bar has a cost, and unless you can justify the expenditure you’re wasting corporate resources.

The business case can range from a simple exercise in comparing the cost of automating vs the current cost of manual labor required to fix and/or circumnavigate around incorrect data elements.  For example, it doesn’t make sense to spend a half million dollars implementing a data quality technology solution, to save a couple of hours a week of a business or data analyst’s time.  On the other end of the spectrum are strategic implications such as financial reporting and risk management, where the reputation of the company is at stake (just ask Fannie Mae).

Look at data quality as a bar that you raise and lower based on cost, business benefit, risk tolerance, and other factors that are important to the corporation.

Buzzword: “Business Intelligence”

The term “Business Intelligence”, or just BI, has been used and abused so much that it has nearly as many personalities as Herschel Walker.  And that’s just within the context of the analytics and data management community, never mind the legions of people who associate it with corporate espionage.

Within the analytics world, BI has taken on (at least) the following definitions:

  • The process of utilizing data for making better business decisions.  Sometimes used interchangeably with business or corporate performance management (originally coined by Gartner analyst Howard Dresner) 
  • Reporting and dissemination of data from a data warehouse
  • All systems required for collecting, integrating, cleansing, and reporting of data
  • Software tools that extract data from a repository (database or otherwise) and present to a user in various formats
  • Metrics used to measure business performance

So when someone uses the term BI, make sure you understand the context of the discussion (and the person’s background) so you’ll know which alter ego you’re conversing.

CDI (Customer Data Integration)

CDI is the use of technology and business processes to build a 360 degree view of the customer.  CDI is sometimes considered to be a subset of or related to Master Data Management (MDM). 

From a technical perspective, CDI involves a number of typical data warehouse activities, such as extract, transform, and load (ETL), data cleansing, and meta data management, among others.  These approaches have been discussed in great detail by numerous sources, so I won’t elaborate here.  The one key aspect of the technical approach that is critical to the solution: In-house or outsource?  You need to decide whether to develop this solution in-house with either custom code or one of the numerous products marketed under the CDI banner, or to outsource the CDI effort.  The pros and cons of each are outside the scope of this posting, but it’s safe to say this is critical decision points for the project, as it impacts all down stream decisions.

The major difference as I see it is in the business approach.  The typical data warehouse initiative may be very broad in nature (enterprise focus), or it may be targeted for a particular business unit and/or functional group (e.g., marketing).  The CDI initiative is all about integrating and eliminating duplicate customer data, to provide a clean, consistent view of the relationship the company has with each of their customers.  This picture shows the core data associated with a 360 degree customer view:

  • Unique identifiers such as customer id and SSN
  • Transaction data such as orders and payments
  • Account structure including parent child relationships
  • All Interactions between the company and the customer, including e-mail, phone calls, and snail mail
  • Marketing Attribution elements that allow the customer to be tied to specific marketing campaigns

The key issue in a CDI initiative will revolve around the proper definition of a “customer”, and the resulting effort to build a business model of the customer that accurately reflects the views of all business constituents.  Until this issue has been resolved, the technical implementation should not begin.

Please download my white paper 360 Degree Customer View for a more detailed discussion on this topic.

Reference Links:

-        Wikipedia Entry

-        DMReview Article

-        Gartner CDI Magic Quadrant

-        CDI Station

VDI (Virtual Data Integration)

What is virtual integration and when is it appropriate to use? Virtual data integration (VDI) is the use of a software layer that interfaces with the reporting or end-user delivery application, in place of direct access of a data repository.  The software layer provides a mapping between the conceptual data model that the user interfaces with, and the physical model(s) and underlying data stores. The reason I say “model(s)” is that one of the most powerful aspects of VDI is the ability to make multiple data stores (whether they be transactional, ODS, data warehouse/marts, or others) look like one integrated repository.    Below are the pros and cons of using VDI verses physical data integration. 

Pros:

        Shortens implementation time since data does not have to be physically integrated

        Provides short term benefit to the business while buying time for technical team to integrate/retire existing redundant and fragmented data architecture

        Can reduce data storage requirements

        Source data updates are available immediately to end-users

        Eliminates cost and time associated with extraction and movement of data (e.g., no batch window)

        Business rules can be modified “on the fly” 

Cons:

        Data is not accessible if source system is unavailable

        Complex transformations and aggregations can impact report response time

        Updates to source systems can result in out of sync data on reports

        Complex match-merge routines may not be possible outside of batch processing

        Accessing large data sets can impact report response time

        Query processing can impact source system performance 

In my mind, the key differentiator is the volume of data being analyzed.  If end users are accessing relatively few numbers of records, a VDI solution is viable.  But when large data volumes need to be accessed, such as in a data mining or long term trending exercise, performance may become an insurmountable issue, both from the end user as well as the source system owner perspective.