Archive for July 2008

Kickfire Overview

I spoke with Karl Van Den Bergh, VP Business Development from Kickfire (founded June 2006) today and wanted to share my impressions of their company and offering, Kickfire Database Appliance. Kickfire is venture backed (Accel, Greylock, Mayfield Fund, and Pinnacle Ventures), and is based in Santa Clara, CA. They have a strategic alliance with Sun, and technology partnerships established with Jaspersoft, Talend, Pentaho, and Zmanda. Karl said they currently have approximately 60 people in the company, most of them in research and engineering.

The Kickfire Database Appliance has been in beta testing since April, and is scheduled to be launched commercially sometime in Q42008. The two key differentiators of the Kickfire platform are the Query Processing Module (QPM) and the Kickfire Storage Plug-in for MySQL

QPM is a SQL accelerator chip, akin to a graphics chip. QPM plugs into a motherboard alongside a standard Intel based quad processor, and other off-the-shelf components. By processing SQL statements on the chip, they are able to achieve significant performance gains, resulting in impressive price/performance and raw performance numbers. Kickfire’s recently released TPC-H numbers for the 100GB and 300GB classes, and set records in those categories for both performance (non-clustered category) and price/performance. They plan to run tests on larger datasets, and feel the existing numbers will scale to these larger sizes.

The storage plug-in sits under native MySQL and on top of Linux CentOS. The plug-in provides modern data warehouse features such as column store and compression. The big lift comes from deploying out of the box MySQL – access to the approximate 11 million installations of MySQL and growing. By going this route, Kickfire will not have to certify their platform with the myriad of business intelligence and data integration vendors. As long as those vendors work with MySQL, in theory they should work with Kickfire.

Kickfire has a small consulting group focused on installation and configuration of their product, but is putting partnerships in place with larger systems integrators to support full life-cycle implementations.

 

If you’re running, or planning on running, an analytics solution on MySQL, I think you have to give this product serious consideration. At a starting cost of about $20,000, you’ll be hard pressed to find a better price point on a system in this category. Even if you have another platform for your enterprise solution, it’s worth investigating using Kickfire to support data marts or other departmental level systems. If you’re a Microsoft shop, you’re probably best to avoid this system, unless you’re making a strategic decision to migration part or all of you infrastructure to open source. In most cases, the cost savings won’t justify the added cost and complexity of introducing one MySQL instance into your environment.

The big caveat to all of this is the production readiness of the system. Assuming they go production in Q4, they will have had less than 9 months of beta testing feedback. Any early adopters (re: anyone buying this before next Spring) should bake in plenty of internal testing to their deployment schedule, or better yet set this up in a sand-box environment until the 1.0 bugs have shaken out.

Dashboards

The term dashboards brings up a number of responses: including the housing for airplane controls, a place to mount your GPS in the car, and surprisingly (to me anyway) the wiki definition of an”application for Apple’s Mac OS X v10.4 Tiger and Mac OS X v10.5 Leopard operating systems” (who knew? - I think Wikipedia needs to work on that one.)

In the business intelligence community, dashboard is generally defined as a reporting tool or application that presents metrics or KPIs to an end user. It is meant to mimic the plane reference above, presumably whereby corporate executives could sit in the “cockpit” and watch the dashboard while driving the company. In reality, this rarely if ever happens. The most effective dashboard implementations I’ve seen are targeted at an operations group (say customer care), and are used in a more tactical role. The group leader has the top level view, which displays key metrics for that group along with target values. When she notices a metric that is off base by a certain tolerance (good or bad), she can discuss the delta with the person in charge of that area. The value add for a dashboard is the ability to drill down from the top level metrics, and decompose those numbers into lower level supporting metrics. This fosters communication throughout the group, and allows for quick identification of problem areas or areas of opportunity.

One common misconception is to confuse dashboards with corporate performance management (CPM). CPM is a process for utilizing technology to define measures that drive the business, and then managing to those measures. A dashboard is usually an important component of a CPM initiative, but they are not one and the same. Be particularly wary of a dashboard vendor trying to sell you a CPM solution.

So what are the key takeaways when considering a dashboard?

  1. Make sure the business unit(s) are sponsoring the initiative, and have bought in to using it to manage the business
  2. Identify data sources for all levels in the business unit, to support drill-down capability
  3. Don’t buy into the hype around dashboards transforming the business - its’ a tool, nothing more or less
  4. Don’t overlook usability design and testing.  If the application is confusing or difficult to use, it will quickly be abandoned.

Semantic Web

I’ve been trying to get my arms around the semantic web movement, and finally decided to devote some time to the topic this morning.  First, let’s break down this phrase by defining the two words (courtesy of Websters.com):

semantic - “of, pertaining to, or arising from the different meanings of words or other symbols…

web - “something formed by or as if by weaving or interweaving.”

So we have a weaving together of the different meanings of words or symbols, and presumably other objects such as video clips and files.  So how is that different from the version of the “web” we’ve weaved today?  The answer comes from an old Twilight Zone episode - it’s another dimension.  The semantic web concept boils down to providing context (or dimensions) to the words, phrases, files, and other detritus that’s floating around out there now.  The “Web 2.0” movement is attempting to address this issue, by building a community that comments on subject areas, thereby giving others context on that subject area.  The semantic web concept goes beyond this, by embedding this extra dimension into the structure in which content is stored.  Which highlights another important difference: Who gets to define content?  The author, or the viewers?  Ideally it would be both, with the ability to determine gaps in the definitions.  So where “Web 2.0″ supports the viewer definition, the “Semantic Web” as advertised today encompasses technologies that support the author definition.   But going back to our original breakdown of this phrase, in particular the piece about “weaving together of the different meanings of words and symbols” - doesn’t that mean capturing both author and viewer definitions? 

Leaving all the philosophical discussions aside, how do you implement a “semantic web” solution? And what are the benefits and drawbacks? The implementation starts with how the data is stored.  A 1.0/2.0 generation website stores information in HTML files that are then directly translated and presented via a browser.  A semantic website stores information in a structured format (either a database, Resource Description Framework or XML file) that supports a metadata layer.  The metadata layer provides this extra dimension, by allowing descriptors to be stored on the content itself.  This also decouples the storage from the presentation, which provides flexibility at a cost of presentation speed.  This allows the content to be translated for web page viewing, but more importantly allows other applications to accurately integrate the data, by using the metadata as a roadmap.  Thereby creating a web within a web, where applications (calendering system) talk to one another without human intervention.  The benefit - all the data on this new web becomes much more valuable because of the leverage you get by combining content across multiple sites.  The downside is the enormous cost and effort to implement a semantic web solution.  There is an order of magnitude difference between putting content in an HTML file and storing data in a structured format with associated metadata.

What does the future hold for the semantic web? Data that has significant value to the author or publisher will migrate towards a structured solution.  These semantic enabled sites will then link up on an opportunistic basis, forming informal networks based on common interests.  As these networks grow, the value proposition (and technological capabilities) will allow more sites to migrate.

As a side note, the semantic web is a subset of Web 3.0, but I’m out of breath and will save that for another posting.

Buzzword: “Knowledge Management”?

Knowledge Management - is there really such a thing as managing your knowledge? Isn’t it more accurate to call it “Knowledge Capitalization”? Let’s break it down by pulling the most appropriate definitions from Webster for these terms:

  • Knowledge Management - “the technologies involved in creating, disseminating, and utilizing knowledge data; also any enterprise involved in this”
  • Knowledge - “the body of truths or facts accumulated in the course of time”; “acquaintance with facts, truths, or principles, as from study or investigation”
  • Management - “the act or manner of managing; handling, direction, or control”; “skill in managing; executive ability”
  • Capitalize - “to take advantage of; turn something to one’s advantage (often fol. by on): to capitalize on one’s opportunities.”

Seems to me that the primary objective is to “take advantage of” the “body of truths or facts accumulated in the course of time”, as opposed to just “handling or controlling” this information. It’s no accident that business users have become gun-shy about the whole “Knowledge Management” concept. This has become an IT driven endeavor, and as a result the focus has been put on “handling” and “controlling”, task oriented words, as opposed to end goals such as “capitalize”.

Too many “Knowledge Management” systems today place a disproportionate emphasis on the collection and storage of knowledge, and not enough on the end results. This makes it prohibitively expensive for users to add information, which dooms the system to mediocrity. All of us involved in delivering technology solutions should be focused on the end benefit of our work. In the case of “Knowledge Management”, put the focus on finding ways to capitalize on the “body of truths or facts” that are part of the corporate history.

How do we do this?

  1. Information capture should be as seamless to the knowledge worker as possible - for instance a user should be able to designate certain directories on their laptop as sharable, and the knowledge system will automatically scan for changes and upload. Same with corporate web sites used for collaboration on projects or other initiatives
  2. Categorization should be automatic, with multiple categories associated (or tagged) to each piece of information. Users should be able to enhance or override the categorizations, but it shouldn’t be a mandatory step
  3. Collaboration should be enabled for a particular unit of information, with this activity becoming part of the body of knowledge
  4. Users should be able to access information in a variety of ways (e.g., mobile phone, Excel, corporate applications), in addition to access via a web browser
  5. A feedback loop should be available to capture results from knowledge exploration, including benefits achieved. This will have three advantages - (1) it will provide insights into the effectiveness of the knowledge system as a whole, (2) it will give other knowledge worker ideas on how to best exploit the underlying knowledge, and (3) it will close the loop by providing suggestions on how to improve on step 1.

Knowledge Capitalization should be, like learning itself, an iterative process.

SAND DNA Overview

I spoke with Linda Arens, VP Alliances and Marketing for SAND. Here’s a summary of the discussion:

  • SAND is based in Montreal, is listed on the OTC (SNDTF.OB). They have about 50-60 employees in the company.
  • They stared in 1983 as a hardware company, but have remade themselves with proprietary storage software purchased from Lockheed Martin
  • They have a number of products listed on their web site, but according to Linda they are all based on the same underlying foundation, hence the “DNA” moniker
  • At the core is a column based, high compression database management system (DNA Analytics). Linda said they advertise 90% compression rates, but see mid-90s and above in real world scenarios. The database runs on a variety of operating systems (HP, Sun, Linux, Windows), and hardware platforms. This is not a full-stack appliance, but a special purpose database management system.
  • SAND has pre-built integration with SAP BI, Oracle, IBM DB2, and Microsoft, and can operate behind or in conjunction with any of them.
  • DNA Access component is intended to offload infrequently used data from the primary server, while still allowing seamless on-line access to the data. The big advantage of this is zero impact to existing user applications. This sounded similar to the way Dataupia operates, working behind the scenes to accelerate queries. DNA Access does allow direct queries, either using SQL or via a ODBC compliant access tool (she mentioned Business Objects, Microstrategy, and Cognos).
  • SAND does have a small professional services group that focuses primarily on the setup and installation of their product. She said they leave the systems integration work to partners such as Accenture.

In their most recent financial results announcement, they touted a deal with P&G for the SAP BI product. Given the volumes in some SAP systems this capability would seem to be a competitive advantage. But I don’t know if SAND has anything else that differentiates them from the host of companies that are in this space now. Looking at their financial report, they listed $1.9M in revenue for the past quarter, which says they are still in their infancy in terms of actual sales. Given the amount of venture capital flowing into this sector, they will have a tough time getting traction.

Data Warehouse Appliance Comparison

I’ve added some information recently to the appliance spreadsheet, and figured it was time to repost. I made the following changes:

  • Added PANTA to the “Harware” category.
  • Updated information on SAND based on my discussion today with Linda Arens (VP Alliances and Marketing)
  • Updated information on ParAccel based on my discussion last week with Kim Stanick (VP Marketing)

 

The information has also been added to the 360DegreeVendor site.

 

Data Warehouse in the Clouds

I’ve been hearing and reading a lot lately about “cloud computing”.  Information Week in particular has run several articles on the topic, including last weeks “Guide to Cloud Computing”.  Most of these articles have been about general purpose platforms, with the focus on Amazon, Google, and Salesforce.com offerings getting the most press.

On the surface computing in the cloud looks very similar to the hosted service offerings that sprang up in the mid-90s, led by companies such as Digex.  I believe fundamentally it is, with the exception of well known on-line brands entering the market.  And with the exception of connection speed and reliability, most of the issues are still in play, namely security, performance, and environment change control.  The benefits touted are lower TCO, faster time to market, and solution scalability.

The Data Warehouse in the Cloud (DWC) concept is just starting to take hold, led by such companies as Vertica (in partnership with Amazon).  I can see the benefits of going this route, most notably time to market.  Connection speeds might still be an issue, particularly in the case of large data load files.  And security will always be an issue, particularly with sensitive customer data.  TCO is often presented as a plus for the DWC, but it’s not that straightforward.  Factors such as initial hardware & software costs, data center operational costs, labor, and upgrade costs must all be included in the mix.

In short, the DWC is a viable alternative, particularly for a company with the following characteristics:

  • Limited data center resources - hardware and/or operations staff are tapped out, and you’d need to significant capacity to deploy a new data solution
  • Deploying a data solution on a “new” platform - you’re an Oracle shop but planning to deploy on Vertica
  • Dispersed users - your user base is geographically spread out, and access corporate systems is through a variety of network channels
  • You’re a mid-sized company and don’t have access to volume hardware and software discounts

 

But remember, it’s like leasing a car.  You never get rid of that payment.

|