Qlikview 11 is anounced (10/11/11)

Qlikview 11

is announced on 10/11/11 – one year after 10/10/10, the release date of Qlikview 10! Qliktech also lunched new demo site with 12 demos of Qlikview 11 Data Visualizations: http://demo11.qlikview.com/ . Real release happened (hopefully) before end of 2011, my personal preference for release date will be 11/11/11 but it may be too much to ask…

QlikView 11 introduces the comparative analysis by enabling the interactive comparison of user-defined groupings. Also now with comparative analysis business users have the power of creating any (own) data (sub)sets and decide which dimensions and values would define the data sets. Users can then view the data sets they have created side by side in a single chart or in different charts:

Collaborative Data Visualization and Discovery.

Also Qlikview 11 enables Collaborative Workspaces – QlikView users can invite others – even those who do not have a license – to participate in live, interactive, shared sessions. All participants in a collaborative session interact with the same analytic app and can see others’ interactions live, see

QlikView users can engage each other in discussions about QlikView content. A user can create notes associated with any QlikView object. Other users can then add their own commentary to create a threaded discussion. Users can capture snapshots of their selections and include them in the discussion so others can get back to the same place in the analysis when reviewing notes and comments. QlikView captures the state of the object (the user’s selections), as well as who made each note and comment and when. Qliktech’s press release is here:

http://www.qlikview.com/us/company/press-room/press-releases/2011/en/1011-qliktech-introduces-social-business-discovery-in-launch-of-qlikview-11

“Our vision for QlikView 11 builds on the fact that decisions aren’t made in isolation, but through social exchanges driven by real-time debate, dialog, and shared insight,” says Anthony Deighton, CTO and senior Vice President, Products at QlikTech. “QlikView 11’s social business discovery approach allows workgroups and teams to collaborate and make decisions faster by collectively exploring data, anywhere, anytime, on any device. Business users are further empowered with new collaborative and mobile capabilities, and IT managers will appreciate the unified management functionality that allows them to keep control and governance at the core while pushing usage out to the edges of the organization.”

New Features in Qlikview 11

Qlikview now is integrated (I think it is a big deal) with TFS – source control system from Microsoft. This makes me think that may be Donald Farmer (he left Microsoft in January 2011 and joined Qliktech) has an additional assignment to make it possible for Microsoft to buy Qliktech? [Dear Donald - please be careful: Microsoft already ruined ProClarity and some others after buying them]. Free QlikView 11 Personal Edition will be available for free download by the end of year at www.qlikview.com/download.

Also if you will check Demo “What is new in Qlikview 11″ here:
http://us.demo11.qlikview.com/QvAJAXZfc/opendoc.htm?document=Whats%20New%20in%20QlikView11.qvw&host=demo11&anonymous=true , you can find the following new features:

  • mentioned above Comparative Analysis
  • Collaborative Data Visualization
  • integration with TFS
  • granular chart dimension control.
  • Conditional Enabling (dynamic add/remove) dimensions and/or expressions/metrics
  • Grid Container to show multiple objects, including another containers
  • Metadata for Charts: annotations, tips, labels/keywords, comments, mouse-over pop-up labels
  • some new actions (including Clear Field)

Omniscope 2.6 (wait is over!)

Do you want the 1st class Data Visualization on your cool Mac without any Virtual Machine with Windows? If so, your best choice will be the Omniscope 2.6 which is finally about to be released (after more then 2 years of delays) by Visokio, located in UK. Of course the Omniscope will run on Windows (most customers use it on Windows anyway) too: all it needs is Java (if needed, a private copy of Java will be installed on your computer as part of Omniscope package). You can get Omniscope Viewer on Linux workstation as well but if you need a full Omniscope 2.6 on Linux, you will have to ask Visokio about special license for you.

Java  was the problem for me, when I first heard about Omniscope, but more about that in a Special note at the end of this post. Visokio is a tiny company, started in 2002. Because of its size and private funding it took 3 years to release Omniscope 1.0 in 2005 and another 4 years to release Omniscope 2.5 in 2009,

which is what Visokio currently is still shipping. Visokio obviously have rich customers in financial (13+ clients), publishing and marketing(10+), and many other  industries and some of them in love with Apple’s Macs, but most customers prefer Windows. Omniscope is a Desktop Java application but completely integrated with internet. It has 4 editions (in both 32-bit and 64-bits versions), which are identical as far a deployment file-set concern, so all you need is buy an appropriate license. The installation process requires about 5 clicks, and user can get started by simply dragging in an Excel file and data will immediately appear and can be explored organically.

 


Omniscope Editions: Viewer, Desktop, Server, Server Plus.

 

Free Viewer allows server-less distribution of all Data Visualizations and interact fully (explore, select, filter and drill-down among other interactions) with all data, charts and reports, which are all can be easily exported to PDF, PPT, XLS and JPG files. Omniscope has zero-install “Web Start online version of free Viewer.

Omniscope Desktop/Professional ($4000 with discount for volume orders) in addition to all Viewer functionality, acts as a Development Studio for Data Visualizations (so called IOK applications are secure and compressed files, ready for easy internet delivery) and as a ETL wizard (using Drag-and-Drop Data Manager) for data:

Omniscope Desktop creates, edits and continuously refreshes all involved datasets, formulas, filters, views, layouts, even assumption-driven models, designs and export interactive Flash Data Players, embeddable into websites and into documents. Desktop able to read multidimensional cubes, just like Tableau and PowerPivot, which is a big advantage over Qlikview and Spotfire.

Omniscope Server (about $16000) adds to Desktop functionality: enables 64-bit IOK files behave (even remotely) as Central Datamarts (multi-source data assembly), as Timeslices (auto-refreshable proxies for datasources: one per each datasource), as Master Report IOK (automatically refreshed from Central Datamart IOK) and as Distributed Report IOK(s) (automatically distributed and live-refreshed from Master Report IOK), automates the refreshing of data, enables batch and scheduled distribution of customized IOK files.

Server Plus (about $24000) includes all Server functionality and adds ability to empower selected actions in free Omniscope Viewers (e.g. continuous data refreshing from Datamart IOK files, export to XLS, PPT, PDF, add/edit/save comments and queries etc.), permits unrestricted publishing of IOK visualizations, enables white labeling and branding Viewers and IOK files to customers specifications, allows multiple servers work as one.

Data Engine.

Omniscope is using in-memory Columnar Database, as all best Data Visualizers do but its architecture is different. For example, all datasets are collection of Cells (organized in column, rows and tables). Each Cell with String or Text is a separate Java Object and it leads to a large overhead in terms of memory usage (I always blame Java, which allows only 1.2GB of addressable memory for 32-bit Windows). Some usage statistics prompting that 32-bit Omniscope Desktop/Professional thinks that 5 millions cells is a large dataset and 15 millions cells is a very large dataset. According to Visokio, average client data file is around 40 fields and 50,000 records (2 million cells).

With Omniscope 2.6, experts from Visokio was able to run on 32-bit Windows PC (with 2GB of RAM) the Data Visualization with 70 millions of cells. For comparison with Qlikview I was able to fit 600+ millions of (data) cells into the same 32-bit PC, basically 9 times more data then with Omniscope and overall Omniscope is slower then competitors. As of now, Omniscope will try to use as much memory as possible in order to accelerate performance. I expect in near future the version of Omniscope with large performance and memory management improvements.

64-bit Installations of Omniscope are far more scalable, for example with 8GB of RAM 120 millions of cells was not a problem; largest known installation of Omniscope has 34 million Rows (about half of billion of cells) running on 64-bit Windows/Java PC with 16GB of RAM

In Omniscope 2.6, the DataManager can be used as an entirely new and independent application, allowing you to create and automate ETL workflows, without even loading data into the classic Omniscope interface.  You can visually drag sources in, append and merge, and transform with a variety of powerful operations such as Field Organiser which allows you to add formulas.  You can then publish, including a Batch Publisher which allows you to specify commands in another IOK file, such as “Publish [this subset] to [email] using [this view template]“, etc.

For full list of Omniscope features please check this: http://www.visokio.com/omniscope-features and for new features in version 2.6 please review this: http://www.visokio.com/omniscope-new-in-2-6 .

The original foundation of exportable Flash DataPlayer “generation” was totally re-written (for Omniscope 2.6) in ActionScript 3, which increased the scalability of DataPlayer  and added new view types/features. DataPlayers available as an experimental feature in Omniscope 2.6, and fully feature-complete in Omniscope 2.7 (I personally think that the time for Flash is gone/over and it is time to port DataPlayers into HTML5).

Visokio is confident that Omniscope 2.7 will come soon after release of Omniscope 2.6 and it will be integrated with super-popular Open Source Statistical R Library, and hopefully will contain HTML5-based DataPlayer, integration with Salesforce etc. If customers will demand, I also expect the Linux version of Omniscope at some future point.

By the way, my recent Poll is confirming that Omniscope is among Data Visualization Leaders and it got respectable 6% of votes so far! You can vote on this poll, just click here!

Special Note about Java.

While Java gave Omniscope the unique ability to run everywhere, it also gave a performance disadvantage to it, compare with my favorites Qlikview, Spotfire, Tableau and PowerPivot (all 4 written as native Windows applications).

Spotfire Silver 2.0

Spotfire Silver version 2.0 is available now on https://silverspotfire.tibco.com/us/home and it will be officially announced at TIBCO User Conference 2011 (9/27-9/29/11) at http://tucon.tibco.com/

Spotfire Silver available in 4 Editions, see Product Comparison Chart here: https://silverspotfire.tibco.com/us/product-comparison-chart and Feature List at Feature Matrix here: https://silverspotfire.tibco.com/us/get-spotfire/feature-matrix

Update 9/27/11: TIBCO officially released Silver 2.0, see http://www.marketwatch.com/story/tibco-unveils-silver-spotfire-20-to-meet-growing-demand-for-easy-to-use-cloud-based-analytics-solutions-2011-09-27 “TIBCO Silver Spotfire 2.0 gives users the ability to embed live dashboards into their social media applications, including business blogs, online articles, tweets, and live feeds, all without complex development or corporate IT resources… Overall, the software’s capabilities foster collaboration, which allows users to showcase and exchange ideas and insights — either internally or publicly. In addition, it allows users to share solutions and application templates with customers, prospects, and other members of the community.”

Spotfire Silver Personal Edition is Free (Trial for one year, can be “renewed” with other email address for free) and allows 50MB (exactly the same amount as Tableau Public) and allows 10 concurrent read-only web users of your content. If you wish more then Personal Edition you can buy Personal Plus ($99/year) or Publisher ($99/month or $1000/year) or Analyst ($399/month) Account.

In any case you will GET for your Account needs a real Spotfire Desktop Client and worry-free and hassle-free web hosting (by TIBCO) of your Data Visualization applications – you do not need to buy any hardware,  software or services for web hosting, it is all part of your Spotfire Silver account.

To test Spotfire Silver 2.0 Personal Edition I took Adventure Works dataset from Microsoft (60398 rows, which is 6 times more than Spotfire’s own estimate of 10000 rows for 50MB Web storage). Adventure Works dataset  requires 42MB as Excel XLS file (or 16M as XLSX with data compression) and only 5.6MB as Spotfire DXP file (Tableau file took approximately the same disk space, because both Spotfire and Tableau are doing a good data compression job). This 5.6MB size of DXP file for Adventure Works is just 11% of web storage allowed by Spotfire (50MB for Personal Edition) to each user of free Spotfire Silver 2.0 Personal Edition.

Spotfire Silver 2.0 is a very good and mature Data Visualization product with excellent Web Client, with Desktop Client development tool and with tutorials online here: https://silverspotfire.tibco.com/us/tutorials . Functionally (and Data Visualization-wise) Spotfire Silver 2.0 has more to offer then Tableau Public. However Tableau Public account will not expire after 1 year of “trial” and will not restrict number of simultaneous users to 10.

Spotfire Silver 2.0 Publisher and Analyst Accounts can compete successfully with Tableau Digital and they have much clear licensing then Tableau Digital (see http://www.tableausoftware.com/products/digital#top-10-features-of-tableau-digital ), which is based on number of “impressions” and can be confusing and more expensive then Spotfire Silver Analyst Edition.

Data Visualization Poll (Fall 2011)

7 months ago I published a poll on LinkedIn and got a lot of responses, 1340 votes (in average 1 vote per hour) and comments. People asked me many times to repeat this poll from time to time. I guess it is time to re-Poll. I added 2 more choices (LinkedIn allows maximum 5 choices in their polls and it is clear not enough for this poll), based on a feedback I got: Omniscope and Visual Insight/Microstrategy. I also got some angry voters complaining that certain vendors are funding this poll. This is completely FALSE, I am unaffiliated with any of vendors, mentioned in this poll and I am working for completely independent (from those vendors) software company, see the About page of this Blog.


Tableau 6.1 is released

Today Tableau 6.1 is released (and client for iPad and Tableau Public for iPad), that includes the full support for incremental Data updates whether they are scheduled or on demand:

New in Tableau 6.1

  • Incremental Data updates scheduled or on demand
  • Text parser faster, can parse any text files as data source (no 4GB limit)
  • Files larger than 2GB can now be published to Tableau Server (more “big data” support)
  • Impersonation for SQL Server and Teradata; 4 times faster Teradata reading
  • Tableau Server auto-enables touch, pinch, zoom, gesture UI for Data Views
  • Tableau iPad app is released, it browses and filters a content on Server
  • Any Tableau Client sees Server-Published View: web browser, mobile Safari, iPad
  • Server enforces the same (data and user) security on desktop, browser, iPad
  • Straight links from an image on a dashboard, Control of Legend Layout etc.

Here is a Quick demo of how to create Data Visualization with Tableau 6.1 Desktop, how easy to publish it on Tableau server 6.1 and how it is instantly visible, accessible  and touch optimized on the iPad:

 

New since Tableau 6.0, more then 60 features, including:

  • Tableau now has in-memory Data Engine, which greatly improves I/O speed
  • Support for “big” data
  • Data blending from multiple sources
  • Unique support for local PowerPivot Multidimensional Cubes as Data Source
  • Support for Azure Datamarket and OData (Open Data Protocol) as Data Sources
  • Support for parameters in Calculations
  • Motion Charts and Traces (Mark History)
  • In average 8 times faster of rendering of Data Views (compare with previous version)

Tableau Product Family

  • Desktop: Personal ($999), Professional ($1999), Digital, Public.
  • Server: Standard, Core Edition, Digital, Public Edition.
  • Free Client: Web Browser, Desktop/Offline Tableau Reader.
  • Free Tableau Reader enables Server-less distribution of Visualizations!
  • Free Tableau Public served 20+ millions visitors since inception

Tableau Server

  • Easy to install: 13 minutes + optional 10 minutes for firewall configuration
  • Tableau has useful command line tools for administration and remote management
  • Scalability: Tableau Server can run (while load balancing) on multiple machines
  • Straightforward licensing for Standard Server (min 10 users, $1000/user)
  • With Core Edition Server License: unlimited number of users, no need for User Login
  • Digital Server Licensing based on impressions/month, allows unlimited data, Tableau-hosted.
  • Public Server License: Free, limited (100000 rows from flat files) data, hosted by Tableau.

Widest (and Tableau optimized) Native Support for data sources

  • Microsoft SSAS and PowerPivot: Excel Add-in for PowerPivot, native SSAS support
  • Native support for Microsoft SQL Server, Access, Excel, Azure Marketplace DataMarket
  • Other Enterprise DBMSes: Oracle, IBM DB2, Oracle Essbase
  • Analytical DBMSes: Vertica, Sybase IQ, ParAccel, Teradata, Aster Data nCluster
  • Database appliances: EMC/GreenPlum, IBM/Netezza
  • Many Popular Data Sources: MySQL, PostgreSQL, Firebird, ODBC, OData, Text files etc.

Some old problems I still have with Tableau

  • No MDI support in Dashboards, all charts share the same window and paint area
  • Wrong User Interface (compare with Qlikview UI) for Drilldown Functionality
  • Tableau’s approach to Partners is from stone ages
  • Tableau is 2 generations behind Spotfire in terms of API, Modeling and Analytics

Excel as a BI Platform – Part 3

Below is a Part 3 of the Guest Post by my guest blogger Dr. Kadakal, (CEO of Pagos, Inc.). This article is about of how to build Dashboards and Data Visualizations with Excel. The topic is large, and the first portion of article (published on this blog 3 weeks ago) contains the the general Introduction and the Part 1 “Use of Excel as a BI Platform Today“.  The Part 2 – “Dos and Don’ts of building dashboards in Excel“ published 2 weeks ago  and Part 3 – “Publishing Excel dashboards to the Internet“ is started below and its full text is here.

As I said many times, BI is just a marketing umbrella for multiple products and technologies and Data Visualization became recently as one of the most important among those. Data Visualization (DV) so far is a very focused technology and article below shows how to publish Excel Data Visualizations and Dashboards on Web. Actually a few Vendors providing tools to publish Excel-based Dashboards on Web, including Microsoft, Google, Zoho, Pagos and 4+ other vendors:

I leave to the reader to decide if other vendors can compete in business of publishing Excel-based Dashbaords on Web, but the author of the artcile below provides a very good 3 criterias of how to select the vendor, tool and technology for it (and when I used it myself it left me only with 2 choices – the same as described in article).

Author: Ugur Kadakal, Ph.D., CEO and founder of Pagos, Inc. 

Publishing of Excel Dashboards on the Internet

Introduction

In previous article (see “Excel as BI Platform” here) I discussed Excel’s use as a Business Intelligence platform and why it is exceedingly popular software among business users. In 2nd article (“Dos&Don’ts of Building Successful Dashboards in Excel) I talked about some of the principles to follow when building a dashboard or a report in Excel. Together this is a discussion of why Excel is the most powerful self-service BI platform.

However, one of the most important facets of any BI platform is web enablement and collaboration. It is important for business users to be able to create their own dashboards but it is equally important for them to be able to distribute those dashboards securely over the web. In this article, I will discuss two technologies that enable business users to publish and distribute their Excel based dashboards over the web.

Selection Criteria

The following criteria were selected in order to compare the products:

  1. Ability to convert a workbook with most Excel-supported features into a web based application with little to no programming.
  2. Dashboard management, security and access control capabilities that can be handled by business users.
  3. On-premise, server-based deployment options.

Criteria #3 eliminates online spreadsheet products such as Google Docs or Zoho. As much as I support cloud based technologies, in order for a BI product to be successful it should have on-premise deployment options. Without on-premise you neglect the possibility of integration with other data sources within an organization.

There are other web based Excel conversion products on the market but none of them meet the criteria of supporting most Excel features relevant to BI; therefore, they were not included in this article about how to publish Excel Dashboard on Web .

Excel as a BI Platform – Part 2

Below is a Part 2 of the Guest Post by my guest blogger Dr. Kadakal, (CEO of Pagos, Inc.). This article is about of how to build Dashboards and Data Visualizations with Excel. The topic is large, and the first portion of article (published on this blog last week) contains the the general Introduction and the Part 1 “Use of Excel as a BI Platform Today“.

The Part 2 – “Dos and Don’ts of building dashboards in Excel“ is below and Part 3 – “Publishing Excel dashboards to the Internet“ is coming soon. It is easy to fall into a trap with Excel, but if  you avoid those risks as described in article below, Excel can become of one of the valuable BI and Data Visualization (DV) tool for user. Dr. Kadakal said to me recently: “if the user doesn’t know what he is doing he may end up spending lots of time maintaining the file or create unnecessary calculation errors”. So we (Dr. Kadakal and me) hope that article below can save time for visitors of this blog.

BI in my mind is a marketing umbrella for multiple products and technologies, including RDBMS, Data Collection, ETL, DW, Reporting, Multidimensional Cubes, OLAP, Columnar and in-Memory Databases, Predictive and Visual Analytics, Modeling and DV.

Data Visualization (aka DV), on other hand, is a technology, which enabling people to explore, drill-down, visually analyze their data and visually search for data patterns, like trends, clusters, outliers, etc. So BI is marketing super-abused term, while DV so far is focused technology and article below shows how to use Excel as a great Dashboard builder and Data Visualization tool.

Dos&Don’ts of Building Successful Dashboards in Excel

Introduction (click to see the full article here)

In previous week’s post (see also article “Excel as BI Platform” here) I discussed Excel’s use as a Business Intelligence platform and why it is exceedingly popular software among business users. In this article I will talk about some of the principles to follow when building a dashboard or a report in Excel.

One of the greatest advantages of Excel is its flexibility: it puts little or no constraints on the user’s ability to create their ideal dashboard environments. As a result, Excel is being used as a platform for solving practically any business challenge. You will find individuals using Excel to solve a number of business-specific challenges in practically any organization or industry. This makes Excel the ultimate business software.

On the other hand, this same flexibility can lead to errors and long term maintenance issues if not handled properly. There are no constraints on data separation, business logic or the creation of a user interface. Inexperienced users tend to build their Excel files by mixing them up. When these facets of a spreadsheet are not properly separated, it becomes much harder to maintain those workbooks and they become prone to errors.

In this article, I will discuss how you can build successful dashboards and reports by separating data, calculations and the user interface. The rest of this post you can find in this article

 Dos and Don’ts of building dashboards in Excel” here.

It discusses how to prepare Data (both static and external) for dashboards, how to build formulas and calculation models, UI and Input Controls for Dashboards and of course – Pivots,Charts, Sparklines and Conditional Formatting for innovative and powerful Data Visualizations in Excel.

Footprint Comparison for DV Leaders

Comparison of DV Tools is the most popular page (and post) of this site, visited by many thousands of people. Some of them keep asking to append this comparison with different additional features, one of them is a comparison of requirements of leading DV tools for file and memory footprint and also for reading and saving time.

I took mid-sized dataset (428999 rows and 135 columns), exported it into CSV and compressed it to ZIP format, because all native DV formats (QVW by Qlikview, DXP by Spotfire, TWBX by Tableau and XLSX by Excel and PowerPivot) are compressed one way or another. My starting filesize (of ZIPped dataset) was 56 MB. Here is what I got, see for yourself:

One comment is that numbers above are all relative to configuration of hardware used for tests and also depend on other software I ran during tests, because that software also requires RAM, CPU cycles, disk I/O and even on speed of repainting applications windows on screen, especially for Excel. I probably will add more comments to this post/page, but my first impression from this comparison is that new Tableau’s Data Engine (released in version 6.0 and soon will be updated in 6.1) made Tableau more competitive. Please keep in mind, that comparison of in-memory footprint was much less significant in above test, because Qlikview, Excel and PowerPivot putting all dataset into RAM, while Tableau and Spotfire can leave some (unneeded for visualization) data on disk, treating it as “virtual memory”. Also Tableau using 2 executables (not just one EXE as others): tableau.exe (or tabreader.exe) and tdserver64.exe

Since Tableau is the only DV Leading software, capable to read from SSAS Cubes and from PowerPivot (local SSAS) Cubes, I also took large SSAS Cube and for testing purposes I selected SSAS Sub-Cube with 3 Dimensions, 2 Measures and 156439 “rows”, measured the Time and Footprint, needed for Tableau to read Sub-Cube, Refresh it in Memory, Save to local application file, and also measurted “Cubical” Footprint of it in Memory and on Disk and then compared all results with the same tests while running Excel 2010 alone and Excel 2010 with PowerPivot:

While Tableau’s ability to read and visualize Cubes is cool, performance-wise Tableau is far behind of Excel and PowerPivot, especially in Reading department and memory footprint. In Saving department and File footprint Tableau is doing nothing because it is not saving cube locally in its local application TWBX file (and it keeps data in SSAS cube outside of Tableau) so Tableau’s file footprint for SSAS Cubes is not an indicator but for PowerPivot-based local Cubes Tableau does better job (saving data into local application file) then both Excel and PowerPivot!

Spotfire 3.3: mature, scalable, social

TIBCO released Spotfire 3.3 and first (see what is new here) that jumped to my eyes was how mature this product is. For example, among new features is improved scalability – each additional simultaneous user of a web analysis initially claims very little additional system memory:

Many Spotfire customers will be able to support a greater number of web users on their existing hardware by upgrading to 3.3. Spotfire Web Player 3.3 includes significant improvements in memory consumption (as shown above for certain scenarios). Theoretically goal is to minimize the amount of system memory needed to support larger numbers of simultaneous users on the same analysis file. Main use case here: the larger the file and the greater the number of simultaneous web users on that file, then less initial system memory required to support each additional user: it is greatly reduced compared to version 3.2.1 and earlier.

Comparison with competition and thorough testing of new Spotfire scalability has to be done (similar to what Qliktech done with Qlikview here), but my initial reaction is as I said in a Title: we are witnessing a very mature software. Apparently the Defense Intelligence Agency (DIA) agrees with me and Defense Intelligence Agency Selects TIBCO Spotfire Analytics Solutions for Department of Defense Intelligence Information System Community. “With more than 16,500 military and civilian employees worldwide, DIA is a major producer and manager of foreign military intelligence”

Spotfire 3.3 also includes collaborative bookmarking, which enables all Spotfire users  to capture a dashboard – its complete configuration, including markings, drop down selections, and filter settings and share that visualization immediately with other users of that same dashboard, regardless of client in use. Spotfire actually not just a piece of Data Visualization Software, but a real Analytical Platform with large portfolio of products, including completely integrated S-PLUS (commercial version of R Library which has million of users), best Web Client (you can go Zero-footprint with Spotfire Web Player or/and partially free Spotfire Silver), free iPad Client version 1.1.1 (requires iTunes, so be prepared for Apple intrusion), very rich API, SDK, integration with Visual Studio, support of IronPython and JavaScript , well-thought Web Architecture, set of Extension Points etc.

System requirements for Spotfire 3.3 can be found here. Coincidentally with 3.3 Release Spotfire VAR Program got expansion too. Spotfire has a very rich set of training options, see it here. You can also find set of good Spotfire videos from Colin White’s Screencast Library, especially 2011 Webcasts.

My only and large concern with Spotfire is its focus, since it is part of a large corporation TIBCO, which has 50+ products and 50+ reasons to focus on something else. Indirectly it can be confirmed with sales: my estimate that Tableau is growing much faster than Spotfire (sales-wise) and Qlikview Sales probably 3 times larger (dollar-wise) than Spotfire sales. Since TIBCO bought Spotfire in 2007, I expected Spotfire will be integrated with other great TIBCO products, but after 4 years it is still not a case… And TIBCO has no reason to change its corporate policies, since its busines is good and stock is doing well:

(at least 500% increase of share price since end of 2008!). Also see article written by Ted Stamas for SeekingAlpha and comparison of TIBX vs. ETF here:

Trend Analysis: see it 1st

Data Visualization can be a good thing for Trend Analysis: it allows to “see this” before “analyze this” and to take advantage of human eye ability to recognize trends quicker than any other methods. Dr. Ahlberg started (after selling Spotfire to TIBCO and claiming that “Second place is first loser”) a “Recorded Future” to basically sell … future trends in form (mostly) of Sparklines; he succeeded at least in selling RecordedFuture to investors from CIA and Google. Trend analysis is an attempt to “spot” a pattern, or trend, in data (in most cases well-ordered set of datapoints, e.g. by timestamps) or predict future events.

Visualizing Trends means in many cases either Time Series Chart (can you spot a pattern here with your naked eye?):

or Motion Chart (both best done by … Google, see it here http://visibledata.blogspot.com/p/demos.html ) – can you predict the future here(?):

or Sparklines (I like Sparkline implementations by Qlikview and Excel 2010) – sparklines are scale-less visualization of “trends”:

may be Scatter (Excel is good for it too):

and in some cases Stock Chart (Volume-Open-High-Low-Close, best done with Excel) – for example Microsoft stock is fluctuating near the same level for many years, so I guess there is no visible trend  here, which may be spells a trouble for Microsoft future (compare with visible trend of Apple and Google stocks):

Or you can see Motion, Timeline, Sparkline and Scatter charts alive/online below: for Motion Chart Demo, please Choose a few countries (e.g. check checkboxes for US and France) and then Click on “Right Arrow” button in the bottom left corner of the Motion Chart below:

In statistics trend analysis often refers to techniques for extracting an underlying pattern of behavior in well-ordered dataset which would otherwise be partly hidden by “noise data”. It means that if one cannot “spot” a pattern by visualizing such a dataset, then (and only then) it is time to apply regression analysis and other mathematical methods (unless you smart or lucky enough to remove a noise from your data). As I said in a beginning: try to see it first! However, extrapolating the past to the future can be a source for very dangerous mistakes (just check a history of almost any empire: Roman, Mongol, British, Ottoman, Austrian, Russian etc.)

Visual BI with Vizubi

Since many people will use Excel regardless of how good other BI and DV tools are, I am regularly comparing abilities of Excel to solve Data Visualization problems I discussed on this site. In most cases Excel 2003 is completely inappropriate and obsolete (especially visually), Excel 2007 is good only for limited DV tasks like Infographics, Data Slides, Data Presentations, Static Dashboards and Single-Chart Visualizations. Excel 2010 has some features relevant to Data Visualizations, including one of the best columnar in-memory databases (PowerPivot as free add-in), an ability to synchronize multiple Charts through slicers, a limited ability to drilldown data using slicers and even the support for both 64-bit and 32-bit. However, when comparing with Qlikview, Spotfire and Tableau the Excel 2010 feels like a stone-age tool or at least 2 generation behind as far as Data Visualization (and BI) is a concern…

That was my impression until I started to use the Excel Plugin, called Visubi (from company with the same name, see it here ). Suddenly my Excel 2003 and Excel 2007 (I keep them for historical purposes) started to be almost as capable as Excel 2010, because Visubi adding to all those versions of Excel a very capable columnar in-memory database, slicers and many features you cannot find in Excel 2010 and PowerPivot and in addition is greatly improving the functionality of Excel PivotTables and Tables! Vizubi enables me to read (in addition to usual data sources like ODBC, CSV, XLS, XLSX etc.) even my QVD files (Qlikview Data files)! Visubi, unlike PowerPivot, will create Time Dimension(s) the same way as SSAS does. All above means that users are not forced to migrate to Office 2010, but they will have many PowerPivot features with their old version of Excel. In addition Vizubi added to my Excel tables and Pivots uniques feature: I can easily switch back and forth between Table and PivotTable presentation of my data.

Most important Visubi’s feature is that all Vizubi’s tables and pivots are interactive and each piece of data is clickable and enables me to drill down/up/through my entire dataset:

It is basically equivalent or exceeded the drilldown ability of Qlikview, with one exception: Qlikview allows to do it through charts, but Vizubi does it through Tables and PivotTables. Visubi enables Excel user creates large databases with millions of rows (e.g. test database has 15 millions of rows) and enables ordinary users (non-developers) easily create Tables, Reports, Charts, Graphs and Dashboards with such database – all within familiar Excel environment using easy Drag-and-Drop UI:

Vizubi’s Database(s) enables users to share data over central datastore, while keeping Excel as a personal desktop DV (or BI) client. See Vizubi videos here and tutorials here.

Vizubi is a small (15 employees) profitable Italian company and it is a living prove that size does not matter – Vizubi did something extremely valuable and cool for Excel users that giant Microsoft failed to do for many years, even with PowerPivot. Prices for Vizubi is minimal considering the value it adds to Excel: between $99 and &279, depends on the version and the number of seats (discounts are available, see it here ).

Vizubi is not perfect (they just at version 1.21, less then one year old product), for example I wish they will support a graphical drilldown like Qlikview does (outlining rectangles right on Charts and then instant selection of appropriate subset of data ), a web client (like Spotfire) and web publishing for their functionality (even Excel 2010 supports Slicers on a web in Office Live environment), 64-bit Excel (32-bits is so 20th century), the ability to read and use SSAS and PowerPivot directly (like Tableau does), some scripting (Javascript or VBScript like Qlikview) and”formula”  language (like PowerPivot with DAX) etc.

I suggest to review these articles about Vizubi: in TDWI by Stephen Swoyer and relatively old article  from Marco Russo at SQLBlog .

Permalink: http://apandre.wordpress.com/2011/04/10/visubi/

Deloitte: me DV too (it wishes…)

Last week Deloitte suddenly declared that 2011 will be a year of Data Visualization (DV for short, at least on this site) and main technology trend in 2011 will be a Data Visualization as “Emerging Enabler”. It took Deloitte many years to see the trend (I advise to them to re-read posts by observers and analysts like Stephen Few, David Raab, Boris Evelson, Curt Monash, Mark Smith, Fern Halper and other known experts). Yes, I am welcoming Deloitte  to DV Party anyway: better late then never. You can download their “full” report here, in which they allocated first(!) 6 pages to Data Visualization. I cannot resist to notice that “DV Specialists” at Deloitte just recycling (using own words!) some stuff (even from this blog) known for ages and from multiple places on Web and I am glad that Deloitte knows how to use the Internet and how to read.

However, some details in Deloitte’s report amazed me of how they are out of touch with reality and made me wondering in what Cave or Cage (or Ivory Tower?)

these guys are wasting their well-paid time? On a sidebar of their “Visualization” Pages/Post they published a poll: “What type of visualization platform is most effective in supporting your organization’s business decision making?”. Among most laughable options to choose/vote you can find “Lotus” (hello, people, are you there? 20th century ended many years ago!), Access (what are you smoking people?), Excel (it cannot even have interactive charts and proper drilldown functionality, but yes, everybody has it), Crystal Reports (static reports are among main reasons why people looking for interactive Data Visualization alternatives), “Many Eyes” (I love enthusiasts, but it will not help me to produce actionable data views) and some “standalone options” like SAS and ILOG which are 2 generations behind of leading DV tools. What is more amazing that “BI and Reporting option” (Crystal, BO etc.) collected 30% of voters and other vote getters are “standalone option” (Deloitte thinks SAS and ILOG are  there) – 19% and “None of the Above” option got 22%!

In the second part of their 2011 Tech Trends report Deloitte declares the “Real Analytics” as a main trend among “Disruptive Deployments”. Use of word “Real Analytics” made me laugh again and reminds me some other funny usage of the word “real”: “Real Man”, Real Woman” etc. I just want to see what it will be as an “unreal analytics” or “not real analytics” or whatever real antonym for “real analytics” is.

Update: Deloitte and Qliktech form alliance in last week of April of 2011, see it here.

Permalink: http://visibledata.files.wordpress.com/2010/10/powerpivotarchitecture.jpg2011/03/29/deloitte-too/

Deloitte: me DV too (it wishes…)

Last week Deloitte suddenly declared that 2011 will be a year of Data Visualization (DV for short, at least on this site) and main technology trend in 2011 will be a Data Visualization as “Emerging Enabler”. It took Deloitte many years to see the trend (I advise to them to re-read posts by observers and analysts like Stephen Few, David Raab, Boris Evelson, Curt Monash, Mark Smith, Fern Halper and other known experts). Yes, I am welcoming Deloitte  to DV Party anyway: better late then never. You can download their “full” report here, in which they allocated first(!) 6 pages to Data Visualization. I cannot resist to notice that “DV Specialists” at Deloitte just recycling (using own words!) some stuff (even from this blog) known for ages and from multiple places on Web and I am glad that Deloitte knows how to use the Internet and how to read.

However, some details in Deloitte’s report amazed me of how they are out of touch with reality and made me wondering in what Cave or Cage (or Ivory Tower?)

these guys are wasting their well-paid time? On a sidebar of their “Visualization” Pages/Post they published a poll: “What type of visualization platform is most effective in supporting your organization’s business decision making?”. Among most laughable options to choose/vote you can find “Lotus” (hello, people, are you there? 20th century ended many years ago!), Access (what are you smoking people?), Excel (it cannot even have interactive charts and proper drilldown functionality, but yes, everybody has it), Crystal Reports (static reports are among main reasons why people looking for interactive Data Visualization alternatives), “Many Eyes” (I love enthusiasts, but it will not help me to produce actionable data views) and some “standalone options” like SAS and ILOG which are 2 generations behind of leading DV tools. What is more amazing that “BI and Reporting option” (Crystal, BO etc.) collected 30% of voters and other vote getters are “standalone option” (Deloitte thinks SAS and ILOG are  there) – 19% and “None of the Above” option got 22%!

In the second part of their 2011 Tech Trends report Deloitte declares the “Real Analytics” as a main trend among “Disruptive Deployments”. Use of word “Real Analytics” made me laugh again and reminds me some other funny usage of the word “real”: “Real Man”, Real Woman” etc. I just want to see what it will be as an “unreal analytics” or “not real analytics” or whatever real antonym for “real analytics” is.

Update: Deloitte and Qliktech form alliance in last week of April of 2011, see it here.

More updates: In August 2011 Deloitte opened “”The Real Analytics website”" here: http://realanalyticsinsights.com/ and on 9/13/11 they “Joined forces in US with Qliktech: http://investor.qlikview.com/releasedetail.cfm?ReleaseID=604843

Permalink: http://apandre.wordpress.com/2011/03/29/deloitte-too/

Dimensionality of Visible Data

Human eye has own Curse of Dimensionality (term suggested in 1961 by R.Bellman and described independently by G. Hughes in 1968). In most cases the data (before they visualized) usually organized in multidimensional Cubes (n-Cubes) and/or Data Warehouses and/or speaking more cloudy – in Data Cloud – need to be projected into less-dimensional datasets (small-dimensional Cubes, e.g. 3d-Cubes) before they can be exposed through (preferably  interactive  and  synchronized set of charts, sometimes called dashboards) 2-dimensional surface of computer monitor in form of Charts.

Projection of DataCloud to DataCubes and then to Charts

During last 200+ years people kept inventing all type of charts to be printed on paper or shown on screen, so most charts showing 2- or 3-dimensional datasets. Prof. Hans Rosling led Gapminder.org to create the web-based, animated 6-dimensional Color Bubble Motion Chart (Trendalyzer) ,

which he used in his famous demos: http://www.gapminder.org/world/ , where 6 dimensions in this specific Chart are (almost a record for 2-dimensional chart to carry):

  • X coordinate of the Bubble = Income per person,
  • Y coordinate of the Bubble = Life expectancy,
  • Size of the Bubble = Population of the Country,
  • Color of the Bubble = Continent of the Country,
  • Name of the Bubble = Country,
  • Year = animated 6th Dimension/Parameter as time-stamp of the Bubble.

Trendalyzer was bought from Gapminder in 2007 by Google and was converted into Google Motion Chart, but Google somehow is not in rush to enter the Data Visualization (DV) market.

Dimensionality of this Motion Chart can be pushed even further to 7 dimensions (dimension as an expression of measurement without units) if we will use different Shapes (in addition to filled Circles we can use Triangles, Squares etc.) but it will be literally pushing the limit of what human eye can handle. If you will add to the consideration a tendency of DV Designers to squeeze more than one chart on a screen (how about overcrowded Dashboards with multiple synchronized interactive Charts?), we are literally approaching the limits of both human eye and human brain, regardless of the dimensionality of the Data Warehouse in backend.

Below I approximately assessed the dimensionality of datasets for some popular charts (please feel free to send me the corrections). For each Dataset and respective Chart I estimated the number of measures (usually real or integer number, can be a calculation from other dimensions of dataset), the number of attributes (in many cases they are categories, enumerations or have string as datatype) and 0 or 1 parameter (presenting a well-ordered set, like time (for time series), date, year, sequence (can be used for Data Slicing), natural, integer or real  number) and Dimensionality (the number of Dimensions) as a total number of measures, attributes and parameters in a given dataset.

Chart Measures Attributes Parameter Dimensionality
Gauge, Bullet, KPI 0 0
Monochromatic Pie 1 1
Colorful Pie 1 1 2
Bar/Column 1 1 2
Sparkline 1 1 2
Line 1 1 2
Area 1 1 2
Radar 1 1 2
Stacked Line 1 1 1 3
Multiline 1 1 1 3
Stacked Area 1 1 1 3
Overlapped Radar 1 1 1 3
Stacked Bar/Column 1 1 1 3
Heatmap 1 2 3
Combo 1 2 3
Mekko 2 1 3
Scatter (2-d set) 2 1 3
Bubble (3-d set) 3 1 4
Shaped Motion Bubble 3 1 1 5
Color Shaped Bubble 3 2 5
Color Motion Bubble 3 2 1 6
Motion Chart 3 3 1 7


The diversity of Charts and their Dimensionality adding another complexity for DV Designer: what Chart(s) choose. You can find on web some good suggestions about that. Dr. Andrew Abela created Chart Chooser Diagram

Choosing a good chart by Dr. Abela

and it was even converted into online “application“!

Permalink: http://apandre.wordpress.com/2011/03/02/dimensionality/

“Quadrant” for Data Visualization Platforms

For many years, Gartner keeps annoying me every January by publishing so called “Magic Quadrant for Business Intelligence Platforms” (MQ4BI for short) and most vendors (mentioned in it; this is funny, even Donald Farmer quotes MQ4BI) almost immediately re-published it either on so-called reprint (e.g. here – for a few months) area of Gartner website or on own website; some of them also making this “report” available to web visitors in exchange for contact info – for free. To channel my feeling toward Gartner  to a  something constructive, I decided to produce my own “Quadrant” for Data Visualization Platforms (DV “Quadrant” or Q4DV for short) – it is below and is a work in-progress and will be modified and republished overtime:

3 DV Leaders (green dots in upper right corner of Q4DV above) compared with each other and with Microsoft BI stack on this blog, as well as voted in DV Poll on LinkedIn. MQ4BI report actually contains a lot of useful info and it deserved to be used as a one of possible data sources for my new post, which has more specific target – Data Visualization Platforms. As I said above, I will call it Quadrant too: Q4DV. But before I will do that, I have to comment on Gartner’s annual MQ4BI.

MQ4BI customer survey included vendor-provided references, as well as survey responses from BI users in Gartner’s BI summit and inquiry lists. There were 1,225 survey responses (funny enough, almost the same number of responces as on my DV Poll on LinkedIn), with 247 (20%) from non-vendor-supplied reference lists. Magic Quadrant Customer Survey’s results the Gartner promised to publish in 1Q11. The Gartner has a somewhat reasonable “Inclusion and Exclusion Criteria” (for Data Visualization Q4DV I excluded some vendors from Gartner List and included a few too), almost tolerable but a fuzzy BI Market Definition (based on 13 loosely pre-defined capabilities organized into 3 categories of functionality: integration, information delivery and analysis).

I also partially agree with the definition and the usage of “Ability to Execute” as one  (Y axis) of 2 dimensions for bubble Chart above (called the same way as entire report “Magic Quadrant for Business Intelligence Platforms”). However I disagree with Gartner’s order of vendors in their ability to execute and for DV purposes I had to completely change order of DV Vendors on X axis (“Completeness of Vision”).

For Q4DV purposes I am reusing Gartner’s MQ as a template, I also excluded almost all vendors, classified by Gartner as niche players with lower ability to execute (bottom-left quarter of MQ4BI), except Panorama Software (Gartner put Panorama to a last place, which is unfair) and will add the following vendors: Panopticon, Visokio, Pagos and may be some others after further testing.

I am going to update this DV “Quadrant”, using the method suggested by Jon Peltier: http://peltiertech.com/WordPress/excel-chart-with-colored-quadrant-background/ - Thank you Jon! I hope I will have time before end of 2011 for it…

Permalink: http://apandre.wordpress.com/2011/02/13/q4dv/

"Quadrant" for Data Visualization Platforms

For many years, Gartner keeps annoying me every January by publishing so called “Magic Quadrant for Business Intelligence Platforms” (MQ4BI for short) and most vendors (mentioned in it; this is funny, even Donald Farmer quotes MQ4BI) almost immediately re-published it either on so-called reprint (e.g. here – for a few months) area of Gartner website or on own website; some of them also making this “report” available to web visitors in exchange for contact info – for free. To channel my feeling toward Gartner  to a  something constructive, I decided to produce my own “Quadrant” for Data Visualization Platforms (DV “Quadrant” or Q4DV for short) – it is below and is a work in-progress and will be modified and republished overtime:

3 DV Leaders (green dots in upper right corner of Q4DV above) compared with each other and with Microsoft BI stack on this blog, as well as voted in DV Poll on LinkedIn. MQ4BI report actually contains a lot of useful info and it deserved to be used as a one of possible data sources for my new post, which has more specific target – Data Visualization Platforms. As I said above, I will call it Quadrant too: Q4DV. But before I will do that, I have to comment on Gartner’s annual MQ4BI.

MQ4BI customer survey included vendor-provided references, as well as survey responses from BI users in Gartner’s BI summit and inquiry lists. There were 1,225 survey responses (funny enough, almost the same number of responces as on my DV Poll on LinkedIn), with 247 (20%) from non-vendor-supplied reference lists. Magic Quadrant Customer Survey’s results the Gartner promised to publish in 1Q11. The Gartner has a somewhat reasonable “Inclusion and Exclusion Criteria” (for Data Visualization Q4DV I excluded some vendors from Gartner List and included a few too), almost tolerable but a fuzzy BI Market Definition (based on 13 loosely pre-defined capabilities organized into 3 categories of functionality: integration, information delivery and analysis).

I also partially agree with the definition and the usage of “Ability to Execute” as one  (Y axis) of 2 dimensions for bubble Chart above (called the same way as entire report “Magic Quadrant for Business Intelligence Platforms”). However I disagree with Gartner’s order of vendors in their ability to execute and for DV purposes I had to completely change order of DV Vendors on X axis (“Completeness of Vision”).

For Q4DV purposes I am reusing Gartner’s MQ as a template, I also excluded almost all vendors, classified by Gartner as niche players with lower ability to execute (bottom-left quarter of MQ4BI), except Panorama Software (Gartner put Panorama to a last place, which is unfair) and will add the following vendors: Panopticon, Visokio, Pagos and may be some others after further testing.

Permalink: http://apandre.wordpress.com/2011/02/13/q4dv/

Google keeps own Data Visualizations options open

Recently I had a few reasons to review Data Visualization technologies in Google portfolio. In short: Google (if it decided to do so) has all components to create a good visualization tool, but the same thing can be said about Microsoft and Microsoft decided to postpone the production of DV tool in favor of other business goals.

I remember a few years ago Google bought a Gapminder (Hans Rosling did some very impressive Demos with it a while ago)

and converted it to a Motion Chart “technology” of its own. Motion Chart (For Motion Chart Demo I did below, please Choose a few countries (e.g. check checkboxes for US and France) and then Click on “Right Arrow” button in the bottom left corner of the Motion Chart below)

(see also here a sample I did myself, using Google’s motion Chart) allows to have 5-6 dimensions crammed into 2-dimensional chart: shape, color and size of bubbles, Axes X and Y as usual (above it will be Life Expectancy and Income per Person) and animated time series (see light blue 1985 in background above – all bubbles will move as “time” goes by). Google uses this and other own visualization technologies in its very useful Public Data Explorer.

Google Fusion Tables is a free service for sharing and visualizing data online. It allows you to upload and share data, merge data from multiple tables into interesting derived tables, and see the most up-to-date data from all sources, it has  TutorialsUser’s GroupDeveloper’s Guide and sample code, as well as examples. You can check a video here:

The Google Fusion Tables API enables programmatic access to Google Fusion Tables content. It is an extension of Google’s existing structured data capabilities for developers. Developer can populate a table in Google Fusion Tables with data, from a single row to hundreds at a time. The data can come from a variety of sources, such as a local database, .CSV file, data collection form, or mobile device. The Google Fusion Tables API is built on top of a subset of the SQL querying language. By referencing data values in SQL-like query expressions, developer can find the data you need, then download it for use by your application. Your app can do any desired processing on the data, such as computing aggregates or feeding into a visualization gadget. Data can be synchronized when you add or change data in the tables in your offline repository, you can ensure the most up-to-date version is available to the world by synchronizing those changes up to Google Fusion Tables.

Everybody knows about Google Web Analytics for your web traffic, visitors, visits, pageviews, length and depth of visits, presented by very simple charts and dashboard, see sample below:

Less people know that Panorama Software has OEM partnership with Google, enabling Google Spreadsheets with SaaS Data Visualizations and Pivot Tables.

Google has Visualization API (and interactive Charts, including all standard Charts, GeoMap, Intensity Map, Map, DyGraph, Sparkline, WordCloud and other Charts) which enables developers to expose own data, stored on any data-store that is connected to the web, as a Visualization compliant datasource. The Google Visualization API also provides a platform that can be used to create, share and reuse visualizations written by the developer community at large. Google provides samples, Chart/API Gallery (Javascript-based visualizations) and Gadget Gallery.

And last but not least, Google has excellent back-end technologies needed for big Data Visualization applications, like BigTable (BigTable is a compressed, high performance, and proprietary database system built on Google File System (GFS), Chubby Lock Service, and a few other Google programs; it is currently not distributed or used outside of Google, although Google offers access to it as part of their Google App Engine) and MapReduce. Add to this list Google Maps and Google Earth

and ask yourself then: what is stopping Google to produce a Competitor for the Holy Trinity (of Qlikview+Spotfire+Tableau) of DV?

Permalink: http://apandre.wordpress.com/2011/02/08/dvgoogle/

Poll about Data Visualization tools

On New Year Eve I started on LinkedIn the Poll “What tool is better for Data Visualization? and 1340 people voted there (which is unusually high return for LinkedIn polls, most of them getting less then 1000 votes), in average one vote per hour during 8 weeks, which is statistically significant as a reflection of the fact that the Data Visualization market has 3 clear leaders (probably at least a generation ahead of all other competitors: Spotfire, Tableau and Qlikview. Spotfire is a top vote getter: as of 2/27/11, 1pm EST: Spotfire got 450 votes (34%), Tableau 308 (23%), Qlikview 305 (23% ; Qlikview result improved during last 3 weeks of this poll), PowerPivot 146 (11%, more votes then all “Other” DV Tools) and all Others DV tools got just 131 votes (10%). Poll got 88 comments (more then 6% of voters commented on poll!) , will be open for more unique voters until 2/27/11, 7pm and its results consistent during last 5 weeks, so statistically it represents the user preferences of the LinkedIn population:

URL is http://linkd.in/f5SRw9 but you need to login to LinkedIn.com to vote. Also see some demographic info (in somewhat ugly visualization by … LinkedIn) about poll voters below:

Interesting that Tableau voters are younger then for other DV tools and more then 82% voters in poll are men. Summary of some comments:

  • - poll’s question is too generic – because an answer partially depends on what you are trying to visualize;
  • - poll is limited by LinkedIn restrictions, which allows no more than 5 possible/optional answers on Poll’s question;
  • - poll’s results may correlate with number of Qlikview/Tableau/Spotfire groups (and the size of their membership) on LinkedIn and also ability of employees of vendors of respective tools to vote in favor of the tool, produced by their company (I don’t see this happened). LinkedIn has 85 groups, related to Qlikview (with almost 5000 members), 34 groups related to Tableau (with 2000+ members total) and 7 groups related to Spotfire (with about 400 members total).
  • Randall Hand posted interesting comments about my poll here:    http://www.vizworld.com/2011/01/tool-data-visualization/#more-19190 . I disagreed with some of Randall’s assessments that “Gartner is probably right” (in my opinion Gartner is usually wrong when it is talking about BI, I posted on this blog about it and Randall agreed with me) and that “IBM & Microsoft rule … markets”. In fact IBM is very far behind (of Qlikview, Spotfire and Tableau) and Microsoft, while has excellent technologies (like PowerPivot and SSAS) are behind too, because Microsoft made a strategic mistake and does not have a visualization product, only technologies for it.
  • Spotfire fans from Facebook had some “advise” from here: http://www.facebook.com/TIBCOSpotfire (post said “TIBCO Spotfire LinkedIn users: Spotfire needs your votes! Weigh in on this poll and make us the Data Visualization tool of choice…” (nothing I can do to prevent people doing that, sorry). I think that the poll is statistically significant anyway and voters from Facebook may be added just a couple of dozens of votes for … their favorite tool.
  • Among Other Data Visualization tools, mentioned in 88 comments so far were JMP, R, Panopticon, Omniscope (from Visokio), BO/SAP Explorer and Excelsius, IBM Cognos, SpreadsheetWEB, IBM’s Elixir Enterprise Edition, iCharts, UC4 Insight, Birst, Digdash, Constellation Roamer, BIme, Bissantz DeltaMaster, RA.Pid, Corda Technologies, Advizor, LogiXml,TeleView etc.

Permalink: http://apandre.wordpress.com/2011/01/26/poll/

Happy New 2011 Year!

Happy holidays to visitors of this blog and my best wishes for 2011! December 2010 was so busy for me, so I did not have time to blog about anything. I will just mention some news in this last post of 2010.

Tableau sales will exceed $40M in 2010 (and they planning to employ 300+ by end of 2011!), which is almost 20% of Qliktech sales in 2010. My guesstimate (if anybody has better data, please comment on it) that Spotfire’s sales in 2010 are about $80M. Qliktech’s market capitalization exceeded recently $2B, more than twice of Microstrategy ($930M as of today) Cap!

I recently noticed that Gartner trying to coin the new catch phrase because old (referring to BI, which never worked because intelligence is attribute of humans and not attribute of businesses) does not work. Now they are saying that for last 20+ years when they talked about business intelligence (BI) they meant an intelligent business. I think this is confusing because (at least in USA) business is all about profit and Chief Business Intelligent Dr. Karl Marx will agree with that. I respect the phrase “Profitable Business” but “Intelligent Business” reminds me the old phrase “Crocodile tears“. Gartner also saying that BI projects should be treated as a “cultural transformation” which reminds me a road paved with good intentions.

I also noticed the huge attention paid by Forrester to Advanced Data Visualization and probably for 4  good reasons (I have the different reasoning, but I am not part of Forrester) :

  • - data visualization can fit much more (tens of thousands) data points into one screen or page compare with numerical information and datagrid ( hundreds datapoints per screen);
  • - ability to visually drilldown and zoom through interactive and synchronized charts;
  • - ability to convey a story behind the data to a wider audience through data visualization.
  • - analysts and decision makers cannot see patterns (and in many cases also trends and outliers) in data without data visualization, like 37+ years old example, known as Anscombe’s quartet, which comprises four datasets that have identical simple statistical properties, yet appear very different when visualized. They were constructed by F.J. Anscombe to demonstrate the importance of Data Visualization (DV):
Anscombe’s quartet
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

In 2nd half of 2010 all 3 DV leaders released new versions of their beautiful software: Qlikview, Spotfire and Tableau. Visokio’s Omniscope 2.6 will be available soon and I am waiting for it since June 2010… In 2010 Microsoft, IBM, SAP, SAS, Oracle, Microstrategy etc. all trying hard to catch up with DV leaders and I wish to all of them the best of luck in 2011. Here is a list of some other things I still remember from 2010:

  • Microsoft officially declared that it prefers BISM over OLAP and will invest into their future accordingly. I am very disappointed with Microsoft, because it did not include BIDS (Business Intelligence Development Studio) into Visual Studio 2010. Even with release of supercool and free PowerPivot it is likely now that Microsoft will not be a leader in DV (Data Visualization), given it discontinued ProClarity and PerformancePoint and considering ugliness of SharePoint. Project Crescent (new visualization “experience” from Microsoft) was announced 6 weeks ago, but still not too many details about it, except that it mostly done with Silverlight 5 and Community Technology Preview will be available in 1st half of 2011.
  • SAP bought Sybase, released new version 4.0 of Business Objects and HANA “analytic appliance”
  • IBM bought Netezza and released Cognos 10.
  • Oracle released OBIEE 11g with ROLAP and MOLAP unified
  • Microstrategy released its version 9 Released 3 with much faster performance, integration with ESRI and support for web-serviced data
  • EMC bought Greenplum and started new DCD (Data Computing Division), which is obvious attempt to join BI and DV market
  • Panorama released NovaView for PowerPivot, which is natively connecting to the PowerPivot in-memory models.
  • Actuate’s BIRT was downloaded 10 million times (!) and has over a million (!) BIRT developers
  • Panopticon 5.7 was released recently (on 11/22/10) and adds the ability to display real-time streaming data.

David Raab, one of my favorite DV and BI gurus, published on his blog the interesting comparison of some leading DV tools. According to David’ scenario, one of possible ranking of DV Tools can be like that: Tableau is 1st than  Advizor (version 5.6 available since June 2010), Spotfire and Qlikview (seems to me David implied that order). In my recent DV comparison “my scenario” gave a different ranking: Qlikview is slightly ahead, while Spotfire and Tableau are sharing 2nd place (but very competitive to Qlikview) and Microsoft is distant 4th place, but it is possible that David knows something, which I don’t…

In addition to David, I want to thank  Boris Evelson, Mark Smith, Prof. Shneiderman, Prof. Rosling, Curt Monash, Stephen Few and others for their publications, articles, blogs and demos dedicated to Data Visualization in 2010 and before.

Permalink: http://apandre.wordpress.com/2010/12/25/hny2011/

This DV blog is a work in progress (as a website)


This blog was started just a few weeks ago and it is a work in progress, because in addition to blog’s posts it has multiple webpages and most of them will be completed over time, approximately 1 post or page per week. After a few weeks of blogging I really started to appreciate what E.M. Forster (in “Aspects of the Novel”), Graham Wallas (in “The art of thought”) and Andre Gide said almost 90 years ago: “How do I know what I think until I see what I say?”.

So yes, it is under construction as a website and it is mostly a weekly blog.

Update for 3/24/2011: This site got 22 posts since first post (since 10/12/2010, roughly one post per week), 43 (and still growing) pages (some of them incomplete and all are work in progress), 20  comments and getting in last few weeks (in average) almost 200 (this number actually growing steadily) visitors per day. I am starting to get a lot of feedback and some of new posts actually was prompted by questions and requests from visitors and by phone conversations with some of them (they asked to keep their confidentiality).

Permalink: http://apandre.wordpress.com/2010/12/03/dvblogasworkinprogress/

SAP HANA scales linearly

SAP released HANA today which does in-memory computing with in-memory database. Sample appliance with 10 blades with 32 cores (using XEON 7500) each; sample (another buzzword: “data source agnostic”) appliance costs approximately half-million of dollars. SAP claimed that”Very complex reports and queries against 500 billion point-of-sale records were run in less than one minute” using parallel processing. SAP HANA “scales linearly” with performance proportional to hardware improvements that enable complex real-time analytics.

Pricing will likely be value based and that it is looking for an all-in figure of around $10 million per deal. Each deal will be evaluated based upon requirements and during the call, the company confirmed that each engagement will be unique (so SAP is hoping for 40-60 deals in pipeline).

I think with such pricing and data size the HANA appliance (as well as other pricey data appliances) can be useful mostly in 2 scenarios:

  • when it integrates with mathematical models to enable users to discover patterns, clusters, trends, outliers and hidden dependencies and
  • when those mountains of data can be visualized, interactively explored and searched, drilled-down and pivot…

Permalink: http://apandre.wordpress.com/2010/12/01/sap-hana/

DV Comparison: Qlikview, Spotfire, Tableau, MS BI Stack

Published the comparison of 4 leading DV Products, see http://wp.me/PCJUg-1T

I did not included into comparison the 5th leading product – Visokio’s Omniscope, because it has very limited scalability due the specifics of it’s implementation: Java does not allow to visualize too much data. Among factors to considered when comparing DV tools:

  • - memory optimization [Qlikview is the leader in in-memory columnar database technology];
  • - load time [I tested all products above and PowerPivot is the fastest];
  • - memory swapping [Spotfire is only who can use a disk as a virtual memory, while Qlikview limited by RAM only];
  • - incremental updates [Qlikview probably the best in this area];
  • - thin clients [Spotfire has the the best THIN/Web/ZFC (zero-footprint) client, especially with their recent release of Spotfire 3.2 and Spotfire Silver];
  • - thick clients [Qlikview has the best THICK client] ,
  • - access by 3rd party tools [PowerPivot's integration with Excel 2010, SQL Server 2008 R2 Analysis Services and SharePoint 2010 is a big attraction];
  • - interface with SSAS cubes [PowerPivot has it, Tableau has it, Omniscope will have it very soon, Qlikview and Spotfire do not have it],
  • - GUI [3-way tie, it is heavily depends on personal preferences, but in my opinion Qlikview is more easy to use than others];
  • - advanced analytics [Spotfire 3.2 is the leader here with its integration with S-PLUS and support for IronPython and other add-ons]
  • - the productivity of developers involved with tools mentioned above. In my experience Qlikview is much more productive tool in this regard.

p003: http://wp.me/pCJUg-3R

This DV blog is a work in progress (as a website)


This blog was started just a few weeks ago and it is a work in progress, because in addition to blog’s posts it has multiple webpages and most of them will be completed over time, approximately 1 post or page per week. After a few weeks of blogging I really started to appreciate what E.M. Forster (in “Aspects of the Novel”), Graham Wallas (in “The art of thought”) and Andre Gide said almost 90 years ago: “How do I know what I think until I see what I say?”.

So yes, it is under construction as a website and it is mostly a weekly blog.

Update for 3/24/2011: This site got 22 posts since first post (since 10/12/2010, roughly one post per week), 43 (and still growing) pages (some of them incomplete and all are work in progress), 20  comments and getting in last few weeks (in average) almost 200 (this number actually growing steadily) visitors per day. I am starting to get a lot of feedback and some of new posts actually was prompted by questions and requests from visitors and by phone conversations with some of them (they asked to keep their confidentiality).

Permalink: http://apandre.wordpress.com/2010/12/03/dvblogasworkinprogress/

DV Comparison: Qlikview, Spotfire, Tableau, MS BI Stack

Published the comparison of 4 leading DV Products, see http://wp.me/PCJUg-1T

I did not included into comparison the 5th leading product – Visokio’s Omniscope, because it has very limited scalability due the specifics of it’s implementation: Java does not allow to visualize too much data. Among factors to considered when comparing DV tools:

  • - memory optimization [Qlikview is the leader in in-memory columnar database technology];
  • - load time [I tested all products above and PowerPivot is the fastest];
  • - memory swapping [Spotfire is only who can use a disk as a virtual memory, while Qlikview limited by RAM only];
  • - incremental updates [Qlikview probably the best in this area];
  • - thin clients [Spotfire has the the best THIN/Web/ZFC (zero-footprint) client, especially with their recent release of Spotfire 3.2 and Spotfire Silver];
  • - thick clients [Qlikview has the best THICK client] ,
  • - access by 3rd party tools [PowerPivot's integration with Excel 2010, SQL Server 2008 R2 Analysis Services and SharePoint 2010 is a big attraction];
  • - interface with SSAS cubes [PowerPivot has it, Tableau has it, Omniscope will have it very soon, Qlikview and Spotfire do not have it],
  • - GUI [3-way tie, it is heavily depends on personal preferences, but in my opinion Qlikview is more easy to use than others];
  • - advanced analytics [Spotfire 3.2 is the leader here with its integration with S-PLUS and support for IronPython and other add-ons]
  • - the productivity of developers involved with tools mentioned above. In my experience Qlikview is much more productive tool in this regard.

p003: http://wp.me/pCJUg-3R

Google keeps own Data Visualizations options open

Recently I had a few reasons to review Data Visualization technologies in Google portfolio. In short: Google (if it decided to do so) has all components to create a good visualization tool, but the same thing can be said about Microsoft and Microsoft decided to postpone the production of DV tool in favor of other business goals.

I remember a few years ago Google bought a Gapminder (Hans Rosling did some very impressive Demos with it a while ago)

and converted it to a Motion Chart “technology” of its own. Motion Chart (For Motion Chart Demo I did below, please Choose a few countries (e.g. check checkboxes for US and France) and then Click on “Right Arrow” button in the bottom left corner of the Motion Chart below)

(see also here a sample I did myself, using Google’s motion Chart) allows to have 5-6 dimensions crammed into 2-dimensional chart: shape, color and size of bubbles, Axes X and Y as usual (above it will be Life Expectancy and Income per Person) and animated time series (see light blue 1985 in background above – all bubbles will move as “time” goes by). Google uses this and other own visualization technologies in its very useful Public Data Explorer.

Google Fusion Tables is a free service for sharing and visualizing data online. It allows you to upload and share data, merge data from multiple tables into interesting derived tables, and see the most up-to-date data from all sources, it has  TutorialsUser’s GroupDeveloper’s Guide and sample code, as well as examples. You can check a video here:

The Google Fusion Tables API enables programmatic access to Google Fusion Tables content. It is an extension of Google’s existing structured data capabilities for developers. Developer can populate a table in Google Fusion Tables with data, from a single row to hundreds at a time. The data can come from a variety of sources, such as a local database, .CSV file, data collection form, or mobile device. The Google Fusion Tables API is built on top of a subset of the SQL querying language. By referencing data values in SQL-like query expressions, developer can find the data you need, then download it for use by your application. Your app can do any desired processing on the data, such as computing aggregates or feeding into a visualization gadget. Data can be synchronized when you add or change data in the tables in your offline repository, you can ensure the most up-to-date version is available to the world by synchronizing those changes up to Google Fusion Tables.

Everybody knows about Google Web Analytics for your web traffic, visitors, visits, pageviews, length and depth of visits, presented by very simple charts and dashboard, see sample below:

Less people know that Panorama Software has OEM partnership with Google, enabling Google Spreadsheets with SaaS Data Visualizations and Pivot Tables.

Google has Visualization API (and interactive Charts, including all standard Charts, GeoMap, Intensity Map, Map, DyGraph, Sparkline, WordCloud and other Charts) which enables developers to expose own data, stored on any data-store that is connected to the web, as a Visualization compliant datasource. The Google Visualization API also provides a platform that can be used to create, share and reuse visualizations written by the developer community at large. Google provides samples, Chart/API Gallery (Javascript-based visualizations) and Gadget Gallery.

And last but not least, Google has excellent back-end technologies needed for big Data Visualization applications, like BigTable (BigTable is a compressed, high performance, and proprietary database system built on Google File System (GFS), Chubby Lock Service, and a few other Google programs; it is currently not distributed or used outside of Google, although Google offers access to it as part of their Google App Engine) and MapReduce. Add to this list Google Maps and Google Earth

and ask yourself then: what is stopping Google to produce a Competitor for the Holy Trinity (of Qlikview+Spotfire+Tableau) of DV?

Permalink: http://apandre.wordpress.com/2011/02/08/dvgoogle/

Trend Analysis: see it 1st

Data Visualization can be a good thing for Trend Analysis: it allows to “see this” before “analyze this” and to take advantage of human eye ability to recognize trends quicker than any other methods. Dr. Ahlberg started (after selling Spotfire to TIBCO and claiming that “Second place is first loser”) a “Recorded Future” to basically sell … future trends in form (mostly) of Sparklines; he succeeded at least in selling RecordedFuture to investors from CIA and Google. Trend analysis is an attempt to “spot” a pattern, or trend, in data (in most cases well-ordered set of datapoints, e.g. by timestamps) or predict future events.

Visualizing Trends means in many cases either Time Series Chart (can you spot a pattern here with your naked eye?):

or Motion Chart (both best done by … Google, see it here http://visibledata.blogspot.com/p/demos.html ) – can you predict the future here(?):

or Sparklines (I like Sparkline implementations by Qlikview and Excel 2010) – sparklines are scale-less visualization of “trends”:

may be Scatter (Excel is good for it too):

and in some cases Stock Chart (Volume-Open-High-Low-Close, best done with Excel) – for example Microsoft stock is fluctuating near the same level for many years, so I guess there is no visible trend  here, which may be spells a trouble for Microsoft future (compare with visible trend of Apple and Google stocks):

Or you can see Motion, Timeline, Sparkline and Scatter charts alive/online below: for Motion Chart Demo, please Choose a few countries (e.g. check checkboxes for US and France) and then Click on “Right Arrow” button in the bottom left corner of the Motion Chart below:

In statistics trend analysis often refers to techniques for extracting an underlying pattern of behavior in well-ordered dataset which would otherwise be partly hidden by “noise data”. It means that if one cannot “spot” a pattern by visualizing such a dataset, then (and only then) it is time to apply regression analysis and other mathematical methods (unless you smart or lucky enough to remove a noise from your data). As I said in a beginning: try to see it first! However, extrapolating the past to the future can be a source for very dangerous mistakes (just check a history of almost any empire: Roman, Mongol, British, Ottoman, Austrian, Russian etc.)

Dimensionality of Visible Data

Human eye has own Curse of Dimensionality (term suggested in 1961 by R.Bellman and described independently by G. Hughes in 1968). In most cases the data (before they visualized) usually organized in multidimensional Cubes (n-Cubes) and/or Data Warehouses and/or speaking more cloudy – in Data Cloud – need to be projected into less-dimensional datasets (small-dimensional Cubes, e.g. 3d-Cubes) before they can be exposed through (preferably  interactive  and  synchronized set of charts, sometimes called dashboards) 2-dimensional surface of computer monitor in form of Charts.

Projection of DataCloud to DataCubes and then to Charts

During last 200+ years people kept inventing all type of charts to be printed on paper or shown on screen, so most charts showing 2- or 3-dimensional datasets. Prof. Hans Rosling led Gapminder.org to create the web-based, animated 6-dimensional Color Bubble Motion Chart (Trendalyzer) ,

which he used in his famous demos: http://www.gapminder.org/world/ , where 6 dimensions in this specific Chart are (almost a record for 2-dimensional chart to carry):

  • X coordinate of the Bubble = Income per person,
  • Y coordinate of the Bubble = Life expectancy,
  • Size of the Bubble = Population of the Country,
  • Color of the Bubble = Continent of the Country,
  • Name of the Bubble = Country,
  • Year = animated 6th Dimension/Parameter as time-stamp of the Bubble.

Trendalyzer was bought from Gapminder in 2007 by Google and was converted into Google Motion Chart, but Google somehow is not in rush to enter the Data Visualization (DV) market.

Dimensionality of this Motion Chart can be pushed even further to 7 dimensions (dimension as an expression of measurement without units) if we will use different Shapes (in addition to filled Circles we can use Triangles, Squares etc.) but it will be literally pushing the limit of what human eye can handle. If you will add to the consideration a tendency of DV Designers to squeeze more than one chart on a screen (how about overcrowded Dashboards with multiple synchronized interactive Charts?), we are literally approaching the limits of both human eye and human brain, regardless of the dimensionality of the Data Warehouse in backend.

Below I approximately assessed the dimensionality of datasets for some popular charts (please feel free to send me the corrections). For each Dataset and respective Chart I estimated the number of measures (usually real or integer number, can be a calculation from other dimensions of dataset), the number of attributes (in many cases they are categories, enumerations or have string as datatype) and 0 or 1 parameter (presenting a well-ordered set, like time (for time series), date, year, sequence (can be used for Data Slicing), natural, integer or real  number) and Dimensionality (the number of Dimensions) as a total number of measures, attributes and parameters in a given dataset.

Chart Measures Attributes Parameter Dimensionality
Gauge, Bullet, KPI 0 0
Monochromatic Pie 1 1
Colorful Pie 1 1 2
Bar/Column 1 1 2
Sparkline 1 1 2
Line 1 1 2
Area 1 1 2
Radar 1 1 2
Stacked Line 1 1 1 3
Multiline 1 1 1 3
Stacked Area 1 1 1 3
Overlapped Radar 1 1 1 3
Stacked Bar/Column 1 1 1 3
Heatmap 1 2 3
Combo 1 2 3
Mekko 2 1 3
Scatter (2-d set) 2 1 3
Bubble (3-d set) 3 1 4
Shaped Motion Bubble 3 1 1 5
Color Shaped Bubble 3 2 5
Color Motion Bubble 3 2 1 6
Motion Chart 3 3 1 7


The diversity of Charts and their Dimensionality adding another complexity for DV Designer: what Chart(s) choose. You can find on web some good suggestions about that. Dr. Andrew Abela created Chart Chooser Diagram

Choosing a good chart by Dr. Abela

and it was even converted into online “application“!

Permalink: http://apandre.wordpress.com/2011/03/02/dimensionality/

Follow

Get every new post delivered to your Inbox.