In my previous blog post about taking charge of the data problem, I mentioned that data is often the biggest challenge of self-service analytics and unfortunately, is often ignored. This is largely because organizational data is in 10, 20, even 30 different locations, making it difficult to truly have a 360 degree view of your organization.
But it’s possible – so I wanted to spend some time digging deeper into this issue, and discuss how you can obtain this comprehensive organizational view.
Step One: Find the Data
Part of “owning” the data problem is to actually acknowledge that not all of your data is in your control. Assuming that you “own” all of your data or that using only your in-house data is enough, will simply set you up to fail. While small and medium sized organizations led the charge, pretty much every business today is outsourcing their fundamental applications that are critical to running business (e.g. CRM, Marketing, Finance, HR), and the data lives with those applications.
Unfortunately, each of these cloud applications stores data in a different place, and usually in different formats. Moreover, these applications don’t share common underlying data structures or definitions of common terms such as “customer.” The data is all very different, its specific to that application, and the data is not designed for analysis.
Step Two: Get the Data
So now that you know where your data resides, you need to connect to it. And the answer is not to just download a bunch of CSV files and put it in Excel. Because there is no easy way to merge that data to create a complete view within Excel – it was just not designed to do that.
These systems were also designed for transactional applications, not for pulling data out. They are designed to accept and read data – not for complex queries and calculations required for analytical reports and dashboards. And since you want to create a report across data in multiple apps, just leveraging the basic analytic tools provided within the app will not be enough.
That’s where we need the ability to do some smart blending of the data – where anyone could match data columns that include the same information, but might be called two different things (customer vs customer name). Being able to bring these data fields together from different applications will help you begin to create a business perspective so you can analyze the data.
For most business people, this process of data blending, wrangling and enrichment is best done with a data prep tool designed them, not data scientists or engineers. There is no need for a full-fledged ETL tool, as it typically requires advanced IT skills and expertise. What is needed is a data prep experience that is easy to understand yet powerful enough to wrangle the data into “analytical shape” so it is ready to exploration.
Step 3: Manage the Data for Performance
Now you have your data ready but what to do with it. You have many questions, you need answers quickly, and you want to share the data with colleagues. You need performance. And you need a means to manage the data so it is up to date. Both are particularly challenging given data is in the cloud, in your internal systems, and likely on your computer.
The next step is the have an intermediate place to store the data for analysis. A storage place that is smart enough to self-tune for performance and simple enough for anyone to manage in order to keep the data fresh.
Now, you can get that comprehensive perspective of your business because you own the data problem. As we think about self-service analytics, and the fact that data is often the key to making it happen, we need to be able to get to the data, regardless of where it is housed, and bring it all together in a spot that allows me to blend the data and ask the kind of questions I have.
In the past, accessing the data, blending data, and storing the data for analysis were three separate sets of technology. You have drivers to get the data, ETL tools to blend the data, and data repositories, relational databases, columnar stores, etc. for storing. These are all thought of as infrastructure tools, and all require special tools or expertise that most people don’t have.
We need to recognize that for 70-80% of the time, most people don’t need that level of sophistication and power. In fact it’s the opposite. We need something that is simple enough to be used – so it is used. The first step is making self-service analytics a reality. When we are able to give users this – they will be able to succeed.
After all, we do not want to let “perfect be the enemy of good.” There’s a better and faster way to squeeze out the remaining 20-30%, but you need self-service analytics working in conjunction with your IT delivered analytics. But that is for another time…