By now everyone has been inundated with information on big data – everyone is talking about it, everyone is worrying about it, and some people are actually using it! But there’s a lot of misconceptions around what big data actually IS, so let’s clear that up right now. Big data refers to our ability to collect and analyze large and complex datasets to uncover new insights about the world around us. Across all industries and scopes of life, we’re collecting more data than ever before, and now we need to make sense of it all. In some cases, traditional database management tools and data processing applications have failed to handle these new datasets because of challenges in capturing, curating, storing, searching, sharing, transferring, analyzing, and visualizing the data. This has given rise to new technologies, including analytic databases (e.g. HP Vertica, ParAccel, ParStream), noSQL databases (e.g. MongoDB), Hadoop (e.g. Cloudera, Hortonworks), and cloud data storage applications (e.g. Amazon Redshift).
As Bernard Marr explained in his viral LinkedIn article, big data has changed our lives in many ways, enabling us to “decode human DNA in minutes, find cures for cancer, accurately predict human behavior, foil terrorist attacks, pinpoint marketing efforts, and prevent diseases.” And as we’re able to collect more and more data, we’ve found that information ABOUT everyday transactions has become more valuable than the transactions themselves.
The 3 Vs – Volume, Variety, and Velocity
The 3 Vs are known as the defining attributes of big data, introduced by Doug Laney in 2001. Not only are we seeing higher volumes of data, but that data also comes from many different sources (variety) and changes very quickly (velocity).
Volume: Volume presents the most immediate challenge to IT infrastructures. More data is better than having better models – even simple mathematical models are often very effective given large datasets. More data requires scalable storage, and a distributing approach to querying. Today’s servers are now able to store data in petabytes. To put petabytes into perspective, a single petabyte is enough to “store the DNA of the entire population of the US – and then clone them, twice.”
Variety: As any analyst knows, data preparation takes way more time than the analysis itself, and with an increasing number of data sources (like relational databases, web services, spreadsheets, proprietary systems, etc.), it has become more difficult to conduct “big picture” analysis since the data from these sources need to be combined and formatted before you can create a single chart.
Velocity: The final V, velocity, refers to the increasing rate at which data flows into an organization and how quickly it can be presented to information consumers. In the past, it might have taken months to collect, analyze, and visualize data in a simple dashboard, but today’s companies are more reliant on real-time information to influence better, faster decision-making.
Uses of Big Data
Here are a few examples of organizations that have been using big data in big ways:
- In 2012, the Obama Administration announced the “Big Data Research and Development Initiative,” which would help strengthen national security
- Amazon analyzes petabytes of data to better understand their customers purchasing habits, allowing them to recommend the most relevant, related items for purchase
- The Large Hadron Collider particle accelerator in Switzerland delivers 40 million data points every second from 150 million sensors to test predictions of different theories of particle physics
But you don’t have to be a large organization, like the Federal government or Amazon, to use big data. Even small companies are now able to collect and analyze large sets of data with the right technologies.
The Future of Big Data
In the McKinsey study, Big data: The next frontier for competition, it’s easy to see that analysis of big data is the future for productivity and innovation in the technology work. In the United States alone, there are will be an estimated “140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data” by the year 2018. So while big data is relatively new, this trend isn’t going anywhere anytime soon.