Create, deploy, and maintain analytic applications that engage users and drive revenue. See a Logi demo

BI Trends

Big Data 101 – NoSql, Hadoop, Analytic DS, RDBMS Differences for Business People

By David Abramson | March 3, 2015
Share on LinkedIn Tweet about this on Twitter Share on Facebook

The following post was coauthored by David Abramson, director of product management, Logi Analytics, and Steven Schneider, VP of sales and business development, Logi Analytics, and was originally published on Slinging Software

Confusion Reigns – The basic differences between Hadoop, NoSql, Analytic Data Stores & traditional databases.

Organizations are now creating more data than ever before, and as such a new set of tools and technologies are becoming popular to facilitate the storage and retrieval of this information in a timely and cost-effective manner.  There are many technologies that are attempting to address these challenges, and as such there are different (and often incompatible) approaches, each with positives and negatives depending on the use-case.

While initially big data was synonymous with Hadoop, through aggressive vendor marketing and leadership discussion, the term has broadened to it mean “a lot of data” and a wider set of data storage technologies.  At a high-level, there are four competing sets of data storage/access technologies that you are likely to hear about related to big-data:

RDBMS

Analytic Data Stores

NoSql

Hadoop

Description

Traditional row-column databases used for both transactional systems, reporting, and archiving.

Optimized for data-access (as opposed to writes) and leverage columnar or in-memory technology to provide fast data access at the expense of write-performance limitations.

Designed for rapid access to “key-value” pair combinations.  Useful for products like Facebook and Twitter where most information revolves around one key piece of data.

An open-source approach to storing data in a file system across a range of commodity hardware and processing it utilizing parallelism (multiple systems at once)

Examples

Sql Server, MySql, Oracle, etc

Vertica, Kognitio, ParAccel, Netezza, InfoBright, Amazon RedShift

MongoDB, Cassandra

Hadoop implementations by CloudEra, Intel, Amazon, Hortonworks

Good for…

Reads & Writes, “reasonable” data sets (< 1B rows)

Storing lots of information, great query/retrieval speeds.

Storing information of a certain type, great retrieval speed based on a key, write performance

Inexpensive storage of mass data, structured & semi-structured

Not good for…

Massive data volumes, unstructured & semi-structured data

Unstructured & semi-structured data, writes (one at a time)

Not used for grouping information across keys (such as for reporting)

Complex, code-based, incompatible approaches in market, writes (one at a time)

Notes

Challenging to “scale-out”

Often viewed as an alternative to traditional RDBMS when read performance is important

Enables faster productivity when creating data-driven applications as there is less up-front design work needed

Strong bias to the open-source community & Java

 

About the Author

David Abramson has more than 10 years-experience in full lifecycle product development and management, from product inception through general availability. He has shepherded multiple analytics and business intelligence products, and has worked with hundreds of customers, both enterprises and ISVs, to support data-driven application implementations.

Subscribe to the latest articles, videos, and webinars from Logi.