Notes

Big Data - what’s next?

Big Data is the buzzword everyone loves. The world is all about big data. Big big big - bigger than you ever thought possible. More complex. Needing more processing power than you ever thought possible. 

Let’s break it down.

In this post, I am using a Framework I came across in Roger and Mike’s Hypernet blog, thanks to a blog post by Rob Go of NextView Ventures. It talks about Technology Waves.

 image

Technology waves begin with the infrastructure, on top of which enabling technology and platforms are built. End-user applications are then built on top of enabling technology and platforms, and take the whole wave to the mass market.

Just to bring this framework to life, consider the social networking wave. Infrastructure = broadband internet, Enabling tech/platforms = Facebook, Twitter and LinkedIn, and end-user applications including social games, apps etc. One of the key points is that the winning enabling tech and platform companies almost always are super hits! This makes sense - app developers would want to build on one well-known platform, rather than having to support multiple platforms.

Let’s see how it breaks down for big data. 

Infrastructure

  • The continually lower cost of storage
  • Cloud computing with cheap hardware, that enables distributed processing of large amounts of data

Enabling technologies and platforms

  • Emergence of new types of Databases, especially the NoSQLvariety that support real-time analysis of a growing data source (e.g., Twitter streams, website logs)
  • Frameworks that support data-intensive distributed applications such as Apache Hadoop

Applications

I think this is where the primary action is at this point in time - in applications that aid developers and data scientists in big data analysis. Key examples include:

  • Cloudera, built to ease the adoption of Hadoop in the enterprise. I debate whether Hadoop belongs in the enabling tech category rather than application category, as it seems to be serving as a single technology that everyone is adopting
  • Emerging statistical programming languages such as R and SAP HANA

So where are we?

I think we are in the Application phase of wave 1 of big data - the wave whosw end users are developers and data scientists.There is a whole of set of enabling technologies that still need to be ironed out; especially on the database side where a lot of new DBs such as MongoDB, CouchDB etc. are emerging.

All this infrastructure and enabling technology is also enabling applications focused on non-technical end-users; applications that take this data and generate insights through analytics. Whether it be Lattice Engines, which uses Big Data analytics in sales, or Quant5, that provides big data analytics for Marketing, the focus is going to start turning towards what this data is supposed to provide in the end - business focused insight, decision-making and automation tools.

I believe that these two applications sets will continue to emerge, but there will be a set of standardization on the enabling technologies, and on the DB side. Surely an exciting time for all!