Big Data is more than hot. It is one of the most talked about phenomenas of the past year and will continue to be the hot topic going forward. Just like social media, there is enormous pressure on organizations to get into the Big Data game and quickly. Beyond the excitement and anxiety, there are reasons you should slow down and think about what you want to do.
Environments are complex, requiring organizations to seek technology that is plug and play and can stand up easily in diverse infrastructures. The current status of the technology market could be described as many tools that are equally complex to understand, deploy and use. Standing them up has nuances that anyone considering a Big Data solution should understand first. The nuances fall into three categories; resources, timelines and tools.
Most new data analytics technologies were created for developers and require java skills or SQL experience. The traditional data scientist who understands data modeling, on the other hand, doesn’t come from a coding background and can’t access the data they’d like to analyze. Those data integration skills lie on one side of the technical fence and the data knowledge on the other. I guess you could say data scientists are from Mars, developers are from Venus.
Data science is a challenging field. Data scientists are used to writing algorithms for others to develop, test and implement. The traditional cycle for doing that was six months or more in most industries. This waterfall approach is methodical but takes too long to stand up. The world can change in six months. Time to market is both a barrier to getting started and a competitive differentiator if you can shorten it.
Pieces of the solution exists. First and foremost, there is Hadoop, the premier product for distributed computing, which involves shuffling jobs between servers to run large scale analytics. Hadoop solves the problem of storage and parallel processing in an elegant way. While Hadoop is the rallying point for Big Data, by itself it isn’t a solution. It sometimes seems like the solution because when data gets large, there is nothing that can replace Hadoop. There’s a real expectation gap, however, between the engine that is Hadoop and the drive train that is required to do useful things with Big Data.
So what have companies done to address these issues? That’s another story.