Friday 2 December 2011

Big Data needs Big Collection and Big Execution


[Image by John M. Kennedy T]


Big Data is the new buzz it seems, and I must say I have been sceptic of it since I first saw the very word - or phrase, what is it?
As an IT architect, I've always equaled data to databases, and information to applications - and knowledge to the people on top of these

For one, I think you can very easily handle the perceived issue when dropping the data, and acting on the information instead. Since when did databases contain useful information?

Vijay Vijayasankar wrote a good post on it, and I'd like to add to that from another point of view

Structured versus unstructered has always been an important topic. It is why file-system based information storage moved up to databases. From that, relational databases were formed. All of these contain data, but combining data was the goal and has become a lot easier, simplest reason for which is the fact that one database has one database administrator who should work his magic

Big data's definition on the fabulous Wikipedia puzzles me a bit, I don't have an attention deficit disorder but this seems to be the opposite of it. Why take on petabytes of data? Brings back visuals of Don Quixote fighting the windmills...
I do understand the temptation to catch all possible signals out there, but as a human being I've found that this simply is not done. The bottleneck is my own processing power at best, but to be fair the real bottleneck in my life has been my own mindset - with my mind being set some 20-30 years ago. Regardless of how much you process, you usually process it in the same way

Having said that, let me make the connection to Big Data. First, you have to identify the Big Data you want to have. Next, you need to get it "in your hands". Then, you need to juggle it so you can analyse it, and what remains as a finished product is actionable information - in the ideal situation, that is - and then you got to act on that.
Sounds like a lengthy journey? To me it does, with quite a few unpleasant complications. It's pretty much like the daily dinner-challenge: what on earth will we eat today? First, you have to make up what that is, then you have to go out and buy it, after which you have to prepare it, then serve it, and then wait and see whether your kids will like it or not

In the above example big data would compare to huge supermarkets, that offer an abundance of food in different sizes, colours and combinations.
If you really need to cook an awful lot of that, that means you need a bigger car or maybe even cars, or a truck even, to transport that to your home. When home, you might need a bigger kitchen to lay it all out, and an extra stove to do all the cooking so it's still warm enough when you serve it. Probably you being the only cook will not suffice either, and you will need to hire extra people to do the cooking with you.
Then, le moment suprême, dinner is served! And the real trouble begins...
"Mom, I don't like this" "It doesn't taste good" "Oh not ... again!"

I see similar issues around Big Data. Maybe you can serve this big meal now thanks to big hardware and software such as clusters and cloud or Hadoop and HBase, but how do you get that data to you?
You can cook it on the Big Data stoves, but how do you get it there? And how do you make certain that that happens in a steady flow? Much like traffic, that could result in traffic jams or even traffic infarcts.
Then, when it is served, how will people react? It will be the product of a whole new way of cooking, and thus questioned

About the real-time aspect I'm equally unsure as Vijay Vijayasankar: even if you can make decisions real-time, can you execute them real-time as well? I actually see a good market for Small Data with automated decision-making; e.g. in food and retail where products have a fixed lifecycle.
I am not sure that Big Data and Business Intelligence belong together. I do see that bigger applications are encompassing bigger territory and handle more users, hence generating more data - but is that data really so important for BI? And if it is, how about the old SETI approach of federating it across machines to chop it up?

One thing is for sure: if you want to devour Big Data, make sure you have a steady flow of it inbound, and a good amount of consumers outbound. Luckily, no worries about who's doing the dishes

0 reacties:

Post a Comment

Thank you for sharing your thoughts! Copy your comment before signing in...