Tuesday, 27 May 2014

The Roadmap to 'Hadoop in the Cloud'


The Twitter ball started rolling again just now. Matt Asay posed an interesting question about Forrester suggesting Hadoop isn't a great fit for the cloud. (Even) without context Vijay Vijayasankar and I started firing off questions and answers which inevitable led to my promise of writing down the transition plan for it

Here it is
I'll start bottom-up, from an enterprise perspective, detailing what needs to be done how and why to tackle my biggest beef with #bigdata: 1) getting all that fine data fast enough and neatly fit into that big number cruncher so you can 2) make split-second market-making or -breaking decisions based upon it. I'm betting a fine bottle of wine that Vijay will do at least a slightly better job on no. 2 than I will attempt, just as reassured I am that hardly anyone will be able to contest the solution I'll propose to no. 1

I ranted on Big Data's bottleneck over two years ago but didn't provide much more than some blurry vision on what would fix the problem: it's all fair and square that you can crunch anything within hours, minutes or even seconds but that defeats the purpose if it takes you days, weeks or even months to input all that into that same number cruncher

The issue

What's the issue? The issue that all big data evangelists and protagonists try hard to evade?
It's the same issue that has prevented many TLA's of the past decade (ESB, BPM, SOA) from becoming successful and it's called the Information Problem: on a conceptual enterprise level you sell perfectly coherent services and products via solid processes but on the IT infrastructure level where all that resides, it's just disparate bits and pieces scattered across an endlessly diverse multitude of non-compatible databases, table spaces, tables and columns with different rules of entry for every single one of them

The owner of all that? Gazillions of decisions and compromises made under the stress of go-to-market, 'Quick Wins' and anything else that in regular life qualifies as a one-night-stand without much, if any, afterthought. The culprit? You the enterprise, as you failed to control it. Once labelled legacy by cunning vendors and SI in order to try to establish a mutual enemy, you should have figured out by now that diversity, differentiation and bending the rules is what makes a business tick

The problem

So. How do you feed that evolutionary chaos into your smart number cruncher e.g. Hadoop without any hassle? The answer: not in a lifetime on a regular basis, unless you adapt. How? In stages, as we all know that the boat must be kept afloat while we plug the holes

Imagine da Cloud. Pretty much like Gawd, it's one coherent single form (which it ain't of course, but we'll come back to that later). Now look at your enterprise IT landscape: hundreds and thousands of different shapes, colours, differences in ways of access paths, etcetera.
Is it gonna fit? Hell no

Don't be fooled by appearances. The diversity in form is there, but the real problem is the diversity in information. Do you have enterprise-wide definitions of products down to the letter? Of course not, each department has its own. The marketing department is only interested in certain aspects, the complaints handling department likewise, the R&D department doesn't even have the name for the product, and the sales department might even club together a few

The solution

Your enterprise-wide coherent data model will never be reflected down into every single app that supports it - not ever. So you need to feed it by bits and pieces, slowly tweaking and tuning on the side, until it does. In the end, you'll have a single Data Ware House, probably a huge database grid with massive scale-up and scale-out and fail-over capabilities, that you can replicate via a very fat pipe to your Hadoop instance in the cloud - where everybody can go crazy analysing everything. [Disclaimer] That *is* my wet dream of the moment (no sponsors or partners yet, feel free to apply)

The roadmap

1) Take an application, preferably one of your most simple and static business processes. Whenever a transaction is completed, send it off. Send it off, to your central Data Ware House (DWH). This is probably the moment where you start your DWH, and it's only fit to absorb that single request.
2) Naturally, your DWH has a translation ring around it, much like that of the European Parliament, where 24 different languages get 'un-languaged' by one single interpreter's department
3) Your DWH has its own data model, decoupled by the translation ring, that exactly fits your enterprise-wide data model on a business level. It probably starts with this one service, which is fine

4) Repeat steps 1 through 3 for other applications. You'll find that some applications deliver similar process data, hopefully from other departments, and need to either adjust your enterprise-wide data model or the app(s) delivering the data, or both, or simply drop one of the apps delivering it, then replacing it by another. Doing so, you'll establish domains across your enterprise, singling out single sources of truth - and / or accept the (usually business-) fact that you need to keep supporting more than one version of that truth; small steps, remember? And evolution - remember that too

5) Ignore the techies. Insult them. Scare them off. Kill them. Slaughter them. Keep them out of all this. This is a business exercise and don't let anybody tell you differently. If anyone does so, send them to me and I'll handle them for free, no bills sent. Scouts honour. It's all about the information, and whoever dares to mention XML and XSD should have been decapitated 15 years ago anyway - it's a clear indication that they are clueless about enterprise (machine-to-machine) information exchange

6) Your first try probably resulted in synchronising information once a day, maybe even once a week. We old folks used to call this 'batch'. Over the past two decades, batch intervals have become increasingly smaller. Some (of those same old folks) confuse this with 'online' - which we same old folks used to indicate 'pretty fast'. Real-time is the goal of course, and implies that not a second is spilled between letting 'you' know what happened to 'me'. Don't go for that from the start, as it will hamper your progress

7) The ultimate and hardest goal is to get the information across by transforming it from a blue cube (the native app) to the green peg (your enterprise-wide business data model) - where transforming it is merely a minor tech effort because you simply convert what you functionally require from that same native app into pretty much any syntax you deem fit. Everything else is -really- an afterthought

8) Take it easy. Take it slow. You will suffer from the benefit of hindsight a multitude of times but trust me, life's generally on par with evolution and this way you're at least not handicapped for the future

The short version of the long version - which will last you a life-time

You, dear CxO, need to get rid of the dependency on IT implementation. Whether you have a single platform / vendor or dozens or hundreds, you're still locked-in. You need to break free. You want to break free. You must. So you build your enterprise-wide data model that supports all your processes into the finest detail, let the apps supporting it all spit out what you functionally require, let a 2-5 people team take care of translating their odd dialects and accents into your business language, and you're done

Then, and only then, you can replicate your own DWH to the Hadoop (hey it's what started this post but it's only an example!) Cloud and crunch all you can crunch

Of course, executing on the insights gained from all that BigData crunching will take old-fashioned days, weeks and months, but I'm confident you'll find a means to do so - that's not my cup of tea but I'm sure Vijay has some thoughts on that

0 reacties:

Post a Comment

Thank you for sharing your thoughts! Copy your comment before signing in...