An Alternative to Centralizing Big Data
This alternative strategy to "centralize everything" recognizes the sheer physics problem of moving, storing, and processing huge volumes of data as well as the time, cost, and risk of attempting to do so.
By Simon Moss, CEO, Pneuron Corporation
Let's face it -- we are drowning in data. As if our own transactional systems weren't enough, now we add niche applications, supplier and customer data, external data services, and now even sensors and machine data. Increasingly, enterprises are realizing that the projected costs of managing the full life cycle of information across the incredible breadth of sources is prohibitive, yet remain paralyzed by the fear of not capturing that last nugget of data that will magically convert a huge investment into a profitable return.
Most businesses have not left this first generation of big data -- the "get everything we can because we aren't sure what will be valuable" phase. Unfortunately, today's data streams are simply outpacing the capacity of nearly any firm to capture, store, integrate, manage and extract value from the data. This leaves businesses in an uncomfortable quandary – what should I take and what should I leave behind?
There is an alternative strategy to the approach of centralizing everything. It recognizes not only the sheer physics problem of moving, storing, and processing huge volumes of data but also the time, cost, and risk of attempting to do so. Instead, a purpose-driven, distributed approach to analytics is a robust complement to those circumstances where centralization is truly required. This requires firms to accept that targeting and realizing specific instances of business value is not only acceptable but a preferred alternative over "blindly" (with all due respect) vacuuming up every bit in one's path.
This distributed paradigm relies on the premise that for many business problems, the elements of value are available, albeit in a broad set of distributed and diverse sources. The amount of integration and processing is relatively light compared to the heavy-duty advanced analytics being performed on big data. However, it's no less valuable because it creates the value-added insight needed. Furthermore, one should only target and extract precisely what is needed from each source to meet the given need. This sidesteps the costly data aggregation and processing pre-requisite, accelerates time to value dramatically and critically restores a high degree of solution agility.
Those dynamics make it feasible to rapidly prototype and iterate on solution variants until the optimal mix of inputs produces the most valuable result. This approach is ideal for often-changing scenarios, where perfect knowledge is not available up front so business must quickly adapt as conditions and requirements change.
Let's examine an approach to help determine which strategy is most suitable for the given challenge at hand.
Step 1: Segment opportunities
In this step, it is crucial that you properly characterize your business opportunities and understand, with specificity, the components of value and inputs necessary to drive value realization. Although this list is lengthy, each answer will provide critical insight as to solution strategy. Key diagnostic questions include:
- What is improvement in the targeted business metric worth?
- What must I know, what must I decide and what must I act on to realize that value?
- What are the sources of data and information that I must acquire to properly inform myself of the opportunities and decisions to be made?
- Are those sources internal or external and what is the difficulty and means of capturing those sources?
- What processing must be performed against those captured elements to achieve the necessary insights?
- What is the timeliness required to deliver actionable insight into my business operations?
- What rate of change will I expect to face in all relevant portions of this "value chain?"
After you have characterized the sources, identified the potential value, and documented requirements for processing and timeliness, your solution architect will be well positioned to select the most appropriate solution strategy.
Step 2: Architect the solution
By taking the answers from the previous step, a solution architect becomes highly purpose-driven. For problems where fast time to value, lower TCO, and agility are critical to business value and the availability of information and targeted data is great, architects must examine the distributed strategy described above and detailed below. For problems where fundamental insight must be generated across large data sets, a centralized approach may well be necessary. Ultimately, many problems may require a combination of these two approaches.
Step 3: Connect to data and information non-invasively
To support fast time to value, connecting to and extracting from diverse sources becomes critical. A non-invasive approach that leverages available standards (JDBC, HTTP, FTP, etc.) without requiring installed agents is an effective way to gain access to these diverse systems. These connections can be developed or discarded with very little effort, thereby maintaining the desired agility as solution development proceeds. By leveraging these connections, specific value elements can now be discovered and extracted as part of the development process.
Step 4: Extract only what's needed
Another key tenet of the distributed approach is to extract only the element(s) that directly contribute to creating value. This keeps the solution lightweight in terms of moving, processing and possibly storing results. With the non-invasive strategy, new data elements can be readily added to the extraction while other fields are left behind without the overhead of complex extraction logic. These extractions are only taking place at run time and are guided by the specific business problem.
Step 5: Create the prototype
With a diverse mix of data sources, applications, and services typically required for modern solutions, it is critical to have a high degree of interoperability to accelerate initial solution design. The ability to quickly and easily mix and match different data sources, analytics, models, technologies, and any number of other points of value without having to forcefully integrate them provides tremendous productivity for developers to build initial prototypes. These prototypes will be especially valuable as they validate the initial assessment of not only realizable value but the effort and costs of achieving that value.
Step 6: Iterate to optimize value
A final component of this distributed approach is the ability to rapidly adapt to changing requirements and constraints. By pursuing the non-invasive, highly-targeted, and interoperable composition of solutions, designers have tremendous flexibility and agility to respond to material changes in their environment. This distributed strategy supports easily adding a new data source, changing the applied algorithms, or diverting results to new destinations. These can all be done without re-architecting and re-building the entire solution, saving tremendous time and cost while sustaining the core value proposition.
A Final Word
Architecting solutions in today's complex environment requires the ability to optimize the solution strategy best fit for the diverse needs and constraints within the environment. A distributed, targeted, and highly agile approach is a great complement to the centralized, heavy processing, and insight generation oriented approach so commonly used today. By blending these alternative approaches, your business will be best positioned to optimize your overall return on investment yet sustain the ability to rapidly adapt as you deal with the inevitable high rates of change.
Simon Moss is the chief executive officer and board member of Pneuron Corporation. Simon brings over 20 years of successful strategic leadership at CEO, partner, and board-of-director executive levels in the financial services industry.  You can contact the author at firstname.lastname@example.org.