Open Source Intelligence: The Information Is Out There
Chances are, someone, somewhere has already experienced most of the same problems you're about to tackle. You'd be surprised how willing they are to talk about it.
- By Steve Swoyer
- April 4, 2016
When you want to assess the maturity of your business intelligence (BI) or analytics initiatives, you start by developing a model as a basis or reference for comparison.
More than likely, you'll decide to bring in outside expertise to help you with this, usually that of a consultancy or advisory service that has developed a model and a methodology of its own. At some point, either on your own or with help from an outside counsel, you're going to contrast your own maturity or sophistication with that of other companies. In the vast majority of cases, you're going to narrow your perspective to companies that are in the same industry or vertical as you.
That would be a mistake. No, not the decision to bring in outside help. In most cases, that's usually a good call. Implementing analytics -- or, more ambitious still, retrofitting an organization for data-driven decision-making -- is an incredibly hard problem. The mistake lies in going narrow, i.e., in taking an old-school approach to sizing up the competition, the challenge, and the opportunity.
In the teeming world of open source intelligence, the information is out there.
You just have to look for it.
Even if you aren't interested in BI or analytics maturity assessment, benchmarking, or gap analysis vis-à-vis your company and others, you should be interested in open source intelligence.
There are two big issues at stake:
- You risk setting the bar too low. Perhaps you're in a comparatively backward industry -- one that hasn't been a site of much BI or analytic transformation. This isn't an excuse for you to sustain a status quo that could be destabilized at any moment. All it takes is for one of your competitors to get analytics religion. Unlike you, your competitor could determine to pursue a much more aggressive analytic strategy. More to the point, BI and analytics readiness isn't, or shouldn't be, about maintaining parity -- about doing what everybody else is doing. It's both a strategy both for competitive advantage and a strategy for managing (or, ideally, co-opting) change. BI and, especially, advanced analytics technologies give you a means to monitor, measure, and anticipate change.
- You risk missing out. There's a ton of great information out there. Seriously. It might not make sense to compare yourself against -- much less to emulate -- companies such as Google, eBay, or Netflix, but all three are doing new, imaginative, and (above all) ambitious things with data and analytics. Many of these same companies are talking about what they're doing, mostly about their successes, but sometimes (as with companies such as Netflix) about their many failures. Web hotshots aren't the only ones doing interesting stuff, either. Traditional organs are also warming up to the practice (if not the theory) of open source intelligence.
At TDWI's quarterly Executive Summits and Teradata's annual Partners Conference, companies of all sizes are keen to talk about the things they're doing with data and analytics. TDWI, Teradata, and others now post links to these presentations. If they don't, Google, Slideshare, Youtube, and other resources -- e.g., a speaker's own website -- can often be of help. Even if companies themselves aren't anxious to trumpet their successes, the human actors who enabled these successes are. Learn from them.
Why Does This Matter?
Why should we care about open source intelligence? For a whole host of reasons, starting with the fact that some kinds of analytic innovations are or can be generalized. They aren't specific to any one industry, even though their use might have been popularized in a certain vertical. They have general-purpose salience. A great example is Netflix's streaming data pipeline, which has undergone a slew of changes in the last few years. We know about these changes because Netflix is happy to talk about them on its "Tech Blog." The problems Netflix is trying to address can be generalized as follows:
- It's capturing real-time event data from several hundred different sources, ranging from video viewing activities to UI activities, error logs, and performance data.
- It's processing, preparing, and persisting that data. In its first iteration, Netflix was capturing data and persisting it to Amazon's S3 cloud storage service, there to be processed, in batch, via Amazon's Elastic MapReduce (EMR) service. In future revisions, it's still doing that.
- It's also performing real-time analytics on streaming data. In the first iteration of Netflix's data pipeline, this wasn't possible. In its second (which Netflix calls version 1.5), it used a combination of Kafka (a publish-subscribe event messaging system used for streaming ingest) and ElasticSearch (a full-text search service based on the Apache Lucene library) to support this. In its current (version 2.0) version, it uses the same technologies while feeding substreams to Spark (via its Spark Streaming library, which supports microbatch ingestion) and other consumers.
In other words, the folks at Netflix have documented the development of an in-house data pipeline designed to support streaming data ingest, processing, preparation, persistence, and -- most compelling of all -- in-flight analytics. Is Netflix marketing a blueprint you can use to develop your own event-streaming pipeline? No. It isn't intended to be. It's a resource, nothing more.
What's more, in the age of Twitter, LinkedIn, and other democratizing services, resources such as Netflix's Steven Wu, Allen Wang, and others are only a tweet, PM, InMail, or comment away. (That's right: you can comment on the Netflix Tech Blog. If you're lucky, a real, live Netflix engineer will respond.) In future postings, Netflix has promised to tackle other subjects, including:
- How it runs Kafka in the cloud at scale
- How it uses Samza, an Apache Software Foundation project designed to support near-real-time, asynchronous stream processing, to power its event-routing service
- How it manages and deploys Docker containers to support that same event-routing service
Marc Demarest, a principal in Noumenal, Inc., an international management consulting firm, cites Netflix's Tech Blog as one high-value example of the overlooked -- and often rich -- world of open source intelligence.
"Netflix does a stellar job of documenting their process of ceaseless technological transformation. In a recent blog post, one of their developers talked about the various incarnations of their high-speed streaming data processing infrastructure. This kind of transparency makes it possible for later adopters -- the folks who will travel in the paths blazed by Netflix and others -- to make new mistakes: to know, in advance, where the pitfalls, tar pits, and paths-not-taken are to be found. [It's] basic technological navigation. Why anyone would go through the process of remapping territory Netflix has already mapped is beyond me, particularly given the fact that we're all going to be deploying essentially the same real-time streaming data infrastructure, as the world migrates to real-time, event-based models for everything data-ish."
Too few companies are hip to the existence -- let alone the potential value -- of open source intelligence, Demarest argues. "I'm not one of those people who believes we should all by chasing Netflix across the parking lot. They stream videos, we make, say, washing machines. I get that. Strategy, business model, competitive dynamics -- all these things influence technology architecture and implementation decisions, but again, why re-make someone else's mistakes? Why rediscover someone else's approximations? Why remap territory?" he asks.
"We live in a world in which open source intelligence is plentiful, if distorted. I realize that a lot of people working today don't remember what it was like before the open source intelligence transformation -- when, for example, we had to pay tens of thousands of dollars to market analysts to obtain less actionable information about a competitor than we can gather from that competitor's website, today," concludes Demarest, who acknowledges that open source intelligence does have its drawbacks.
"Because it's tinctured with a strong dose of marketing, open source intelligence requires more critical analysis skills than we needed in the old, proprietary intelligence world. In some cases, too, it's still worth paying for certain kinds of proprietary intelligence. For most commercial purposes, open source intelligence should allow companies to frame up strategies and tactics, make informed choices, and generally avoid recapitulating the mistakes of competitors and fellow travelers."
About the Author
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].