RESEARCH & RESOURCES

Best Practices for Big Data Management

A new TDWI report outlines a best practices approach to big data management.

Sooner or later, it had to happen -- an update of the time-tested term "data management" to reflect a changing technological and cultural landscape.

A new report from TDWI Research outlines a best practices approach to what its author -- Philip Russom, research director for data management with TDWI -- describes as "big data management," or BDM. Russom explains that BDM is "an amalgam of old and new best practices, skills, teams, data types, and home-grown or vendor-built functionality. All of these are expanding and realigning so that businesses can fully leverage big data, not merely manage it."

Russom's report, aptly titled Managing Big Data, approaches BDM from a holistic perspective. "Big data," he writes, "must eventually find a permanent place in enterprise data management."

It's well on its way. According to TDWI survey data, more than half of all organizations are managing big data today. In TDWI's sample of 461 completed responses, two-thirds were from IT professionals. Because of "branching" in the survey, however, "some questions were answered by only 189 respondents who have experience managing big data." Russom concedes that the proportion of responses from participants in two big data-heavy segments -- mid-to-large-sized Internet firms and corporations with $10 billion or more in annual revenues -- likely skewed results in a more pro-big data direction.

TDWI found that most organizations are managing some form of big data today. This could be "mostly structured" data, e.g., in very large database (VLDB) systems of one terabyte or more -- a big data use case cited by more than a quarter (26 percent) of respondents. It could also involve "multi-structured" data (e.g., human-readable, semi-structured, "unstructured," and other types), which was cited by almost one-third (31 percent) of respondents.

In other words, Russom writes, "big data and its management have crossed into mainstream usage."

Big data is often viewed through the lens of Web 2.0 -- e.g., the decade-long shift to Web application development; the emergence of representational state transfer (REST) as a primary app-dev paradigm; the unabated explosion in REST-ful applications and services, particularly in the social media space -- but Russom, a data management veteran, casts back even further: to the "eBusiness" boom of the 1990s.

The upshot is that organizations have been grappling with key aspects of big data (such as exploding volumes and radically compressed refresh windows) for some time. "A consequence of the post-eBusiness era is that many organizations now have massive volumes of application data to manage and to leverage for business value," he writes.

"[O]rganizations have the skills for structured data -- which is what comes out of most operational applications -- [but] today's unprecedented data volume and speed of generation make big data management a challenge."

We tend to think of multi-structured information as ancillary to -- that is, as supplementing or enriching -- the structured data produced by operational applications. The reality is that multi-structured data is actually integral to (and in many cases produced by) many business processes. Far from being supplemental, it's essential.

"Some industries have large, valuable stores of unstructured data, typically in the form of human language text. For example, the claims process in insurance generates many textual descriptions of accidents and other losses, plus the related people, locations, and events," he writes.

"Most insurance companies process this unstructured big data using technologies for natural language processing (NLP), often in the form of text analytics. The output from NLP may feed into older applications for risk and fraud analytics or actuarial calculations, which benefit from the larger data sample provided via NLP."

Other examples include sensor and machine data, the pervasiveness (and value) of which is only expected to increase over time. In addition to providing grist for analytics, this information must be stored and managed to address legal, regulatory, and other requirements.

Russom's report makes a compelling case for cultivating a strategic (as distinct from a tactical or fragmented) BDM practice: joining (or even enriching) traditional data sources with information from big data sources -- and vice-versa -- is a key "path to value," according to Russom.

He also assesses big data through the lens of what he calls its "primary path to value:" i.e., big data analytics. "It's important to beef up data management infrastructure and skills as early as possible. Otherwise, an organization can get so far behind from a technology viewpoint that it's difficult to catch up. From a business viewpoint, delaying the leverage of big data delays the business value. Similarly, capacity planning is more important than ever, and should be adjusted to accommodate the logarithmic increases typical of big data," he writes.

The report can be downloaded for free here. (Brief registration is required for users accessing TDWI reports for the first time.)

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.