Data Hunting: Mastering the Big Data Challenge
The data is out there. It’s time to use it effectively to bring your big data strategy to life.
- By Troy Hiltbrand
- February 9, 2016
The term big data can be daunting, mainly because the word big seems insurmountable. As organizations begin to implement solutions to harness this big data, they face the challenge of obtaining the big data that will power their strategy. Once their strategy is in place, it becomes a hunt to locate the needed data and put reins on it to make it do what the organization needs it to do.
To effectively consume the flood of available data, the first step is to identify the end target. Organizations that start with the data and try to figure out what to do with it will waste valuable resources and time and end up drowning from the data’s sheer size. Those that identify a target question or decision and work backwards into the data will still find it difficult but will have a fighting chance.
Once a business target is identified and the problem statement is well defined and broken down into manageable chunks, the next step is to go data hunting. Many organizations start with the most attainable and best understood sources of data: their transactional systems. This data is often structured and at least partially within their control.
This does not mean that this data is always ready to use. Political and technological barriers can prevent teams from accessing and aggregating the data in order to utilize it to solve the business challenge. Even once it has been accessed and aggregated, the data might require cleansing and munging to get it into a state where it can support the decision-making process.
With today’s complex business challenges, the data inside an organization’s transactional systems only paints a portion of the picture. This is where the team has to figure out where else to source data that will bring perspective and life to their big data challenges.
Here are some other sources of data your organization can explore.
Purchased data: More companies are appearing on the market that merge data from all across the Web and offer it for a price. They will supply segments of this data to paying partners.
Open data: Federal governments and other entities have seen the value to society in creating open sets of data that can be used by the public to enhance their data offerings. This is often available in standard text-based file formats but can sometimes require special programs to read and consume the data.
Log data and machine data: The Internet of Things is becoming more “real” as devices are connected together to share data. The output logs of these connected devices and machines will become some of the largest data stores available, but it will require extensive processing to extract value from them.
Social data: With the advent of social networking, people are revealing a lot about themselves and their lives on the Web. Combining data from multiple social media sources has the potential to generate extensive and detailed profiles about customers and partners; such data can significantly augment internal data on these individuals.
Dark data: This is data that the organization has maintained on backup files or in archives and is not maintained in transactional systems. Dark data tells stories of the organization’s past that can be extremely valuable when making decisions. Examples include email and employee’s document stores. These are often highly unstructured and full of information that has little relevancy to the task at hand, but the nuggets that hide within that information can be critically important.
With a well-formed analytics strategy and defined problem set, your organization doesn’t have to be overwhelmed. The data is out there. Now it is time to find it and use it effectively to bring your strategy to life.