Top 5 Best Practices for Implementing Big Data Projects
Even as companies embark on big data initiatives, many still have unanswered questions about how to drive the most business value from them.
By Raghu Sowmyanarayanan
Businesses are always looking for ways to improve efficiencies and processes by implementing big data technologies and associated solutions. Of course, these opportunities face a variety of planning challenges and projects may come at a high price in terms of their financial cost, implementation nightmares, and people issues. Often these implementations fail miserably; they run behind schedule, finish over budget, or don't meet business expectations. To ensure a successful rollout, your project team must be able to address their most pressing issues about big data projects by following these five best practices.
1. Start Slow
There are several key success factors for implementing big data initiatives. Among them: start slow. Begin with a proof of concept or pilot project. Choose an important area where you want to improve your decision making but one that won't greatly impact. Let this initial answer the business problem you are trying to solve. The project should be operationalized only after the findings have been proven valuable and feasible from the point of view of your business model, meet compliance demands, and are technologically sound.
Select the initial project wisely. Do not force a big data solution approach if the problem statement does not need it. Be sure you take the time to have the right skills in place; there's room to learn with this project, but your staff needs at least a fundamental knowledge of big data issues. Finally, be sure you have the "right" data available -- you want to leverage the available internal and external data.
2. Collaborate with Your Project Objectives in Mind
Data teams and business units must work together to meet your business goals. Data scientists represent analysis using data and models, and they are expected to understand what the business users are trying to achieve. Conversely, business leaders should have at least a high-level understanding of what the business can achieve (and cannot achieve) with data.
Effective collaboration requires effective communications. For example, consider a business intelligence team that built a model to predict customer churn. They considered it a "fantastic" project based on hypothetical cases. The marketing department thought the model was a disaster because it wasn't 100 percent accurate. If you have a data science team that says they built a great model and a marketing team that says the model doesn't work, you have either serious people gaps or communication gaps. You must close these gaps before you begin your project.
3. Have All the Right Data
Sometimes it's not possible to answer particular questions because the data is not available. Even when the data is available, enterprises aren't always sure they're asking the right questions. Your project must deliver measurable results that have an impact, and that means having the right data and leveraging that data effectively. You can run very sophisticated regression and build very complex models -- that can be exciting -- but the bottom line is delivering to the business measurable results.
To be successful, you must decide what questions you can answer and determine if any of these questions cannot be answered by the available data. If the latter is the case, the missing data must be acquired.
Sometimes it may not be obvious that you are missing important data. For example, when you try to create an agricultural analytics model, your prevailing belief may be that weather has the best predictive impact on future farming conditions. However, you may find that a local data set reveals that the factor impacting farming in some regions is a peculiar type of pests that impact one specific type of plant when planted and that pest will not impact any other plants under the same weather conditions. You never would have discovered that observation from your hypothesis. Sometimes you have to be careful about what you think the data can tell you by testing it and reviewing the results. You might be surprised.
4. Don't Skew the Results
Human tendencies and non-representative data sets tend to skew results and lead to incorrect conclusions. It's important to make sure your data is not skewed towards a subset because even if you have a lot of data, it may not represent the entire set. If you're not representing your entire set, your conclusions will not be accurate.
Confirmation bias-- the tendency to search for, interpret, favor, and recall information in a way that confirms your beliefs or hypotheses while giving disproportionately less attention to information that contradicts it -- influences the approach to problem solving as well as the way individuals view data and results. When the purpose of your analysis is to prove a hypothesis, bias influences the data sets, tests, and outcomes. To take bias out of data, use statistical expertise to understand if you're looking at a population of consumers, making sure they're statistically representative of the questions you're asking.
5. Understand Big Data's Impact on Your Information Architecture
Big data implementations can impact organization's enterprise architecture in multiple ways.
For example, many organizations have standardized hardware, DBMSes, and analytics platforms, which not be sufficient to handle the volume, velocity, or variety of information nor the information processing demanded by big data. CIOs and CTOs need to be open to innovative forms of processing and hybrid approaches to accommodate the variety of data, structured and unstructured, internal and external.
Furthermore, the size, speed, and range of data sources you may need to manage will likely mean the data need not be physically co-located. In these scenarios, a logical data warehouse approach may be more appropriate because it can provide analytics on near-real time data and without limiting data to pre-built structures of the data warehouse's persistent data store.
As many big data initiatives are at least initially experimental in nature, their architecture must be able to scale to support an unpredictable workload. The dynamic storage and processing capacity offered by the cloud can be one way to deal with this.
For big data to provide a competitive advantage, your enterprise needs to make analytics the way you do business. Analytics needs to be a part of the corporate culture. Nowadays, the competitive advantage of data-driven organizations is no longer achieved through silos of teams but with collaboration between IT and business users. Most vital best practice is to keep business leaders and users engaged in all important decisions (such as identifying the proof-of-concept scope) and involved in validating data used at each stage.
Raghuveeran Sowmyanarayanan is a vice president at Accenture and is responsible for designing solution architecture for RFPs and opportunities. You can contact him at firstname.lastname@example.org.