Case Study: NationBuilder Goes Data Wrangling
Integrating voter data from dozens of sources in dozens of formats presented a data quality problem for NationBuilder.
- By Brian J. Dooley
- April 10, 2017
NationBuilder, a community software platform for organizing political campaigns, had a big data problem. Its mission depends on gathering voter data from across the country, normalizing it, and regularly updating it to include current voter registration and recent election information. National elections loomed, and the company needed to draw together a huge collection of voter records in a hurry. Data access was a major issue.
The data existed in a wide variety of formats, with different standards for data entry, across 50 states and more than 3,000 counties, which only added to the challenge. "Voter records are not all in one format or even in one place," says VP of professional services Gina Davis.
"It requires a fair amount of legwork to gather data from Secretaries of State and other agencies and officials. For different campaigns, you may need to get registered voter data at the state level, by individual county, or by subdivisions of a county. There's a lot of calling and emailing.
"Once you have the data, it's in different formats. Sometimes it's mailed, sometimes it's on CD, and sometimes it's by FTP. It's absolutely different for every touchpoint. Then, when all of this data is gathered, you need to normalize it across the country -- checking for differences in names and addresses, as well as different references to political boundaries, and making sure it's all consistent. Our customers may run campaigns across districts; campaigns may be local or national, and being able to interact with details in the same way is essential."
Voter data is critical to the success of any campaign, and NationBuilder's strength lies in focusing on candidates and people who want to make a change -- whether it be Britain's Brexit campaign, Trump's U.S. election, or any of the thousands of political and NGO campaigns, small and large, across the U.S. and around the world. People need to be organized by email, social media, and knocking at doors. Everything starts with having the right data.
The Wrangling Requirement
NationBuilder is a cross between a content management system and a customer relationship management (CRM) package, geared towards attracting people to engage with a cause. It provides supporter management through a workflow that tracks how individual supporters are contributing to the campaign result. Sharing on Facebook, Twitter, LinkedIn, and other sites is interwoven into the platform, as is access to social media data. This makes it possible to tie campaign goals to individual social media actions. All of this demands accurate data.
"The most difficult thing was data availability," says Davis. "It's not just 50 format variations; there are differences between as many as 50 different areas in a single state. Then there are constant changes in the voter files. People are cycling in and out of voter rolls. Everything we thought we had figured out would suddenly need to change. That became the biggest challenge; we were using a lot of hard coding.
"Another issue is holes in the data -- items that are wrong or missing. You may need to get information out in a hurry, but if there are holes, you don't have a comprehensive view."
Having begun with a manual solution based on Ruby scripts, NationBuilder staff soon realized that this was unworkable across the whole country on a routine basis. In creating the giant voter database, they were only able to get through five or six states before recognizing that they had a problem. The company was growing, and the need for immediate access to accurate records was becoming more acute. To solve the problem, Trifacta was called in to create a semiautomated solution that would handle the numerous conversion challenges but still provide plenty of oversight and control.
Trifacta is designed to enable users to transform and enrich raw, complex data into clean and structured formats for analysis through self-service data preparation. Its approach focuses on creating a "partnership between user and machine" to maximize human domain expertise and data gathering with machine learning and automation.
"Trifacta takes all of what we have from various sources, all of the wrangle scripts that we have written, identifies what needs cleanup, standardizes the data points and language (making it consistent), then clears up bad data. Particular issues are voter records that don't contain enough contact information, people who have died, and people who are too young to vote. Cleaning up all of this is done through Trifacta."
Once the Trifacta solution was in place, NationBuilder was able to successfully complete its mission of assembling a complete and updatable voter database for its clients. It did take some learning, but the data can now also be refreshed with much less intervention.
On to the Future
NationBuilder is growing quickly and moving out into new territory. Its base is politics, but it is also using its campaign leadership tools for causes and even to support artists who need to develop campaigns. The capability to make better use of voter data and support more demographic analysis is also coming. Voter records provide complex data that can be used to make campaigns more efficient and effective.
Brian J. Dooley is an author, analyst, and journalist with more than 30 years' experience in analyzing and writing about trends in IT. He has written six books, numerous user manuals, hundreds of reports, and more than 1,000 magazine features. You can contact the author at firstname.lastname@example.org.