Case Study: Creating A Visual Dashboard of Forced Labor Around the Globe
The SAS Hackathon winner for 2023 was a group effort. Here’s how the project came about and came together, what data sources it used, who did what (and why), how the team overcame challenges, and what’s ahead for the project.
- By Upside Staff
- January 30, 2024
Slave-Free Alliance and Hope for Justice won the SAS Hackathon 2023 (in the Americas region) for the creation of a pioneering visual dashboard showing forced labor by country, industry, and commodity, pulling from a wide variety of data sources and formats.
The SAS Hackathon brings together data scientists and technology enthusiasts worldwide to tackle some of the most challenging business and humanitarian issues using analytics, AI, and open source on SAS Viya in the cloud.
TDWI spoke with members of the winning team: Tom Frost, an advisor to Human Rights in Supply Chains for the Slave-Free Alliance; Becky Lorig, senior data analyst at the University of Nevada, Las Vegas; and Zoraya Cruz-Bonilla, data research analyst at Binghamton University. Frost, Lorig, and Cruz-Bonilla all worked together to create the forced-labor visual dashboard using data analytics tools from SAS.
What was the original idea for this project and how did the project start?
Tom Frost: This was Slave-Free Alliance’s (SFA) first exposure to the SAS Hackathon and we went into the project completely open-minded. At SFA our mission is to help businesses and organizations of all sizes manage the threat of modern slavery and labor exploitation within their own organization and their supply chains. The project really kicked off with us individually highlighting our skills and experiences and putting these to immediate use to meet the short timescale of the project.
What were some of the driving forces or motivations that fueled the team?
Becky Lorig: The data science team members felt that there was no better way to use our analytics and technological skills than to address a humanitarian cause, so that really drove the project forward.
Zoraya Cruz-Bonilla: With so much turmoil going on in the world, it’s sometimes hard to see how we can make an impactful contribution to complex social issues. On top of that, it can be paralyzing to hear numbers such as 27 million people in situations of forced labor. It’s a significant statistic with human consequences, but it was also one of the driving forces that propelled us forward, knowing that we are all empowered to raise awareness of labor exploitation. As the data science team, we do that through data storytelling.
Lorig: We also had the unique opportunity of working directly with subject matter experts from Slave-Free Alliance. We were hopeful that the results of the project would bring further attention to commodities and industries with a high prevalence or an elevated risk of global labor exploitation, like Zoraya said.
Who were the key players and what were the key roles on the team? Why were these folks invited or why did they join?
Lorig: For the data science team -- my team -- we got to work with SAS mentors. A SAS mentor is a SAS employee who helps to move the project forward by showcasing the capabilities of the SAS Viya environment. They are also instrumental in setting team expectations and creating an open line of communication among all team members.
Our SAS mentors, Tom Sabo and John Stultz, reached out to a few SAS Hackathon public sector enrollees without a team and pitched our project idea along with the opportunity to use technology such as natural language processing techniques to tackle the issue.
Cruz-Bonilla: Coincidentally, everyone who accepted the invitation to join the End Forced Labor team was affiliated with higher education. That made us feel connected to each other.
I was driven by a curiosity and willingness to step into the unknown. Like others in the data science team, this was my very first SAS Hackathon and I really did not know what to expect. Our SAS mentors guided us throughout the entire journey.
Lorig: This made us all passionate about the use case. We found a natural split of roles based on individual skill sets, technological experience, and big-picture thinking. Everyone brought a different talent to the table and it was one of the most well-executed, collaborative teams I’ve worked with in my decade-long analytics career.
What role did the Slave-Free Alliance and Hope for Justice play in the project?
Frost: During the project, Hope for Justice and Slave-Free Alliance representatives imparted knowledge on global modern slavery risks relating to particular commodities and geographical locations. Hope for Justice and Slave-Free Alliance use and refer to a variety of reports and publications as part of their day-to-day operations, many of which were suggested as potential data sources for the project. SFA also recommended realistic products that can be produced as part of this project, such as a supply chain mapping capability and risk assessment tool, that could be used by a business.
What challenges did you face during the project?
Frost: Despite there being an estimated 27.6 million people experiencing forced labor worldwide, awareness and understanding of this among the general population is low. As a result, publicly available information can be difficult to find and verify. In addition, the data often comes in a variety of forms, both quantitative and qualitative, and spanning numerous time periods. This meant that a lot of work was required to identify trusted and reliable sources of data before our data analytics experts could process it.
Lorig: The 5-week timeline was certainly a big challenge. We coordinated our team meetings across continents and time zones and we started as a group of strangers -- we had never met one another before. I assumed that data sets were already available and we could immediately start analyzing data. However, the data science team realized that the bulk of data existed in unstructured forms such as news articles, reports, and documents, and there were no ready-for-analysis data sets. We had to create all our own data to analyze, which was probably the biggest challenge we faced.
We capitalized on the team’s unique skills; for example, I wrote the Python code to scrape PDF documents programmatically, and Zoraya converted the PDF tables into data using her Power BI knowledge. At this point we had the data but no expertise in using SAS Text Analytics (natural language processing or NLP) tools.
Cruz-Bonilla: If I remember correctly, one team member had limited hands-on experience using the SAS Viya environment.
Lorig: One of our SAS mentors, Tom Sabo, gave a two-hour tutorial on the technology and we ran with it. Surprisingly, learning the SAS platform and the advanced analytics tools was one of the easier tasks we encountered. From there, the data team split off into different sections, guided by data insights and our areas of interest, and we created NLP models, dynamic dashboards, and a risk assessment tool for businesses to assess their knowledge of their supply chains.
Do you recall making any significant decisions or changes to the dashboard throughout the process?
Lorig: There was little time to make sweeping changes, therefore, the data science team members worked on their sections of interest, independently culled the unnecessary or overly complicated pieces, and presented their best work product to the entire team.
Cruz-Bonilla: Becky has a good point about the short timeline of the project. One tweak I remember making in light of the tight deadline was to the initial wireframe for the dashboard. It originally included a case study to provide concrete examples of goods, services, or commodities that use forced labor, such as those from the Xinjiang Uyghur Autonomous Region. However, we made a strategic decision to cull this section because it would have required extensive background information that we could not neatly fit onto a single page. Furthermore, we wanted to save space for the risk assessment tool that was paramount to the much needed call to action.
Why did you pick the data sources you did? How often are these databases updated?
Frost: Data sources such as the U.S. Department of Labor and U.S. Department of State reports on human rights practices were selected because they span multiple years of data, which can allow for comparisons over periods of time. The Global Slavery Index is also a staple for any organization that deals in global supply chains, and it provided reliable figures as to estimated victims and the quality of the government’s efforts to tackle forced labor. Many of the databases are updated annually; however, this is not always the case, which required additional processing by our expert analysts.
Lorig: Due to the time limit of only five weeks, we had to pare down the analysis we hoped to do to one that was more reasonable. As a result, we focused mostly on issues with commodities and business sectors at risk for forced labor in their supply chains.
Most of our data sources are static reports, like Tom mentioned. Other data sources are single reports addressing an issue at one point in time, such as “Strengthening Protections Against Trafficking in Persons in Federal and Corporate Supply Chains” written by the fair labor group, Verité.
The data sets can be updated as new information becomes available and we can update the NLP model with the additional information. That would certainly be a significant goal of the project moving into the future.
Was there any training required for the models? If so, what was the process?
Lorig: Natural language processing and text analytics models can be slightly different from traditional machine learning models depending on the required outcome. Our NLP model helped us understand the trends and patterns in the written text by analyzing the chunks, or entities, of sentences and paragraphs. It was also great at extracting relationships among words and assigning sentiment scores.
In our case, we did not use NLP to predict occurrences; we used it to show patterns we wouldn’t have seen without it, such as similarities in word groupings. Therefore, we didn’t partition the data into training and validation data sets for analysis (such as machine learning models) because our end goal was not statistically supported predictions but rather frequency of problematic word groups associated with specific industries and commodities that could indicate risk for forced labor. Of course, I would recommend an enhanced analysis that does have more statistical rigor and predictive capabilities as the next iteration now that we have a foundation of understanding from our completed project.
How did you go about verifying the results of your model?
Lorig: We started with legitimate and verified data sources, such as the U.S. Department of Labor and U.S. Department of State reports, as well as documents from Verité, a trusted fair labor group. There was less concern about verifying the integrity of the source and more focus on preparing, cleaning, and merging data sets -- this is usually when data errors creep in.
For example, I discovered data quality issues around inconsistent country names that can cause incomplete or incorrect data joins. We conducted quality checks using standard data quality best practices. Unlike some machine learning models that are evaluated based on accuracy scores or similar metrics, our NLP model output served more as guideposts. We created themes and word groupings to analyze commodities and industries. The results of our models were quality checked for double-counting or similar types of errors rather than a model metric for verification.
Are there applications that use this model, or a similar model, that enterprises can make use of now?
Frost: Slave-Free Alliance utilizes several models to help businesses better understand their supply chain and address risks of modern slavery. Such models include software that can notify businesses of relevant news articles and reports that relate to countries where they operate where there may be a modern slavery risk.
Other software allows the user to input nominal data, such as names or business entities, which are then searched for association to human rights risks or labor market offenses. Both products enable businesses to add an extra layer of due diligence into the people and companies they work with.
What are your plans for the future of this project?
Lorig: This is just the start of the work that can be done addressing our understanding or documenting changes in global labor exploitation and modern-day slavery. The data science team members would certainly enjoy continuing to expand on the original project. Perhaps we could create a second iteration for the next SAS Hackathon (which is now a year-long experience for 2024).
Frost: Slave-Free Alliance continues to maintain contact with its SAS mentor and the Hackathon team, and there are some projects currently being discussed which build on the tools included in this project. This project has highlighted the potential of analytics and technological skills to address global issues such as modern slavery and human rights violations, and Slave-Free Alliance is excited to see how far this project can go.