Spatial Data Science: The Basics You Need to Know
James Kobielus, TDWI’s senior director of research for data management, discusses trends and topics in spatial data science.
- By Upside Staff
- August 4, 2023
In this “Speaking of Data” podcast, TDWI’s James Kobielus explores the latest developments in spatial data science with host Andrew Miller. Kobielus is senior research director for data management at TDWI. [Editor’s note: Speaker quotations have been edited for length and clarity.]
Miller began the conversation asking for a simple definition of spatial data science. “Spatial data science is data science on geospatial data -- location data, navigation data, GPS data, any data that is geocoded,” Kobielus explained. “Geospatial data science builds on and extends the capabilities of geographic information systems.” In addition, because it's data science, it focuses on understanding not just where events take place but also why those events are occurring, and what's likely to happen under various future scenarios.
When asked about principal use cases for spatial data science, Kobielus suggested several possibilities. “A core and mainstream enterprise application for spatial data science has been address management. Customer information management needs to be integrated with permanent addressing which then is geocoded so that as your customers move around you always know what their actual address is.”
Other possible uses include determining optimal locations for things such as retail outlets or manufacturing facilities, optimizing supply chain logistics, tracking inventory, personalizing user experiences on mobile devices, allowing businesses to provide targeted content (such as restaurant recommendations or other localized content), and indoor applications to help organizations optimally arrange things within warehouses or other indoor spaces.
“I used to be a product manager at a company that helped wireless engineers to build out cellular networks at a macro scale,” Kobielus said. “I also helped facilities managers with GIS tools for indoor optimization of wireless signals, Wi-Fi signals, and so forth, based on any number of variables related to the design of the facility.
“Of course, because it’s data science, a key function is using machine learning and predictive models to find patterns in geospatial data sets based not just on location data but on any number of contextual variables.”
Miller asked about the core tools, platforms, and libraries required by spatial data scientists. According to Kobielus, the tools are similar to those used by any other type of data scientist. “In addition to geographic information systems, which visualize geospatial data and enable complex analyses on that data, spatial data scientists use all the tools of data science: data mining, predictive modeling, machine learning, and data visualization capabilities so you can overlay various information over maps.” Spatial data scientists also use geocoding tools because being able to code existing data with geographic coordinates in a consistent way is absolutely essential.
With spatial data science playing such an important role in modern businesses, the conversation turned to how organizations can future-proof their data science practices.
“What you need for spatial data science and data scientists is to recognize that spatial data science is a core enterprise focus area for your data science practices,” Kobielus said. Data scientists need to be skilled and knowledgeable about the tools and techniques of spatial data science, so organizations should evaluate candidates for data science positions based on spatial data skills among other aspects of their knowledge, and internal training resources should be provided.
Organizations also need to recognize that spatial data is huge and getting bigger. As they pull in geospatial data from a multitude of sources, data scientists need the scalability and elasticity of a cloud data lake to support the storage of that data, as well as the intense processing, analytical and otherwise, of that data to do things such as name and address matching at high speed.
Another aspect of future-proofing an organization’s spatial data science practice is addressing the ethical, sustainability, and governance concerns.
“There are obviously privacy issues,” Kobielus said. “Personalization of my experience continuously based on where I am, where I'm likely to be one minute from now, one day from now, one week from now -- that definitely presents issues.” In terms of future proofing spatial data science practices, organizations need to build guardrails into their management of geospatial data. They need to be compliant with all the regulations but also be aware of the sensitivities related to tracking and surveillance.
As for sustainability, geospatial data science models, as they proliferate into the billions of dimensions, take on a huge carbon footprint. Part of the solution involves such things as adopting renewable energy sources to power data centers. Another part is reusability of models. There's a concept called “foundation models” -- models that can be easily adapted to new tasks without much (if any) retraining. That's reusability. To the extent that the reusability of existing models can be maximized, that means much less retraining is necessary and, in theory, much less carbon generated.
“We recently published a Best Practices Report about responsible data analytics,” Kobielus said. In it, survey respondents said the principles of data governance -- data quality and regulatory compliance, among others -- were at the top of their list of priorities when it comes to responsible data and analytics. Ethics and sustainability are not yet priorities, but one can hope that they will become so as society adapts to the increasing impact of AI and machine learning.
[Editor’s note: To listen to the entire conversation, replay the podcast episode on demand here.]