Emerging Practices in Location Data Management and Analytics
Traditional geographic data combined with new geocoding is giving business operations and analytics greater precision and innovation.
- By Philip Russom
- October 31, 2018
Some of the fastest growing dimensions within data warehouses today (in terms of data volumes, structures, and analytics use) are the dimensions that record various definitions of location. Location data for operational applications such as billing, shipping, and logistics is similarly increasing rapidly.
Older dimensions that simply capture states, ZIP codes, and other traditional location units continue to be relevant because they are still needed for common tasks such as bulk mail sorting, sales region management, and analytics based on coarse units.
However, a number of applications benefit from more precise units based on longitude and latitude coordinates. Modern geocoding is enabling a new generation of super-efficient deliveries, physically distributed asset management, and analytics that need to draw spaces independent of traditional boundaries.
Location Data Challenges
Data about the precise location of people, vehicles, shipments, buildings, roads, and events can improve business operations and analytics -- but only if you can properly correlate location data with other data domains and business processes. To that end, location data and analytics face a number of challenges that a data lake, data warehouse, or similar data platform can help mitigate.
Location data is strewn across many systems. Analytics and operational reporting can benefit from a single, central data platform for aggregating location data, whether a data lake, data warehouse, or the two integrated.
Location data comes in massive volumes. A data lake on Hadoop or cloud can handle the volume and the complex calculations typical of location analytics.
Location data comes in proprietary data formats and standard structures. Data lakes, by definition, quickly store data in its original state so it can be repurposed in multiple ways later. To avoid excessive normalization processing at the time of data capture, the data in a lake is usually standardized and integrated at read time.
Location data arrives via diverse latencies, from batch to real time. A broad data integration infrastructure can handle all latencies, interfaces, transformations, and incoming data structures. The new onslaught of big data from the Internet of Things (IoT) is particularly diverse in this regard.
Location data is hard to relate to business processes. Data lakes and warehouses can provide more data for richer cross-source analytics correlations, which in turn can reveal how a location (or a group of location types) relates to more efficient delivery routes, traffic management in smart cities, creative customer views, demographics that defy traditional boundaries, and insurance fraud or other criminal activities.
Embracing New Practices
User organizations are developing a data management strategy for the capture, analytics, and operational use of location data. They are also developing infrastructure that generates the location data that new practices require. For example, they are deploying sensors in great numbers in the form of RFID chips on shipping pallets and other mobile assets. Similarly, they are using electronic devices (smartphones, tablets) and vehicles (trucks, railcars, farm equipment) with built-in sensors and GPS.
In addition, firms carefully record the location of a fixed asset when they deploy it so they can find it efficiently or perform analytics, as when an electric utility installs a meter on a building or a transformer on a light pole. All the above -- mostly from the IoT -- are generating a new onslaught of big data that is challenging to capture but very promising for new analytics and innovative ways to run a business.
The use cases just described depend largely on geocoding for their granular precision and standardization of location data. Geocoding uses GPS technology to determine spatial coordinates for longitude and latitude -- called geocodes. The resulting geocodes have become the preferred standard for describing locations, and complying with their standards guides data quality measures for modern location data.
Geocoding gives location data high precision. However, geocoding is only precise when created properly. For example, geocodes are sometimes added to legacy data by generating them from coarse units such as ZIP codes, town names, or street addresses. Codes created this way suffer the same lack of precision as their sources. The most precise geocodes are created by on-site GPS equipment, which is then curated by a domain expert (i.e., human) who registers them in a commercially available database.
This level of precision and trust is required for, say, finding the delivery entrance of a large factory or a condo front door within a gated community. Furthermore, many rural or undeveloped locations have no street address, which makes geocodes indispensable.
Creatively Define Spatial Areas
Modern tools support geometric functions that read the X-Y data points of geocoding and enable users to draw many area types, from symmetrical grids to amoeba-like shapes. Users can incorporate boundaries recorded in geocodes, such as highways, rivers, topography, and districts. They can also correlate and cluster locations that repeat in the data -- such as concentrations of customers, accidents, crimes, tornados, and diseases -- and develop areas from that information. Users may also plot points of interest and create proximity areas around them.
A good tool will automatically redraw areas as the data, triggers, and user input evolve, as well as recalculate distance and time or efficiency and risk metrics. Of course, geocoded location data lends itself to mapping and analytics visualizations. For even more functionality and business value, geocoded data should be exported to tools for GIS, reporting, analytics, and mobile apps.
Adopt the emerging practices of location data for their compelling use cases. Location data has high business value in both operations and analytics. Use it to extend existing applications and create new analytics.
Create a strategy for managing location data. Consider the data lake, integrated with a data warehouse, as a scalable data store for the storage, study, and use of location data. Deploy hefty data integration and quality functions for capturing and standardizing location data, especially when the IoT is involved. Also rely on modern tools that support GPS and geocoding, plus analytics for these.
Leverage both traditional location data and modern geocoding. Both are needed for the full range of legacy and new use cases.
To Learn More
For more information about managing location data, replay the TDWI webinar "Location Analytics for Your Data Lake: Driving New Business Insights and Outcomes."
Philip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 500 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at firstname.lastname@example.org, @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.