Geocoding: pre-treating address data to be used in analysis, display, and spatial optimization applications
Whether you are setting out to optimize delivery routes, analyse catchment areas, deploy geomarketing techniques or define the ideal sectorization for your sales force, geocoding is without question the place to start. Geocoding generates the basic units of information for building the geographic dimension of your business and fully exploiting your applications and business processes.
Let’s start with a definition:
Geocoding is an operation that consists of assigning geographic coordinates (X,Y / latitude and longitude) to a postal address so as to be able to locate it in geographical space.
Any object that already has an associated postal address, or that can be associated with a postal address, can be geocoded.
In geocoding your customer files, lists of business establishments, depots and warehouses, competitors or any other files containing addresses, you are rendering them useable and exploitable by a large range of applications for viewing, analysis and optimization. For these applications, geographic coordinates are an essential component to each record so it is accurately geolocated in a mapping space, for the purpose of applying groupings, performing calculations when plotting itineraries or routes, defining business sectors or areas of influence… Geographic coordinates are the magic ingredient of precision mapping: without them, none of these treatments will be possible.
Understanding the geocoding mechanism
Taken in isolation, geocoding is a data-enhancing operation that processes input data to match it with stored standard address data using a geocoding engine designed specifically for that purpose. The three main components of this treatment are:
- Input data, that is, a file of postal addresses in text format. It could be a flat file, an Excel file, a list extracted from a customer database, from a CRM or an ERP or another external system.
- A reference database, or database directory, in standard format, of all addresses that exist for a given territory along with the geographic coordinates of each address.
- A geocoding engine: this is the algorithm that will interpret input data and search in the database for the best possible match for each address, for the purpose of assigning the appropriate X, Y coordinates.
This is a rather simplified description, but it will serve perfectly to reflect and clarify the three main points (developed below) you need to keep in mind before finally taking the step of purchasing and using a geocoding solution.
1 – Source data quality is crucial
Armed with even the most powerful geocoding engine on the market and the most comprehensive referential database available, if the data is of low quality, the algorithm will be unable to make intelligent and accurate matches between addresses it is presented with, and those it can find in the database. When it encounters an address that is imprecise, or addresses that are incomplete or wrongly structured, the geocoding engine can behave in one of two ways:
- Generate «false positives», that is, interpret addresses wrongly and attribute incorrect geographic coordinates leading to errors in matching further down the line. If your geocoding project is aimed at sourcing a delivery scheduling application, for example, it is easy to imagine the consequences these errors could have on any operations in the field…
- Simply reject a large number of addresses for lack of an identified match. You would then have to process these rejections manually, examining and rectifying the addresses one by one…
At NOMADIA, we know from experience that businesses tend to overrate the quality of their address data files. That’s why we usually recommend carrying out an audit of the databases to geocode, and follow this up with a radical rationalisation of existing data to bring it into line with established standards for the various address fields that make up the exact postal location, town or city name, country…
Once this is done, if you decide to take the wise and proactive step of setting up systems to reduce geocoding engine error and rejection rates sustainably, the best solution is to take action upstream at the point data is initially collected, by putting in place input constraints and address standardisation help routines in your applications and address gathering forms. This way, you can side-step a host of problems treating incorrectly spelt or wildly imaginative concoctions of city or street names, incomprehensible abbreviations, post codes that don’t exist, or that are entered in the wrong field, and above all, input of incomplete addresses, for example by making it impossible to save an address if all the required fields are not filled.
2 – Reliable data makes all the difference
The choice of database is hugely important. Most geocoding solution publishers work with several suppliers, and allow their customers to select the database that most suits their needs as regards precision, geographical coverage (France, Europe, the World) and update frequency. For highest levels of precision – «to the exact address» – it will be prudent to put established databases at the top of your list: these will have been tried and tested by longstanding players in the field of high repute such as the IGN and HERE, both partners of NOMADIA, not forgetting TELE ATLAS and TOMTOM.
You can also try using so-called ‘free’ databases, sourced by community development groups that collect data and share it, so these can be used is free of charge. Because of their long experience and world coverage, OpenStreetMap is without doubt the most comprehensive offering of this type in this category. It is the more reliable for France because, since 2016, the IGN has made its aerial coverage available to the OpenStreetMap community for the whole of the French territory.
Not forgetting to mention using GoogleMaps as a database, while this rates highly in terms of quality, and data is constantly being updated, users quickly find it heavy-going if used intensively for the purpose of geocoding. Another disadvantage worth considering is that if you opt for GoogleMaps as your database, you will not be able to keep the geographic coordinates collected, nor will you be able to exploit them elsewhere in applications where you might need them. These data remain the property of Google.
So our advice is: when choosing a database, be sure to check not only that it contains all the data you need for your business perimeter or project, but you should also carefully scrutinise the terms and conditions for using data, the status of any geocoding results you obtain, and the cost of updates.
3 – Rule out a «black box» geocoding engine solution
There are two main types of geocoding solution:
- Autonomous geocoding engines that can be deployed in standalone mode, or as a bespoke integration within third party applications to source processing sequences or specific business processes.
- Geocoding modules that are increasingly being integrated as standard in packaged route optimization, geomarketing or territory sectorization solutions.
In either type of solution, you should have the option to perform all three of the following: one-off geocoding operations on-the-fly, and so, for the sake of practicality, configure geocoding to take place each time an address is entered in the source application so it is automatically geocoded at the point of entry; batch geocoding (batch mode) that is essential when handling large volumes of data; and also incremental geocoding, that makes it possible to shorten treatment times by only geocoding addresses that have been added recently to the source database. It should be possible for all these processes to be completely automated and performed in back-office conditions, or performed on demand by the user.
However you intend to deploy and use it, a geocoding engine should offer configuration options and an interface for manual error handling and treatment of rejected address records. Without these interactive and control tools, your geocoding engine will be like a black box, impenetrable for the purpose of control and manipulation of strategies for address recognition, close-match handling and machine learning. To side-step this «black box» effect, the tolerance criteria of the NOMADIA geocoding engine can be modified as necessary, and the strategies deployed when making choices for address handling fine-tuned and aligned so as to produce a desired result. Our engine also features a geocoding wizard that displays a recognition score, suggests correction options for rejected records, and gives the user an option to save any corrections applied to memory. This means you can capitalise on acquired knowledge while keeping control as to how strictly the algorithm applies its configured criteria to data being treated.
Now that you know what to watch out for when choosing a geocoding solution, you are ready to take the next step: let’s talk about your next project!