Bash one liner for batch geocoding CSV/Text filesSeptember-3-2016:
Of all the various clients interfacing with our geocoding API, the simplest one deserves some praise.
Here is a bash one liner that takes as input a CSV file of locations, and outputs the file with the location information appended at the end of each line:
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line,`curl -X POST -d locate="$line" -d geoit="csv" https://geocoder.ca`;
done < "$1"
Save as geocode.sh then chmod a+x geocode.sh then run it as:
Your input file might look something like:
20 WYPER SQ , Toronto, ON M1S0B3
415 LAKEWAY, Ding Dong, TX 76549
Yonge and Dundas, Toronto
and the output
20 WYPER SQ , Toronto, ON M1S0B3,200,1,43.790920,-79.245789
415 LAKEWAY, Ding Dong, TX 76549,200,0.8,30.976936,-97.817796
Yonge and Dundas, Toronto,200,0.9,43.655778,-79.380650
Four columns were appended to the input file. The status (200 success), the confidence score (a number between 0 and 1, with 1 being best match), the Latitude and the Longitude.
Batch Geocode / GeoJson and GeoCluster optionsAugust-15-2016:
You can now export your batch geocoded files as GeoJson or view on a map as a Geo Cluster.
Click here for more information.
Crowdsourced Postal Code Data and more Census InformationJuly-15-2016:
We've added Dissemination area codes and Federal electoral district data.
Click Here for more information.
Crowdsourced Postal Code Data and Census InformationJune-1-2016:
We've added a new file to our crowdsourced postal code dataset, the Crowdsourced Postal Code Data and Census Information File. This links postal codes to census information. The census data we are including are: "CTUID","CTNAME","CMAUID","CMANAME","CMATYPE","CMAPUID".
Click Here for more information.
Geocoder.ca has come a long way to be the most accurate geocoding service in North America (99% coverage and 94% accuracy rate in both USA and Canada.)
Those numbers can be verified independently by a) obtaining a random set of locations from Foursquare with a large number of checkins - i.e. ground truth, b) Geocoding each location and comparing the result to the Foursquare location which must be accurate with a very high probability (because it is highly unlikely that when someone checks in to a place, their device sends the wrong latitude, longitude.)
I've also compared other geocoding services using the same method, with Google Geocoder coming a close second (99% coverage, 93% accuracy)
You can download my test data here: https://github.com/eruci/openaddresses/tree/master/test (this data was generated as part of a geocoding talk I gave at Fosdem 2016: Geocoding on the cloud)
There is a ways to go to 100% coverage and 100% accuracy, still.
Fulltext Geocoding - Extracting and Geocoding Locations from free form textJanuary-27-2016:
We have a released a Fulltext Geocoder, which matches and extracts one or more locations from a body of text, such as street addresses, street intersections and city names.
The matches are ordered on confidence score and may be obtained in XML, CSV, JSON or Web format.
Currently it extracts North American locations (Canada, USA and Mexico), but more countries will be added soon based on the openaddresses.io free data repo.
This and That and the Other street in Porters Lake Nova Scotia
Fulltext Geocoding references:
RoofTop vs InterpolatedDecember-18-2015:
A rooftop result is when an exact match to a single address is found (this result is generally considered to be more accurate, because the location is placed at the middle of the "roof" of the property at that address, hence the name "rooftop").
An interpolated result is when the location is estimated to be between two known points (hence the name "interpolated")
(The rooftop has Confidence Score: 1, while the interpolated result has Confidence Score: 0.9)
Rooftop always has Confidence Score = 1, while interpolated less then 1.
2525 Olympic ST, SPRINGFIELD, OR 97477 (Rooftop)
2524 Olympic ST, SPRINGFIELD, OR 97477 (Interpolated)
Canadian Parcel Plan/Block/Lot GeocodingNovember-30-2015:
Parcel Geocoding for Canada is now available using open data from NRCAN
North American Telephone Area Codes and Time ZonesNovember-20-2015:
We've incorporated North American Telephone Area Codes and Time Zones with reverse geocoding based on open data from UCLA.
For example: Area Code 506. These data is now provided in the API and Data Products such as the Canadian Postal Code Datasets.
TIGER/Line Shapefiles - New 2015 ShapefilesNovember-15-2015:
We have integrated the TIGER/Line Shapefiles - New 2015 Shapefiles into the US geocoding engine.
(We convert the Tiger data into SQL then process it into our data structures. If you want to download a SQL version of the latest Tiger Line go to our Free Data Download Page.)
We've also updated our zip code polygons. Click here for the complete zip code compass.
Geocoder is expanding to include Mexico. Starting with reverse geocoding for Mexico (beta), eg, http://geocoder.ca/19.4284665,-99.1685782?geoit=xml we will gradually implement all geocoding functionalities we currently offer for the USA and Canada, into Mexico, making geocoder.ca the most complete geocoding solution for North America (Canada, the United States and Mexico).
We are expanding our coverage to Europe, starting with Geocode.xyz, a free Geocoder for Spain based on free and open data, such as Spain's cadastral data released by Portal de la Direccion General del Catastro.
Amazon Cloud GeocoderAugust-3-2015:
We've partnered with Amazon Web Services to provide a cloud based geocoder.
AWS Marketplace Forward and Reverse Geocoder | API
FoodPages Neighborhood DataJuly-29-2015:
We have added the crowdsourced Foodpages.ca neighborhood data(for Canada) and crowdsourced Foodpage.us neighborhood data(for USA) into our main dataset.
(Available for download on foodpages.ca and foodpage.us)
Neighborhood and Postal Code PolygonsJune-19-2015:
We've cleaned up our postal code polygon data and fitted them nicely inside neighborhood polygons. Both datasets are now available for download.
(Some postal code polygons are very large, the largest of them all is J0K3K0. So, you will not find every postal code polygon inside some neighborhood polygon.)
We've merged our neighborhood data with the excellent dataset from quattroshapes (http://quattroshapes.com/) to give you the widest coverage of North American neighborhoods on our reverse geocoding port.
Free Data License UpgradeJune-11-2015:
We've upgraded our free data license to something less restrictive: This work is licensed under a Creative Commons Attribution 2.5 Canada License.
Here is what has kept us busy recently:
State of Geocoder.ca - More Features, Coverage & Accuracy
Expanding and improving our bulk geocoding service. It can now handle uploads of large files (>1M records), and it runs faster (by running the optimal number of parallel processes).
Try it. Address validation has been one of our earliest features (since 2005). We now return a "standard format" of an input location string with every query by default. This standardization has been expanded to other inputs (such as street intersections).
More data sources backing geocoder.ca. This is the most important addition. We have imported new datasets from municipalities (Toronto, Vancouver, Region of Peel, Niagara Falls, Halifax, etc). We now have an automated process that imports new data as they are released by these sources, converts them from their various cartographic projections to lat,lon and standardizes their address location entities. More Accurate Data = More Accurate Geocoding. And more coverage. (as an added bonus many municipalities also publish their full list of municipal addresses, including the postal codes.)
The Fun with Maps - Visualizing Data Geocoder.ca batch geocoding port converts your spreadsheet data into a map that can be shared publicly (or privately if you password-protect the map).
Use this tool to upload your files and visualize on a map anything that we can geocode.
Here are some datasets we have visualized using Batch Geocoding.
You may experiment with these maps by using our online mapping tool to set map styles and export maps in pdf or png formats.
A map visualizing the location of most frequent Geocoder.ca users (based on ip address)
Click to view IP Addresses in North America A map visualizing free street parking.
Click to view Free Street Parking in Toronto
(Calculated using the city of Toronto street parking public data and geocoder's street address database).
(Wait a few seconds for the maps to render. There are over 50,000 points in each sample)
Try our new fuzzy match algorithm: New Fuzzy Match 151 front st w, twrwntw, on or Intersection of Blue Ridge Parkway and Virginia Route 612, VA
Let's solve the problem of zip/postal/etc codes around the world. Let's invent a simple, intuitive schema that works everywhere - a zip code based on the latitude/longitude pairs of a place. For example, my zip code is P45x3691N75x7007 which translates neatly into 45.3691,-75.7007, the latitude longitude pair of my location. (also 4 digits after the decimal point are enough) PN, PP, NP, NN Codes for all
The letter 'P' means 'Positive' latitude and 'N' Negative longitude. So, for someone in the southern hemisphere 'P' becomes 'N', and so on. What could be so hard? It does have a few more symbols than the US Zip+9 code (7), however if we discount the repeating letters ('P','N' and 'x' representing the decimal point) the difference is only 3! Not bad! Considering nobody can possibly claim copyright over this scheme. Or can they?
PS. Anticipating widespread acceptance, geocoder is currently supporting this new scheme: Check out my Geo-Code P45x3691N75x7007
(suggestions, improvements, enhancements of this scheme are welcome. Everything in this site is licensed under the Creative Commons (CC) License.)
The Geocoding Confidence Score is a number representing our accuracy estimate on a geocoding request. This number ranges from 1 to 0. A higher score indicates a closer match (A score of 1 meaning best match accuracy.) A result with confidence score less than 0.5 is never returned to the user (it will most likely result in a suggestion being returned), except for ip address geocoding where rough approximations are allowed as in most cases we are looking at city level accuracy.
Geocoding Confidence Score
This number is now returned with every type of geocoding request. In the XML and Jsonp ports the new parameter is called confidence.
A recent random selection study of our ip address geocoding (which works only for ip addresses in North America), revealed that the estimation is about 80% accurate at city level, 50% accurate at street address level.
All the data used for ip address geocoding have been crowdsourced from the origin of accesses on our XML and Jsonp ports. If you have further questions or want to test this new service, let us know at email@example.com
Some Numbers on accuracy
Now available at SSL https://geocoder.ca
Rooftop geocoding is an often requested feature. We are rolling out a new version of the software for major Canadian and US cities, which defaults to rooftop geocoding wherever possible. For example: Rooftop Geocoding 1658 SILVERTREE, Ottawa, ON. The differences will often be subtle, with the rooftop option being the most accurate result compared to interpolation. A new parameter in the XML, Jsonp and CSV interfaces will mark the distinction in the result. Read the API documentation for more information.
Further Explanation - interpolation is the process of estimating where on the street a particular address location is. There are several upsides of the rooftop method, most importantly: The point will be properly placed on the middle of the roof of the given address as opposed to being placed in the middle of the road (which is often the case with interpolation.)
Another big advantage is better coverage. If a location belongs to a recently built house for instance, most current geocoders or gps devices will not find it, but the municipality has that data up to date because, well, everybody must pay the property tax.
Here is an example address that I was not able to find on other services 7636 GREEN VISTA GT , Niagara Falls, ON
The data for rooftop geocoding is provided by various open sources:
It is mostly pretty good data, but, in Canada it is missing postal codes (with a few notable exceptions such as
District of North Vancouver / http://geoweb.dnv.org/data/. ). If the response of a rooftop geocoding request includes a postal code, in most cases it has been estimated by our algorithm and it may not always be accurate. (this is an ongoing issue with all the geocoders I know of, google has even resorted to giving just the first three digits of the postal code on an address query, as their failure rate was reportedly over 50%)
The situation with the US data is slightly better, but still... Here is a recent blog post about that.
One last thing, you can also
geocode locations directly on a map. Enjoy!
Geocoding of USA locations remains our most used API with about 65% of all requests, in spite of the larger number of alternatives for accurate and high coverage US geocoding. Recently we ran a test to measure our coverage against a dataset of over 50 million USA addresses. Overall, about 94% of all addresses were geocoded with a level 1 confidence score, which is pretty good compared to the coverage of other major vendors. USA Address Geocoding Match Rates
We've added a new batch geocoding tool. You must have an Batch Geocoding Account and Log In to use this tool.
How it works: You upload a text only csv file -> We geocode and standardize the location information on each line -> You log in to download the processed file.
Batch geocoding is available for both USA locations, addresses and zip codes and for Canadian locations, addresses and postal codes.
A common problem with geocoding is ambiguity. The geocoding engine often makes probabilistic choices when it receives ambigous input, for example: Strict Mode Vancouver. However, when you don't want the geocoding engine to guess you may use the strict mode. Here is an example that gives the two possibilities that may be returned : http://geocoder.ca/?locate=vancouver&geoit=xml&standard=1&topmatches=1&strictmode=1
Use the strict mode whenever possible. Or include the province or state name or abbreviation to remove the ambiguity.
Ambiguity in input. Several possible cities match your query.
I pointed out this example because it concerns two cities of similar size that are in close geographical proximity.
An even better example of ambiguity is Springfield.
Geocoding of IP addresses is an oft requested feature. We now have collected sufficient data to generate our own ip geolocation database. As a result we are releasing a limited version of ip location services for Canada and the USA. (Sorry to those looking for more coverage, We have no data for countries outside of North America) Geocoding IP addresses
While this is still an experimental feature, initial tests seem promissing. Give it a try. We are looking forward to your feedback. (Just enter an ip address in the location parameter. The API remains the same as in the reverse geocoding API)
Web Example | XML Example | Jsonp Example
A new version of the software has been released fixing many bugs, mostly in reverse geocoding. The new version also implements the "agent-based paradigm Major Upgrade " of geocoding for better match rates in locations described by street addresses or street intersections.
We have also integrated the latest version of Statistics Canada Road Network Files. (released on 2013-07-22)
We have released our Crowdsourced Canadian Postal Code Polygon File crowdsourced postal code polygon file under the terms of the Open Database License (ODbL)
This is one of our most requested new features.
Neighborhood Boundaries and Polygons
For now it will only work for the Toronto, Montreal and Vancouver greater metropolitan areas.
(for example: http://geocoder.ca/?moreinfo=1&latt=43.645119&longt=79.383839&reverse=Reverse+GeoCode+it%21 once there click on boundary to see the neighborhood polygon boundary.)
This feature is also available on the premium XML interface for the time being.
We have made our zip and postal code polygons available.
Lookup a zip code (USA) or postal code (Canada) and click on the code to view the polygon.
Zip & Postal Code Polygons
Geocoder.ca public web page has a new look!
New datasets from more Coverage / New Datasets Statistics Canada have been merged with the existing database.
JSONp has been extended to support zip+4 codes in USA and also improved on existing functionality.
JSON has been added as an optional output option.
You can now perform batch lookups (standalone server only), retrieve all streets that cross a particular street, and much more... See our documentation section.
New datasets integrated into our software
June-06-2007: We have released a new standalone geocoding solution, emulating all the functionality of geocoder.ca in a closed network environment as a Turn-key solution.
Turn-key standalone geocoder for Canada and the USA
April-29-2007: Many improvements & additions
In addition to new functionality we are re-designing parts of the algorithm to improve performance. Stay tuned.
We have improved the coverage and quality of our geocoding engine by recognizing non-standard city names (for eg: NYC)
The reverse geocoding engine also returns the road or highway having the smallest vertical distance from the input point in addition to the nearest street civic address. (for eg: This Point)
We are also adding more reverse geocoding info such as returning the closest two streets on either side of a given location. This feature is still in beta.
March-21-2007: The reverse geocoder for the USA has been extended to return the county and metro area information.
More info for Reverse Geocoding USA
Dec-14-2006: The reverse geocoder has been extended to return the nearest major street intersection to a given point, in addition to the nearest intersection for both the US and Canada.
Additions to Reverse Geocoding for Canada and USA
Sept-24-2006: We have added reverse geocoding functionality for the United States, expanding our coverage to most of North America (except Mexico). Similarly Forward geocoding for the US works as advertised on the API pages.
Reverse Geocoding and Forward Geocoding for USA
Sept-07-2006: Now you can enter a partial address and obtain a predefined number of suggestions. See the Multiple suggestions for the xml port API page for details.
June-22-2006: The suggestion system now works for street intersection queries also. In case one or two mistyped street names are entered and the system can not find a fairly close match, then a suggestion may be returned with the correct spelling of both streets. See the Suggestion System Addition for mis-spelled street names API page for details.
June-19-2006: A new redundant backup server has been installed. Now the geocoder service is backed up at two geographically distinct locations. The address of the backup server is backup-geocoder.ca
More Backup and redundancy
June-16-2006: The Reverse Geocoding Addition reverse geocoder can now return the nearest street corner for a latitude longitude pair.
If you are using the xml interface you must send the corner=1 parameter to obtain nearby street intersection information. See the API documentation for more informations.
March-25-2006: You can now locate the More Intersections intersection of Highways with other Highways or streets.
Feb-25-2006: We now return the standard address formating with a successful request. If you use the xml api to obtain geocodes, use the parameter standard=1 to obtain a properly formated address too. See the Standard Address Formating API page for details.
Feb-01-2006: Use our Alternate Spelling of road names and city names suggestion system. If we can not find a mis-spelled location we will provide a sugestion with the correct spelling.
Jan-21-2006: We have added a backup service at an independent location from our main isp (backup-geocoder.ca) . What that means is that in the unlikely event of the main server (geocoder.ca) being offline, you can redirect your queries to the redundant backup service: NEW! - Backup and Redundancy of the real-time xml port
Note: The backup location will _only_ process queries in the event of the main service being off-line. Otherwise it will simply re-direct your query to geocoder.ca. This is only intended to give more "peace of mind" for those who wish to integrate their applications in real-time with our geocoding port, and are worried of the eventuality of a service outage. In case you want to use this backup service you need to code a few extra lines of code into your application to call the backup service in case your application receives a networking error from the main service. _DO NOT_ use the backup service in case the main service can not locate a particular address. Both the main service and the backup service are identical in that respect!
Contact us if you need further explanation.
Oct-01-2005: Now you can also Cross Street Geocoding! geocode street intersections.
Sept-11-2005: Due to popular demand we have added NEW! reverse geocoding.