Using Google Maps for Geocoding locations
If you need coordinates of a thousand addresses, don’t rush writing your own service, you can use Google for it. Extend URL http://maps.google.com/maps/geo?q= with the address you want to have WGS-84 coordinates for and just read XML.
I wanted to geocode 9000 locations, so I have written a script to do it using the above mentioned URL.
The main principles for it were:
- As an input, it had to read CSV file, with first column as ID, second column as Address
- As an output, it had to write CSV file, in first column ID, second and third columns – coordinates
- If there were two or more results for address, such address should not be geocoded as there might be issues with it:
if pageXML.find('"id": "p2"') == -1:
- It should continue until there are no more changes. Google sometimes rejects request and gives no info:
if pageXML.find('"id": "p1"') != -1:
and if it happens, such address should be geocoded again. For that I had a loop which would continue until there are no more changes, keeped track by line count:
currentLineCount != newFileLineCount
Here is a very hardcoded source code for it:
import urllib2 import os currentLineCount = 0 newFileLineCount = 0 first = 1 fInput = open('C:\\EST_Addresses.csv', 'r') fOutput = open('C:\\EST_Addresses_xy.csv', 'w') fTempFile = open('C:\\EST_Addresses_temp.csv', 'w') i = 1 j = 1 while (currentLineCount != newFileLineCount) | first: currentLineCount = newFileLineCount newFileLineCount = 0 if (not first): fInput = open('C:\\EST_Addresses_skipped.csv', 'r') fOutput = open('C:\\EST_Addresses_xy.csv', 'a') fTempFile = open('C:\\EST_Addresses_temp.csv', 'w') j = 1 for line in fInput: try: targetID = line.split(';') address = line.split(';') address = address.replace(' ', '%20') url = "http://maps.google.com/maps/geo?q=" + address page = urllib2.urlopen(url) pageXML = page.read() page.close() if pageXML.find('"id": "p1"') != -1: if pageXML.find('"id": "p2"') == -1: skip = len('"coordinates": [ ') coords = pageXML[pageXML.find('"coordinates": [ ') + skip: pageXML.find(', 0 ]')] xCoord, yCoord = (coords.replace(' ', '')).split(',') fOutput.write(targetID+ ';' + str(xCoord) + ';' + str(yCoord) + '\n') print "input added! " + str(j) j+= 1 else: fTempFile.write(line) newFileLineCount += 1 except: print line raise first = 0 fInput.close() fOutput.close() fTempFile.close() if os.path.exists('C:\\EST_Addresses_skipped.csv'): os.remove('C:\\EST_Addresses_skipped.csv') os.rename('C:\\EST_Addresses_temp.csv', 'C:\\EST_Addrsses_skipped.csv') print "iteration " + str(i) + " has " + str(newFileLineCount) + ' ungeocoded targets left' i += 1