Skip to content

Using Google Maps for Geocoding locations

August 17, 2011

If you need coordinates of a thousand addresses, don’t rush writing your own service, you can use Google for it. Extend URL http://maps.google.com/maps/geo?q= with the address you want to have WGS-84 coordinates for and just read XML.

I wanted to geocode 9000 locations, so I  have written a script to do it using the above mentioned URL.

The main principles for it were:

  • As an input, it had to read CSV file, with first column as ID, second column as Address
  • As an output, it had to write CSV file, in first column ID, second and third columns – coordinates
  • If there were two or more results for address, such address should not be geocoded as there might be issues with it:
    if pageXML.find('"id": "p2"') == -1:
  • It should continue until there are no more changes. Google sometimes rejects request and gives no info:
    if pageXML.find('"id": "p1"') != -1:

    and if it happens, such address should be geocoded again. For that I had a loop which would continue until there are no more changes, keeped track by line count:

    currentLineCount != newFileLineCount

Here is a very hardcoded source code for it:

import urllib2
import os

currentLineCount = 0
newFileLineCount = 0
first = 1
fInput = open('C:\\EST_Addresses.csv', 'r')
fOutput = open('C:\\EST_Addresses_xy.csv', 'w')
fTempFile = open('C:\\EST_Addresses_temp.csv', 'w')

i = 1
j = 1
while (currentLineCount != newFileLineCount) | first:

        currentLineCount = newFileLineCount
        newFileLineCount = 0
        if (not first):
            fInput = open('C:\\EST_Addresses_skipped.csv', 'r')
            fOutput = open('C:\\EST_Addresses_xy.csv', 'a')
            fTempFile = open('C:\\EST_Addresses_temp.csv', 'w')

        j = 1
        for line in fInput:
            try:
                targetID = line.split(';')[0]
                address = line.split(';')[1]
                address = address.replace(' ', '%20')
                url = "http://maps.google.com/maps/geo?q=" + address

                page = urllib2.urlopen(url)
                pageXML = page.read()
                page.close()

                if pageXML.find('"id": "p1"') != -1:
                    if pageXML.find('"id": "p2"') == -1:
                        skip = len('"coordinates": [ ')
                        coords = pageXML[pageXML.find('"coordinates": [ ') + skip:  pageXML.find(', 0 ]')]
                        xCoord, yCoord = (coords.replace(' ', '')).split(',')
                        fOutput.write(targetID+ ';' + str(xCoord) + ';' + str(yCoord) + '\n')
                        print "input added! " + str(j)
                        j+= 1
                else:
                    fTempFile.write(line)
                    newFileLineCount += 1
            except:
                print line
                raise

        first = 0
        fInput.close()
        fOutput.close()
        fTempFile.close()
        if os.path.exists('C:\\EST_Addresses_skipped.csv'):
            os.remove('C:\\EST_Addresses_skipped.csv')
        os.rename('C:\\EST_Addresses_temp.csv', 'C:\\EST_Addrsses_skipped.csv')

        print "iteration " + str(i) + " has " + str(newFileLineCount) + ' ungeocoded targets left'
        i += 1
Advertisements
2 Comments leave one →
  1. Vytautas permalink
    September 7, 2011 17:21

    Google will block you after some time, try not to abuse it

    • September 18, 2011 10:15

      That’s true, but ban is temporary — it will be removed after 24 hours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: