Saturday, March 10, 2012

Location Geocoding (wonky, as Krugman would say)

Do any of you venuologists (or anyone else) have any ideas about how best to codify the geospatial data attached to the locations we love?

So let's say, hypothetically, I have a giant spreadsheet full of concerts and such. There are pretty well coded temporal data. For many of the events of interest, I have address, city, state and zip.

What's the best, quickest, and most fungible way to code in geospatial data? I guess I am thinking GPS coordinates.

Related, I just tried the following process, with the following results.

1268 Sutter Street, San Francisco, CA, 94109

Google Maps: http://maps.google.com/maps?q=1268+Sutter+Street,+San+Francisco,+CA&hl=en&ll=37.787624,-122.421319&spn=0.010582,0.026157&sll=37.788929,-122.421319&sspn=0.010581,0.026157&oq=1268+Sutt&hnear=1268+Sutter+St,+San+Francisco,+California+94109&t=h&z=16

GPS Visualizer: http://www.gpsvisualizer.com/geocode, returns latitude-longitude of 37.787619, -122.4213121.

When I plug that back into Google Maps, it takes me to 1244 Sutter Street:

Ultimately, I am interested in street addresses. But since some of them may no longer exist, it seems to me that some kind of geospatial coding is preferable. But I am worried that on my very first look at this, I get the kind of result above, one which could really mess me up.

Help?


7 comments:

  1. I think it depends on what you want to do with the resultant lat/lon data. Are you looking just to display maps? You can process the data through a service like http://batchgeo.com/. You can also send me the data and I will code it up for you.

    What you saw in the google service is not that uncommon. That is one of those interesting little issues with location. When using street data, the location is interpolated down the segment. Say I have a segment of 100-198 and I have an address of 148. Location is determined by taking the length of the segment and moving down 49% from the 100 endpoint. When you plug that resultant lat/lon in the system, you may get back 148 or something near it. Your example of 1268 and 1244 is kind of disparate but could be related to the data on that segment.

    --sk

    ReplyDelete
  2. Thank you, sk. And thanks for the offer of help. I'll probably take you up on it.

    I am torn between different purposes. The cultural/social history of it demands the address as it was at the time of the event. So, 1268 Sutter Street was the location of the Avalon Ballroom, and I want to be able to say that that's where, e.g., the GD were on 10/12/68 or whatever.

    But of course city grids and addressing schemes change, buildings get torn down, streets get reconfigured, etc., so a second criterion is the actual physical location. Think of this as GPS coordinates that would allow me to walk to the spot.

    The eventual possible usages are myriad, but mostly various kinds of mapping. Dynamic tour maps are one possibility. Being able to do some density maps (for example, a map of the Bay Area with venues represented as circles, and the size of the circle indicating the frequency with which Garcia appeared there). Etc. You general spatial and mapping kinds of stuff.

    Thanks again!

    ReplyDelete
  3. Geocoding error is a common problem even with current address data. With historical data the error rate will be higher. I'm also willing to help with the geocoding. Actually, grabbing coordinates from 2 sources might be a good way to validate accuracy. Let me know if you'd like me to do some geocoding.

    I could also help with the map animation if that's of interest. For a sample map animation take a look at this post:

    http://justinholman.com/2012/03/07/visualizing-unemployment-dynamics/

    The example above involves county level data but it wouldn't be difficult to do something similar with point data.

    Cheers!
    Justin

    ReplyDelete
  4. Justin, this is really helpful and generous. Thank you!

    I will indeed also take you up on your offer to help.

    I am out of town for a few weeks, which may mean I have no time for all of this, or lots of time. I just can't tell yet. Probably closer to the former than to the latter. But I do hope to be in touch ca. first week of April at the latest.

    Thanks again!

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. "...a map of the Bay Area with venues represented as circles, and the size of the circle indicating the frequency with which Garcia appeared there)."

    You want GE Graph. GE Graph allows you to upload an excel spread sheet with labels (dates, venue name, whatever), data (number of times played) and lat lon. Then you can choose to present the data in Google Earth as circles/polygons at the venue location of varying sizes and/or colors based on number of shows. You can also create 3-d columns with heights based on number.
    http://www.sgrillo.net/googleearth/gegraph.htm

    ReplyDelete
  7. Wonderful, thank you!

    I will follow up in a few weeks, IM.

    ReplyDelete

!Thank you for joining the conversation!