July 06, 2015

Planning for Relocation - Phase II

Geocoding itself was actually straight forward. Google’s Map APIs are surprisingly easy to use and have the advantage of not requiring an API key to use (Unlike their Bingified competition), so getting the geocoding data was not really a problem.

However, since I’m going to be going this with an actual web browser (Since I’ll be getting employment locations in text in all likelihood, I’ll probably need redefine the phases as:

  • Phase I: Actually get the school data for the US. Since I’m not likely to move to the EU, we can skip that for now.
  • Phase II: Geocode the schools.
  • Phase III: Build a fun Chrome plugin that {can load the geo data | preloads the geo data} and {shove them into a kd-tree or some other distance-query friendly structure | brute force all the things} to finding the nearest school to a given relocation city, returning the distance of the best match (in time to drive, mi. or whatever)

Also, since I had the data anyway, I decided to do something a little more interesting: Actually generate maps of the schools as is.

I’ve never actually seen a real distribution of where the current schools are located, and, although I know that the SF Bay does have a high concentration of schools (Because I needed to find a school to play after I started work traveling back to Mountain View) and that there’s probably a similar effect near Houston,

Once I started digging into it, though, the static map APIs for Bing and Google both are limited in what they can provide:

  • Google’s Static Map API is limited to request URLs of 2048 characters. Doing some comical math ends up meaning that I’m guaranteed 65 markers. Beyond that, I would need to recheck string lengths to squeeze out the maximum value, or if I was feeling particularly fanciful, throw it all away to use an approximation algorithm for binpacking.
  • Bing’s Imagery REST API is limited to 18 pushpins for requests using GET, but will do up to 100 using a POST and request body. However, the fact that you need to specify either the center point or bounding box manually coupled with a zoom level is… undesirable.

Since there’s 106 schools, I’m pretty much resigned to using two maps. So, because it’s easier, Google wins again. Bing, I am disappoint.

The results are pretty self-explanatory, but some of the things that I immediately notice are:

  • Schools cluster a lot more than I thought. The OK/AR cluster I wouldn’t have expected at all, and there’s a handful of others as well. To be sure, there is a small handful of people who own two schools, but that’s definitely not the norm.
  • I’ve always heard that WKSA considers the MN school somewhat isolated. Which although true, has nothing on Montana, which has a solid one-state buffer from everyone else (Although there are a few Canadian schools). It ends up having the unfortunate honor of being the only school on Mountain time.
  • Sault Ste. Marie, MI looks like a bug at first glance, but it’s a real town. I feel warm and fuzzy knowing it’s there.
  • The mountainous regions are surprisingly barren. Even places that I would expect people to eventually migrate to from tech community to tech community (There are a ton of schools in SF and two in Austin) have zero presence (e.g. Seattle, Boulder).
  • From the school list, I always kinda assumed that the eastern seaboard didn’t have many schools at all. But that perception seems based on being split across 3 of the 4 regional designations, because there is a fairly nice spread.

And for some pretty pictures:

KSW School Map 1

KSW School Map 2

As a side note, if I ever add more regions, the maps would probably vary so much in zoom that simply going by limit would no longer make sense (The likely scenarios being mixed US / Korea and Korea / UK sets, among others). However, before I do that I would probably re-export the data to include higher-level region designation for the US (e.g. Central, Southern) and work it that way. Korea would be similar, since it has an eye count of ~100 schools, but the rest should be okay since the UK doesn’t have the density of Korea.