With machine learning, you’ll know every city as if you grew up there

Nathan Rooy

•

September 28, 2017

Last updated

Data science

How do we know what an area is like before going there? We could spend hours researching on the internet resulting in dozens of open browser tabs and a vague understanding of what an area might be like.

At Spatial, we’re die hard urbanists. We love everything about cities. From the unique places to get food, plentiful chances to catch live music, green spaces, parks, and everything in-between. But we haven’t lived in every city, therefore we couldn’t possibly tell you where to find the best sushi joint or dog park. Between the nine people currently at Spatial, these are the cities we have lived in within the past ten years.

So what does this chart mean? Yes, Cincinnati is amongst the greatest cities on earth, but that’s besides the point. What should be obvious is that, we have spent a lot of time in Cincinnati. Most of us live downtown while a few are spread out in nearby neighborhoods. Collectively, we know Cincinnati inside and out. Wouldn’t it be great if you could somehow leverage local knowledge to better understand an area other than Cincinnati we wanted to explore? Or, more generally, if we could somehow map knowledge of a city you knew very well, to a city you know nothing about. Enter the universe of vector space…

Word vectors are nothing new. They have existed within the world of machine learning and natural language processing (NLP) for several years and come in many different flavors. All of them however have a somewhat similar objective which is to vectorize a string of text in a way that preserves word associations and meanings. Some interesting usage of word vectors can be found here:

The outcome of all this is that we can then perform vector operations on natural language. The most common word vector arithmetic example being:

King — Man + Woman = Queen

What does the above statement mean? It means that when you start with the word vector for “King” and subtract the word vector for “Man” and add the word vector for “Woman” the resulting vector is most similar to the word vector for “Queen”. With word vectors it’s possible to perform math on natural language!

If you’re interested in furthering your word vector knowledge, check out these great resources:

Now that you have a vague understanding of what word vectors are, let’s get back to the original issue. How do we map local knowledge of an area onto a new area without having been there before? Well because we’re Spatial, and were interested in understanding humans from a data centric viewpoint, we get creative. At our disposal we have a vast number of point of interest (POI) locations along with terabytes of geotagged social media posts associated with these POIs. Could we vectorize the millions of geotagged tweets, social media posts, blog review, etc onto every POI on the planet? Yes, and that’s exactly what we did.

So, how do we know any if this witchcraft is even remotely correct? Simple, lets test it on the other cities we know. For instance, I did my undergrad here in Cincy so that’s five years. I also worked in Detroit for almost four years, so let’s work with these two cities.

So let’s take this for a spin. Eli’s BBQ is a popular bbq spot in Cincinnati. It’s a unique, small, casual spot that caters to the local crowd. What is the Eli’s BBQ of Detroit?

According to vector math, it’s Slows Bar-B-Q. After living in Cincinnati for 5 years and Detroit for almost 4 years this suggestion couldn’t be any more on point. Did we just map local knowledge of one area onto a completely different and unknown area?

Why does this work? It works because the tweets, the social media posts, everything associated with Eli’s BBQ that exists on the internet is most similar to Slows BBQ in Detroit. Both of these places are small, casual, high quality, unique, local spots. When people are talking about Eli’s or Slows, they talk differently than if they were at a bbq food truck, or if they are at a stuffy, expensive bbq restaurant. It’s the words used, the context, and the associations between words that capture the unique essence of every POI. J. R. Firth sums it up nicely:

You shall know a word by the company it keeps -J. R. Firth

Let’s try another. One of my favorite coffee spots in Cincinnati is the Collective Espresso in Northside. It’s a low key, quiet cafe that I often frequent on my bike ride home from the office.

This is almost comical at this point because Astro Coffee was my go-to spot while I was living in Detroit. Astro Coffee and Collective Espresso both have almost identical atmospheres, music selection, food offerings, and customers.

So let’s switch gears and try something else now. Instead of finding similar POI vectors based off a single input poi vector input, lets find similar poi vectors based off a single word vector input. We can do this because poi vectors have the same dimensions as the word vectors which allows for a simple comparison. Using the vector for “sushi”, let’s find the most similar POIs in Cincinnati:

Not bad! The POI vectors that are most similar to the word vector for “sushi” are all unsurprisingly sushi restaurants. As another example we could try searching for “sunsets” as a way to find the areas in Cincinnati which are most associated with sunsets.

Seems that Roebling Bridge is most associated with the word “sunsets” in Cincinnati. Looks like another win for vector math!

The other suggestions include parks located on top of Cincinnati’s many hills as well as various rooftop bars which all make great locations for viewing the sunset.

Now let’s reverse this so instead of inputting a word vector and getting a list of POI vectors, let’s input a POI vector and find the most similar word vectors.

These are the words most associated with the Cincinnati Zoo according to social media. For those wondering why #harambe didn’t make an appearance, it’s because most people who use that hashtag aren’t producing geotagged social media posts within the geographic footprint of the Cincinnati Zoo.

Hopefully this post got you excited about word vectors and what’s possible. This is just another example of the many tools in the Spatial tech stack developed for pushing the boundaries of machine learning and spatial awareness. Thanks for reading!

Nathan Rooy

Machine Learning, Spatial.ai