Monday, June 23, 2014

187 million taxi rides


2013 taxi data from NYC yields some interesting insights.
https://www.mapbox.com/blog/nyc-taxi/

The discussion on hackernews was interesting. https://news.ycombinator.com/item?id=7926358 and revealed some fascinating insights. The current top comment offers some insights into how this data could be manipulated to find out, say, who is attending what bars and when... https://news.ycombinator.com/item?id=7927034

Reddit also picked up some interesting bits of info from the data including what percentage of people tip, and how much: http://www.reddit.com/r/bigquery/comments/28ialf/173_million_2013_nyc_taxi_rides_shared_on_bigquery/

89,092,521 (47.57%) left no tip, but... "Cash tips are easier for cab drivers to "forget" to report, so even though data suggests people tip more when paying by card (where the tip presets start at 20%!), drivers still prefer an under-the-table tip."

From the reddit thread, someone linked a detailed yet approachable article about how the data may not be very anonymous: https://medium.com/@vijayp/of-taxis-and-rainbows-f6bc289679a1

> It took a while longer to de-anonymize the entire dataset, but thanks to Yelp’s MRJob, I ran a map-reduce over about 10 computers on EMR and had it done within an hour. 

Interesting stuff!

No comments:

Post a Comment

Thank you for sharing your thoughts.