A blog about the things I find interesting including, but not limited to, mathematics, education policy, data visualization, and juggling.
Monday, June 23, 2014
187 million taxi rides
2013 taxi data from NYC yields some interesting insights.
https://www.mapbox.com/blog/nyc-taxi/
The discussion on hackernews was interesting. https://news.ycombinator.com/item?id=7926358 and revealed some fascinating insights. The current top comment offers some insights into how this data could be manipulated to find out, say, who is attending what bars and when... https://news.ycombinator.com/item?id=7927034
Reddit also picked up some interesting bits of info from the data including what percentage of people tip, and how much: http://www.reddit.com/r/bigquery/comments/28ialf/173_million_2013_nyc_taxi_rides_shared_on_bigquery/
From the reddit thread, someone linked a detailed yet approachable article about how the data may not be very anonymous: https://medium.com/@vijayp/of-taxis-and-rainbows-f6bc289679a1
> It took a while longer to de-anonymize the entire dataset, but thanks to Yelp’s MRJob, I ran a map-reduce over about 10 computers on EMR and had it done within an hour.
Interesting stuff!
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Thank you for sharing your thoughts.