Dataxi Science

A Data Science approach to analyse and predict Medallion Taxi Demand

Jishu Basak


Executive Summary

There are roughly 200 million taxi rides in New York City each year. Exploiting an understanding of taxi supply and demand could increase the efficiency of the city’s taxi system. In the New York city, people use taxi, specifically 'Yellow Taxi' in a frequency much higher than any other cities of US. Instead of booking a taxi by phone one day ahead of time, New York taxi drivers pick up passengers on street. The ability to predict taxi ridership could present valuable insights to city planners and taxi dispatchers in answering questions such as how to position cabs where they are most needed, how many taxis to dispatch, and how ridership varies over time.This mini project focuses on predicting the number of yellow taxi pickups given a one-hour time window and a location within New York City.

  • After Data Analysis, I found out that Manhattan(Borough), JFK Airport and LaGuardia Airport are the most demanding place when it comes to both, Passenger Count and Total amount.
  • Evenings are the busiest and Mornings, specially around 5AM are silent when it comes to demand.
  • Gradient Boosting Regressor out performs in predicting Passenger Demand and Total Gross revenue demand based on Date/Time and Pickup location.

Problem Statement

There are roughly 200 million taxi rides in New York City each year. Exploiting an understanding of taxi supply and demand could increase the efficiency of the city's taxi system. This mini project focuses on predicting the passenger demand and total gross revenue demand given a the given time (one-hour time window) and a location within New York City. In general this project aims to answer how can Data Science address the pitfall in demand of Medallion Taxi in New York City.