In this unit, you'll import Matplotlib into the notebook you've been working with and configure the notebook to support inline Matplotlib output.
%matplotlib inline import matplotlib.pyplot as plt import seaborn as sns sns.set()
The first statement is one of several magic commands supported by the Python kernel that you selected when you created the notebook. It enables Jupyter to render Matplotlib output in a notebook without making repeated calls to show. And it must appear before any references to Matplotlib itself. The final statement configures Seaborn to enhance the output from Matplotlib.
from sklearn.metrics import roc_curve fpr, tpr, _ = roc_curve(test_y, probabilities[:, 1]) plt.plot(fpr, tpr) plt.plot([0, 1], [0, 1], color='grey', lw=1, linestyle='--') plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate')
The dotted line in the middle of the graph represents a 50-50 chance of obtaining a correct answer. The blue curve represents the accuracy of your model. More importantly, the fact that this chart appears at all demonstrates that you can use Matplotlib in a Jupyter notebook.
The reason you built a machine-learning model is to predict whether a flight will arrive on time or late. In this exercise, you'll write a Python function that calls the machine-learning model you built in the previous lab to compute the likelihood that a flight will be on time. Then you'll use the function to analyze several flights.
def predict_delay(departure_date_time, origin, destination): from datetime import datetime try: departure_date_time_parsed = datetime.strptime(departure_date_time, '%d/%m/%Y %H:%M:%S') except ValueError as e: return 'Error parsing date/time - {}'.format(e) month = departure_date_time_parsed.month day = departure_date_time_parsed.day day_of_week = departure_date_time_parsed.isoweekday() hour = departure_date_time_parsed.hour origin = origin.upper() destination = destination.upper() input = [{'MONTH': month, 'DAY': day, 'DAY_OF_WEEK': day_of_week, 'CRS_DEP_TIME': hour, 'ORIGIN_ATL': 1 if origin == 'ATL' else 0, 'ORIGIN_DTW': 1 if origin == 'DTW' else 0, 'ORIGIN_JFK': 1 if origin == 'JFK' else 0, 'ORIGIN_MSP': 1 if origin == 'MSP' else 0, 'ORIGIN_SEA': 1 if origin == 'SEA' else 0, 'DEST_ATL': 1 if destination == 'ATL' else 0, 'DEST_DTW': 1 if destination == 'DTW' else 0, 'DEST_JFK': 1 if destination == 'JFK' else 0, 'DEST_MSP': 1 if destination == 'MSP' else 0, 'DEST_SEA': 1 if destination == 'SEA' else 0 }] return model.predict_proba(pd.DataFrame(input))[0][0]
This function takes as input a date and time, an origin airport code, and a destination airport code, and returns a value between 0.0 and 1.0 indicating the probability that the flight will arrive at its destination on time. It uses the machine-learning model you built in the previous lab to compute the probability. And to call the model, it passes a DataFrame containing the input values to predict_proba. The structure of the DataFrame exactly matches the structure of the DataFrame we used earlier.
predict_delay('1/10/2018 21:45:00', 'JFK', 'ATL')
Confirm that the output shows that the likelihood of an on-time arrival is 60%:
predict_delay('2/10/2018 21:45:00', 'JFK', 'ATL')
How likely is this flight to arrive on time? If your travel plans were flexible, would you consider postponing your trip for one day?
predict_delay('2/10/2018 10:00:00', 'ATL', 'SEA')
Is this flight likely to arrive on time?
You now have an easy way to predict, with a single line of code, whether a flight is likely to be on time or late. Feel free to experiment with other dates, times, origins, and destinations. But keep in mind that the results are only meaningful for the airport codes ATL, DTW, JFK, MSP, and SEA because those are the only airport codes the model was trained with.
import numpy as np labels = ('Oct 1', 'Oct 2', 'Oct 3', 'Oct 4', 'Oct 5', 'Oct 6', 'Oct 7') values = (predict_delay('1/10/2018 21:45:00', 'JFK', 'ATL'), predict_delay('2/10/2018 21:45:00', 'JFK', 'ATL'), predict_delay('3/10/2018 21:45:00', 'JFK', 'ATL'), predict_delay('4/10/2018 21:45:00', 'JFK', 'ATL'), predict_delay('5/10/2018 21:45:00', 'JFK', 'ATL'), predict_delay('6/10/2018 21:45:00', 'JFK', 'ATL'), predict_delay('7/10/2018 21:45:00', 'JFK', 'ATL')) alabels = np.arange(len(labels)) plt.bar(alabels, values, align='center', alpha=0.5) plt.xticks(alabels, labels) plt.ylabel('Probability of On-Time Arrival') plt.ylim((0.0, 1.0))
If you are new to Matplotlib and would like to learn more about it, you will find an excellent tutorial at https://www.labri.fr/perso/nrougier/teaching/matplotlib/. There is much more to Matplotlib than what was shown here, which is one reason why it is so popular in the Python community.
Next unit: Summary
Next