Module 05 — Self Driving Cars, Case Study Discussion

Questions

You're at a strategy meeting with the stakeholders. They want to make sure you have the data required to answer the questions they're most interested in.

Be prepared to answer the following questions:

Network Architecture

Karl, Head of AI

Obviously you'll be using a convolutional neural network to build your model, but will you be using an existing architecture as a starting point, or do you think it'll be better to design your own?

Based on your initial analysis of the data, your team feels:

VGG-16 would be a good architecture to use on this data.
ResNet would be a good architecture to use on this data.
Inception would be a good architecture to use on this data.
For best results, we should design a custom architecture.

Preprocessing

Johnny, Data Science Intern

The training and test images have three color channels, (Red, Green, and Blue), with pixel values for each channel ranging from 0 to 255.
Do you think we need to do any preprocessing before using the data to train the model?

Based on your initial analysis of the data, your team feels:

The RGB pixel values should be fine with that scale, since they're all the same.
We should use rescale the pixel values down to the range of 0 to 1.
We should use rescale the pixel values down to the range of -1 to 1.
We should convert the image to grayscale, giving us a single color channel to work with.

Data Augmentation

Emma, CEO of GehirnWagen

I'm concerned that the model will only be able to recognize signs that look exactly like the ones we have images for.
I understand from Johnny that data augmentation can help with this problem. What strategy would you suggest?

Based on your initial analysis of the data, your team feels:

Data augmentation is not necessary due to the number of different training images.
We should use a data augmentation strategy on the data prior to loading it into the model.
We should use a data augmentation strategy that is based on model preprocessing layers.

Model Evaluation

Johnny, Data Science Intern

This seems like one of those cases where straight accuracy might not be the best metric for model evaluation, but what do you think?

Based on your initial analysis of the data, your team feels:

Accuracy is still the best metric here.
The $F_1$ score is the best option for this problem.
We should probably use logarithmic loss, just to be safe.