Preparation Reading 04: Machine Learning Models

Overview

In this reading you'll learn about the k-Nearest Neighbors algorithm, as well as how to evaluate machine learning models.

Estimated Reading Time

Plan on around 60 to 90 minutes for this preparation reading, which consists of textbook reading.

Reading

Reading the Text

As you read the algorithm chapters in the textbook, you may find it useful to start with the chapter summary at the end of the chapter.

This summary provides an overview of the key concepts presented in the chapter, and shows how each is connected together.

Complete the following preparation reading:

Read pages 179 - 195 of your text (Chapter 5 until section 5.4.2), which introduces Similarity-based learning and the nearest neighbors family of algorithms.
Read pages 204 - 212 of your text (sections 5.4.3 and 5.4.4), which discusses some things to consider when using a nearest neighbor algorithm.
Read Pages 397 - 413 of your text, which describes ways to verify how well a machine learning model works. (Chapter 8 until section 8.4.2)

Extra Help

Below you'll find some optional videos and other resources that help supplement the reading.

You should absolutely still do the reading above. One technique would be to read the text, paying particular attention to new concepts (usually written in bold), then research those concepts using videos or other articles until you're confident you understand them. Afterwards, circle back to the text to pick up extra details you might have missed the first time.

Learning Complex Technical Information

Reading technical information can be difficult and is an acquired skill that you absolutely should develop if you're planning to work in data science. New research papers and algorithms are released constantly in this field that require you to parse through information and formulas.

This helps you to not only understand how the algorithm works, but which types of problems the algorithm would and would not not be suited for.

However, sometimes it's nice to have a different perspective. Some people learn better visually, through videos, interactively, or by example.

In some cases, a superficial understanding of an algorithm and its parameters may be good enough for what you need to do. But you'll always benefit from a deeper understanding of how the tools and algorithms you're using actually work, and the reasons they behave better in some situations than others.

Model Evaluation

About Train, Validation and Test Sets in Machine Learning is an article that explains why we split the dataset up to evaluate our machine learning models.
Accuracy, Precision, Recall or F1? explains a bit about each of these performance metrics, and when it is best to use each one.