Machine Learning & Data Analytics

Programming Resources

Programming Resources

The assignments in this course require a fair amount of Python coding, as well as the use of a few popular Python-based data science tools.

Agency Based Education

It's expected that you'll either know some Python coding coming in to this course, or be able to quickly come up to speed largely on your own. While you don't need to be an expert, or even highly skilled, the more you know about Python coding, the better off you'll be.

In addition, we'll be using a variety of machine learning and data science libraries and frameworks that you may or may not have experience with.

We don't spend much time in this class explicitly learning Python or these tools, for a few reasons:

However, we have put together this guide, as well as a separate guide with tips for reading technical documentation that you may find useful.

Python 2 vs 3

Python went through a major revision a few years ago. In this course, we use Python 3. You may find tutorials for (or already know) Python 2.

For the purposes of this course, there really aren't that many differences you have to think about.

You can see a good summary of the most important differences here.

Python

Google tells me there are 387 Million hits for Python Tutorial, so obviously you have a lot of options, from the official Python tutorial to this 11 hour video course.

Students often wonder which tutorial is the best for a given subject. Unfortunately, there is no such thing.

Some students learn better from books, others from websites, and some prefer videos. Some students want interactive tutorials, others feel like they can only learn if they take a class.

Here are some Python tutorials that I think are good for students:

Pandas

Pandas is a data science library that makes it easy to do common data manipulations and analysis. We'll be using it quite a bit in this course.

There aren't quite as many Pandas tutorials as there are Python tutorials, (only 16 Million hits for Pandas Tutorial).

Here are some Pandas tutorials that I think are good for students:

NumPy

NumPy is a numeric computation library designed to allow Python to carry out super-optimized matrix algebra operations. It'll be rare in this course that we need to use NumPy directly, but many libraries (including Pandas and scikit-learn) are built on top of NumPy, so knowing something about it can sometimes be handy.

Here are some NumPy tutorials that I think are good for students:

SciKit Learn

SciKit Learn is a machine learning library that we'll be using quite a bit in this course. It makes heavy use of both Pandas and NumPy.

My advice for this library is to get an overview of how it works, and especially learn about the pipeline functions. Then as you need to use each algorithm, read the details about that algorithm in the User Guide and API manual.

Altair

There are lots of visualization libraries out there, and pandas has some visualization functions built into it, but we recommend you become familiar with Altair: