WTF is Machine Learning?

Basically

A machine that learns to perform better from experience.

A little more formally...

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E." - Tom Mitchell

Why should I care?

ML enables self-driving vehicles

Cars, ships, planes, drones, and ...pancake flipping robots?

Image detection

Visual search, shopping w/ no checkout lines, cucumber sorting.

Natural language processing

Translation, understanding queries, support automation, virtual coaches.

Content creation

Generative music, AI writers for articles and movies.

And many more!

general prediction
medicine
pathfinding
game playing
optimization
pretty sure there's even more...

OK, how do they do it?

Warning

General and basic introduction
Doesn't apply for all kinds of machine learning
Useful for intuition
Let's start with supervised learning

Setup

You want a certain output given a certain input.
Too hard or too infeasible to program a solution.
Use ML!

Supervised learning in 9 steps

Give machine lots of labeled training data. Labeled just means you provide the answers / correct output.
Feed data into machine, not showing it the labels.
Machine tries to label all the training data itself.
Compare machine's labels to correct labels.
Tell the machine how far off its labels were.
Machine optimizes itself a little bit using ~~magic~~ math.
Do steps 2-6 a shitton of times. This is called training your machine and each pass is called an epoch. Once satisfied with machine's labels, you may stop training.
Machine can now give you the output you want on data it hasn't seen yet.
Mind === blown.

Example: apple-or-not bot

Training on example {{step + 1}} of {{stepMax + 1}}...

Apple!

Not apple.

Apple!

/

Ok. Optimizing self based on error...

AWW YIS.

/

You got {{numWrongs}} wrong. Feel bad! You got one wrong. Not bad... All good!

Epoch:

{{timesTrained - 1}}

Accuracy:

{{readableAccuracy}}

%

Click on the button above! This is a robot that learns to identify apples among other fruits. Let's call him Gary. Explanation on the next slide.

Supervised learning in 9 steps with Gary

We get 5 fruits form the store: 2 apples, 1 orange, 1 banana, 1 kiwi and label them properly.
We give those fruits to Gary without labels.
We show Gary each of the fruits. He tries to label them.
We compare Gary's labels to the correct labels.
We tell Gary how many he got wrong.
Gary optimizes self based on error. He will perform better the next time around.
You may train Gary again and again.
Gary should now be able to identify apples, oranges, bananas and kiwis from a different store. This is not part of the demo.
Mind === blown.

Now for a more in-depth example...

Predicting house prices

Say we want to predict housing prices given a house's area.
Just area is insufficient to actually predict house prices, but just pay attention to concepts.
House area is formally called our feature. In real-life ML problems you usually have a lot more features.

Collect your data

Go outside, look at houses and get area and price. Plot all house data out on a graph with area on the x-axis and price on the y-axis.

The intuition is, we want our machine (who we'll call Gary, after our apple-bot) to draw a line that best fits our data. From this line, we will be able to predict house prices given house areas our machine hasn't seen yet.

Remember the formula for a line? y = mx + b. This is called our model.

Weights

We want Gary to draw that line for us. He can do that by plugging in values to m and b.

These are called weights and are usually denoted by the greek letter Theta (Θ).

It is Gary's job to find the best values for the weights, to get the best fitting line.

But how do we know if it's the best fitting line?

Enter the cost function

The cost function is how you tell Gary he's wrong. Specifically, it tells Gary how wrong he was.

Here's the formula.

m - number of houses in our dataset (training examples) h - just means apply our weights to our model and get the result

All this is doing is summing up all of our errors and squaring them.

Visually, it looks like this:

We're just getting the distance of each training example from the line our machine guessed, squaring them, adding them all together and dividing the result by 2 * num training examples.

Ok, now that Gary knows how bad his answers are, what can he do?

The answer is to turn to ~~meth~~ math!

The magic of gradient descent

Turns out with the magic of derivatives (we will derive the cost function), we can show Gary if a weight should be increased or decreased to minimize the cost function.

Down from the hill

A helpful analogy for gradient descent is that we're throwing Gary off a hill.

Derivatives/Gradients == gravity

Another name for derivatives are gradients (thus gradient descent).

I won't go into details of how to get derivatives (lots of nice tuts online) but just think of them as gravity, telling Gary's poor body where to go.

The more downwards Gary falls, the lower the cost function, thus the less we shame him for failing us.

So it follows that Gary's goal is to try to reach the bottom.

Doing the descent + learning rate

Each time Gary trains, all we have to do is subtract each weight by a number (learning rate) times the gradient. Think of learning rate as how big a fall off the hill Gary takes at a time.

We just keep doing the descent and adjusting our weights until our cost function doesn't decrease by much (like less than some tiny number 0.0001).

Mega reward

Once we're satisfied that Gary has fallen enough, he's ready to predict house prices with his weights!

We'll find some house we haven't seen yet, get their area as x, apply Gary's weights and we should get a number that predicts the price of that house! Recall y = mx + b.

Get your price by calculating price = weight1 * house area + weight2

Intuition: ML is just models and weights

As a recap, takeaway the following:

Model - the features (and modifications to them) you choose to use
Weights - numbers applied to the model to get the output. These are what our algorithm tries to find via optimization.

Hungry for apples more?

Deep Learning is current hot trend
Uses deeply layered Neural Networks
Very good for image classification and understanding text/speech
A lot more things to learn on top of what we discussed but underlying concepts are the same
Check out ML moocs, like Stanford's Machine Learning course by Andrew Ng
Check out the fast.ai course