# Lecture 1: Introduction, Optimization Problems

Source: https://www.youtube.com/watch?v=C1lhuz6pZC0

- Computer Models
- Optimization models
- Knapsack problem
- Brute force for optimization problems
- Greedy algorithm for optimization problems

# Lecture 2: Optimization problems

Source: https://www.youtube.com/watch?v=uK5yvoXnkSk

# Lecture 3: Graph-theoretic models

Source: https://www.youtube.com/watch?v=V_TulH374hw

# Lecture 4: Stochastic Thinking

Source: https://www.youtube.com/watch?v=-1BnXEwHUok

- Uncertainty
- Stochastic processes
- Probability
- Random numbers
- Sample probability
- The Birthday Problem
- Simulation Models

# Lecture 5: Random Walks

Source: https://www.youtube.com/watch?v=6wUD_gp5WeE

# Lecture 6: Monte Carlo Simulation

Source: https://www.youtube.com/watch?v=OgO1gpXSUzU

- Monte Carlo Simulations
- Inferential Statistics
- Confidence intervals
- Law of Large Numbers
- Gambler’s Fallacy
- Regression to the Mean
- Variance
- Empirical Rule
- Probability Distributions
- Probability Density Function
- Normal Distributions

# Lecture 7: Confidence Intervals

Source: https://www.youtube.com/watch?v=rUxP7TM8-wo

# Lecture 8: Sampling and standard error

Source: https://www.youtube.com/watch?v=soZv_KKax3E

- Inferential Statistics
- Monte Carlo Simulations
- Sampling
- Confidence intervals
- Standard of the Error Mean
- Skew

# Lecture 9: Understanding Experimental Data

Source: https://www.youtube.com/watch?v=vIFKGFl1Cn8

- Data
- Modelling a spring
- Objective Functions
- Least Squares Objective Function
- Linear regression
- Coefficient of Determination

# Lecture 10: Understanding Experimental Data (Cont.)

Source: https://www.youtube.com/watch?v=fQvg-hh9dUw

# Lecture 11: Introduction to Machine Learning

Source: https://www.youtube.com/watch?v=h0e2HAPTGF4

You could say that all computer programs learn a little. The grade varies on the kind of algorithm. In this case, particularly, we’re interested in programs that learn from experience, seeing examples and generalizing from them instead of having to program that generalization ourselves.

In “regular” programming we program so that the system can process data (that we also provide) to generate output. In machine learning, we want to provide data and output so that the computer generates a program.

Memorization is declarative knowledge, it’s the accumulation of individual facts. It is limited by the time to observe them and the memory required to store them.

Generalizaton, instead, is imperative knowledge. Is to deduce new facts from old facts, limited just by the accuracy of the deduction process. It assumes taht the past predicts the future.

Observations: training data.

Supervised learning: for each example we have a label, and we’ll find a way to predict that label associated with the input.

Unsupervised: we have a set of feature vectors without labels, and we’ll try to group them into “natural clusters” (or labels for those groups). In some cases we’ll know how many labels there should be, in some other cases we’ll find which is the best number of them.

Clustering examples into groups:

- Pick $k$ examples (at random?) as exemplars
- Cluster remaining samples by minimizing distance between samples in same cluster (objective function) — put sample in group with closest exemplar
- Find median example in each cluster as new exemplar
- Repeat until there is no change

This works with unlabeled data, but if we had it labeled, we’d want to find a subsurface (e.g. for 2D data ⇒ line) of the data that naturally divides them.

Features are the information pieces we can gather from our examples. They never fully describe the situation. Extra features might actually hurt the model as there is the danger of finding sporadic correlations. Or it might generate overfitting, depending on how our process of feature engineering mixes them together to separate instances.

Feature engineering is the process of representing examples by feature vectors that will facilitate generalization.

During the construction of the model we might need to make design choices about which kinds of error the model will make, like prioritizing minimizing false positives.

Minkowski Metric:

$(k=1∑len abs(X1_{k}−X2_{k})_{p})_{1/p}$When $p=1$, we get the Manhattan distance When $p=2$, we get the Euclidean distance

Accuracy: measure of how many instances the model got right.

$accuracy=truepositive+truenegative+falsepositive+falsenegativetruepositive+truenegative $PPV: Positive predictive value: how may true positives the model came up from the things it labeled positive.

$positivepredictivevalue=truepositive+falsepositivetruepositive $Sensitivity: what percentage did the model correctly find.

$sensitivity=truepositive+falsenegativetruepositive $Specificity: what percentage did the model correctly reject.

$specificity=truenegative+falsepositivetruenegative $Sensitivity and specificity suffer a trade off between each other.

# Lecture 12: Clustering

Source: https://www.youtube.com/watch?v=esmzYhuFnds

(Pending)

# Lecture 13: Classification

Source: https://www.youtube.com/watch?v=eg8DJYwdMyg

(Pending)

# Lecture 14: Classification and statistical sins

Source: https://www.youtube.com/watch?v=K2SC-WPdT6k

(Pending)

# Lecture 15: Statistical Sins and Wrap Up

Source: https://www.youtube.com/watch?v=iOZVbILaIZc

(Pending)