When sampling, it’s not guaranteed to get perfect accuracy, because it’s always possible to get a “weird” sample that skews the results. How many samples to take before we get a justifiable answer? It depends on the variability and underlying possibilities.
Quantifying variation in data: Variance
- The standard deviation () is just the square root of the variance.
- Outliers can have a big effect.
- The standard deviation should aways be considered relative to the mean ().
- is the cardinality (size) of the set, the number of members in the set.
Squaring the distance means that:
- It doesn’t matter the direction of the difference
- Outliers (big distances) get emphasized.