The Random Sample Consensus (RANSAC) algorithm is a regressor algorithm that discards outliers automatically.
The algorithm works as follows:
- Select a random subset from the data. Call this subset the hypothetical inliners.
- Fit a model to these hypotetical inliners.
- Test all of the other data points against this model.
- For those points that, according to a loss function, perform well enough, are also considered part of the consensus set.
- The model is considered good enough if a certain amount of points made it into the consensus set.
- Iterate times, keeping the best model.
Sources:
- Random Sample Consensus, Wikipedia
- RANSACRegressor, SciKit Learn Documentation
- RANdom SAmple Consensus, SciKit User Guide
- Robust Regression for Machine Learning in Python, Machine Learning Mastery