Regression
Supervised learning > regression
- mapping of continuous inputs to discrete or continuous outputs
- concept from statistics: observe data to construct an equation, to be able to make predictions for missing or future data
- regression can fit different orders of polynomial (constant, line, parabola, etc.) or, for vector inputs multiple dimensions (hyperplanes)
- input representation must be numeric and continuous => discrete inputs must be enumerated and ordered
Linear Regression
-
linear regression is an attempt to model relationships between a dependent variable \(y\) and independent variables (\(x_1, x_2,...,x_n\)) => want to find equation of type \(y = \theta_0+\theta_1x_1+\theta_2x_2+...+\theta_nx_n\)
- \(y\) is the output variable
- \(x_1, x_2,...,x_n\) are the input variables
- \(\theta_1, \theta_2,...,\theta_n\) are parameters or weights of the model
- the weights which tell how important each corresponding \(x\) is to predicting the outcome
-
sample data may not perfectly fit a linear model causing error in the model
- many ways to calculate error e.g. sum of absolute errors, sum of squared errors
-
let \(\hat{y}\) be the predicted output, then:
sum of absolute errors sum of squared errors \(\displaystyle \sum_{i=1}^{m} \lvert \hat{y}_i - y_i \rvert\) \(\displaystyle \frac{1}{2}\sum_{i=1}^{m} (\hat{y}_i - y_i)^2\) -
use gradient decent algorithm to find the weighs that minimize the error:
\(\displaystyle \frac{1}{2}\sum_{i=1}^{m} (\sum_{j=0}^{n} \theta_jx_j^i - y_i)^2 = 0\)
-
for a constant function the best error is the mean of data points
Polynomial regression
In more general case, for some dataset mapping values \(x_i \rightarrow y_i\), want to find weight coefficients \(c_i\):
where \(\displaystyle c = (X^T \cdot X)^{-1} X^T \cdot Y\)