Regression

Supervised learning > regression

mapping of continuous inputs to discrete or continuous outputs
concept from statistics: observe data to construct an equation, to be able to make predictions for missing or future data
regression can fit different orders of polynomial (constant, line, parabola, etc.) or, for vector inputs multiple dimensions (hyperplanes)
input representation must be numeric and continuous => discrete inputs must be enumerated and ordered

Linear Regression

linear regression is an attempt to model relationships between a dependent variable $y$ and independent variables ( $x_{1}, x_{2}, . . ., x_{n}$ ) => want to find equation of type $y = θ_{0} + θ_{1} x_{1} + θ_{2} x_{2} + . . . + θ_{n} x_{n}$
- $y$ is the output variable
- $x_{1}, x_{2}, . . ., x_{n}$ are the input variables
- $θ_{1}, θ_{2}, . . ., θ_{n}$ are parameters or weights of the model
- the weights which tell how important each corresponding $x$ is to predicting the outcome
sample data may not perfectly fit a linear model causing error in the model
- many ways to calculate error e.g. sum of absolute errors, sum of squared errors
- let $\hat{y}$ be the predicted output, then:
  
  sum of absolute errors sum of squared errors
  
  $\sum_{i = 1}^{m} | {\hat{y}}_{i} - y_{i} |$ $\frac{1}{2} \sum_{i = 1}^{m} ({\hat{y}}_{i} - y_{i})^{2}$
- use gradient decent algorithm to find the weighs that minimize the error:
  
  $\frac{1}{2} \sum_{i = 1}^{m} (\sum_{j = 0}^{n} θ_{j} x_{j}^{i} - y_{i})^{2} = 0$
- for a constant function the best error is the mean of data points

In more general case, for some dataset mapping values $x_{i} \to y_{i}$ , want to find weight coefficients $c_{i}$ :

c_{0} + c_{1} x_{1} + c_{2} x_{2}^{2} + c_{3} x_{3}^{3} + . . . + c_{n} x_{n}^{n} \approx y

[\begin{matrix} 1 & x_{1} & x_{1}^{2} & x_{1}^{3} \\ 1 & x_{2} & x_{2}^{2} & x_{2}^{3} \\ 1 & x_{3} & x_{2}^{3} & x_{3}^{3} \\ ⋮ \\ 1 & x_{n} & x_{n}^{2} & x_{n}^{3} \end{matrix}] \times [\begin{matrix} c_{0} \\ c_{1} \\ c_{2} \\ ⋮ \\ c_{n} \end{matrix}] \approx [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ ⋮ \\ y_{n} \end{matrix}]

where $c = (X^{T} \cdot X)^{- 1} X^{T} \cdot Y$