Skip to content

Regression

Supervised learning > regression

  • mapping of continuous inputs to discrete or continuous outputs
  • concept from statistics: observe data to construct an equation, to be able to make predictions for missing or future data
  • regression can fit different orders of polynomial (constant, line, parabola, etc.) or, for vector inputs multiple dimensions (hyperplanes)
  • input representation must be numeric and continuous => discrete inputs must be enumerated and ordered

Linear Regression

  • linear regression is an attempt to model relationships between a dependent variable \(y\) and independent variables (\(x_1, x_2,...,x_n\)) => want to find equation of type \(y = \theta_0+\theta_1x_1+\theta_2x_2+...+\theta_nx_n\)

    • \(y\) is the output variable
    • \(x_1, x_2,...,x_n\) are the input variables
    • \(\theta_1, \theta_2,...,\theta_n\) are parameters or weights of the model
    • the weights which tell how important each corresponding \(x\) is to predicting the outcome
  • sample data may not perfectly fit a linear model causing error in the model

    • many ways to calculate error e.g. sum of absolute errors, sum of squared errors
    • let \(\hat{y}\) be the predicted output, then:

      sum of absolute errors sum of squared errors
      \(\displaystyle \sum_{i=1}^{m} \lvert \hat{y}_i - y_i \rvert\) \(\displaystyle \frac{1}{2}\sum_{i=1}^{m} (\hat{y}_i - y_i)^2\)
    • use gradient decent algorithm to find the weighs that minimize the error:

      \(\displaystyle \frac{1}{2}\sum_{i=1}^{m} (\sum_{j=0}^{n} \theta_jx_j^i - y_i)^2 = 0\)

    • for a constant function the best error is the mean of data points

Polynomial regression

In more general case, for some dataset mapping values \(x_i \rightarrow y_i\), want to find weight coefficients \(c_i\):

\[ c_0 + c_1x_1 + c_2x_2^2 + c_3x_3^3 + ... + c_nx_n^n \approx y \]
\[ \begin{bmatrix} 1 & x_1 & x_1^2 & x_1^3 \\ 1 & x_2 & x_2^2 & x_2^3 \\ 1 & x_3 & x_2^3 & x_3^3 \\ \vdots \\ 1 & x_n & x_n^2 & x_n^3 \\ \end{bmatrix} \times \begin{bmatrix} c_0 \\ c_1 \\ c_2 \\ \vdots \\ c_n \end{bmatrix} \approx \begin{bmatrix} y_0 \\ y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} \]

where \(\displaystyle c = (X^T \cdot X)^{-1} X^T \cdot Y\)