K-Nearest Neighbors
Supervised > classification/regression
- method of instance-based learning, save examples in database and obtain result by
- general features of instance-based learners:
- defer generalization decision until new query instance is observed
- classify new instance base don analyzing similar instance while ignoring distant examples
- represent instances as real-valued points in Euclidean space
- k-nearest neighbors
- the choice of representative distance function is very important for generalizing well => domain knowledge matters
- particularly susceptible to high dimensionality => the space to covered by examples grows exponentially w.r.t number of features
- there is no general best choice for
, also possible to choose- if all instances are considered (distance-weighted), the algorithm is a global method
- if only nearest examples are considered, then the algorithm is a local method
- strategies for improving classification efficiency
- multi-dimensional access methods (k-d trees)
- fast approximate similarity search
- locality sensitive hashing
- condensing by choosing a smaller subset of records that give the same results
- editing data by removing objects to improve efficiency
Pseudo algorithm
-
given:
- Training data
= { , } - Distance metric
- Number of neighbors:
- Query point:
(if equally near, take all)
- Training data
-
return:
- classification - plurality vote:
(possibly weighted vote)- break ties e.g. most common overall, random choice, etc.
- regression - mean
- classification - plurality vote:
Complexity
method | time | space |
---|---|---|
k-NN, learning | ||
k-NN, query (sorted)* | ||
k-NN, query (unsorted) |
* If
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The actual function is
Distance metrics make assumptions about domain, which may not apply.
Locally weighted regression
Locally weighted (linear) regression is a generalization of kNN where an explicit local approximation of target function is constructed for each query instance.
- uses nearby or distance-weighted examples to form local approximation to
- target function is approximated using constant, linear, quadratic function
- using more complex functional forms is challenging because cost of fitting more complex function for each query instance is high
- these simpler functions model the target quite well over sufficiently small subregion
- it is local because function is approximating only on data near the query point
- is is weighted because contribution of each point is weighted by its distance from query point
- starts with hypothesis space
- over learning becomes
- can build local knowledge from nearby concepts to represent arbitrarily complicated functions
- over learning becomes
Bias
Inductive bias
-
restriction bias: N/A => kNN algorithm never forms an explicit general hypothesis
regarding the target function -
preference bias
- locality: nearest points are similar => domain knowledge required to choose right distance function
- smoothness: in choosing near neighbors we expect similar points to behave similarly
- all features matter equally: assume points can be represented in n-dimensional space
Advantages
- ability to model complex target functions by collection of local approximations
- can estimate locally and differently for ach new instance to be classified
- information presented in examples is never lost
- simple to implement, no training
- attempts to construct a local target function and never generalize over entire instance space => this is advantageous when the target function is very complex but can be described by less complex local approximations
Disadvantages
- no generalization because all examples are kept => difficult to label new instance
- cost of classifying new instances is high since all computation takes place at classification time
- overfits and can be sensitive to noise => use weighted distance to smooth out impact of noisy examples
- having to break ties when two or more similar points with different labels exist
- considers all attributes when attempting to retrieve similar examples
- most similar may seem far apart because of many attributes
- highly sensitive to curse of dimensionality
- highly sensitive to choosing the right distance metric