Title: | An Accurate kNN Implementation with Multiple Distance Measures |
---|---|
Description: | Similarly to the 'FNN' package, this package allows calculation of the k nearest neighbors (kNN) of a data matrix. The implementation is based on cover trees introduced by Alina Beygelzimer, Sham Kakade, and John Langford (2006) <doi:10.1145/1143844.1143857>. |
Authors: | Philipp Angerer [cre, aut] , David Crane [cph, aut] |
Maintainer: | Philipp Angerer <[email protected]> |
License: | AGPL-3 |
Version: | 1.0 |
Built: | 2024-10-24 04:13:26 UTC |
Source: | https://github.com/flying-sheep/knn.covertree |
k nearest neighbor search with custom distance function.
find_knn(data, k, ..., query = NULL, distance = c("euclidean", "cosine", "rankcor"), sym = TRUE)
find_knn(data, k, ..., query = NULL, distance = c("euclidean", "cosine", "rankcor"), sym = TRUE)
data |
Data matrix |
k |
Number of nearest neighbors |
... |
Unused. All parameters to the right of the |
query |
Query matrix. In |
distance |
Distance metric to use. Allowed measures: Euclidean distance (default), cosine distance ( |
sym |
Return a symmetric matrix (as long as query is NULL)? |
A list
with the entries:
index
A integer matrix containing the indices of the k nearest neighbors for each cell.
dist
A double matrix containing the distances to the k nearest neighbors for each cell.
dist_mat
A dgCMatrix
if sym == TRUE
,
else a dsCMatrix
().
Any zero in the matrix (except for the diagonal) indicates that the cells in the corresponding pair are close neighbors.
# The default: symmetricised pairwise distances between all rows pairwise <- find_knn(mtcars, 5L) image(as.matrix(pairwise$dist_mat)) # Nearest neighbors of a subset within all mercedeses <- grepl('Merc', rownames(mtcars)) merc_vs_all <- find_knn(mtcars, 5L, query = mtcars[mercedeses, ]) # Replace row index matrix with row name matrix matrix( rownames(mtcars)[merc_vs_all$index], nrow(merc_vs_all$index), dimnames = list(rownames(merc_vs_all$index), NULL) )[, -1] # 1st nearest neighbor is always the same row
# The default: symmetricised pairwise distances between all rows pairwise <- find_knn(mtcars, 5L) image(as.matrix(pairwise$dist_mat)) # Nearest neighbors of a subset within all mercedeses <- grepl('Merc', rownames(mtcars)) merc_vs_all <- find_knn(mtcars, 5L, query = mtcars[mercedeses, ]) # Replace row index matrix with row name matrix matrix( rownames(mtcars)[merc_vs_all$index], nrow(merc_vs_all$index), dimnames = list(rownames(merc_vs_all$index), NULL) )[, -1] # 1st nearest neighbor is always the same row
A not-too-fast but accurate kNN implementation supporting multiple distance measures