k nearest neighbor (KNN) graphs via Pearson correlation distance and local sensitive hashing (LSH).
MIT License
CPython module for fast calculation of k nearest neighbor (KNN) graphs in high-dimensional vector spaces using Pearson correlation distance and local sensitive hashing (LSH).
The current application is analysis of single cell RNA-Seq data and is the result of a collaboration between Fabio Zanini (now @UNSW) and Paolo Carnevali @ Chan Zuckerberg Initiative, who is the owner of the algorithm code, which is also under MIT license:
https://github.com/chanzuckerberg/ExpressionMatrix2
(you may need superuser priviledges)
pip install lshknn
For the development version:
git clone https://github.com/iosonofabio/lshknn.git
cd lshknn
python setup.py install
import numpy as np
import lshknn
# Make mock data
# 2 features (rows), 4 samples (columns)
data = np.array(
[[1, 0, 1, 0],
[0, 1, 0, 1]],
dtype=np.float64)
# Instantiate class
c = lshknn.Lshknn(
data=data,
k=1,
threshold=0.2,
m=10,
slice_length=4)
# Call subroutine
knn, similarity, n_neighbors = c()
# Check result
assert (knn == [[2], [3], [0], [1]]).all()