An implementation of Anchor Graph Hashing (Liu et al. 2011) in Python.
MIT License
An implementation of the Anchor Graph Hashing algorithm (AGH-1), presented in Hashing with Graphs (Liu et al. 2011).
aghasher supports python>=3.6
, with numpy and scipy. These should be linked with a BLAS implementation
(e.g., OpenBLAS, ATLAS, Intel MKL). Without being linked to BLAS, numpy/scipy will use a fallback that causes
PyAnchorGraphHasher to run over 50x slower.
aghasher is available on PyPI, the Python Package Index.
$ pip install aghasher
To use aghasher, first import the aghasher module.
import aghasher
An AnchorGraphHasher is constructed using the train method, which returns an AnchorGraphHasher and the hash bit embedding for the training data.
agh, H_train = aghasher.AnchorGraphHasher.train(X, anchors, num_bits, nn_anchors, sigma)
AnchorGraphHasher.train takes 5 arguments:
With an AnchorGraphHasher object, which has variable name agh in the preceding and following examples, hashing out-of-sample data is done with the object's hash method.
agh.hash(X)
The hash method takes one argument:
Since Python does not have a native bit vector data structure, the hash method returns an n-by-r numpy.ndarray, where n is the number of observations in data, and r is the number of hash bits specified when the model was trained. The elements of the returned array are boolean values that correspond to bits.
Testing is performed with the AnchorGraphHasher.test method.
precision = AnchorGraphHasher.test(H_train, H_test, y_train, y_test, radius)
AnchorGraphHasher.test takes 5 arguments:
Tests are in tests/.
# Run tests
$ python3 -m unittest discover tests -v
The code is structured differently than the Matlab reference implementation.
The Matlab code implements an additional hashing method, hierarchical hashing (referred to as 2-AGH), an extension of 1-AGH that is not implemented here.
There is one functional difference relative to the Matlab code. If sigma is specified (as opposed to being auto-estimated), then for the same value of sigma, the Matlab and Python code will produce different results. They will produce the same results when the Matlab sigma is sqrt(2) times bigger than the manually specified sigma in the Python code. This is because in the Gaussian RBF kernel, the Python code uses a 2 in the denominator of the exponent, and the Matlab code does not. A 2 was included in the denominator of the Python code, as that is the canonical way to use an RBF kernel.
aghasher has an MIT License.
See LICENSE.
Liu, Wei, Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. 2011. “Hashing with Graphs.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), edited by Lise Getoor and Tobias Scheffer, 1–8. ICML ’11. New York, NY, USA: ACM.