A collection of string comparisons algorithms
MIT License
This library offers a range of functions to calculate text similarity, allowing you to measure the likeness of text data in an application. It implements well-established similarity metrics. The library currently supports the following algorithms:
Assuming you have Node.js and npm/yarn/pnpm installed, install the library using:
# Install the 'string-comparisons' package using npm
npm install string-comparisons
# Alternatively, install the 'string-comparisons' package using yarn
yarn add string-comparisons
# Or, install the 'string-comparisons' package using pnpm
pnpm add string-comparisons
Find more information on the algorithms by accessing the class documentation of each implemented algorithm.
Algorithm | Normalized | Metric | Similarity | Distance | Space Complexity |
---|---|---|---|---|---|
cosine.js | Yes | Vector Space Model | ✓ | O(n) | |
jaro.js | No | Edit Distance | ✓ | O(min(n, m)) | |
jaccard.js | No | Set Theory | ✓ | O(min(n, m)) | |
damerauLevenshtein.js | No | Edit Distance | ✓ | O(max(n, m)²) | |
hammingDistance.js | No | Bitwise Operations | ✓ | O(1) | |
jaroWinkler.js | No | Edit Distance | ✓ | O(min(n, m)) | |
levenshtein.js | No | Edit Distance | ✓ | O(max(n, m)²) | |
smithWaterman.js | No | Dynamic Programming (Local Alignment) | ✓ | O(n * m) | |
sorensenDice.js | No | Set Theory | ✓ | O(min(n, m)) | |
trigram.js | No | N-gram Overlap | ✓ | O(n²) | |
szymkiewiczSimpsonOverlap.js | Yes | Overlap Coefficient | ✓ | O(min(m, n)) | |
nGram.js | Yes | Jaccard similarity coefficient | ✓ | O(m * n) | |
qGram.js | Yes | Jaccard similarity coefficient | ✓ | O(n + m) | |
optimalStringAlignment.js | No | Edit distance | ✓ | O(max(n, m)²) |
Explanation of Columns:
Notes:
import StringComparisons from 'string-comparisons';
const { Cosine, Jaccard, Jaro, DamerauLevenshtein, HammingDistance, JaroWrinker, Levenshtein, SmithWaterman, SorensenDice, Trigram } = StringComparisons;
const string1 = 'programming';
const string2 = 'programmer';
console.log('Jaro-Winkler similarity:', JaroWrinker.similarity(string1, string2)); // Output: ~0.9054545454545454
console.log('Levenshtein distance:', Levenshtein.similarity(string1, string2)); // Output: 3
console.log('Smith-Waterman similarity:', SmithWaterman.similarity(string1, string2)); // Output: 16
const set1 = new Set([1, 2, 3]);
const set2 = new Set([2, 3, 4]);
console.log('Sørensen-Dice similarity:', SorensenDice.similarity(set1, set2)); // Output: 0.6666666666666667
const trigram1 = 'hello';
const trigram2 = 'world';
console.log('Trigram Jaccard similarity:', Trigram.similarity(trigram1, trigram2)); // Output: 0 (no shared trigrams)
// so on
We encourage contributions to this library! Feel free to fork the repository, make your changes, and submit pull requests.
If you feel awesome and want to support us in a small way, please consider starring and sharing the repo! This helps us get visibility and allow the community to grow. 🙏
If you have any questions or feedback, please don't hesitate to contact us at [email protected], or reach out to Suman directly. We hope you find this resource helpful 💜.
This project is licensed under the MIT , which means that you are free to use, modify, and distribute the code as long as you comply with the terms of the license.