Top2Vec

Top2Vec learns jointly embedded topic, document and word vectors.

BSD-3-CLAUSE License

Downloads
11.9K
Stars
2.9K
Committers
2

Bot releases are hidden (Show)

Top2Vec - hierarchical topic reduction improvements Latest Release

Published by ddangelov 11 months ago

  • fixed loading bug
  • hierarchical topic reduction bug
  • added parameter for optimizing hierarchical reduction speed
Top2Vec - Topic indexing bugfix

Published by ddangelov 12 months ago

Top2Vec -

Published by ddangelov 12 months ago

Indexing bugfix

Top2Vec - gpu hdbscan and topic indexing

Published by ddangelov 12 months ago

  1. Added gpu hdsbcan
  2. Added topic indexing
Top2Vec - gpu umap

Published by ddangelov 12 months ago

  1. Changed default embedding model to universal-sentence-encoder-multilingual.
  2. Added option for GPU umap with gpu_umap parameter.
Top2Vec - Adding compute_topics

Published by ddangelov over 1 year ago

  • Added a method for computing topics.
  • Exposed topic deduplication parameter topic_merge_delta.
  • Bug fixes.
Top2Vec - Sklearn change in API fix

Published by ddangelov over 1 year ago

get_feature_names() -> get_feature_names_out()

Top2Vec - Phrases and new embedding options

Published by ddangelov over 2 years ago

  • New pre-trained transformer models available
  • Ability to use any embedding model by passing callable to embedding_model
  • New embedding_batch_size option
  • Document chunking options for long documents
  • Phrases in topics by setting ngram_vocab=True
Top2Vec - Query documents and topics fix

Published by ddangelov over 3 years ago

Top2Vec - Query documents and topics

Published by ddangelov over 3 years ago

Added query_documents and query_topics methods which allow for using a sequence of text such as a question, a sentence, a paragraph or a document to query documents or topics.

Added num_topics parameter to get_documents_topics method which allows retrieving multiple topics per document.

Top2Vec - gensim version fix

Published by ddangelov over 3 years ago

Fixes #152

Top2Vec -

Published by ddangelov over 3 years ago

Added numpy>=1.20.0 dependency.

Top2Vec -

Published by ddangelov over 3 years ago

Numpy related bug fix and document id validation performance upgrade.

Top2Vec - added umap/hdbscan custom args

Published by ddangelov over 3 years ago

Addressed #90, #125, #126

Added custom umap and hdbscan arg option. Fixed issue with loading model with custom tokenizer.

Top2Vec - added use_embedding_model_tokenizer option

Published by ddangelov almost 4 years ago

Added use_embedding_model_tokenizer parameter. If set to True and if using an embedding_model other than doc2vec, use the model's tokenizer for document embedding.

Fixed dependency issue with joblib.

Fixed issues with wordclouds caused by negative similarity scores.

Top2Vec - fix saving bug

Published by ddangelov almost 4 years ago

Fixed bug #91

Top2Vec - word indexing

Published by ddangelov almost 4 years ago

Added option for indexing word vectors, this will speed up search for models with large vocabularies. Specifically search_words_by_vector and similar_words.

Added new method search_words_by_vector.

Top2Vec - document indexing

Published by ddangelov almost 4 years ago

Added option for indexing document vectors, this will speed up search for models with large number of documents. Specifically search_documents_by_vector, search_documents_by_keywords, and search_documents_by_documents.

Added new method search_documents_by_vector.

Added code to prevent hierarchical topic reduction error #79.

Top2Vec - Separate dependencies

Published by ddangelov almost 4 years ago

Dependencies for universal sentence encoder and BERT sentence transformer options are now optional.
With pip install top2vec[sentence-encoders] and pip install top2vec[sentence_transformers]

Faster cosine similarity.

Top2Vec - logging bug fix and default change

Published by ddangelov about 4 years ago

The verbose parameter will be set to True by default.

Fixed a bug that stopped showing logging updates after downloading pre-trained models.

Package Rankings
Top 25.39% on Conda-forge.org
Top 2.62% on Pypi.org
Badges
Extracted from project README
Related Projects