Top2Vec learns jointly embedded topic, document and word vectors.
BSD-3-CLAUSE License
Bot releases are visible (Hide)
Published by ddangelov 11 months ago
Published by ddangelov 12 months ago
Published by ddangelov 12 months ago
Published by ddangelov 12 months ago
universal-sentence-encoder-multilingual
.gpu_umap
parameter.Published by ddangelov over 1 year ago
topic_merge_delta
.Published by ddangelov over 1 year ago
get_feature_names() -> get_feature_names_out()
Published by ddangelov over 2 years ago
embedding_model
embedding_batch_size
optionngram_vocab=True
Published by ddangelov over 3 years ago
Published by ddangelov over 3 years ago
Added query_documents
and query_topics
methods which allow for using a sequence of text such as a question, a sentence, a paragraph or a document to query documents or topics.
Added num_topics
parameter to get_documents_topics
method which allows retrieving multiple topics per document.
Published by ddangelov over 3 years ago
Fixes #152
Published by ddangelov over 3 years ago
Addressed #90, #125, #126
Added custom umap and hdbscan arg option. Fixed issue with loading model with custom tokenizer.
Published by ddangelov almost 4 years ago
Added use_embedding_model_tokenizer
parameter. If set to True
and if using an embedding_model
other than doc2vec
, use the model's tokenizer for document embedding.
Fixed dependency issue with joblib.
Fixed issues with wordclouds caused by negative similarity scores.
Published by ddangelov almost 4 years ago
Fixed bug #91
Published by ddangelov almost 4 years ago
Added option for indexing word vectors, this will speed up search for models with large vocabularies. Specifically search_words_by_vector
and similar_words
.
Added new method search_words_by_vector
.
Published by ddangelov almost 4 years ago
Added option for indexing document vectors, this will speed up search for models with large number of documents. Specifically search_documents_by_vector
, search_documents_by_keywords
, and search_documents_by_documents
.
Added new method search_documents_by_vector
.
Added code to prevent hierarchical topic reduction error #79.
Published by ddangelov almost 4 years ago
Dependencies for universal sentence encoder and BERT sentence transformer options are now optional.
With pip install top2vec[sentence-encoders]
and pip install top2vec[sentence_transformers]
Faster cosine similarity.
Published by ddangelov about 4 years ago
The verbose
parameter will be set to True by default.
Fixed a bug that stopped showing logging updates after downloading pre-trained models.