Making text a first-class citizen in TensorFlow.
APACHE-2.0 License
Bot releases are visible (Hide)
Published by broken over 2 years ago
This release contains contributions from many people at Google, as well as:
Aflah, Connor Brinton, devnev39, Janak Ramakrishnan, Martin, Nathan Luehr, Pierre Dulac, Rabin Adhikari
Published by broken over 2 years ago
This release contains contributions from many people at Google, as well as:
Aflah, Connor Brinton, devnev39, Janak Ramakrishnan, Martin, Nathan Luehr, Pierre Dulac, Rabin Adhikari
Published by broken over 2 years ago
This release contains contributions from many people at Google, as well as:
Abhijeet Manhas, chunduriv, Dean Wyatte, Feiteng, jaymessina3, Mao, Olivier Bacs, RenuPatelGoogle, Steve R. Sun, Stonepia, sun1638650145, Tharaka De Silva, thuang513, Xiaoquan Kong, devnev39, Janak Ramakrishnan, Pierre Dulac
Published by mms4devops over 2 years ago
ShrinkLongestTrimmer
This release contains contributions from many people at Google, as well as:
Abhijeet Manhas, chunduriv, Dean Wyatte, Feiteng, jaymessina3, Mao, Olivier Bacs, RenuPatelGoogle, Steve R. Sun, Stonepia, sun1638650145, Tharaka De Silva, thuang513, Xiaoquan Kong
Published by mms4devops almost 3 years ago
Published by mms4devops almost 3 years ago
This release contains contributions from many people at Google, as well as:
Aaron Siddhartha Mondal, Abhijeet Manhas, Dominik Schlösser, jaymessina3, Mao, Xiaoquan Kong, Yasir Modak, Olivier Bacs, Tharaka De Silva
Published by broken almost 3 years ago
This release contains contributions from many people at Google, as well as:
Aaron Siddhartha Mondal, Abhijeet Manhas, Dominik Schlösser, jaymessina3, Mao, Xiaoquan Kong, Yasir Modak
Published by broken about 3 years ago
This release contains contributions from many people at Google, as well as:
Aaron Siddhartha Mondal, Dominik Schlösser, Xiaoquan Kong, Yasir Modak
Published by broken about 3 years ago
__init__.py
: Added a __version__
variableThis release contains contributions from many people at Google, as well as:
8bitmp3, akiprasad, bongbonglemon, Jules Gagnon-Marchand, Stonepia
Published by broken over 3 years ago
__init__.py
: Added a __version__
variableThis release contains contributions from many people at Google, as well as:
8bitmp3, akiprasad, bongbonglemon, Jules Gagnon-Marchand, Stonepia
Published by broken over 3 years ago
We want to particularly point out that guides, tutorials, and API docs are currently being published to http://tensorflow.org/text ! This should make it easier for users to find our documentation. We worked hard on improving docs across the board, so feel free to let us know if further clarification is needed.
Published by gregbillock over 3 years ago
BertTokenizer
and WordpieceTokenizer
.shape
attribute to the ToDense
Keras layer.unselectable_ids
shape check in ItemSelector.text.BertTokenizer
tensorflow_text
pip package.tools
pip package inclusion.This release contains contributions from many people at Google, as well as:
Rens, Samuel Marks, thuang513
Published by broken almost 4 years ago
This release contains contributions from many people at Google, as well as:
fsx950223
Published by broken almost 4 years ago
tensorflow-text-nightly
. This is available for Linux immediately, with other platforms to be added soon.Published by thuang513 almost 4 years ago
Splitter
RegexSplitter
StateBasedSentenceBreaker
Trimmer
WaterfallTrimmer
RoundRobinTrimmer
ItemSelector
RandomItemSelector
FirstNItemSelector
MaskValuesChooser
mask_language_model()
combine_segments()
pad_model_inputs()
Spliter
/ SplitterWithOffsets
abstract base classes. These are meant to replace the current Tokenizer
/ TokenizerWithOffsets
base classes. The Tokenizer
base classes will continue to work and will implement these new Splitter
base classes. The reasoning behind the change is to prevent confusion when future splitting operations that also use this interface do not tokenize into words (sentences, subwords, etc).offset_end
is a positional value rather than a length.HubModuleSplitter
that helps handle ragged tensor input and outputs for hub modules which implement the Splitter class.SplitMergeFromLogitsTokenizer
which is a narrowly focused tokenizer that splits text based on logits from a model. This is used with the newly released Chinese segmentation model.normalize_utf8_with_offsets
and find_source_offsets
ops.normalization_form
that will be ignored.This release contains contributions from many people at Google, as well as:
Pranay Joshi, Siddharths8212376, Vincent Bodin
Published by broken almost 4 years ago
Spliter
/ SplitterWithOffsets
abstract base classes. These are meant to replace the current Tokenizer
/ TokenizerWithOffsets
base classes. The Tokenizer
base classes will continue to work and will implement these new Splitter
base classes. The reasoning behind the change is to prevent confusion when future splitting operations that also use this interface do not tokenize into words (sentences, subwords, etc).offset_end
is a positional value rather than a length.HubModuleSplitter
that helps handle ragged tensor input and outputs for hub modules which implement the Splitter class.SplitMergeFromLogitsTokenizer
which is a narrowly focused tokenizer that splits text based on logits from a model. This is used with the newly released Chinese segmentation model.normalize_utf8_with_offsets
and find_source_offsets
ops.normalization_form
that will be ignored.This release contains contributions from many people at Google, as well as:
Pranay Joshi, Siddharths8212376, Vincent Bodin
Published by broken almost 4 years ago
Spliter
/ SplitterWithOffsets
abstract base classes. These are meant to replace the current Tokenizer
/ TokenizerWithOffsets
base classes. The Tokenizer
base classes will continue to work and will implement these new Splitter
base classes. The reasoning behind the change is to prevent confusion when future splitting operations that also use this interface do not tokenize into words (sentences, subwords, etc).offset_end
is a positional value rather than a length.HubModuleSplitter
that helps handle ragged tensor input and outputs for hub modules which implement the Splitter class.SplitMergeFromLogitsTokenizer
which is a narrowly focused tokenizer that splits text based on logits from a model. This is used with the newly released Chinese segmentation model.normalize_utf8_with_offsets
and find_source_offsets
ops.normalization_form
that will be ignored.This release contains contributions from many people at Google, as well as:
Pranay Joshi, Siddharths8212376, Vincent Bodin
Published by broken almost 4 years ago
Please note that this is a pre-release and meant to run with TF v2.3.x. We wanted to give access to some of the features we were adding to 2.4.x, but did not want to wait for the TF release.
Spliter
/ SplitterWithOffsets
abstract base classes. These are meant to replace the current Tokenizer
/ TokenizerWithOffsets
base classes. The Tokenizer
base classes will continue to work and will implement these new Splitter
base classes. The reasoning behind the change is to prevent confusion when future splitting operations that also use this interface do not tokenize into words (sentences, subwords, etc).offset_end
is a positional value rather than a length.HubModuleSplitter
that helps handle ragged tensor input and outputs for hub modules which implement the Splitter class.SplitMergeFromLogitsTokenizer
which is a narrowly focused tokenizer that splits text based on logits from a model. This is used with the newly released Chinese segmentation model.Published by broken about 4 years ago
Published by broken over 4 years ago