Bot releases are visible (Hide)

transformers.js - 2.9.0

Published by xenova 11 months ago

What's new?

😍 Exciting new tasks!

Transformers.js v2.9.0 adds support for three new tasks: (1) Depth estimation, (2) Zero-shot object detection, and (3) Optical document understanding.

🕵️‍♂️ Depth Estimation

The task of predicting the depth of objects present in an image. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create depth estimation pipeline
let depth_estimator = await pipeline('depth-estimation', 'Xenova/dpt-hybrid-midas');

// Predict depth for image
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
let output = await depth_estimator(url);

Input	Output

// {
//   predicted_depth: Tensor {
//     dims: [ 384, 384 ],
//     type: 'float32',
//     data: Float32Array(147456) [ 542.859130859375, 545.2833862304688, 546.1649169921875, ... ],
//     size: 147456
//   },
//   depth: RawImage {
//     data: Uint8Array(307200) [ 86, 86, 86, ... ],
//     width: 640,
//     height: 480,
//     channels: 1
//   }
// }

🎯 Zero-shot Object Detection

The task of identifying objects of classes that are unseen during training. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create zero-shot object detection pipeline
let detector = await pipeline('zero-shot-object-detection', 'Xenova/owlvit-base-patch32');

// Predict bounding boxes
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/astronaut.png';
let candidate_labels = ['human face', 'rocket', 'helmet', 'american flag'];
let output = await detector(url, candidate_labels);

// [
//   {
//     score: 0.24392342567443848,
//     label: 'human face',
//     box: { xmin: 180, ymin: 67, xmax: 274, ymax: 175 }
//   },
//   {
//     score: 0.15129457414150238,
//     label: 'american flag',
//     box: { xmin: 0, ymin: 4, xmax: 106, ymax: 513 }
//   },
//   {
//     score: 0.13649864494800568,
//     label: 'helmet',
//     box: { xmin: 277, ymin: 337, xmax: 511, ymax: 511 }
//   },
//   {
//     score: 0.10262022167444229,
//     label: 'rocket',
//     box: { xmin: 352, ymin: -1, xmax: 463, ymax: 287 }
//   }
// ]

📝 Optical Document Understanding (image-to-text)

This task involves translating images of scientific PDFs to markdown, enabling easier access to them. See here for more information.

import { pipeline } from '@xenova/transformers';

// Create image-to-text pipeline
let pipe = await pipeline('image-to-text', 'Xenova/nougat-small');

// Generate markdown
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/nougat_paper.png';
let output = await pipe(url, {
  min_length: 1,
  max_new_tokens: 40,
  bad_words_ids: [[pipe.tokenizer.unk_token_id]],
});
// [{ generated_text: "# Nougat: Neural Optical Understanding for Academic Documents\n\nLukas Blecher\n\nCorrespondence to: [email protected]\n\nGuillem Cucur" }]

💻 New architectures: Nougat, DPT, GLPN, OwlViT

We added support for 4 new architectures, bringing the total up to 61!

DPT for depth estimation. See here for the list of available models.
GLPN for depth estimation. See here for the list of available models.
OwlViT for zero-shot object detection. See here for the list of available models.
Nougat for optical understanding of academic documents (image-to-text). See here for the list of available models.

🔨 Other improvements

Add support for Grouped Query Attention on Llama Model by @felladrin in https://github.com/xenova/transformers.js/pull/393
Implement max character check by @samlhuillier in https://github.com/xenova/transformers.js/pull/398
Add CLIPFeatureExtractor (and tests) in https://github.com/xenova/transformers.js/pull/387
Add jsDelivr stats to README in https://github.com/xenova/transformers.js/pull/395
Update sharp dependency version in https://github.com/xenova/transformers.js/pull/400

🐛 Bug fixes

Move tensor clone to fix Worker ownership NaN issue by @kungfooman in https://github.com/xenova/transformers.js/pull/404
Add default token_type_ids for multilingual-e5-* models by @do-me in https://github.com/xenova/transformers.js/pull/403
Ensure WASM fallback does not crash in GH actions in https://github.com/xenova/transformers.js/pull/402

🤗 New contributors

@felladrin made their first contribution in https://github.com/xenova/transformers.js/pull/393
@samlhuillier made their first contribution in https://github.com/xenova/transformers.js/pull/398
@do-me made their first contribution in https://github.com/xenova/transformers.js/pull/403

Full Changelog: https://github.com/xenova/transformers.js/compare/2.8.0...2.9.0

transformers.js - 2.8.0

Published by xenova 11 months ago

What's new?

🖼️ New task: Image-to-image

This release adds support for image-to-image translation (e.g., super-resolution) with Swin2SR models.

Side-by-side (full)	Animated (zoomed)

As always, you can get started in just a few lines of code!

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/spaces/jjourney1125/swin2sr/resolve/main/testsets/real-inputs/0855.jpg';
let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-compressed-sr-x4-48');
let output = await upscaler(url);
// RawImage {
//   data: Uint8Array(12582912) [165, 166, 163, ...],
//   width: 2048,
//   height: 2048,
//   channels: 3
// }

💻 New architectures: TrOCR, Swin2SR, Mistral, and Falcon

We also added support for 4 new architectures, bringing the total up to 57! 🤯

TrOCR for optical character recognition (OCR).

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/handwriting.jpg';
let captioner = await pipeline('image-to-text', 'Xenova/trocr-small-handwritten');
let output = await captioner(url);
// [{ generated_text: 'Mr. Brown commented icily.' }]

Added in https://github.com/xenova/transformers.js/pull/375. See here for the list of available models.

Swin2SR for super-resolution and image restoration.

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/butterfly.jpg';
let upscaler = await pipeline('image-to-image', 'Xenova/swin2SR-classical-sr-x2-64');
let output = await upscaler(url);
// RawImage {
//   data: Uint8Array(786432) [ 41, 31, 24,  43, ... ],
//   width: 512,
//   height: 512,
//   channels: 3
// }

Added in https://github.com/xenova/transformers.js/pull/381. See here for the list of available models.

Mistral and Falcon for text-generation. Added in https://github.com/xenova/transformers.js/pull/379.
Note: Other than testing models, we haven't yet converted any of the larger (≥7B parameter) models. Stay tuned for more updates on this!

🐛 Bug fixes:

By default, do not add special tokens at start of text-generation (see commit)
Fix Firefox bug when displaying progress events while reading file from browser cache in https://github.com/xenova/transformers.js/pull/374. Thanks to @felladrin for reporting this issue!
Fix text2text-generation pipeline output inconsistency w/ python library in https://github.com/xenova/transformers.js/pull/384

🔨 Minor improvements:

Upgrade typescript dependency version by @Kit-p in https://github.com/xenova/transformers.js/pull/368
Improve docs in https://github.com/xenova/transformers.js/pull/385

🤗 New Contributors

@Kit-p made their first contribution in https://github.com/xenova/transformers.js/pull/368

Full Changelog: https://github.com/xenova/transformers.js/compare/2.7.0...2.8.0

transformers.js - 2.7.0

Published by xenova 12 months ago

What's new?

🗣️ New task: Text to speech/audio

Due to popular demand, we've added text-to-speech support to Transformers.js! 😍

https://github.com/xenova/transformers.js/assets/26504141/9fa5131d-0e07-47fa-9a13-122c1b69d233

You can get started in just a few lines of code!

import { pipeline } from '@xenova/transformers';

let speaker_embeddings = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/speaker_embeddings.bin';
let synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts', { quantized: false });
let out = await synthesizer('Hello, my dog is cute', { speaker_embeddings });
// {
//   audio: Float32Array(26112) [-0.00005657337896991521, 0.00020583874720614403, ...],
//   sampling_rate: 16000
// }

You can then save the audio to a .wav file with the wavefile package:

import wavefile from 'wavefile';
import fs from 'fs';

let wav = new wavefile.WaveFile();
wav.fromScratch(1, out.sampling_rate, '32f', out.audio);
fs.writeFileSync('out.wav', wav.toBuffer());

Alternatively, you can play the file in your browser (see below).

Don't like the speaker's voice? Well, you can choose another from the >7000 speaker embeddings in the CMU Arctic dataset (see here)!

Note: currently, we only support TTS w/ speecht5, but in future we'll add others like bark and MMS!

🖥️ TTS demo and example app

To showcase the power of in-browser TTS, we're also releasing a simple example app (demo, code). Feel free to make improvements to it... and if you do (or end up building your own), please tag me on Twitter! 🤗

https://github.com/xenova/transformers.js/assets/26504141/98adea31-b002-403b-ba9d-1edcc7e7bf11

Misc. changes

Update falcon tokenizer in https://github.com/xenova/transformers.js/pull/344
Add more links to example section in https://github.com/xenova/transformers.js/pull/343
Improve electron example template in https://github.com/xenova/transformers.js/pull/342
Update example app dependencies in https://github.com/xenova/transformers.js/pull/347
Do not post-process < and > symbols generated from docs in https://github.com/xenova/transformers.js/pull/335

Full Changelog: https://github.com/xenova/transformers.js/compare/2.6.2...2.7.0

transformers.js - 2.6.2

Published by xenova about 1 year ago

What's new?

📝 New task: Document Question Answering

Document Question Answering is the task of answering questions based on an image of a document. Document Question Answering models take a (document, question) pair as input and return an answer in natural language. Check out the docs for more info!

// npm i @xenova/transformers
import { pipeline } from '@xenova/transformers';

let image = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/invoice.png';
let question = 'What is the invoice number?';

// Create document question answering pipeline
let qa_pipeline = await pipeline('document-question-answering', 'Xenova/donut-base-finetuned-docvqa');

// Run the pipeline
let output = await qa_pipeline(image, question);
// [{ answer: 'us-001' }]

🤖 New models

Add support for DonutSwin models in https://github.com/xenova/transformers.js/pull/320
Add support for Blenderbot and BlenderbotSmall in https://github.com/xenova/transformers.js/pull/292
Add support for LongT5 models https://github.com/xenova/transformers.js/pull/316

💻 New example application

In-browser semantic image search in https://github.com/xenova/transformers.js/pull/326 (demo, code, tweet)

https://github.com/xenova/transformers.js/assets/26504141/c2ea6e69-2344-401e-8745-fdea3a0613ad

🐛 Misc. improvements

Fixing more _call LSP errors + extra typings by @kungfooman in https://github.com/xenova/transformers.js/pull/304
Remove CustomCache requirement for example browser extension project in https://github.com/xenova/transformers.js/pull/325

Full Changelog: https://github.com/xenova/transformers.js/compare/2.6.1...2.6.2

transformers.js - 2.6.1

Published by xenova about 1 year ago

What's new?

Add Vanilla JavaScript tutorial by @perborgen in https://github.com/xenova/transformers.js/pull/271. This includes an interactive video tutorial ("scrim"), which walks you through the code! Let us know if you want to see more of these video tutorials! 🤗
Add support for min_length and min_new_tokens generation parameters in https://github.com/xenova/transformers.js/pull/308
Fix issues with minification in https://github.com/xenova/transformers.js/pull/307
Fix ByteLevel pretokenizer and improve whisper test cases in https://github.com/xenova/transformers.js/pull/287
Misc. documentation improvements by @rubiagatra in https://github.com/xenova/transformers.js/pull/293

New Contributors

@rubiagatra made their first contribution in https://github.com/xenova/transformers.js/pull/293

Full Changelog: https://github.com/xenova/transformers.js/compare/2.6.0...2.6.1

transformers.js - 2.6.0

Published by xenova about 1 year ago

What's new?

🤯 14 new architectures

In this release, we've added a ton of new architectures: BLOOM, MPT, BeiT, CamemBERT, CodeLlama, GPT NeoX, GPT-J, HerBERT, mBART, mBART-50, OPT, ResNet, WavLM, and XLM. This brings the total number of supported architectures up to 46! Here's some example code to help you get started:

Text-generation with MPT (models):

import { pipeline } from '@xenova/transformers';
const generator = await pipeline('text-generation', 'Xenova/ipt-350m', {
    quantized: false, // using unquantized to ensure it matches python version
});

const output = await generator('La nostra azienda');
// { generated_text: "La nostra azienda è specializzata nella vendita di prodotti per l'igiene orale e per la salute." }

Other text-generation models: BLOOM, GPT-NeoX, CodeLlama, GPT-J, OPT.

CamemBERT for masked language modelling, text classification, token classification, question answering, and feature extraction (models). For example:

import { pipeline } from '@xenova/transformers';
let pipe = await pipeline('token-classification', 'Xenova/camembert-ner-with-dates');
let output = await pipe("Je m'appelle jean-baptiste et j'habite à montréal depuis fevr 2012");
// [
//   { entity: 'I-PER', score: 0.9258053302764893, index: 5, word: 'jean' },
//   { entity: 'I-PER', score: 0.9048717617988586, index: 6, word: '-' },
//   { entity: 'I-PER', score: 0.9227054119110107, index: 7, word: 'ba' },
//   { entity: 'I-PER', score: 0.9385354518890381, index: 8, word: 'pt' },
//   { entity: 'I-PER', score: 0.9139659404754639, index: 9, word: 'iste' },
//   { entity: 'I-LOC', score: 0.9877734780311584, index: 15, word: 'montré' },
//   { entity: 'I-LOC', score: 0.9891639351844788, index: 16, word: 'al' },
//   { entity: 'I-DATE', score: 0.9858269691467285, index: 18, word: 'fe' },
//   { entity: 'I-DATE', score: 0.9780661463737488, index: 19, word: 'vr' },
//   { entity: 'I-DATE', score: 0.980688214302063, index: 20, word: '2012' }
// ]

WavLM for feature-extraction (models). For example:

import { AutoProcessor, AutoModel, read_audio } from '@xenova/transformers';

// Read and preprocess audio
const processor = await AutoProcessor.from_pretrained('Xenova/wavlm-base');
const audio = await read_audio('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav', 16000);
const inputs = await processor(audio);

// Run model with inputs
const model = await AutoModel.from_pretrained('Xenova/wavlm-base');
const output = await model(inputs);
// {
//   last_hidden_state: Tensor {
//     dims: [ 1, 549, 768 ],
//     type: 'float32',
//     data: Float32Array(421632) [-0.349443256855011, -0.39341306686401367,  0.022836603224277496, ...],
//     size: 421632
//   }
// }

MBart +MBart50 for multilingual translation (models). For example:

import { pipeline } from '@xenova/transformers';
let translator = await pipeline('translation', 'Xenova/mbart-large-50-many-to-many-mmt');
let output = await translator('संयुक्त राष्ट्र के प्रमुख का कहना है कि सीरिया में कोई सैन्य समाधान नहीं है', {
  src_lang: 'hi_IN', // Hindi
  tgt_lang: 'fr_XX', // French
});
// [{ translation_text: 'Le chef des Nations affirme qu 'il n 'y a military solution in Syria.' }]

See here for the full list of languages and their corresponding codes.

BeiT for image classification (models):

import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let pipe = await pipeline('image-classification', 'Xenova/beit-base-patch16-224');
let output = await pipe(url);
// [{ label: 'tiger, Panthera tigris', score: 0.7168469429016113 }]

ResNet for image classification (models):

import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let pipe = await pipeline('image-classification', 'Xenova/resnet-50');
let output = await pipe(url);
// [{ label: 'tiger, Panthera tigris', score: 0.7576608061790466 }]

😍 Over 150 newly-converted models

To get started with these new architectures (and expand coverage for other models), we're releasing over 150 new models on the Hugging Face Hub! Check out the full list here.

🏋️ HUGE reduction in model sizes (up to -40%)

Thanks to a recent update of 🤗 Optimum, we were able to remove duplicate weights across various models. In some cases, like whisper-tiny's decoder, this resulted in a 40% reduction in size! Here are some improvements we saw:

Whisper-tiny decoder: 50MB → 30MB (-40%)
NLLB decoder: 732MB → 476MB (-35%)
bloom: 819MB → 562MB (-31%)
T5 decoder: 59MB → 42MB (-28%)
distilbert-base: 91MB → 68MB (-25%)
bart-base decoder: 207MB → 155MB (-25%)
roberta-base: 165MB → 126MB (-24%)
gpt2: 167MB → 127MB (-24%)
bert-base: 134MB → 111MB (-17%)
many more!

Play around with some of the smaller whisper models (for automatic speech recognition) here!

whisper-smaller-models

Other

Transformers.js integration with LangChain JS (docs)

import { HuggingFaceTransformersEmbeddings } from "langchain/embeddings/hf_transformers";

const model = new HuggingFaceTransformersEmbeddings({
  modelName: "Xenova/all-MiniLM-L6-v2",
});

/* Embed queries */
const res = await model.embedQuery(
  "What would be a good company name for a company that makes colorful socks?"
);
console.log({ res });
/* Embed documents */
const documentRes = await model.embedDocuments(["Hello world", "Bye bye"]);
console.log({ documentRes });

Refactored PreTrainedModel to require significantly less code when adding new models
Typing improvements by @kungfooman

transformers.js - 2.5.4

Published by xenova about 1 year ago

What's new?

Add support for 3 new vision architectures (Swin, DeiT, Yolos) in https://github.com/xenova/transformers.js/pull/262. Check out the Hugging Face Hub to see which models you can use!

Swin for image classification. e.g.:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let classifier = await pipeline('image-classification', 'Xenova/swin-base-patch4-window7-224-in22k');
let output = await classifier(url, { topk: null });
// [
//   { label: 'Bengal_tiger', score: 0.2258443683385849 },
//   { label: 'tiger, Panthera_tigris', score: 0.21161635220050812 },
//   { label: 'predator, predatory_animal', score: 0.09135803580284119 },
//   { label: 'tigress', score: 0.08038495481014252 },
//   // ... 21838 more items
// ]

DeiT for image classification. e.g.,:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg';
let classifier = await pipeline('image-classification', 'Xenova/deit-tiny-distilled-patch16-224');
let output = await classifier(url);
// [{ label: 'tiger, Panthera tigris', score: 0.9804046154022217 }]

Yolos for object detection. e.g.,:

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/cats.jpg';
let detector = await pipeline('object-detection', 'Xenova/yolos-small-300');
let output = await detector(url);
// [
//   { label: 'remote', score: 0.9837935566902161, box: { xmin: 331, ymin: 80, xmax: 367, ymax: 192 } },
//   { label: 'cat', score: 0.94994056224823, box: { xmin: 8, ymin: 57, xmax: 316, ymax: 470 } },
//   { label: 'couch', score: 0.9843178987503052, box: { xmin: 0, ymin: 0, xmax: 639, ymax: 474 } },
//   { label: 'remote', score: 0.9704685211181641, box: { xmin: 39, ymin: 71, xmax: 179, ymax: 114 } },
//   { label: 'cat', score: 0.9921762943267822, box: { xmin: 339, ymin: 17, xmax: 642, ymax: 380 } }
// ]

Documentation improvements by @perborgen in https://github.com/xenova/transformers.js/pull/261

New contributors 🤗

@perborgen made their first contribution in https://github.com/xenova/transformers.js/pull/261

Full Changelog: https://github.com/xenova/transformers.js/compare/2.5.3...2.5.4

transformers.js - 2.5.3

Published by xenova about 1 year ago

What's new?

Fix whisper timestamps for non-English languages in https://github.com/xenova/transformers.js/pull/253
Fix caching for some LFS files from the Hugging Face Hub in https://github.com/xenova/transformers.js/pull/251
Improve documentation (w/ example code and links) in https://github.com/xenova/transformers.js/pull/255 and https://github.com/xenova/transformers.js/pull/257. Thanks @josephrocca for helping with this!

New contributors 🤗

@josephrocca made their first contribution in https://github.com/xenova/transformers.js/pull/257

Full Changelog: https://github.com/xenova/transformers.js/compare/2.5.2...2.5.3

transformers.js - 2.5.2

Published by xenova about 1 year ago

What's new?

Add audio-classification with MMS and Wav2Vec2 in https://github.com/xenova/transformers.js/pull/220. Example usage:

// npm i @xenova/transformers
import { pipeline } from '@xenova/transformers';

// Create audio classification pipeline
let classifier = await pipeline('audio-classification', 'Xenova/mms-lid-4017');

// Run inference
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jeanNL.wav';
let output = await classifier(url);
// [
//   { label: 'fra', score: 0.9995712041854858 },
//   { label: 'hat', score: 0.00003788191679632291 },
//   { label: 'lin', score: 0.00002646935718075838 },
//   { label: 'hun', score: 0.000015628289474989288 },
//   { label: 'bre', score: 0.000007014674793026643 }
// ]

Adds automatic-speech-recognition for Wav2Vec2 models in https://github.com/xenova/transformers.js/pull/220 (MMS coming soon).
Add support for multi-label classification problem type in https://github.com/xenova/transformers.js/pull/249. Thanks @KiterWork for reporting!
Add M2M100 tokenizer in https://github.com/xenova/transformers.js/pull/250. Thanks @AAnirudh07 for the feature request!
Documentation improvements

New Contributors

@celsodias12 made their first contribution in https://github.com/xenova/transformers.js/pull/247

Full Changelog: https://github.com/xenova/transformers.js/compare/2.5.1...2.5.2

transformers.js - 2.5.1

Published by xenova about 1 year ago

What's new?

Add support for Llama/Llama2 models in https://github.com/xenova/transformers.js/pull/232
Tokenization performance improvements in https://github.com/xenova/transformers.js/pull/234 (+ The Tokenizer Playground example app)
Add support for DeBERTa/DeBERTa-v2 models in https://github.com/xenova/transformers.js/pull/244
Documentation improvements for zero-shot-classification pipeline (link)

Full Changelog: https://github.com/xenova/transformers.js/compare/2.5.0...2.5.1

transformers.js - 2.5.0

Published by xenova about 1 year ago

What's new?

Support for computing CLIP image and text embeddings separately (https://github.com/xenova/transformers.js/pull/227)

You can now compute CLIP text and vision embeddings separately, allowing for faster inference when you only need to query one of the modalities. We've also released a demo application for semantic image search to showcase this functionality.

Example: Compute text embeddings with CLIPTextModelWithProjection.

import { AutoTokenizer, CLIPTextModelWithProjection } from '@xenova/transformers';

// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const text_model = await CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');

// Run tokenization
let texts = ['a photo of a car', 'a photo of a football match'];
let text_inputs = tokenizer(texts, { padding: true, truncation: true });

// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
//   dims: [ 2, 512 ],
//   type: 'float32',
//   data: Float32Array(1024) [ ... ],
//   size: 1024
// }

Example: Compute vision embeddings with CLIPVisionModelWithProjection.

import { AutoProcessor, CLIPVisionModelWithProjection, RawImage} from '@xenova/transformers';

// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');

// Read image and run processor
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);

// Compute embeddings
const { image_embeds } = await vision_model(image_inputs);
// Tensor {
//   dims: [ 1, 512 ],
//   type: 'float32',
//   data: Float32Array(512) [ ... ],
//   size: 512
// }

Improved browser extension example/template (https://github.com/xenova/transformers.js/pull/196)

We've updated the source code for our example browser extension, making the following improvements:

Custom model caching - meaning you don't need to ship the weights of the model with the extension. In addition to a smaller bundle size, when the user updates, they won't need to redownload the weights!
Use ES6 module syntax (vs. CommonJS) - much cleaner code!
Persistent service worker - fixed an issue where the service worker would go to sleep after a portion of inactivity.

Summary of updates since last minor release (2.4.0):

(2.4.1) Improved documentation
(2.4.2) Support for private/gated models (https://github.com/xenova/transformers.js/pull/202)
(2.4.3) Example Next.js applications (https://github.com/xenova/transformers.js/pull/211) + MPNet model support (https://github.com/xenova/transformers.js/pull/221)
(2.4.4) StarCoder models + example application (release; demo + source code)

Misc bug fixes and improvements

Fixed floating-point-precision edge-case for resizing images
Fixed RawImage.save()
BPE tokenization for weird whitespace characters (https://github.com/xenova/transformers.js/pull/208)

transformers.js - 2.4.4

Published by xenova about 1 year ago

What's new?

New model: StarCoder (Xenova/starcoderbase-1b and Xenova/tiny_starcoder_py)
In-browser code completion example application (demo and source code)

Full Changelog: https://github.com/xenova/transformers.js/compare/2.4.3...2.4.4

transformers.js - 2.4.3

Published by xenova about 1 year ago

What's new?

Example next.js applications in https://github.com/xenova/transformers.js/pull/211
- Tutorial
- Demo: client-side or server-side
- Source code: client-side or server-side
Add support for mpnet models by @xenova in https://github.com/xenova/transformers.js/pull/221

Full Changelog: https://github.com/xenova/transformers.js/compare/2.4.2...2.4.3

transformers.js - 2.4.2

Published by xenova about 1 year ago

What's new?

Add support for private/gated model access by @xenova in https://github.com/xenova/transformers.js/pull/202
Fix BPE tokenization for weird whitespace characters by @xenova in https://github.com/xenova/transformers.js/pull/208
- Thanks to @fozziethebeat for reporting and helping to debug
Minor documentation improvements

Full Changelog: https://github.com/xenova/transformers.js/compare/2.4.1...2.4.2

transformers.js - 2.4.1

Published by xenova over 1 year ago

What's new?

Minor bug fixes

Fix padding and truncation of long sequences in certain pipelines by @xenova in https://github.com/xenova/transformers.js/pull/190
Object-detection pipeline improvements + better documentation by @xenova in https://github.com/xenova/transformers.js/pull/189

Full Changelog: https://github.com/xenova/transformers.js/compare/2.4.0...2.4.1

transformers.js - 2.4.0

Published by xenova over 1 year ago

What's new?

Word-level timestamps for Whisper automatic-speech-recognition 🤯

This release adds the ability to predict word-level timestamps for our whisper automatic-speech-recognition models by analyzing the cross-attentions and applying dynamic time warping. Our implementation is adapted from this PR, which added this functionality to the 🤗 transformers Python library.

Example usage: (see docs)

import { pipeline } from '@xenova/transformers';

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', {
    revision: 'output_attentions',
});
let output = await transcriber(url, { return_timestamps: 'word' });
// {
//   "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
//   "chunks": [
//     { "text": " And", "timestamp": [0, 0.78] },
//     { "text": " so", "timestamp": [0.78, 1.06] },
//     { "text": " my", "timestamp": [1.06, 1.46] },
//     ...
//     { "text": " for", "timestamp": [9.72, 9.92] },
//     { "text": " your", "timestamp": [9.92, 10.22] },
//     { "text": " country.", "timestamp": [10.22, 13.5] }
//   ]
// }

Note: For now, you need to choose the output_attentions revision (see above). In future, we may merge these models into the main branch. Also, we currently do not have exports for the medium and large models, simply because I don't have enough RAM to do the export myself (>25GB needed) 😅 ... so, if you would like to use our conversion script to do the conversion yourself, please make a PR on the hub with these new models (under a new output_attentions branch)!

From our testing, the JS implementation exactly matches the output produced by the Python implementation (when using the same model of course)! 🥳

Python (left) vs. JavaScript (right)

I'm excited to see what you all build with this! Please tag me on twitter if you use it in your project - I'd love to see! I'm also planning on adding this as an option to whisper-web, so stay tuned! 🚀

Misc bug fixes and improvements

Fix loading of grayscale images in node.js (#178)

transformers.js - 2.3.1

Published by xenova over 1 year ago

What's new?

New models and tokenizers

Models:
- MobileViT for image classification
- Roberta for token classification (thanks @julien-c)
- XLMRoberta for masked language modelling, sequence classification, token classification, and question answering
Tokenizers: FalconTokenizer, GPTNeoXTokenizer

Improved documentation

Details on how to discover and share transformers.js models on the hub (link)
Example text-generation code (link)
Example image-classification code (link)

Misc bug fixes

Fix conversion to grayscale (commit)
Aligned .generate() function output with original python implementation
Fix issue with non-greedy samplers
Use WASM SIMD on iOS != 16.4.x (thanks @lsb)

New Contributors

@julien-c made their first contribution in https://github.com/xenova/transformers.js/pull/170
@lsb made their first contribution in https://github.com/xenova/transformers.js/pull/174

Full Changelog: https://github.com/xenova/transformers.js/compare/2.3.0...2.3.1

transformers.js - 2.3.0

Published by xenova over 1 year ago

What's new?

Improved 🤗 Hub integration and model discoverability!

All Transformers.js-compatible models are now displayed with a super cool tag! To indicate your model is compatible with the library, simply add the "transformers.js" library tag in your README (example).

This also means you can now search for and filter these models by task!

For example,

https://huggingface.co/models?library=transformers.js lists all Transformers.js models
https://huggingface.co/models?library=transformers.js&pipeline_tag=feature-extraction lists all models which can be used in the feature-extraction pipeline!

And lastly, clicking the "Use in Transformers.js" button will show some sample code for how to use the model!

Chroma 🤝 Transformers.js

You can now use all Transformers.js-compatible feature-extraction models for embeddings computation directly in Chroma! For example:

const {ChromaClient, TransformersEmbeddingFunction} = require('chromadb');
const client = new ChromaClient();

// Create the embedder. In this case, I just use the defaults, but you can change the model,
// quantization, revision, or add a progress callback, if desired.
const embedder = new TransformersEmbeddingFunction({ /* Configuration goes here */ });

const main = async () => {
    // Empties and completely resets the database.
    await client.reset()

    // Create the collection
    const collection = await client.createCollection({name: "my_collection", embeddingFunction: embedder})

    // Add some data to the collection
    await collection.add({
        ids: ["id1", "id2", "id3"],
        metadatas: [{"source": "my_source"}, {"source": "my_source"},  {"source": "my_source"}],
        documents: ["I love walking my dog", "This is another document", "This is a legal document"],
    }) 
    
    // Query the collection
    const results = await collection.query({
        nResults: 2, 
        queryTexts: ["This is a query document"]
    }) 
    console.log(results)
    // {
    //     ids: [ [ 'id2', 'id3' ] ],
    //     embeddings: null,
    //     documents: [ [ 'This is another document', 'This is a legal document' ] ],
    //     metadatas: [ [ [Object], [Object] ] ],
    //     distances: [ [ 1.0109775066375732, 1.0756263732910156 ] ]
    // }
}

main();

Better alignment with python library for calling decoder-only models

You can now call decoder-only models loaded via AutoModel.from_pretrained(...):

import { AutoModel, AutoTokenizer } from '@xenova/transformers';

// Choose model to use
let model_id = "Xenova/gpt2";

// Load model and tokenizer
let tokenizer = await AutoTokenizer.from_pretrained(model_id);
let model = await AutoModel.from_pretrained(model_id);

// Tokenize text and call
let model_inputs = await tokenizer('Once upon a time');
let output = await model(model_inputs);

console.log(output);
// {
//     logits: Tensor {
//         dims: [ 1, 4, 50257 ],
//         type: 'float32',
//         data: Float32Array(201028) [
//             -20.166624069213867, -19.662782669067383, -23.189680099487305,
//             ...
//         ],
//         size: 201028
//     },
//     past_key_values: { ... }
// }

Examples for computing perplexity: https://github.com/xenova/transformers.js/issues/137#issuecomment-1595496161

More accurate quantization parameters for whisper models

We've updated the quantization parameters used for the pre-converted whisper models on the hub. You can test them out with whisper web! Thanks to @jozefchutka for reporting this issue.

Thanks to @jozefchutka for reporting this issue!

Misc bug fixes and improvements

Do not use spread operator to concatenate large arrays (https://github.com/xenova/transformers.js/pull/154)
Set chunk timestamp to rounded time by @PushpenderSaini0 (https://github.com/xenova/transformers.js/pull/160)

transformers.js - 2.2.0

Published by xenova over 1 year ago

What's new?

Multilingual speech recognition and translation w/ Whisper

You can now transcribe and translate speech for over 100 different languages, directly in your browser, with Whisper! Play around with our demo application here.

Example: Transcribe English.

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
let output = await transcriber(url);
// { text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country." }

Example: Transcribe English w/ timestamps.

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
let output = await transcriber(url, { return_timestamps: true });
// {
//   text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country."
//   chunks: [
//     { timestamp: [0, 8],  text: " And so my fellow Americans ask not what your country can do for you" }
//     { timestamp: [8, 11], text: " ask what you can do for your country." }
//   ]
// }

Example: Transcribe French.

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/french-audio.mp3';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small');
let output = await transcriber(url, { language: 'french', task: 'transcribe' });
// { text: " J'adore, j'aime, je n'aime pas, je déteste." }

Example: Translate French to English.

let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/french-audio.mp3';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small');
let output = await transcriber(url, { language: 'french', task: 'translate' });
// { text: " I love, I like, I don't like, I hate." }

Misc

Aligned .generate() function with original python implementation
Minor improvements to documentation (+ some examples). More to come in the future.

Full Changelog: https://github.com/xenova/transformers.js/compare/2.1.1...2.2.0

transformers.js - 2.1.1

Published by xenova over 1 year ago

Minor patch for v2.1.0 to fix an issue with browser caching.

Package Rankings

Top 1.41% on Npmjs.org