node-tesseract-ocr

A Node.js wrapper for the Tesseract OCR API

MIT License

Downloads
64.1K
Stars
304
Committers
5

Tesseract OCR for Node.js

Installation

First, you need to install the Tesseract project. Instructions for installing Tesseract for all platforms can be found on the project site. On Debian/Ubuntu:

apt-get install tesseract-ocr

After you've installed Tesseract, you can go installing the npm-package:

npm install node-tesseract-ocr

Usage

const tesseract = require("node-tesseract-ocr")

const config = {
  lang: "eng", // default
  oem: 3,
  psm: 3,
}

async function main() {
  try {
    const text = await tesseract.recognize("image.jpg", config)
    console.log("Result:", text)
  } catch (error) {
    console.log(error.message)
  }
}

main()

Also you can pass URL:

const img = "https://tesseract.projectnaptha.com/img/eng_bw.png"
const text = await tesseract.recognize(img)

or Buffer:

const tesseract = require("node-tesseract-ocr")
const fs = require("fs/promises")

async function main() {
  const img = await fs.readFile("image.jpg")
  const text = await tesseract.recognize(img)

  console.log("Result:", text)
}

If you want to process multiple images in a single run, then pass an array:

const images = ["./samples/file1.png", "./samples/file2.png"]
const text = await tesseract.recognize(images)

In the config object you can pass any OCR options. Also you can pass here any control parameters or use ready-made sets of config files (like hocr):

await tesseract.recognize("image.jpg", {
  load_system_dawg: 0,
  tessedit_char_whitelist: "0123456789",
  presets: ["tsv"],
})

Alternatives

If you want to use Tesseract in the browser, choose Tesseract.js package, which compiles original Tesseract from C to JavaScript WebAssembly. You can also use it in Node.js, but the performance may not be as good.