Pix2Text-Mac: A Mac desktop application for recognizing mathematical formulas

This project is a Mac local OCR application based on Pix2Text (no internet connection required). It can recognize mathematical formula images from the clipboard and convert them to their LaTeX representation, which can then be copied to the clipboard. Additionally, it supports text recognition (Text OCR) from general images.

Note ⚠️: This application is only available for MacOS.

The initial code of this project was forked from: horennel/LaTex-OCR_for_macOS. Special thanks to the author of this project.

Features

After opening the application, you can see the Pix2Text application icon in the Mac menu bar, as shown below. It includes OCR for 4 different modes.

1. `Text_Formula OCR`: Recognizing images with both formulas and text

This mode can recognize images containing both mathematical formulas and text. The recognition result is in Markdown format, which can be pasted into the Pix2Text Online Service to view the rendered result.

For example, it can recognize the following image (assets/mixed-en.jpg):

2. `Formula OCR`: Recognizing images with pure formulas

This mode can recognize images containing only mathematical formulas. The recognition result is in LaTeX format, which can be pasted into the Pix2Text Online Service to view the rendered result.

For example, it can recognize the following image (assets/math-formula-42.png):

3. `Text OCR`: Recognizing images with pure text

This mode can recognize images containing only text. The recognition result is in plain text.

For example, it can recognize the following image (assets/text.jpg):

4. `Page OCR`: Recognizing Page Screenshots with Complex Layouts

If an image contains complex layout structures, such as multi-column layouts or includes tables and other information, you can use this mode for recognition. This mode will additionally load the Layout Analysis and Table Recognition models from pix2text~=1.1 to recognize all information in the image and integrate the recognition results into Markdown format. You can paste the results into the Pix2Text web version to view the rendered results.

The recognition results will also be saved to a specified local folder. The folder location can be specified by the output_md_root_dir variable in the configuration file config.yaml, which defaults to the /tmp/output_mds folder. Additionally, the parsing results will be saved to a specified local folder. The folder location can be specified by the output_debug_dir variable in the configuration file config.yaml, which defaults to the /tmp/output_debugs folder. You can manually change the values of these two variables to specify the storage location.

For example, it can recognize the following image (assets/page.png):

Installation

1. Clone the repository:

git clone https://github.com/breezedeus/Pix2Text-Mac

2. Install dependencies:

pip install -r requirements.txt

If you want to recognize text images in languages other than Simplified Chinese and English, please run the following command to install additional dependencies:

pip install pix2text[multilingual]>=1.1.0.1

3. Verify the installation is working correctly

Use the following command to verify if the installed Pix2Text is working normally:

p2t predict -l en,ch_sim --resized-shape 768 --file-type page -i assets/page.png -o output-page --save-debug-res output-debug-page

4. Package the application:

python setup.py py2app -A

You can find the application Pix2Text.app in the generated dist folder. Double-click to open it, or move it to the Applications folder.

How to Use

Launch the application
- Start the Pix2Text.app application, and you will see the Pix2Text application icon in the menu bar.
- Click the On / Off button in the menu bar icon to ensure that the Mixed OCR, Formula OCR, and Mixed OCR buttons are lit up.
Take a screenshot
- Use any screenshot software, such as Snipaste, to capture and copy to the clipboard.
Recognition
- Recognize images with both mathematical formulas and text
  - Click the Text_Formula OCR button.
  - After successful recognition, you will receive a notification in the notification center.
- Recognize images with pure mathematical formulas
  - Click the Formula OCR button.
  - After successful recognition, you will receive a notification in the notification center.
- Recognize images with pure text
  - Click the Text OCR button.
  - After successful recognition, you will receive a notification in the notification center.
- To recognize screenshots of pages with complex layouts
  - Click on the Page OCR button.
  - After successful recognition, you will receive a notification in the notification center.
- If you do not want to receive notifications, you can turn them off in the system settings.
- After receiving a notification, you can paste the result into the Pix2Text Online Service to view the rendered result.
- You can modify the initialization configuration of Pix2Text by editing the configuration file config.yaml, such as which model to use and the path to the model. If you have purchased the premium models (which provides better results), you can refer to the content of pro-config.yaml to modify config.yaml.

Notes

The first time you start the application, it will download models and configuration files, resulting in a long startup time. Subsequent startups will return to normal speed.
The storage path for downloaded models and configuration files is ~/.cnstd, ~/.cnocr, and ~/.pix2text.
The application depends on the Python environment used during packaging. If the Python environment changes (e.g., the virtual environment used for packaging is deleted, the dependencies in the environment used for packaging are deleted or modified, or the Python environment on the computer is completely uninstalled), the application may not work properly and needs to be repackaged.

Acknowledgments

The initial code of this project was forked from: horennel/LaTex-OCR_for_macOS. Special thanks to the author of this project.
Pix2Text
pyperclip
rumps
py2app