tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

APACHE-2.0 License

Stars
2.2K
Committers
162