webvtt-to-json

Convert WebVTT to JSON, optionally removing duplicate lines

APACHE-2.0 License

Downloads
152
Stars
10
Committers
2

webvtt-to-json

Convert WebVTT to JSON, optionally removing duplicate lines

Installation

Install this tool using pip:

pip install webvtt-to-json

Usage

To output JSON for a WebVTT file:

webvtt-to-json subtitles.vtt

This will output to standard output. Use -o filename to send it to a specified file.

Subtitles can often include duplicate lines. Add -d or --dedupe to attempt to remove those duplicates from the output:

webvtt-to-json --dedupe subtitles.vtt

Use -s or --single to output single "line" keys instead of a "lines" array.

You can also use:

python -m webvtt_to_json ...

Output

Standard output:

[
    {
        "start": "00:00:00.000",
        "end": "00:00:01.829",
        "lines": [
            " ",
            "my<00:00:00.160><c> career</c><00:00:00.480><c> in</c><00:00:00.640><c> side</c><00:00:00.880><c> projects</c><00:00:01.280><c> and</c><00:00:01.520><c> open</c>"
        ]
    }
]

--dedupe output:

[
    {
        "start": "00:00:01.829",
        "end": "00:00:01.839",
        "lines": ["my career in side projects and open"]
    }
]

--dedupe --single output:

[
    {
        "start": "00:00:01.829",
        "end": "00:00:01.839",
        "line": "my career in side projects and open"
    }
]

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd webvtt-to-json
python -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest
Package Rankings
Top 18.17% on Pypi.org
Badges
Extracted from project README
PyPI Changelog Tests License