ChatGPT Markdown Translator: CLI to translate Markdown docs using ChatGPT API
MIT License
This tool translates a Markdown file into another (natural) language by using OpenAI's ChatGPT API. Its main purpose is to help people translate docs sites for open-source libraries.
As an example, this is React Tutorial translated into Japanese with this tool: チュートリアル:三目並べ
As compared to other translation services, ChatGPT is well suited for translating technical documents for several reasons:
This tool itself is free, but you will be charged according to OpenAI's pricing page. Even if you're a ChatGPT Plus subscriber, access to the API is not free.
Make sure you have a recent version of Node.js installed.
Run npm install --global chatgpt-md-translator
.
Go to OpenAI's developer section, sign up, register your own credit card, and generate an API key.
Create a config file in the dotenv format. Use this template and save it to one of the following locations ($CWD refers to the directory where this tool will be invoked).
$CWD/.chatgpt-md-translator
$CWD/.env
$HOME/.config/chatgpt-md-translator/config
$HOME/.chatgpt-md-translator
At least, you need to specify your API key as OPENAI_API_KEY
.
Create a prompt file. You can copy this prompt-example.md
to one of the following locations and edit its contents. At least, you need to specify the language name. The contents of this file will be included in each API call, so you can write instructions to ChatGPT in a natural language.
$CWD/prompt.md
$CWD/.prompt.md
$HOME/.config/chatgpt-md-translator/prompt.md
$HOME/.chatgpt-md-translator-prompt.md
Run chatgpt-md-translator [file-to-translate.md]
. By default, the Markdown file will be overwritten, so make sure it is in a VCS.
In addition to OPENAI_API_TOKEN
, you can set several values in the config file. Only important ones are described below. See the .env-example
for the full list.
MODEL_NAME
)This is the setting that has the greatest impact on translation accuracy (and price!). Set this to one of the Chat models accepted by the OpenAI API.
gpt-4o
gpt-4-turbo
(4
)gpt-3.5-turbo
(3
)gpt-4
gpt-4-32k
(4large
)gpt-3.5-turbo-16k
(3large
)Shortcuts (in brackets) are available. Starting from v1.7.0, the shortcut 4
points to gpt-4-turbo
rather than gpt-4
. In the next version, it is likely that 4
will refer to gpt-4o
.
Although GPT-4 is much smarter, it is slower and much more expensive than GPT-3. Try using the GPT-3 model first, especially while you are experimenting with this tool. It's recommended to set the usage limit to a reasonable amount (e.g., $10) on the OpenAI's account management page.
FRAGMENT_TOKEN_SIZE
)Since ChatGPT cannot handle long texts, this program works by splitting a given Markdown file into multiple parts (fragments), passing them to the API along with the prompt in parallel, and combining the translated results. This option determines the (soft) maximum length of each fragment. The default is 2048 (in string .length
). The optimal value depends on several factors:
Setting a value that is too large can result in longer processing time, and in worse cases, the transfer of the translated text may stop midway. If this happens, the program will automatically split the fragment in half and try again recursively. Try to avoid this as it can waste your time and money.
On the other hand, splitting the text into too small fragments can result in a loss of term consistency or accuracy in the translation, since there is less context available for each translation process.
[!TIP] GPT-4 Turbo models support a massive context window, effectively allowing for unlimited prompt file size. However, since the output token size is still limited to 4,096, the size of the input text is limited accordingly. Splitting a long article remains a useful approach.
TEMPERATURE
)This controls the randomness of the output. See the official API docs. The default is 0.1
, which is intentionally much lower than the original ChatGPT API default (1.0
). Since this tool works by splitting a file, too much randomness can lead to inconsistent translations. Experience suggests that a high value also increases the risk of breaking markups or ignoring your instructions.
API_CALL_INTERVAL
)The tool will not call the API more frequently than this value (in seconds) to avoid these rate limits. The default is 0, which means the API tries to translate all fragments simultaneously. This is especially useful when you're translating a huge file or when you're still a free trial user.
CODE_BLOCK_PRESERVATION_LINES
)Code blocks usually don't need to be translated, so code blocks that are longer than the number of lines specified by this option will be replaced with a dummy string before being sent to the API, saving you time and money. They will be restored after translation.
Short code blocks (up to 5 lines by default) are sent as-is to give the API a better context for translation. If you want to replace all code blocks, specify 0
. If you don't want this feature (for example, if you want to translate comments in code examples), you can specify a large value like 1000
. But code blocks will never be split into fragments, so be mindful of the token limit!
OUTPUT_FILE_PATTERN
)By default, the content of the input file will be overwritten with the translated content. If you prefer to save the new content in a different directory or as a different file name, you can specify a pattern to transform the input path into the output path. For example, when the input file is /projects/abc/docs/tutorial/index.md
, you can set OUTPUT_FILE_PATTERN="{dir}/i18n/{basename}-es.md"
to write the translated content to /projects/abc/docs/tutorial/i18n/index-es.md
. Ensure the target directory has been created beforehand. The following placeholders are available:
{dir}
: /projects/abc/docs/tutorial
(absolute directory path){filename}
: index.md
{basename}
: index
(filename without extension){main}
: /projects/abc/docs/tutorial/index
(dir + basename){ext}
: md
BASE_DIR
is defined in the config file:
{basedir}
: /projects/abc/docs
(the BASE_DIR
itself){reldir}
: tutorial
(relative to BASE_DIR
){relmain}
: tutorial/index
(relative to BASE_DIR
)Alternatively, you can directly specify the output file name in command line, like -o translated.md
or --out=translated.md
. The path will be relative to the current directory (or BASE_DIR
if it's defined in the config file).
If you are translating many files, consider using the OVERWRITE_POLICY
option as well to skip already translated files.
These can be used to override the settings in the config file.
Example: markdown-gpt-translator -m 4 -f 1000 learn/thinking-in-react.md
-m MODEL
, --model=MODEL
: Sets the language model.-f NUM
, --fragment-size=NUM
: Sets the fragment size (in string length).-t NUM
, --temperature=NUM
: Sets the "temperature", or the randomness of the output.-i NUM
, --interval=NUM
: Sets the API call interval.-o NAME
, --out=NAME
: Explicitly sets the output file name. If set, the OUTPUT_FILE_PATTERN
setting will be ignored.-w ARG
, --overwrite-policy=ARG
: Determines what happens when the output file already exists. One of "overwrite" (default), "skip", and "abort".