神经仓颉机

TL,DR: 用于分析汉字字形结构的简单神经网络，预设提供仓颉五代模型。

分析仓颉码的任务与 image caption 有一定相似性，故本代码主要基于 show, atten and tell 方法 [1]，部分代码借用于 [2]。同时仓库中还提供基于 Transformer 解码器的代码供参考。

需要 Python >= 3.7，其它详细依赖见 requirements.txt 或 environment.yaml。

结果示例：

预测

执行 inference.py 使用预训练模型进行预测（预训练模型见 releases）。预训练模型使用花园明朝字形，需要将 HanaMinA.ttf、HanaMinB.ttf 放置在 data/hanazono 下。

python inference.py --model data/cangjie5.pth

程序将进入命令行交互界面：

>> 拉
qyt

并将可视化结果保存至 result.png。

如果需要使用 CPU 进行计算：

python inference.py --model data/cangjie5.pth --use_cpu

其它命令行参数详见 --help。

训练

见 python train.py --help。默认配置大约需要 10GB 显存，训练花费约 4 小时。

默认使用的仓五码表来自 Jackchows/Cangjie5，去除了所有 X 与 Z 开头的编码。训练时随机 7:3 划分训练集与验证集。

训练进程：

绿色日志为 LSTM 解码器，橙色日志为 Transformer 解码器。LSTM 模型在随机划分的验证集中最高准确率（按单码计）可达 95.4%。

授权协议

data/Cangjie5.txt 文件授权协议见 data/LICENSE。

其余部分依照 WTFPL。

参考

[1] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

[2] https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

[3] Attention Is All You Need

Related Projects

Text_select_captcha

实现文字点选、选字、选择、点触验证码识别，基于pytorch训练

23 Aug 2020 1,270

Attention-OCR

Visual Attention based OCR

09 Jun 2016 1,111

UNIT

Unsupervised Image-to-Image Translation

06 Mar 2017 1,982

GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

31 May 2019 7,448

AttnGAN

19 Feb 2018 1,334

NeuralNLP-NeuralClassifier

An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

04 Jul 2019 1,830

image-captioning

24 Mar 2021 1

NeuralBabyTalk

Pytorch code of for our CVPR 2018 paper "Neural Baby Talk"

16 Dec 2017 523

UGATIT

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with...

26 Jul 2019 6,169

transformer-in-transformer

Implementation of Transformer in Transformer, pixel level attention paired with patch level atten...

02 Mar 2021 300

daanet

DAANet: Dual Ask-Answer Network for Machine Reading Comprehension

05 Sep 2018 145

GPT2-chitchat

GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI思想)

09 Dec 2019 2,978

GPT2-NewsTitle

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

16 Dec 2020 1,094

train-text-to-image-tpu-tutorial

20 Jan 2023 19

PaddlePaddle-CRNN

基于PaddlePaddle2.0实现的CRNN模型，文字识别

30 Mar 2021 22

neuro-cangjie