A tool for converting dictionary files aka glossaries. Mainly to help use our offline glossaries in any Open Source dictionary we like on any modern operating system / device.
GPL-3.0 License
4.6.0
Fix a bug causing broken installation if ~/.local/lib
is a symbolic link
site-packages
or any of its parents are a symbolic linkFix incompatibilty with Python 3.9 (despite documentation)
Fix scripts/entry-filters-doc.py
, scripts/plugin-doc.py
and doc/entry-filters.md
AppleDict: Fix typos in Chinese lannguage module
VERBOSITY
as default (a number from 0 to 5)AppleDict Binary: set html_full=True
by default
Update wcwidth
to 0.2.6
Add glos.stripFullHtml(errorHandler)
and use it in 3 plugins
StripFullHtml
and change entry.stripFullHtml()
to return errorRefactor entryFiltersRules
Remove empty plugin gettext_mo.py
Remove glos.titleElement
from glossary_v2.Glossary
glossary.Glossary
for compatibilityglossary.Glossary
is a wrapper (child class) on top on glossary_v2.Glossary
Update doc/entry-filters.md
to list some entry filters that were enabled conditionally (besides config)
Remove sdict.md
and sdict_source.md
(removed plugins)
GlossaryType
classmypy
errors on most of code base and some of pluginslist, dict, tuple, set
for type annotationsOptional[X]
with X or None
Published by ilius over 1 year ago
4.5.0
We now require Python 3.9 or a later version.
Fix exception in scripts/plugin-index.py
: 8a94b8c60cce50a21e229020970f085a0fb55fb0
StarDict: Fix writing to .zip
file produced empty zip, and fix bad test
dictunformat: fix #367: add option headword_separator
, default to ;
Fixes in ui_gtk, #380 #382 #403
AppleDict source: fix #407 missing quotes for title, and refactor duplicate codes
DictionaryForMIDs: remove |
from word when normalizing, fix punctuation regex, use Unix newlines
StarDict: use Unix newline when reading and writing .ifo file on Windows
Fix bug of glos.addEntryObj(dataEntry)
adding empty file because tmpDataDir
is not set until glos.read()
tmpDataDir
on glos.tmpDataDir
access, and add test, #424Fix scripts/wiki-formats.py
, #428
Dictd / Dict.org: fix exception on Windows
Support sorting by an ICU locale, see Sorting section of README
Add Gtk4 interface --ui=gtk4
/ --gtk4
Add flag --optimize-memory
, config key optimize_memory
--indirect
Allow plugin's reader.open()
to return an Iterator
for progress bar
KeyText.data
)Add read and write support for StarDict Textual File (.xml), #348
Add support for writing Yomichan dictionary files, #395 by @tomtung
StarDict reader: support .syn.dz
file, #410
StarDict writer: add write option large_file
, #392 #422
StarDict reader: support dxoffsetbits=64
on read, #392 #422
JMDict: support examples, #383
Add read support for JMnedict, #386
Add flag --skip-duplicate-headword
, config skip_duplicate_headword
, #365
skip_duplicate_words
, #365Add flag --trim-arabic-diacritics
, config trim_arabic_diacritics
, #366
Add read support for IUPAC goldbook (.xml), #355
Add write support for DIKT JSON
StarDict writer: limit memory usage by using SQLite for idx
and syn
data, #409
CSV: add newline option, defaulting to Unix-style
Aard2 Slob writer: add option file_size_approx_check_num_entries
Add scripts/diff-glossary
and scripts/view-glossary
When remove HTML tags, also replace <div>
with \n
, #394 by @tomtung
<div>
the same way <p>
is treated.Mobi: add mobi7-forcing
switch to kindlegen
command, #374 by @holyspiritomb
Octopus MDict: ignore directories with same_dir_data_files
, #362
StarDict reader: handle definitions with mixed types/formats
Dictfile: strip whitespaces from word and defi before going through entry filters
BGL: strip whitespaces from word and defi before going through entry filters
Improvement in glos.write
: avoid printing exception for invalid encoding
Remove empty logs in glos.convert
StarDict reader: fix validating sametypesequence
, and add test
glos.convert
: Allow an existing empty directory as output path
TextGlossaryReader
: replace nextPair
method with nextBlock
which returns resource files as third item
ui_cmd_interactive: allow converting several times before exiting
Change title tag for Greek from <big>
to <b>
Update language data set (langs.json
)
ui/main.py
: print 1-line error instead of full exception on ImportError
ui/main.py
: Windows: try Tkinter before Gtk
ebook_base.py
: avoid shutil.move
on Windows, #368
TextGlossaryReader
: fix loading info and some refactoring, #370 36b9cd83d4c79b32e34bf64c3101cb89093b2a4e
Entry
: Allow word
to be tuple
in Entry(word=...)
glos.iterInfo()
return Iterator
rather than Iterable
Zim: change dependency to libzim>=1.0
, and some comments
Mobi: work with kindlegen executable in PATH
directories, #401
ui: limit the length of option comments in Format Options dialog
ui_gtk: improvement: show (last) critical error on status bar
ui_gtk: set intial focus
ui_gtk: improvements in About tab
ui_tk: revert most ttk
widgets to tk
because the theme doesn't match
Add SVG icon, #414 by @proletarius101
Prevent exception/traceback on Ctrl+C
Optimize progress bar
Aard2 slob: show info log before and after slobWriter.finalize()
, #437
Remove read support for Wiktiomary Dump, #48
Remove support for Sdictionary Binary and Source
Support MDict V3 fomrat by updating readmdict
, #385 by @xiaoqiangwang
Fix files created without UUID in header, #387 by @xiaoqiangwang
Decode mdict title & description if they're bytes, #393 by @tomtung
readmdict
: Skip zlib decompress exceptions, #384
readmdict
: Use __name__
as logger name, and add 2 debug logs, #384
readmdict
: improve exception msg for xxhash, #385
<categ>
<iref>
<mrkd>
<etm>
<ex>
, #396.xdxf.gz
, .xdxf.bz2
, .xdxf.lzma
)--write-options=xsl=True
Fix css name on html_full=True
Fix using self._encoding
when should use utf-8
Fix internal links, #343
x-dictionary:d:
prefix from href
x-dictionary:r:
: use title if presentbword://
prefix to href
(unless it points to http/https)x-dictionary:r:
Add plistlib to dependencies
Add tests
Replace <entry ...>
with <div>
Fix bad exception formatting
Fixes from PR #436
Support morphology (alternates): #434 by @soshial
Support different AppleDict offsets, #417 by @soshial
Extract AppleDict meta-info (langs, title, author), #418 by @soshial
Progress Bar on open()
/ loading KeyText.data
Improve memory usage of loading KeyText.data
Replace appledict_bin.py
with appledict_bin
directory and more refactoring
glossary.py
)Lots of refactoring in glossary.py
Glossary
inherits fromIntroduce glossary_v2.py
, and maintain API backward-compatibility for glossary.py
(as far as documented)
Fix style errors using ruff
based on pyproject.toml configuration
Remove all usages of pyglossary.plugins.formats_common
Use str.startswith(tuple)
and str.endswith(tuple)
Reduce complexity of Glossary
methods
Rename entry filter strip
to trim_whitespaces
Some refactoring in StarDict reader
Use f-string equal syntax added in Python 3.8
Use str.removeprefix
and str.removesuffix
added in Python 3.9
langs/writing_system.py
:
iso
field to listgetAllWritingSystemsFromText
Split up TextGlossaryReader.loadInfo
method
plugin_manager.py
: make some methods private
Update plugins' documentation
Glossary: add comments about entryFilters
Update config.rst
Update doc/entry-filters.md
Update README.md
Update doc/sort-key.md
Update doc/pyicu.md
Update plugins/testformat.py
Add types for arguments and result of all functions/methods
Add types for r/w options in reader/writer classes
Fix a few incorrect type annotations
README.md
: Add document for adding data entries, #412
README.md
: Fix -> nixos command, #400 by @srghma
Update bgl_info.md and move it from pyglossary/plugins/babylon_bgl/
to doc/babylon/
Add test for DSL -> Tabfile conversion
dsl_test.py
: fix method names not starting with test_
StarDict reader: better testing for handling definitions with mixed types
StarDict writer: much better testing, coverage of stardict.py
: from %62 to %83
Refactoring and improvements in tests of Glossary, along with new tests
Add test for dictunformat -> Tabfile
AppleDict (source) tests: validate plist file contents
Allow forking and branching pyglossary-test
repo
Fix some failing tests on Windows
Slob: test file_size_approx
Test Tabfile -> SQL conversion
Test StarDict error/warning for sortKeyName with and without locale
Print useful messages for unhandled warnings
Improve logs
Add showDiff=False
arg to compareTextFiles
and convert
Update and refactor Dockerfile
and run-with-docker.sh
Dockerfile
: change WORKDIR
to /root/home
which is mapped to host's home dirrun-with-docker.sh
: create confDir
before docker build (to check the owner later)run-with-docker.sh
: accept version (image tag) as argument$HOME
to docker's user homeUpdate setup.py
Published by ilius over 2 years ago
Fix 2 log messages in glos._resolveConvertSortParams
Fixes and improvements in Dictfile (.df) reader
error in DataEntry.save: [Errno 2] No such file or directory: ...
because entry.save()
moves the temp file to output pathFix not cleaning up temp directory on return with error from glos.convert
ui_gtk: add a "General Options" button that opens a dialog for:
sort
and sortKey
save_info_json
, lower
, skip_resources
, rtl
, enable_alts
, cleanup
, remove_html_all
Add support for --sort-key random
to shuffle entries
Performance improvement: remove gc.collect()
calls in Glossary
and *EntryList
README.md
Do not import all plugin modules (only import two plugins that are used)
plugins-meta/index.json
insteadlangs.json
: add new 3-letter codes for 25 languagesglos.preventDuplicateWords
and glos.removeHtmlTagsAll
: prevent adding filter twiceglos.cleanup
: reset path list to avoid (non-critical) error if called againGlossary.init()
DataEntry.save
: on FileNotFoundError
show a 1-line error instead of log.exception
Glossary
object every time Convert button is clickedGlossary.init
tests/glossary_errors_test.py
Plugins: replace import of formats_common
from currect directory with pyglossary.plugins.formats_common
Fix logging.warn
method is deprecated, use warning
instead, PR #360 by @BoboTiG
Fix DeprecationWarning: invalid escape sequence
, PR #361 by @BoboTiG
Move some functions from glossary_utils.py
to compression.py
Move some methods from Glossary
to new parent classes PluginManager
and GlossaryInfo
Some refactoring in plugin_prop.py
and plugin_manager.py
plugin.pluginModule
to plugin.module
plugin.module
, plugin.readerClass
or plugin.writerClass
PluginProp
glossary.py
plugin_prop.py
: fix checking debug levelsq_entry_list.py
: rename sortColumns
to sqliteSortKey
Some refactoring around setSortKey
between Glossary
, EntryList
and SqEntryList
Remove Entry.sqliteSortKeyFrom
and related classmethods
Some more simplification in glossary.py
Remove Entry.defaultSortKey
Some style fixes
iter_utils.py
: remove unused key=
argument from unique_everseen
Refactor ui_gtk and update config comments
extractInlineHtmlImages
: avoid writing file within sub func
Published by ilius over 2 years ago
cacheDir
on Glossary.init()
ui_cmd_interactive
: support setting sortKey
glossary.py
: update docstrings for sortKeyName
sort_keys.py
: add desc
to NamedSortKey
doc/sort-key.md
Published by ilius over 2 years ago
Remove partial sorting support (obsolete feature)
--sort-cache-size
flag in command linesortCacheSize
argument to glos.write
and glos.convert
Re-design sorting and sortKey
parameters
Breaking change for library users, and user plugins that need sorting (sortOnWrite = ALWAYS
)
Change glos.convert
sortKey
(Callable) with sortKeyName
(str
)sortEncoding
(str) defaulting to utf-8
Change glos.write
sortKey
(Callable) with namedSortKey
(sort_keys.NamedSortKey
)sortEncoding
(str
) defaulting to utf-8
Change glos.sortWords
key
(Callable) with sortKeyName
(str
)sortEncoding
(str
) defaulting to utf-8
Change API of plugins that use sortOnWrite = ALWAYS
writer.sortKey
and Writer.sqliteSortKey
with sortKeyName
in plugin module.Note 1: All sortKey
and sortEncoding
arguments are optional.
Note 2: Values of sortKeyName
are documented in doc/sort-key.md
Rename 2 files in doc/
:
doc/entry_filters.md
to doc/entry-filters.md
doc/term_colors.md
to doc/term-colors.md
--sort-key
and --sort-encoding
command line flags (as part of above re-design)
Now SQLite mode works for all output formats.
FileNotFoundError
traceback in glos.read
and glos.write
glos.convert
if write
failedGlossary.__str__
glos.setInfo
: convert non-str value to str, and add testsAdd new tests and improve existing tests.
glossary.py
: %89glos
object to EntryList()
SqList
with SqEntryList
__iter__
of SqEntryList
and EntryList
to give entry objectsGlossary
by moving gc.collect
to EntryList
and SqEntryList
xml_unescape
operator.itemgetter
in stardict.py
, dict_cc.py
, ebook_kobo.py
, reverse.py
glossary.py
: cleanup, simplify and optimize generators logic
index
argument from entryFilter.run
method and add some commentsglos.progress
_getLangByStr
Glossary.detectOutputFormat
Published by ilius almost 3 years ago
\
with \\
--remove-html
flag: fix bad regex<a href="bword://...
) when --lower
flag is passedTextGlossaryWriter
: do not skip words that start with #
StdLogHandler
: was not applying --no-color
sys.frozen
Add auto_sqlite
config parameter
--no-sqlite
flagAdd 3 config parameters allow changing log colors in terminal:
color.cmd.critical
color.cmd.error
color.cmd.warning
Add 2 keys to config to enable/disable colors in Unix and Windows separately
color.enable.cmd.unix
: default true
color.enable.cmd.windows
: default false
Allow glos.setInfo(key, None)
to delete the info / metadata key
Add glos.alts
property as shortcut, and use it internally
Change rawEntry[0]
from bytes
to List[str]
and avoid split/join when converting rawEntry
<-> entry
.
This fixes some very edge cases involving |
in words, but uses more RAM in indirect mode (converting to StarDict), which can be solved with --sqlite
.
doc/config.md
with doc/config.rst, update comments and other improvementsCoverage of glossary.py
: %75
There are 2501 lines of test code in tests directory.
Tests for Glossary class include:
lower
, rtl
, remove_html
, remove_html_all
)Other improvements:
glossary_test.py
: check CRC32 of downloaded test filesglossary_test.py
: use a new temp dir for each test method for isolation.ebook_kobo_test.py
: split into several test methodsglos.config
to be set twice--lower --no-lower
utf8_check
config parameter by default (not needed since 3.0.0
)scripts/
directoryDataEntry.fromFile
and improve behavior of DataEntry.__init__
option.cmdFlag
to option.customFlag
glos.rawEntryCompress
property, and use in entry.py
Reader.open
text_utils.py
plugin_prop.py
: refactor getExtraOptionstext_writer.py
and plugins/tabfile.py
entry_filters.py
sortKey
and get_prefix
implementations from ebook_base.py
to epub and mobi pluginsPublished by ilius almost 3 years ago
text_utils.py
urlToPath
using urllib.parse.unquote
replacePostSpaceChar
: remove trailing space from the output strisControlChar
formatByteStr
exclude
from function isASCII
ui_cmd_interactive.py
: fix a minor bug and some small refactoring
Command line: Override input glossary info with --source-lang
and --target-lang
flags
Add unit tests for CSV -> Tabfile conversion
CSV plugin: some refactoring, and rename the module to csv_plugin.py
Update setup.py
: add python_requires=">=3.7.0"
, update extras_require
Update README.md
--name
flag for changing glossary nameGlossary
: convert
: add infoOverride
optional argumentPublished by ilius almost 3 years ago
Breaking changes:
glos.getAuthor()
with glos.author
apply_css
to css
for mobi and epub2glos.getInfo
and glos.setInfo
only accept str
as key (or a subclass of str
)Bug fixes:
Indirect mode: Fix handling '|' character in words.
|
in words when converting entry
<-> rawEntry
Escape/unescape |
in words when writing/reading text-based file formats
JSON: Prevent duplicate keys in json output, #344
glos.preventDuplicateWords()
Features and improvements
Add SQLite mode with --sqlite
flag for converting to StarDict.
--sqlite
to you command, even for running GUI.
python3 main.py --tk --sqlite
Add --source-lang
and --target-lang
flags
XDXF: support more tags and improvements
Add unit tests for Glossary
class, and some functions in text_utils.py
Windows: change cache directory to %LOCALAPPDATA%
Some refactoring and optimization
Update, improve and re-format documentations
Published by ilius almost 3 years ago
There are a lot of changes since last release, but here is what I could gather and organize!
Please see the commit list for more!
Improvements in ui_gtk
Improvements in ui_tk
Improvements in ui_cmd_interactive
Refactoring and improvements in ui-related codebase
Fix not loading config with --ui=none
Code style fixes and cleanup
Documentation
README.md
scripts/plugin-doc-gen.py
scriptAdd Dockerfile
and run-with-docker.sh
script
New command-line flags:
--json-read-options
and --json-write-options
;
in option values'--json-write-options={"delimiter": ";"}'
--gtk
, --tk
and --cmd
as shortcut for --ui=gtk
etc--rtl
to change direction of definitions, #268, also added to config.json
Fix non-working --remove-html
flag
Changes in Glossary
class
glos.getPref
to glos.getConfig
formatsReadOptions
and formatsWriteOptions
to Dict[str, OrderedDict[str, Any]]
glos.writeTabfile
, replace with a func in pyglossary/text_writer.py
Glossary.init
: avoid showing error if user plugin directory does not existFixes and improvements code base
dataEntry.save()
from raising exception because of invalid filename or permissionmktemp
and more improvements
~/.cache/pyglossary/
directory instead of /tmp/
runDictzip
RuntimeError
instead of StopIteration
when iterating over a non-open readerDataEntry
: replace inTmp
argument with tmpPath
argumentEntry
: fix html pattern for hyperlinks, #330dataDir
detection, #307 #316log.emit
dataDir
detection, #321StdLogHandler.emit
Fixes and improvements in Windows
dataDir
on Windows, #307shutil.rmtree
exception on WindowsChanges in Config:
skipResources
to skip_resources
utf8Check
to utf8_check
Implement direct compression and uncompression, and some refactoring
fileObj=
argument from glos.writeTxt
Update setup.py
Show version from 'git describe --always' on --version
FileSize
option (used in many formats):
K
, M
, G
unitsKiB
, MiB
, GiB
for powers of 1024Add extensionCreate
variable (str) to plugins and plugin API
Text-based glossary code-base (effecting Tabfile, Kobo Dictfile, LDF)
.N.txt
to .txt.N
(where N>=1
)file_count=-1
to metadata
Tabfile
writeInfo
to enable_info
*.txt_res
directory if exists*.txt_res
directory to *.zip fileZim Reader:
image/webp
, fix #329Slob and Tabfile Writer: add file_size_approx
option to allow writing multi-part output
5500k
, 100m
, 1.2g
Add word_title=False
option to some writers
word_title=False
optionword_title=False
optionword_title=False
optionword_title=False
optionkeywords_header
option to word_title
glos.wordTitleStr
, used in plugins with word_title
optiondefinition_has_headwords=True
info key to avoid adding the title next time we read the glossaryAard2 (slob)
separate_alternates=False
, #270content_type
option~/.cache/pyglossary/
instead of /tmp
slob.py
library: Refactoring and cleanupStarDict:
audio_goldendict
, #327audio_icon=True
, and add option comment, #327FreeDict Reader
pron
unescape_unicode
by encoding="utf-8"
arg to ET.htmlfile
edition
is missing in header, and few other fixes<cit type="example">
with <cit type="trans">
inside it<cit type="trans">
inside nested second-level(nested) <sense>
"lang"
attribute to html elements<def>
, refactoring and improvement<note>
inside <sense>
<note>
in <gramGrp>
<a ... class="external">
<cit>
<xr>
inside <sense>
<sense>
XDXF
Fix not finding xdxf.xsl
in installed mode
xdxf.xsl
: generate <font color=...>
instead of <span style=...>
StarDict Reader: Add xdxf_to_html=True
option, #258
StarDict Reader: Import xdxf_transform
lazily
lxml
, #261XDXF plugin: fix glos.setDefaultDefiFormat call
xdxf_transform.py
: remove warnings for , #322sr
, gr
, ex_orig
, ex_transl
tags and audio
None
attribute from audio
tagMobi
Changes in ebook_base.py
(Mobi and EPUB)
style.css
dataEntry, #299DSL Reader:
html.escape
on text before adding html tags, #265<i>
and <font color=...>
instead of <span style=...>
\ufeff
from header lines, #306AppleDict Source
encoding="utf-8"
AppleDict Binary
DefaultStyle.css
file, add as style.css
, #299html=True
Octopus MDict (MDX)
readmdict.py
audio=True
(default: False
), #327audio
: remove extra attrs and add commentsDICT.org plugin:
installToDictd
: skip if target directory does not existFixes and improvements in Dict.cc (SQLite3) plugin:
fetchall()
, #296unescape_unicode
JMDict
unescape_unicode
DigitalNK: work around Python's sqlite bug, #282
Changes in dict_org.py
plugin, By Justin Yang
CC-CEDICT Reader:
conv.py
<
, >
and &
 
instead of
<font>
instead of <span style=...>
len(syllables)
, #328<font color="">
for each syllable in case of mismatch tones, #328Rename read/write options:
infoKeys
-> info_keys
addExtraInfo
-> add_extra_info
havePrevLink
to prev_link
writeInfo
to enable_info
writeInfo
to enable_info
New formats:
plugins/abc_medical_notes.py
, #267plugins/almaany.py
, #267 #268Remove TreeDict plugin, plugins/treedict.py
Remove FreeDict writer
Published by ilius almost 4 years ago
Require Python 3.7 or 3.8, drop support for Python 3.4, 3.5 and 3.6
Fix / rewrite setup.py
python3 setup.py sdist bdist_wheel
, and pypi paackage
ui/
directory into pyglossary/
distutils
to setuptools
py2exe
Add interactive command line user interface
$DISPLAY
is not settkinter
module is found--ui=cmd
flag is passedNew format support:
dictfmt
source fileRemove Omnidic write support (Unmaintained J2ME dictionary)
Remove Octopus MDict Source plugin
Remove Babylon Source plugin
BGL Weader: improvements
DictionaryForMIDs Writer: fix non-working code
Gettext Source (po) Writer: fix info header
MOBI E-Book Writer: fix sort order, fix and test kindlegen codes, add kindlegen_path
option, #112
EPUB-2 E-Book Writer: fix sort order
XDXF Reader: rewrite with etree.iterparse
to avoid using too much RAM
Lingoes Source (LDF) Reader: fix ignoring info/metadata header
dict_org.py: rewrite broken plugin (Reader and Writer)
DSL Reader: fix loosing metadata/info
Aard 2 (slob) Reader:
bword://
prefix to entry linksAard 2 (slob) Writer:
bword://
prefix from entry linksslob.py
in debug modecompression
to zlib
compression
Octopus MDict Reader:
len(reader)
for progressbarStarDict Writer:
stardict_client: bool
True
to make glossary more compatible with StarDict 3.xsametypesequence
option is given and a definitions contains |
sametypesequence=x
for xdxfmerge_syns
optionsametypesequence=None
optionXDXF Reader:
Kobo Writer:
<img src=...
tags with [Image: name.bmp]
, #219
CSV:
delimiter
option to Reader and Writeradd_defi_format=True
(default False)AppleDict Writer:
MDX Reader:
entry://
with bword://
in MDX Reader instead of AppleDict Writerhref="x:"
and href="d:"
linksfile://
in images path, fix #243User Interface improvements and fixes:
Add a list of 208 languages and ~40 writing systems
sourceLang
and targetLang
from glossary name/title<b>
and <big>
tags depending on writing system
glos.titleElement
method, used in FreeDict, JMDict and Dict.cc writersglos.sourceLang
and glos.targetLang
properties (with setters) as Lang
objectsglos.sourceLangName
and glos.targetLangName
properties (with setters) as str
Break compatibilty of plugins
__init__(self, glos)
open(self, filename)
glos.setInfo
__len__(self) -> int
__iter__(self) -> "Iterator[BaseEntry]"
close(self)
__init__(self, glos)
open(self, filename)
glos.getInfo
or glos.iterInfo
and written to filewrite(self) -> "Generator[None, BaseEntry, None]"
Entries must be fetched with entry = yield
in a while True
loop:
while True:
entry = yield
if entry is None:
break
# process and write entry into file(s)
finish(self)
pyglossary/plugins/csv_pyg.py
plugin for examplesortKey
must be an intance method of Writer, instead of a function outside any class
Refactor and cleanup Glossary
class
Glossary
git diff 3.3.0..master -- pyglossary/glossary.py
glos.addEntry
method
glos.addEntryObj(glos.newEntry(word, defi, defiFormat))
getMostUsedDefiFormats
iterEntryBuckets
zipOutDir
and archiveOutDir
pyglossary/glossary_utils.py
archiveOutDir
renamed to compressOutDir
writeDict
iterSqlLines
-> moved to pyglossary/plugins/sql.py
reverse
, takeOutputWords
, searchWordInDef
-> moved to pyglossary/reverse.py
Glossary.plugins
is changed to plugin_prop.PluginProp
instancesglos.writeTxt
arguments
sep1
and sep2
with entryFmt
rplList
with defiEscapeFunc
, wordEscapeFunc
and tail
iterEntries
, entryFilterFunc
Generator[None, BaseEntry, None]
instead of bool
pyglossary/glossary.py
-> def writeTabfile
pyglossary/plugins/dict_org_source.py
pyglossary/plugins/json_plugin.py
pyglossary/plugins/lingoes_ldf.py
pyglossary/plugins/sdict_source.py
Refactor, cleanup and fixes in Entry
and DataEntry
classes
entry.getWord()
with entry.word
entry.getWords()
with entry.l_word
entry.getDefi()
with entry.defi
entry.getDefis()
Entry
objectsentry.getDefiFormat()
with entry.defiFormat
entry.b_word
and entry.b_defi
shortcuts that give bytes
(UTF-8)dataEntry.getData()
with dataEntry.data
__slots__
to Entry and DataEntry classesDataEntry
in indirect mode
dataEntry.save(...)
Entry.getRawEntrySortKey
not being alternates-aware, broke StarDict WriterDataEntry
: save: use shutil.copy
if has _tmpPath
, and set _tmpPath
New features of Entry
entry.stripFullHtml()
, remove <html... <head>...</head>...<body>
Fix glos.writeTabfile
:
\r
from definitions and info valuesFix/improve html detection in definitions
Switch to lazy imports of non-standard modules in plugins
Optimize RAM usage of indirect conversion
Other new features of Glossary class
glos.getAuthor()
to get "author", or "publisher" (as fallback)glos.removeHtmlTagsAll()
method, can be called by plugins' writerglos.collectDefiFormat(maxCount)
extract defiFormat counts
maxCount
entries. (then iterator will be reset)Bug fixes and improvements in code base
Apply entry filter when iterating over reader, fix #251
Fixes and improvements in TextGlossaryReader
class
Fix evaluating None
value in read/write options
Support reading multi-file Tabfile or other text formats
file.txt
, file.txt.1
, file.txt.2
file_count
info key, for example: ##file_count 3
Fixes in Tabfile Writer
Add/update documentation
doc/termux.md
doc/apple.md
doc/lzo.md
.svg
files in doc/
folderSwitch to f-strings, pep8 fixes, add types, style changes and refactoring
New command line flags:
--log-time
to show datetime in logs (override log_time
in config.json)--no-alts
to disable alternates handling--normalize-html
to lowercase tags (for now)--cleanup
and --no-cleanup
--info
to save .info
file alongside output filePublished by ilius over 4 years ago
Require Python 3.6 or higher (mainly becuase of f-strings)
New format support
Glossary: detect and load Writer class from plugins
Glossary: call gc.collect()
on indirect mode after reading/writing each 128 entries
Glossary: remove empty and duplicate alternate words when converting, using Entry Filter, #188
Add command line options to remove html tags:
--remove-html=tag1,tag2,tag3
--remove-html-all
Re-design format-specific options
option.py
option.py
in option_test.py
optionsProp
to all plugins
readOptions
and writeOptions
from all plugins
optionsProp
variables**kwargs
in plugin read
, Reader.open
or write
functionsAdd depends
variable to plugins
dict
, keys are module names, values are pip's package nameGlossary.formatsDepends
Minor fixes and improvements in Glossary class:
DIRECTORY.zip
as output glossaryindex % 100
-> index & 0x7f
appledict_bin.py
Glossary.writeTxt
StarDict plugin
.ifo
file as UTF-8Babylon BGL plugin
b'...'
and some refactoring in readType3bgl_
stripHtmlTags
Octopus MDict plugin
readmdict.py
: https://bitbucket.org/xwang/mdict-analysis/commits/8f66c30
Change yes/no options in AppleDict and ABBYY Lingvo DSL plugins to boolean
AppleDict plugin:
echo
problem in Makefile
(#177)optionsProp
features=
and fix a warning about from_encoding=Fix misspelled "extension" (as "extention") in plugins
Detect entries with span
tag as html, #193
Refactoring in ui_gtk and ui_tk
Fix some deprecated API in ui_gtk
Fix minor bugs and improvements in ui_tk and ui_gtk
Update setup.py to adapt packaging with wheel, #189
Add type hints to codebase and plugins
Refactoring and style changes:
pyglossary.pyw
to main.py, add a small pyglossary.pyw
for compatibilityPublished by ilius over 5 years ago
Published by ilius over 5 years ago
Add read support for CC-CEDICT plugin
Fixes in DSL (ABBYY Lingvo) plugin:
#CONTENTS_LANGUAGE:
Improvement in Gtk interface:
Fix encoding problem with non-UTF-8 system locales
Improvements in Glossary class
Published by ilius over 6 years ago
--ui=none
flag--skip-resources
flagencoding
write optionencoding
read optionwrite
and convert
methods return absolute path of output file, or Nonegzip_no_crc.py
for Python 36 (required for some non-standard BGL files)encoding='utf8'
while opening xml file, fix for #84return
with argument inside generator) in Glossary.reverse with Python 3.6Published by ilius almost 8 years ago
Glossary
code baseZeroDivisionError
if wordCount < 500
, #61Glossary.writeTxt
open
converts (modifies) newlines automatically, #66Glossary.writeTxt
methodsep
which was a tuple of length two, with two mandatory arguments: sep1
and sep2
install
was not working<br/>
, #61encoding='utf-8'
while opening file for write, #67encoding='utf-8'
while opening file for read, #78newline
argument to Glossary.writeTxt
), #66newline
write optionCRC check failed
error for some (rare) glossaries with Python 3.4--read-options
and --write-options
(happened in very rare cases)pyglossary.spec
pyglossary.desktop
Published by ilius over 8 years ago
Changes since 3.0.2
Published by ilius over 8 years ago
Changes since 3.0.1
setup.py
, making it not to workREADME.md
Published by ilius over 8 years ago
Changes since 3.0.0
pyglossary.pyw
Published by ilius over 8 years ago
Since I believe this is the first standard version, I'm not sure which code revision should I compare it with. So I just write the most important recent changes, in both application-view and library-view.
exec
) to JSON
sort
boolean flag
--sort
in command line to enable sorting for most of output formatsplugins
inside config directorycopy
, attach
, merge
, deepMerge
, takeWords
, getInputList
, getOutputList
reverseDic
-> reverse
data
-> _data
info
-> _info
filename
-> _filename
write
operation
convert
:
convert
method is added to be used instead of read
and then write
sort
boolean flag is now an argument to write
method
--sort
in command line--no-sort
in command linewrite
method itself decides what to dosortOnWrite
variable in plugin, with allowed values:
ALWAYS
: force sorting even if sort=False (user gives --no-sort
), used only for writing StarDictDEFAULT_YES
: enable sorting unless sort=False (user gives --no-sort
)DEFAULT_NO
: disable sorting unless sort=True (user gives --sort
)NEVER
: disable sorting even if sort=True (user gives --sort
)sortOnWrite = DEFAULT_NO
sortKey
function to be used for sortingkey
argument to list.sort
method, See pydoc list.sort
)glos.data.append((word, defi))
-> glos.addEntry(word, defi)
for item in glos.data:
-> for entry in glos:
for key, value in glos.info.items():
-> for key, value in glos.iterInfo():
master
branch was based on Python 3 since 2016 Apr 29, there was some problem that are fixed in this releasepython2.7
--direct
command line option--sort
in command line--sort-cache-size=1000
is optionalGlossary.convert
method), direct mode is enabled by default if sorting is not enabled (by user or plugin)--indirect
option in command lineAutomatic command line Progress Bar for all input / output formats is now supported
--direct
and --sort
flags are given, will be fixed later--no-progress-bar
option (recommended for Windows users)Feature: Add encoding
option to read and write drivers of some plain-text formats
Feature: SQL and SQLite: read/write extra information from/to a new table dbinfo_extra, backward compatible
New format invented and implemented for later implementation of a Glossary Editor
edlin.py
(Editable Linked List of Entries) is optimized for adding/modifying/removing one entry at a timeRewrite non-working Reverse functionality
Improve and complete command line help (-h
or --help
)