Updated (11/18/2018 Yet Again: I planned to stop, but then I wanted something. So here we are again. I added -onyomi-to-hiragana and -kunyomi-to-katakana.
Updated (11/18/2018) Again: I added -rename argument. I added single operation !tag match.
Updated (11/18/2018) Again: I added -strip-rmgroup argument. I added “!” not operator for attributes.
Updated (11/18/2018): I added -strip-meaningless argument.
This is some pretty awfully specific, incredibly, more-than-usual, niche software. This program is for stripping and cutting out information from the kanjidic2.xml file as well as various other operations with the data.
This is a command line tool, these are the arguments…
Input File (kanjidic2.xml)
Output File (if you don’t specify a file, kanjidic2.modified.xml is created in input folder)
Strip out kanji that doesn’t have a defined meaning (11/18/2018 update)
Strip out non-ranked kanji (overrides -strip-meaningless)
Order by ranking (none = unranked)
-rank-order <low-high-none, high-low-none, none-low-high, none-high-low>
Rank Limit (2500 being the maximum limit)
English only, strips out Spanish, Portuguese, Hangul, Vietnamese, and anything not English/Japanese
Strip out nanori
Strips out dic_number group, or specific parts of it
Strip out codepoint info
Strip out radical info
Strip out query code info
Strip out misc info
Strip reading meaning info (11/18/2018 Version B)
Rename tags (11/18/2018 Version C)
Convert OnYomi to Hiragana (11/18/2018 Version D)
Convert KunYomi to Katakana (11/18/2018 Version D)
You can specify to remove a tag
e.g. -strip-misc “variant”
You can specify many tags to be removed by separating them with semi-colons
e.g. -strip-misc “variant;grade;freq”
In 11/18/2018 version C, you can specify with an exclamation mark “!” before the tag, that everything except it be removed. This is a powerful command and there can only be 1 of these specified. I’d advise against using or overusing this particular feature as it may have unusual results.
e.g. -strip-misc “!freq”
You can specify attributes with the tags, but not just attributes without a tag
e.g. -strip-dic-number “dic_ref[dr_type=nelson_c]”
In 11/18/2018 version B, you can specify with an exclamation mark “!” before the attribute name, that you want a “not match”. So everything except a tag “dic_ref” with attribute “dr_type=heisig6” will be stripped.
e.g. -strip-dic-number “dic_ref[!dr_type=heisig6]”
You can even specify that it needs to match 2 or more attributes, despite that fact that kanjidic2.xml doesn’t seem to contain anything like that.
e.g. -strip-dic-number “dic_ref[dr_type=nelson_c][some_other_thing=1]”
In 11/18/2018 version B, this feature is actually useful now with exclamation mark “!”, “not matches”. In the example, anything that is “dic_ref” with any one of the three attributes is exempt from being stripped.
e.g. -strip-dic-number “dic_ref[!dr_type=heisig6][!dr_type=gakken][!dr_type=sh_kk]”
You can of course specify many by separating them with semi-colons as mentioned above.
e.g. -strip-dic-number “dic_ref[dr_type=nelson_c];dic_ref[dr_type=nelson_n]”
You can rename sections by separating the name you want to replace and the new name with a colon.
e.g. -rename “misc:stuff”
You can go into specific sections and rename inside them. Using the front slash “/” to denote a change in depth.
e.g. -rename “misc/freq:rank”
You can rename multiple items. Separating them by semi-colons.
e.g. -rename “misc/freq:rank;misc/grade:school_grade”
Be careful, you can not rename something inside something that has been renamed, so do it backwards.
Bad: -rename “misc:stuff;misc/freq:rank” will not rename “misc/freq” to “rank”
Good: -rename “misc/freq:rank;misc:stuff”
Lastly, here is a real example to get you started if all of this looks intimidating.
KanjiDictTool -i “kanjidic2.xml” -ranked-only -rank-order low-high-none -rank-limit 256 -english-only -strip-dic-number -strip-nanori -strip-codepoint -strip-radical -strip-query-code -strip-misc “variant”