Kanji Dictionary Tool – Release (11/17/2018)

Updated (11/18/2018 Yet Again: I planned to stop, but then I wanted something. So here we are again. I added -onyomi-to-hiragana and -kunyomi-to-katakana.
Updated (11/18/2018) Again: I added -rename argument. I added single operation !tag match.
Updated (11/18/2018) Again: I added -strip-rmgroup argument. I added “!” not operator for attributes.
Updated (11/18/2018): I added -strip-meaningless argument.

This is some pretty awfully specific, incredibly, more-than-usual, niche software. This program is for stripping and cutting out information from the kanjidic2.xml file as well as various other operations with the data.

Download Version D (11/18/2018)

This is a command line tool, these are the arguments…
Input File (kanjidic2.xml)
-i inputFilePath
Output File (if you don’t specify a file, kanjidic2.modified.xml is created in input folder)
-o outputFilePath
Strip out kanji that doesn’t have a defined meaning (11/18/2018 update)
-strip-meaningless
Strip out non-ranked kanji (overrides -strip-meaningless)
-ranked-only
Order by ranking (none = unranked)
-rank-order <low-high-none, high-low-none, none-low-high, none-high-low>
Rank Limit (2500 being the maximum limit)
-rank-limit <1-2500>
English only, strips out Spanish, Portuguese, Hangul, Vietnamese, and anything not English/Japanese
-english-only
Strip out nanori
-strip-nanori
Strips out dic_number group, or specific parts of it
-strip-dic-number [tagAndAttrs*]
Strip out codepoint info
-strip-codepoint [tagAndAttrs*]
Strip out radical info
-strip-radical [tagAndAttrs*]
Strip out query code info
-strip-query-code [tagAndAttrs*]
Strip out misc info
-strip-misc [tagAndAttrs*]
Strip reading meaning info (11/18/2018 Version B)
-strip-rmgroup [tagAndAttrs*]
Rename tags (11/18/2018 Version C)
-rename [renameSpecifications*]
Convert OnYomi to Hiragana (11/18/2018 Version D)
-onyomi-to-hiragana
Convert KunYomi to Katakana (11/18/2018 Version D)
-kunyomi-to-katakana

*tagAndAttrs Specifications
You can specify to remove a tag
e.g. -strip-misc “variant”

You can specify many tags to be removed by separating them with semi-colons
e.g. -strip-misc “variant;grade;freq”

In 11/18/2018 version C, you can specify with an exclamation mark “!” before the tag, that everything except it be removed. This is a powerful command and there can only be 1 of these specified. I’d advise against using or overusing this particular feature as it may have unusual results.
e.g. -strip-misc “!freq”

You can specify attributes with the tags, but not just attributes without a tag
e.g. -strip-dic-number “dic_ref[dr_type=nelson_c]”

In 11/18/2018 version B, you can specify with an exclamation mark “!” before the attribute name, that you want a “not match”. So everything except a tag “dic_ref” with attribute “dr_type=heisig6” will be stripped.
e.g. -strip-dic-number “dic_ref[!dr_type=heisig6]”

You can even specify that it needs to match 2 or more attributes, despite that fact that kanjidic2.xml doesn’t seem to contain anything like that.
e.g. -strip-dic-number “dic_ref[dr_type=nelson_c][some_other_thing=1]”

In 11/18/2018 version B, this feature is actually useful now with exclamation mark “!”, “not matches”. In the example, anything that is “dic_ref” with any one of the three attributes is exempt from being stripped.
e.g. -strip-dic-number “dic_ref[!dr_type=heisig6][!dr_type=gakken][!dr_type=sh_kk]”

You can of course specify many by separating them with semi-colons as mentioned above.
e.g. -strip-dic-number “dic_ref[dr_type=nelson_c];dic_ref[dr_type=nelson_n]”

*Rename Specifications
You can rename sections by separating the name you want to replace and the new name with a colon.
e.g. -rename “misc:stuff”

You can go into specific sections and rename inside them. Using the front slash “/” to denote a change in depth.
e.g. -rename “misc/freq:rank”

You can rename multiple items. Separating them by semi-colons.
e.g. -rename “misc/freq:rank;misc/grade:school_grade”

Be careful, you can not rename something inside something that has been renamed, so do it backwards.
Bad: -rename “misc:stuff;misc/freq:rank” will not rename “misc/freq” to “rank”
Good: -rename “misc/freq:rank;misc:stuff”

Lastly, here is a real example to get you started if all of this looks intimidating.
KanjiDictTool -i “kanjidic2.xml” -ranked-only -rank-order low-high-none -rank-limit 256 -english-only -strip-dic-number -strip-nanori -strip-codepoint -strip-radical -strip-query-code -strip-misc “variant”

 
 
© 2018 NRGsoft