Resources for 常用漢字表. Thursday 28 July 2016

Leonardo F.S. Boiko’s post Jōyō kanji variants: The curious case of 叱 and 𠮟 is among other things an interesting discussion of the acknowledgement of variants of characters in the 常用漢字表). It also pointed me to a project he is working on:


Might be useful for dealing both with the latest 常用漢字 and their variants. The output folder contains lists that his script generates (from the official 常用漢字表 that is in PDF).

However, the post also explained that Unicode has embraced a method to hard code local variants of characters that have been unified earlier (that for a long time to come won’t be supported by any software no doubt). It’s amazing that after so many years Unicode gets a kind of after the fact fix for the controversial choices made in the unification. I’m not sure what to think about it.

Currently search engines seem to unify even not-unified characters like 圧/壓. Example for Google using 外壓:

If I want to search for 外壓 only, excluding 外圧, I have to choose a language in the advanced search options page. Which is a workable solution.

Currently I’ve ‘unified’ the characters with separate code points for old words in entries of 日蘭辭典, just like Google, but not for modern entries. (I did use a universal solution earlier, but it was a bit slow, so I removed the code. I lack real programming skills.)

Will I update www.jiten.nl, when the new options come available? Maybe, if I’m still around then, and really bored.