💾 Archived View for gemini.circumlunar.space › users › shufei › phlog › 20210107-Tech-IMEBlues.gmi captured on 2022-07-16 at 13:59:07. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-12-03)

➡️ Next capture (2024-06-20)

-=-=-=-=-=-=-

2021/01/07 - Tech - CJK - IME Blues and LinTabs

More and more I am frustrated by the conventions of 中日韓越壯 text input and encoding. Why are we still trying to graft characters onto the primitive and restricted ASCII alphabet? We have accepted a system “just because” to the point that Unicode is effectively a dictionary of CJK. This is inefficient and ossifies Characters which should be an organic and evolving system.

Cangjie and Wubi are on the right track. Lin Yutang should light the way, not Silicon Valley &c. If he could do it with metal pulleys and gears and keys, why can’t we?

There is no technical reason why characters can’t be encoded using a few basic frameworks like stroke, corner coordinates, and the inherent / organic lineaments like radicals. Doing so would help writers evade prescriptive regimes, would keep characters close to the “bare metal” as well as be more humane or ergonomic.

I confess I mostly input using pinyin with fuzzy for southern dialect. This is a shame on me. On tablet I have the option of writing by handwriting. I adore the iPadOS handwriting IME. However its library is rather restricted, apparently because many of the libraries are still based on the old BIG5 or GB code or older UTF8 blocks. So those of us who work in vocabulary beyond the dinner table chit chat must constantly beg cabals of devs to expand their libraries whenever Unicode consortium gets around to adding characters.

It’s pure madness that in the 庚子 year of our Lord 4717 we must still do this.

Of course “certain parties” enjoy this state of affairs. When characters are hard encoded in Unicode, they are easier to censor and surveil. I don’t want to darkly insinuate a Beijing-Redmond-Palo Alto axis, but it may as well be the case de facto.

PineTab and RasPad

Currently no device fits these criteria. But the tech is getting close, and to a point where a Jane of all trades may herself easily make it so.

Previously I drooled for the PineTab.

PineTab (web)

It’s an elegant beastie. But PineTab aren’t often available, and aren’t easy to mod.

There is now the RasPad by SunFounder.

RasPad (web)

I like this much better for numerous reasons. The raspberry Pi can be upgraded or removed for various projects. The tablet is basically an i/o framework and portable power supply for any Pi you see fit to install. Furthermore, the ports are compleat, even the GPIO is easily accessible, making this device a peach of a personal cybercortex.

The triangular form factor has pros and cons. RasPad may be a bit harder to carry in a purse? But I’d enjoy those ports and be happy to use the thing as a total radio solution with a HackRF plastered on the back. (I’m thinking of devving a multilingual radio mode).

I would love to use a RasPad with Zinnia/tekigaki or some other handwriting IME. This would be optimal, whether in Latin or 中日韓越壯 headspace.

Zinnia (web)

Sadly, I’ve yet to get zinnia / tekigaki to actually play nicely on any Linux install! The path of laziness on laptop has been to use the keyboard. The facility to hand write CJK on my iPad has yoked me to it. I hope Ubuntu Mate for RasPad is easy to use ibus and zinnia.

Epaper Blues

There are now epaper tablets. QED:

Remarkable 2 (web)

No tablet I’ve discovered has multicolour epaper. Furthermore, those that use epaper tend to be made for professional class people whose purses don’t have steel tools banging about in them, haha. There are no serious port access.

But RasPad looks eminently moddable. I’m pondering a nice secondary screen as epaper. It could be mounted to the back with a little effort on hinges or caster slides. There is now a luscious 7 colour epaper screen.

7 Colour Epaper Display (web)

However, this beastie doesn’t yet have a capacitance overlay that I’ve seen. It is also pretty slow on refresh, at 15 seconds, making it impractical for all but reading and perhaps terminal stuff. For tablet, a greyscale or 3 colour display is still the best option for interactive use.

Greyscale epaper with Pi hat (web)

Nevertheless, the time when a colour epaper Linux tablet may be had is drawing nigh.

Dream Tab

My dream tablet at current tech level offerings would be 7 colour epaper, capacitive touch display. It would pair a Pencil type stylus, and one with a brush tip. It would have a built in SDR transceiver with sma ports. The audio would be isolated and shielded by copper for optimal AFSK operations. The tablet would run x86 code. All ports would be accessible from the box edges, including GPIO. It would absolutely be TB;RL vertical text and darkmode by default. The tablet would have a smolnet suite out of the box (Gemini, gopher, finger client with integrated sftp).

Writing by brush on such a tablet, orange text on black epaper, instantly encoded to vertical UTF8, would be glorious, seemly, and joyful.

Dream 中日韓越壯 Charset

A new CJK character encoding which would be complementary with UTF8. It would encode stroke, four corners, radical/sign, and/or vector data to construct characters organically. It would build as much as possible on traditional frameworks like 反切、倉頡輸入法、五筆等等 for a plethora of keyboard and HCR interface options...

The key difference is that it be designed to draw the user closer to the code and the organic writing of characters. Any phonetic inputs would be deprecated unless ancillary (mainly to support various 方言 glossaries and vulgar or literary idiom libraries).

How big such an encoding would get depends on the character resolution. I think stroke, radical, 4 corners, and Unicode description characters are on the right track for efficiency... But a vector description as ancillary encoding would truly allow an almost infinite open set. The idea would be that the visual elements are the encoding as much as possible, beyond a standard set of words. The IME would support with predictive text, or with alternative methods such as phonetic input, but the logic of the encoding would encourage “drawing” the characters, whether by handwriting or typing.

The phonetic inputs would be called up by a hot key, be tone sensitive, and fuzzy *within* characters by rime. That is, latin or 注音 input somewhat obscures root etymology relationships. There’s no reason why IME predictive text couldn’t mediate between various pronunciations in different dialects in intelligent ways. Not to mention a finer granularity to character encoding would make genealogies of characters far more obvious and discoverable. A 字譜 IME predictive text layer, with various dialect layers, would also enable easy literacy in the literary language.

In other words, I highly suspect a far more organic integration of human, character, spoken dialect, digital encoding, and IME is possible. A humane IME would cultivate the self of the writer rather than encouraging slovenly and totalitarian hidebound language habits. A humane encoding would make the organic system of characters visible and accessible. A human multi dialect and etymological approach to phonetic libraries would make intersectional (multi dialect) communication more harmonious and visible.

A Zhuang person could write to a Cantonese speaker could write to a Shanghainese speaker could write to someone in Tokyo, all mediated by a far more integral and organic encoding and input.

Only modern ideology prevents all this. (Cough, cough)

-EOF-

.