As I was trying to build a customized history browser for zsh, I quickly felt on a weird issue when trying to parse it in ruby.
3.3.1 :001 > history_lines = File.read(File.expand_path('~/.zsh_history')).split("\n") (irb):1:in `split': invalid byte sequence in UTF-8 (ArgumentError)
I was a bit surprized at first, but a quick search online explained to me that zsh use a weird trick to encode higher byte values of some characters[1]. I should have a specific aim at for american-only people, but nowadays, with all the emojis everywhere, it feels a bit odd.
Anyway, the solution was provided in the previous email as a little C code excerpt.
/* from zsh utils.c */ char *unmetafy(char *s, int *len) { char *p, *t; for (p = s; *p && *p != Meta; p++); for (t = p; (*t = *p++);) if (*t++ == Meta) t[-1] = *p++ ^ 32; if (len) *len = t - s; return s; }
I propose the following transposition in ruby.
def unmetafy(text) xor_next = false text.bytes.filter_map do |byte| if byte == 0x83 # Meta char, next one must be changed xor_next = true next elsif xor_next # unmetafy byte xor_next = false byte ^ 32 else byte end end.pack('C*').force_encoding('UTF-8') end
Then, parsing the zsh history file is as easy as the following code example. In it, using `each_line' instead of `split("\n")' avoid a crash before trying to do anything. `each_line' is more relax on parsing, and as the "meta char" will never replace the end of line character, it should never break.
line_data = [] File.read(File.expand_path('~/.zsh_history')).each_line do |line| line_data << unmetafy(line) end
[1] weird trick to encode higher byte values of some characters (HTTPS)
--
π mardi 30 avril 2024 Γ 22:29
π Γtienne Pflieger with GNU/Emacs 29.4 (Org mode 9.7.11)