Parse zsh history file in ruby

As I was trying to build a customized history browser for zsh, I quickly felt on a weird issue when trying to parse it in ruby.

3.3.1 :001 > history_lines = File.read(File.expand_path('~/.zsh_history')).split("\n")
(irb):1:in `split': invalid byte sequence in UTF-8 (ArgumentError)

I was a bit surprized at first, but a quick search online explained to me that zsh use a weird trick to encode higher byte values of some characters[1]. I should have a specific aim at for american-only people, but nowadays, with all the emojis everywhere, it feels a bit odd.

Anyway, the solution was provided in the previous email as a little C code excerpt.

/* from zsh utils.c */
char *unmetafy(char *s, int *len)
{
  char *p, *t;

  for (p = s; *p && *p != Meta; p++);
  for (t = p; (*t = *p++);)
    if (*t++ == Meta)
      t[-1] = *p++ ^ 32;
  if (len)
    *len = t - s;
  return s;
}

I propose the following transposition in ruby.

def unmetafy(text)
  xor_next = false
  text.bytes.filter_map do |byte|
    if byte == 0x83
      # Meta char, next one must be changed
      xor_next = true
      next
    elsif xor_next
      # unmetafy byte
      xor_next = false
      byte ^ 32
    else
      byte
    end
  end.pack('C*').force_encoding('UTF-8')
end

Then, parsing the zsh history file is as easy as the following code example. In it, using `each_line' instead of `split("\n")' avoid a crash before trying to do anything. `each_line' is more relax on parsing, and as the "meta char" will never replace the end of line character, it should never break.

line_data = []
File.read(File.expand_path('~/.zsh_history')).each_line do |line|
  line_data << unmetafy(line)
end

[1] weird trick to encode higher byte values of some characters (HTTPS)

📅 mardi 30 avril 2024 à 22:29

📝 Étienne Pflieger with GNU/Emacs 29.4 (Org mode 9.7.11)

🚀 Propelled by fronde