💾 Archived View for alltext.umaneti.net › gemlog › parse-zsh-history-file-in-ruby.gmi captured on 2024-07-08 at 23:22:19. Gemini links have been rewritten to link to archived content

View Raw

More Information

➡️ Next capture (2024-08-18)

-=-=-=-=-=-=-

Parse zsh history file in ruby

As I was trying to build a customized history browser for zsh, I quickly felt on a weird issue when trying to parse it in ruby.

3.3.1 :001 > history_lines = File.read(File.expand_path('~/.zsh_history')).split("\n")
(irb):1:in `split': invalid byte sequence in UTF-8 (ArgumentError)

I was a bit surprized at first, but a quick search online explained to me that zsh use a weird trick to encode higher byte values of some characters[1]. I should have a specific aim at for american-only people, but nowadays, with all the emojis everywhere, it feels a bit odd.

Anyway, the solution was provided in the previous email as a little C code excerpt.

/* from zsh utils.c */
char *unmetafy(char *s, int *len)
{
  char *p, *t;

  for (p = s; *p && *p != Meta; p++);
  for (t = p; (*t = *p++);)
    if (*t++ == Meta)
      t[-1] = *p++ ^ 32;
  if (len)
    *len = t - s;
  return s;
}

I propose the following transposition in ruby.

def unmetafy(text)
  xor_next = false
  text.bytes.filter_map do |byte|
    if byte == 0x83
      # Meta char, next one must be changed
      xor_next = true
      next
    elsif xor_next
      # unmetafy byte
      xor_next = false
      byte ^ 32
    else
      byte
    end
  end.pack('C*').force_encoding('UTF-8')
end

Then, parsing the zsh history file is as easy as the following code example. In it, using `each_line' instead of `split("\n")' avoid a crash before trying to do anything. `each_line' is more relax on parsing, and as the "meta char" will never replace the end of line character, it should never break.

line_data = []
File.read(File.expand_path('~/.zsh_history')).each_line do |line|
  line_data << unmetafy(line)
end

[1] weird trick to encode higher byte values of some characters (HTTPS)

--

📅 mardi 30 avril 2024 à 22:29

📝 Étienne Deparis with GNU/Emacs 29.4 (Org mode 9.7.6)

propelled by fronde