Emacs: English Dictionary Completion With Dabbrev (publ. 2024-03-19)

[EDIT: A friend suggested I take a look at the "company" extension, which has an ispell backend, so I am looking into that. I see there is a helm-company package as well for interfacing that to Helm.]

I'm very fond of fuzzy completion and am currently using the Helm system which is layered on top of completing-read. Usually when I am writing a post or some documention, at least once or twice I'll begin to type a word which is long and/or it is hard to remember the exact spelling, and I think that it would be helpful to be able to autocomplete the word.

For a while I have been using dabbrev-expand, which completes the word using words found in the buffer or in other open buffers. This is often helpful, but if it does not get the correct word on the first attempt, you have to keep running dabbrev-expand until it does. I recently discovered Emacs has a dabbrev-completion command as well, which starts a completion for the word selection. Since on my system completion is handled by Helm fuzzy completion, then it is generally fast to select the word I want.

However, this will only give us access to words in the open buffers, rather than the full English language. One easy trick to solve this problem is to keep open, in the background, a buffer that contains all the English words. I found one here:

List Of English Words

The file "words.txt" in this repository contains 466k English words. So I cloned the repository locally and then put some elisp in my init.el in order to automatically open this buffer in the background. I make it a read-only buffer also so that I do not accidently corrupt the file.

(defvar dict-file nil)
(setq dict-file "~/Repos/dwyl/english-words/words.txt")

;; We open up the dictionary file read-only, so that
;; dabbrev-completion-all can pull from it
(with-current-buffer (find-file-noselect dict-file)
  (set (make-local-variable 'buffer-read-only) t))

Now, the dabbrev-completion function, without any arguments, only pulls candidates from the current open buffer. We need to pass the correct argument (16) in order to have it pull from all buffers. I added a convenient wrapper function for that as well as an ergonomic key-binding:

(defun dabbrev-completion-all ()
  "convenience function to do dabbrev-completion using all buffers"
  (interactive)
  (dabbrev-completion 16))

(global-set-key (kbd "C-:") 'dabbrev-completion-all)

Having the words.txt file being searched has not introduced any noticable latency or performance issues on my system. I notice a slight delay if I switch to the words.txt buffer, as it is figuring out how to display the large file, but it does not take very long, and I don't need to do that anyway. I believe earlier version of Emacs had difficulties with displaying large files quickly, so you might need to keep this in mind if using an earlier version.

This is working great so far, without any breakage. I find often, though:

- With words of a more unique character, like the word "repository" it is still more convenient to use dabbrev-expand.

- There are so many words in the 466k dictionary file, that completion sometimes will bring up 30+ candidates, such that I still have to type most of the letters of the word to narrow it down.

Another option, rather than the 466k dictionary, would be to to use this repository instead:

first20hours/google-10000-english

The file google-10000-english-usa.txt, and a few other variants, contains the top 10k most used words in the English language, according to Google.

I briefly looked into the idea of using the same dictionary that ispell uses. However, ispell uses a hash dictionary database, so their file does not work for this simple approach.

If there is some extension out there which does full-dictionary completion more efficiently, I'd be interested in hearing about it. I wonder if perhaps flyspell has some similar completion functionality built-in, but I haven't had time to look into that yet.

Copyright

This work © 2024 by Christopher Howard is licensed under Attribution-ShareAlike 4.0 International.

CC BY-SA 4.0 Deed