💾 Archived View for freeside.wntrmute.net › log › 2020 › 20201222.gmi captured on 2021-12-05 at 23:47:19. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
now listening: I Couldn't Believe It Was True - Carla Bozulich - Red Headed Stranger
It's been a few days since the last post; I spent some time traipsing around in the mountains with snowshoes.
I've been working on the Lisp for the PCM4 computer. It's funny; writing a real lisp implementation has been something I've wanted to do for a while. I wrote a tiny scheme implementation based on Peter Norvig's lisp.py at one point, and did the "Write You a Scheme in 48 Hours" (naming my lisp Hemera). Hemera didn't even get to the point where it had lambdas, though:
$ ./hemera Hemera> (define (square x) (* x x)) The variable x is unbounded. Hemera> (define square (lambda (x) (* x x))) Illegal function call: "x"
Also, it's interesting to note that five years later, I get tons of deprecation warnings on the code base. I have a real love/hate relationship with Go, but at least my code from (longer than) five years ago still builds and runs. Even my C code will build and run. Anyways.
Peter Norvig's small Lisp interpreter in Python
An implementation of Scheme in Haskell
The book I was reading (writing an interpreter in Go) has been really helpful. I've built a tiny hash table implementation that uses an 8-bit hash value. I thought I would be able to put off hash table handling, but my first test case had a collision using the keys "one", ..., "five" - turns out "three" and "four" hash the same under my first incredibly sophisticated hashing algorithm.
#define HASH_TYPE uint8_t #define HASH_TABLE_SIZE 251 static HASH_TYPE hash_key(char *k) { HASH_TYPE h = 1; size_t i = 0; while (k[i] != 0) { h += k[i]; h = h << i; i++; } return h; }
I wrote a little program that hashed a lot of the symbol names (I won't say all) from ulisp, so I've been tweaking it. At first, I was trying this and came up with a list of keys that hash the same. I was also reading on a CS lecture notes page that prime hash table sizes are "better", so I wrote a quick program to figure out the highest 8-bit prime number (it's 251).
#define HASH_TYPE uint8_t #define HASH_TABLE_SIZE 251 static HASH_TYPE hash_key(char *k) { HASH_TYPE h = 1; size_t i = 0; while (k[i] != 0) { h *= (uint8_t)k[i]; h %= HASH_TABLE_SIZE; i++; } return h; }
This worked with the hash test program I wrote that just used keys like "one" and "two"; once I applied it to the list of Scheme symbols, I noticed some pretty significant clusters. I wondered if I could do better, and came up with this:
#define HASH_TYPE uint8_t #define HASH_TABLE_SIZE 251 static HASH_TYPE hash_key(char *k) { HASH_TYPE h = 1; size_t i = 0; while (k[i] != 0) { h >>= 1; h++; h *= (uint8_t)k[i]; h %= HASH_TABLE_SIZE; i++; } return h; }
The bit shift is sort of an arbitrary change that I decided to try out after reading about prime multipliers; multiplication is expensive but a bit shift down isn't. I had to add the increment, otherwise the hash value just ends up multiplying by zero right away and everything hashes to zero and that just won't do. At the end of this post, I've included the results of running the test program over the symbol list.
There's still a lot to figure out: garbage collection, evaluation, lambdas, etc. But it's engaging; I think I was up until 2 am working stuff out.
kyle@wntrmute:~/code/hashc$ xargs ./hashc < atoms.txt | awk '{print $1}' | \ sort | uniq -c | sort -n 1 000000 1 000006 1 000007 1 000011 1 000015 1 000016 1 000017 1 000021 1 000028 1 000030 1 000033 1 000037 1 000051 1 000056 1 000061 1 000062 1 000063 1 000064 1 000065 1 000066 1 000076 1 000086 1 000092 1 000100 1 000104 1 000105 1 000107 1 000110 1 000116 1 000120 1 000129 1 000130 1 000140 1 000141 1 000146 1 000149 1 000156 1 000164 1 000168 1 000169 1 000170 1 000173 1 000178 1 000182 1 000184 1 000186 1 000192 1 000196 1 000198 1 000200 1 000207 1 000217 1 000219 1 000223 1 000226 1 000228 1 000231 1 000232 1 000236 1 000238 1 000242 1 000243 2 000001 2 000024 2 000029 2 000031 2 000036 2 000038 2 000039 2 000040 2 000042 2 000045 2 000047 2 000060 2 000079 2 000084 2 000097 2 000098 2 000117 2 000125 2 000132 2 000136 2 000152 2 000160 2 000193 2 000204 2 000240 2 000247 3 000004 3 000012 3 000032 3 000048 3 000052 3 000099 3 000212 3 000248 4 000043 4 000148
kyle@wntrmute:~/code/hashc$ xargs ./hashc < atoms.txt 000042 * 000043 + 000045 - 000047 / 000060 < 000099 <= 000061 = 000062 > 000160 >= 000006 abs 000125 accumulate 000198 align 000012 analogread 000247 analogwrite 000032 and 000029 appearances 000104 append 000079 apply 000021 assoc 000125 before? 000164 begin 000236 bf 000024 bl 000051 boolean? 000039 break 000040 butfirst 000232 butlast 000084 car 000186 cdr 000148 ceiling 000038 children 000065 close-all-ports 000052 close-input-port 000052 close-output-port 000004 cond 000043 cons 000136 cos 000048 count 000066 c...r 000193 datum 000001 define 000217 delay 000212 digitalread 000097 digitalwrite 000156 display 000032 edit 000110 empty? 000129 eof-object? 000247 equal? 000148 error 000117 even? 000031 every 000084 expt 000204 filter 000116 first 000086 floor 000024 for-each 000004 format 000132 for-millis 000028 gc 000017 globals 000030 if 000212 integer? 000056 item 000048 keep 000037 lambda 000092 last 000248 length 000200 let 000100 list 000141 list? 000079 list-library 000182 list-ref 000032 list->vector 000060 load 000238 load-image 000219 log 000231 make-node 000098 make-vector 000248 makunbound 000064 map 000160 max 000152 member 000243 member? 000136 millis 000240 min 000170 newline 000196 not 000015 note 000117 null? 000007 number? 000173 odd? 000029 open-input-file 000193 open-output-file 000240 or 000031 pinmode 000204 pprint 000140 pprintall 000004 prin1 000045 princ 000076 print 000000 procedure? 000207 quote 000228 quotient 000105 random 000040 read 000097 read-byte 000149 read-line 000042 read-string 000184 reduce 000146 remainder 000132 repeated 000099 require 000148 restart-i2c 000011 room 000152 round 000043 save-image 000226 se 000001 sentence 000063 sentence? 000012 show 000130 show-line 000212 sin 000016 sleep 000012 sqrt 000036 square 000098 terpri 000033 trace 000099 untrace 000178 vector 000038 vector? 000248 vector-length 000192 vector->list 000120 vector-ref 000107 vector-set! 000223 vowel? 000168 with-i2c 000148 with-sd-card 000052 with-serial 000169 with-spi 000048 word 000039 word? 000043 write 000047 write-byte 000036 write-line 000242 write-string