💾 Archived View for freeside.wntrmute.net › log › 2020 › 20201222.gmi captured on 2021-12-05 at 23:47:19. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Back to gemlog

Scheming

2020-12-22 10:24 PST

now listening: I Couldn't Believe It Was True - Carla Bozulich - Red Headed Stranger

It's been a few days since the last post; I spent some time traipsing around in the mountains with snowshoes.

I've been working on the Lisp for the PCM4 computer. It's funny; writing a real lisp implementation has been something I've wanted to do for a while. I wrote a tiny scheme implementation based on Peter Norvig's lisp.py at one point, and did the "Write You a Scheme in 48 Hours" (naming my lisp Hemera). Hemera didn't even get to the point where it had lambdas, though:



$ ./hemera 
Hemera> (define (square x) (* x x))
The variable x is unbounded.
Hemera> (define square (lambda (x) (* x x)))
Illegal function call: "x"

Also, it's interesting to note that five years later, I get tons of deprecation warnings on the code base. I have a real love/hate relationship with Go, but at least my code from (longer than) five years ago still builds and runs. Even my C code will build and run. Anyways.

Peter Norvig's small Lisp interpreter in Python

An implementation of Scheme in Haskell

The book I was reading (writing an interpreter in Go) has been really helpful. I've built a tiny hash table implementation that uses an 8-bit hash value. I thought I would be able to put off hash table handling, but my first test case had a collision using the keys "one", ..., "five" - turns out "three" and "four" hash the same under my first incredibly sophisticated hashing algorithm.



#define HASH_TYPE	uint8_t
#define HASH_TABLE_SIZE	251

static HASH_TYPE
hash_key(char *k)
{
	HASH_TYPE	h = 1;
	size_t		i = 0;

	while (k[i] != 0) {
		h += k[i];
		h = h << i;
		i++;
	}

	return h;
}

I wrote a little program that hashed a lot of the symbol names (I won't say all) from ulisp, so I've been tweaking it. At first, I was trying this and came up with a list of keys that hash the same. I was also reading on a CS lecture notes page that prime hash table sizes are "better", so I wrote a quick program to figure out the highest 8-bit prime number (it's 251).



#define HASH_TYPE	uint8_t
#define HASH_TABLE_SIZE	251

static HASH_TYPE
hash_key(char *k)
{
	HASH_TYPE	h = 1;
	size_t		i = 0;

	while (k[i] != 0) {
		h *= (uint8_t)k[i];
		h %= HASH_TABLE_SIZE;
		i++;
	}

	return h;
}

This worked with the hash test program I wrote that just used keys like "one" and "two"; once I applied it to the list of Scheme symbols, I noticed some pretty significant clusters. I wondered if I could do better, and came up with this:



#define HASH_TYPE	uint8_t
#define HASH_TABLE_SIZE	251

static HASH_TYPE
hash_key(char *k)
{
	HASH_TYPE	h = 1;
	size_t		i = 0;

	while (k[i] != 0) {
		h >>= 1;
		h++;
		h *= (uint8_t)k[i];
		h %= HASH_TABLE_SIZE;
		i++;
	}

	return h;
}

The bit shift is sort of an arbitrary change that I decided to try out after reading about prime multipliers; multiplication is expensive but a bit shift down isn't. I had to add the increment, otherwise the hash value just ends up multiplying by zero right away and everything hashes to zero and that just won't do. At the end of this post, I've included the results of running the test program over the symbol list.

Hash table notes

There's still a lot to figure out: garbage collection, evaluation, lambdas, etc. But it's engaging; I think I was up until 2 am working stuff out.

Appendix: hash results

Collision results



kyle@wntrmute:~/code/hashc$ xargs ./hashc < atoms.txt | awk '{print $1}' | \
	sort | uniq -c | sort -n
      1 000000
      1 000006
      1 000007
      1 000011
      1 000015
      1 000016
      1 000017
      1 000021
      1 000028
      1 000030
      1 000033
      1 000037
      1 000051
      1 000056
      1 000061
      1 000062
      1 000063
      1 000064
      1 000065
      1 000066
      1 000076
      1 000086
      1 000092
      1 000100
      1 000104
      1 000105
      1 000107
      1 000110
      1 000116
      1 000120
      1 000129
      1 000130
      1 000140
      1 000141
      1 000146
      1 000149
      1 000156
      1 000164
      1 000168
      1 000169
      1 000170
      1 000173
      1 000178
      1 000182
      1 000184
      1 000186
      1 000192
      1 000196
      1 000198
      1 000200
      1 000207
      1 000217
      1 000219
      1 000223
      1 000226
      1 000228
      1 000231
      1 000232
      1 000236
      1 000238
      1 000242
      1 000243
      2 000001
      2 000024
      2 000029
      2 000031
      2 000036
      2 000038
      2 000039
      2 000040
      2 000042
      2 000045
      2 000047
      2 000060
      2 000079
      2 000084
      2 000097
      2 000098
      2 000117
      2 000125
      2 000132
      2 000136
      2 000152
      2 000160
      2 000193
      2 000204
      2 000240
      2 000247
      3 000004
      3 000012
      3 000032
      3 000048
      3 000052
      3 000099
      3 000212
      3 000248
      4 000043
      4 000148

Hash values for names



kyle@wntrmute:~/code/hashc$ xargs ./hashc < atoms.txt
000042	*
000043	+
000045	-
000047	/
000060	<
000099	<=
000061	=
000062	>
000160	>=
000006	abs
000125	accumulate
000198	align
000012	analogread
000247	analogwrite
000032	and
000029	appearances
000104	append
000079	apply
000021	assoc
000125	before?
000164	begin
000236	bf
000024	bl
000051	boolean?
000039	break
000040	butfirst
000232	butlast
000084	car
000186	cdr
000148	ceiling
000038	children
000065	close-all-ports
000052	close-input-port
000052	close-output-port
000004	cond
000043	cons
000136	cos
000048	count
000066	c...r
000193	datum
000001	define
000217	delay
000212	digitalread
000097	digitalwrite
000156	display
000032	edit
000110	empty?
000129	eof-object?
000247	equal?
000148	error
000117	even?
000031	every
000084	expt
000204	filter
000116	first
000086	floor
000024	for-each
000004	format
000132	for-millis
000028	gc
000017	globals
000030	if
000212	integer?
000056	item
000048	keep
000037	lambda
000092	last
000248	length
000200	let
000100	list
000141	list?
000079	list-library
000182	list-ref
000032	list->vector
000060	load
000238	load-image
000219	log
000231	make-node
000098	make-vector
000248	makunbound
000064	map
000160	max
000152	member
000243	member?
000136	millis
000240	min
000170	newline
000196	not
000015	note
000117	null?
000007	number?
000173	odd?
000029	open-input-file
000193	open-output-file
000240	or
000031	pinmode
000204	pprint
000140	pprintall
000004	prin1
000045	princ
000076	print
000000	procedure?
000207	quote
000228	quotient
000105	random
000040	read
000097	read-byte
000149	read-line
000042	read-string
000184	reduce
000146	remainder
000132	repeated
000099	require
000148	restart-i2c
000011	room
000152	round
000043	save-image
000226	se
000001	sentence
000063	sentence?
000012	show
000130	show-line
000212	sin
000016	sleep
000012	sqrt
000036	square
000098	terpri
000033	trace
000099	untrace
000178	vector
000038	vector?
000248	vector-length
000192	vector->list
000120	vector-ref
000107	vector-set!
000223	vowel?
000168	with-i2c
000148	with-sd-card
000052	with-serial
000169	with-spi
000048	word
000039	word?
000043	write
000047	write-byte
000036	write-line
000242	write-string