In a comment on my last post about the J programming language, Peter Kotrčka mentioned a flaw in the simple parsing trick I had used. I was basically assuming that the numbers in my input were separated by exactly *one* newline (or other J “word”).
I had spent some time trying to implement a parser that just looks for digits and ignores everything else, but failed. There’s an example of a sequential machine in their help pages, but I couldn’t get it to simply emit a list of numbers.
Peter made me return to that parser and I think I finally understood how it works! Thanks. 🙂
First, create an array mapping every input byte to a code. In this case, we create an array with 256 zeroes, and then we set the code to 1 for every digit.
m=: 256$0 m=: 1 (a.i.'0123456789')}m
The result looks something like this: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... – as you can see there are a bunch of ones in there. 🙂
Next we need a state transition table. It has 3 dimensions.
1. the first axis is the current state we’re in: 0 means we’re skipping stuff, 1 means we’re reading a number – we don’t need any more states
2. the second axis is the current code we’re looking at: 0 means we’re looking at some byte that’s not a digit, 1 means we’re looking at a digit (this is where m is used)
3. the third axis is simply two integers: the first one is the new state (we only have 0 or 1 to choose from), and the second one is what to do (the output code: 0 means doing nothing, 1 means start a word, 3 means emit a word and reset)
OK, so here’s how I built it:
NB. state 0 state 1 s=: 2 2 2$ 0 0 1 1 0 3 1 0
Or graphically:
<"1 s ┌───┬───┐ │0 0│1 1│ ├───┼───┤ │0 3│1 0│ └───┴───┘
Rows are the current state, columns are the input code we’re looking at.
Can you see it? I started thinking about it like this:
1. given a string such as `42 16`, starting at index 0, in state 0, with j being the beginning of a word pointing at -1 (in other words, no word is started) ...
2. we look at ’4’ and get the new state from our m: 1 (as we’re looking at a digit)
3. given this information, we need to look at the cell `[1 1]` (current state 0, input code 1)
4. the new state is set to 1, and output code 1 means we begin a new word at the current index 0
5. we look at ’2’ and get the new state from our m: still 1
6. now we’re looking at the cell `[1 0]` (current state 1, input code 1)
7. the new state continues set to 1, and output code 0 means nothing happens
8. we look at the space character and get the new state from our m: 0
9. now we’re looking at the cell `[0 3]` (current state 1, input code 0)
10. the new state changes to 0, and output code 3 means we emit a word, starting from index 0 (that’s when we started it) up to the current position ("42") and then we set j to -1 again
And so on.
The output code 3 means we emit a word and we do *not* begin a new word. This is important at the end: we could have used the output code 2 but that means we emit a word and begin a new one (j remains set), and the result is that any trailing garbage at the end is turned into a word.
And now the parser works for everything:
(0;s;m) ;: '42x16xx1' ┌──┬──┬─┐ │42│16│1│ └──┴──┴─┘
You might be wondering about the 0 at the beginning of that parser definition. The answer is that this tells the parser what to do with the words it finds. 0 means that the words end up in boxes. 2 means the word index and length, for example:
(2;s;m) ;: '42x16xx1' 0 2 3 2 7 1
At last I understand it!
#J #Programming
(Please contact me if you want to remove your comment.)
⁂
I installed j701 on the App Store.
I was also happy to learn:
J is written in portable C and is available for Windows, Linux, Mac, iOS, Android and Raspberry Pi. J can be installed and distributed for free. The source is provided under both commercial and GPL 3 licenses.
– Alex Schroeder 2019-12-13 00:31 UTC
---
When installing j901 on Debian, I had to install the following:
I don’t know what I should have installed instead.
– Alex Schroeder 2019-12-13 17:55 UTC