____ ____ ____ ____ 
||n |||i |||m |||f ||
||__|||__|||__|||__||
|/__\|/__\|/__\|/__\|

the nimf guide : strings

Strings are difficult in nimf. They work, but require some management and some thinking ahead.

String Literals

A string litteral is a string that is written in code as a string. For example, we have seen the following example in other parts of this guide:

" Hello, world "

A string litteral starts with the `"` word (the space surrounding it is important, it is a nimf word so it needs to be separated from the text that follows it). Then the text can be written. The `"` word is used again to finish the word. Basically, the `"` word works as a toggle to let the interpreter know to treat anything between it and the next instance of `"` as a string (a series of characters, which are actually integers).

Caveats

The same character escapes that are used in characters can be used within a string (`\_`, `\n`, `\e`, `\r`, `\t`, etc).
Whitespace is treated within a string the same as it is in nimf code. If you want more than one space in a row or you want a newline or a tab to appear, you must include that information as an escaped character.
Strings can span multiple lines in source code files, but must include `\n` if you actually want a line break in the string

The Temporary String Buffer

Any time a string literal is encountered, the text of that string will be stored into the temporary string buffer. This buffer resides at memory address `50`. If youb inline the `text` module, you can use `str.buf-addr` to add the buffer address to the stack, or `str.print-buf` to print the current buffer data.

The name of this buffer having the word temporary in it is no accident. There is no guarantee that a string will remain in the buffer for long. Any time a word uses a string literal that new string gets moved into the buffer. As such, you will need to move temporary strings from the buffer into long term memory if you want to keep them around without worrying about them getting lost. It is never an issue to input a string literal and then call a word to output the temp buffer. Its presence can be relied upon for that time scale, and is often used in that way.

It is important to know that the temporary string buffer extends from memory address `50` to memory address `29999`. If you need more memory than that, you will need to process your incoming string data in batches. Builtins that "read all" from a source (file or tcp) will truncate the data to fit if need be. If you are expecting a large amount of data, doing incremental reads is the better call. For a more detailed look at the nimf memory map please see the memory section of the nimf guide.

How Strings Are Stored In Memory

Many programmers familiar with C will be used to "null terminated strings". nimf does not use null terminated strings. It does, like C, think of them as character arrays though. As such, the address of a string in memory contains the length of the string. You ca then use that length to determine offsets for utilizing the string data.

Here is a visual representation of a string, `" hello "`, stored in memory:

    Address: [  50  ,  51 ,  52 ,  53 ,  54 ,  55 ]
      Value: [   5  , 104 , 101 , 108 , 108 , 111 ]
Description: [length,  `h ,  `e ,  `l ,  `l ,  `o ]

This alows for quick access to the length of the string without having to read ahead. It also allows for using offsets easily. For example:

" text " inline
" hello " str.print-buf ( will output 'hello' )
str.buf-addr str.buf-addr @ + emit ( will output 'o', the last character of the string )

In the above example we first inline the `text` module, so that we have access to some convenient words. We then enter the string literal `hello`. That gets stored in the temporary string buffer. We print it by calling `str.print-buf`, which outputs `hello`. We then add the temporary string buffer's address to the stack twice. We eat the last value with `@` to get the length on top of the stack. We then add the top two stack values to add the length to the temporary string buffer address (50 + 5), resulting in 55 being TOS. We then call `emit` to output the value at that address.

That seems like a lot just to get the value of a character in a string. It would be easy to define a word that takes a string address and an offset and either outputs the character at that address/offset or that adds its value to TOS. We are working intentionally low level here to get the basics down.

Something else to note about the above example is that when we inlined the text module we did so with a string literal. That means that prior to us adding the string literal `hello`, the temporary string buffer had the word `text` in it. You could absolutely interact with that string before inlining. After inlining it is highly probable that the temporary string buffer had a different value in it. Since `text` also inlines other modules, the strings that were used to inline them would also go into the termporary string buffer.

Be sure to read the next section of this guide, which will cover variables, memory, and string variables. There are a number of techniques to work with strings in nimf, and most involve reserving extra memory and overwriting variables with new string values.

Strings and Builtins

A number of builtins (`file.read`, `file.open`, `get-env`, and `tcp.read` all come to mind) put data into the temporary string buffer. For example, if you use `file.read` to read a line from a file it will read the line into the temporary string buffer. From there you can move it into long term memory, analyze it, print/output it, modify it... anything you like. Then, when you call `file.read` again the next available line will replace the prior one in the temporary string buffer.

It is very possible to define your own words that write to the buffer as well. Just remember to update the length as needed so that words like `str.print-buf` do not try and read more or less than you expect them to.

________________________________________________

PREV: characters

NEXT: variables & memory

the nimf guide