💾 Archived View for auragem.space › odin › ref › spec.gmi captured on 2022-04-29 at 11:23:16. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-17)
-=-=-=-=-=-=-
This is a reference manual for the Odin programming language.
Odin is a general-purpose language designed for systems programming. It is a strongly typed language with manual memory management. Programs are constructed from *packages*.
The syntax is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" [ Expression ] "." . Expression = Alternative { "|" Alternative } . Alternative = Term { Term } . Term = production_name | token [ "…" token ] | Group | Option | Repetition . Group = "(" Expression ")" . Option = "[" Expression "]" . Repetition = "{" Expression "}" .
Productions are expressions constructed from terms and the following operators, in increasing precedence:
| alternation () grouping [] option (0 or 1 times) {} repetition (0 to n times)
Source code is Unicode text encoded in UTF-8. The text is not canonicalized, so a single accented code point is distinct from the same character constructed from combining an accent and a letter; those are treated as two separate code points. In this document, the term *character* will be used to refer to a Unicode code point in the source text.
Each code point is distinct; there is case sensitivity.
Implementation restriction: A compile *must* disallow the NUL character (U+0000) in the source text. Implementation restriction: A compile *may* ignore a UTF-8-encoded byte order mark (U+FEFF) if it is the first Unicode code point in the source text. A byte order mark *must* be disallowed anywhere else in the source text.
The following terms are used to denote specific Unicode character classes:
newline = /* the Unicode code point U+000A */ unicode_char = /* an arbitrary Unicode code point except newline */ unicode_letter = /* a Unicode code point classified as "Letter" */ unicode_digit = /* a Unicode code point classified as "Number, decimal digit" */
In The Unicode Standard 8.0[1], Section 4.5 "General Category" defines a set of character categories. Odin treats all characters in any of the Letter categories Lu, Ll, Lt, Lm, or Lo as Unicode letters, and those in the Number category Nd as Unicode digits.
1: https://www.unicode.org/versions/Unicode8.0.0/
The underscore character `_` (U+005F) is considered a letter.
letter = unicode_letter | "_" . binary_digit = "0" … "1" . octal_digit = "0" … "7" . decimal_digit = "0" … "9" . dozenal_digit = "0" … "9" | "A" … "B" | "a" … "b" . hex_digit = "0" … "9" | "A" … "F" | "a" … "f" . binary_char = binary_digit | "_" . octal_char = octal_digit | "_" . decimal_char = decimal_digit | "_" . dozenal_char = dozenal_digit | "_" . hex_char = hex_digit | "_" .
Comments serve as program documentation. There are three forms:
1. *Line comments* start with the character sequence `//` and stop at the end of the line
2. *General comments* start with the character sequence `/*` and stop with a pairing character sequence `*/` to allow for nested general comments
3. *Hash-bang comments* start with the character sequence `#!` and stop at the end of the line
A comment cannot start inside a *rune* or *string* literal, or inside a line or hash-bang comment.
Tokens form the vocabulary of the Odin language. There four classes: *identifiers*, *keywords*, *operators* and *punctuation*, and *literals*. *White space*, formed from spaces (U+0020), horizontal tabs (U+0009), carriage returns (U+000D), and new lines (U+000A), is ignored except as it separates tokens that would otherwise combine into a single token
The formal grammar uses semicolons `;` as terminators. Odin programs may omit these semicolons under the following rules:
1. followed by one of the operators and punctuation on the same line: `}`, `)`
2. followed by one of the keywords on the same line: `else`
3. preceded by one of the statements: block, if, when, for, switch
4. preceded by one of the declarations: package, import, foreign import, foreign block
5. the last expression in a constant value declaration is one of the expressions followed by a new line: procedure literal
6. the last expression in a constant value declaration is one of the types followed by a new line: helper type of [6], pointer type of [6], struct type, union type, enum type, bit field type
Identifiers name program entities such as variables and types. An identifier is a sequence of one or more letters and digits. The first character in an identifier must be a letter.
identifier = letter { letter | unicode_digit } .
Some identifiers are predeclared.
The following keywords are reserved and may not be used as identifiers
align_of case defer enum import no_inline proc transmute when auto_cast cast distinct fallthrough in notin return type_of bit_field const do for inline offset_of size_of typeid bit_set context dynamic foreign macro opaque struct union break continue else if map package switch using
Some keywords are currently not used by the language but are just reserved for future use.
The following character sequences represent operators (including assignment operators) and punctuation:
+ & += &= && == != ( ) # -> - | -= |= || < <= [ ] @ <-
Implementation option: A compiler may allow the following character sequences as aliases for other operators, punctuation, and keywords:
'≠' (U+2260) alias for '!=' '≤' (U+2260) alias for '<=' '≥' (U+2260) alias for '>=' '∈' (U+2260) alias for 'in' '∉' (U+2260) alias for 'notin'
An integer literal is a sequence of digits representing an integer constant. An optional prefix sets a specific radix: `0b` for binary, `0o` for octal, `0d` for decimal, `0z` for dozenal, or `0x` for hexadecimal. In dozenal literals, letters `a-b` and `A-B` represents values ten through eleven. In hexadecimal literals, letters `a-f` and `A-F` represents values ten through fifteen.
Integer literals may contain any amount of the underscore character `_` (U+005F) within the literal after the first character.
int_lit = binary_lit | octal_lit | decimal_lit | dozenal_lit | hex_lit binary_lit = "0b" binary_digit { binary_char } . octal_lit = "0o" octal_digit { octal_char } . decimal_lit = ["0d"] decimal_digit { decimal_char } . dozenal_lit = "0z" dozenal_digit { dozenal_char } . hexadecimal_lit = "0x" hex_digit { hex_char } .
42 042 // == 42 0b1001011 0o712 0d42 0z19b3 0xDeadBeef 210206826754181103207028761697008013415622289 210_206_826_754_181_103_207_028_761_697_008_013_415_622_289
A floating-point literal is a textual representation of a floating-point constant. There are two forms of floating-point literals decimal and hexadecimal. Hexadecimal floating-point literals represent the internal integer representation of the floating-point number for that platform.
float_lit = decimal_float_lit | hexadecimal_float32_lit | hexadecimal_float64_lit . decimal_float_lit = decimals "." [decimals] [exponent] | decimals exponent | "." decimals [exponent] . decimals = decimal_digit { decimal_char } . exponent = ( "e" | "E" ) [ "+" | "-" ] decimals . hexadecimal_float32_lit = "0h" hex_digit hex_char hex_char hex_char hex_char hex_char hex_char hex_char . hexadecimal_float64_lit = "0h" hex_digit hex_char hex_char hex_char hex_char hex_char hex_char hex_char hex_char hex_char hex_char hex_char hex_char hex_char hex_char hex_char .
0. 0.0 42.36 042.36 // == 42.36 6.28318530718 1.e+0 1.054571800e-34 1.054_571_800e-34 1E9 .125 .12345e+5
An imaginary literal is a decimal representation of the imaginary part of a complex constant. It consists of a floating-point literal or a decimal integer followed by the lower-case letter `i`.
imaginary_lit = (decimals | float_lit) "i" .
0.i 0.0i 42.36i 042.36i // == 42.36i 6.28318530718i 1.e+0i 1.054571800e-34i 1.054_571_800e-34i 1E9i .125i .12345e+5i
A rune literal represents a rune constant, an integer value identifying a Unicode code point. A rune literal is expressed as one or more characters enclosed in single quotes, such as `'b'` or `'\t'`.
TODO:
\a U+0007 alert or bell \b U+0008 backspace \e U+001B escape \f U+000C form feed \n U+000A newline or line feed \r U+000D carriage return \t U+0009 horizontal tab \v U+000B vertical tab \\ U+005C backslash \' U+0027 single quote (valid escape only within rune literals) \" U+0022 double quote (valid escape only within string literals)
rune_lit = "'" ( unicode_value | byte_value ) "'" . unicode_value = unicode_char | little_u_value | big_u_value | escaped_char . byte_value = octal_byte_value | hex_byte_value . octal_byte_value = `\` octal_digit octal_digit octal_digit . hex_byte_value = `\` "x" hex_digit hex_digit . little_u_value = `\` "u" hex_digit hex_digit hex_digit hex_digit . big_u_value = `\` "U" hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit . escaped_char = `\` ( "a" | "b" | "e" | f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
TODO:
string_lit = raw_string_lit | interpreted_string_lit . raw_string_lit = "`" { unicode_char | newline } "`" . interpreted_string_lit = `"` { unicode_value | byte_value } `"` .
A type determines a set of values together with operations specific to those values. A type may be denoted by a *type name*, if it has one, or a specified using a *type literal*, which composing a type from other existing types.
Type = TypeName | TypeLit | "(" Type ")" | HelperType . TypeName = identifier | QualifiedIdent . TypeLit = ArrayType | SliceType | DynamicArrayType | StructType | UnionType | PointerType | ProcedureType | MapType | EnumType | BitSetType | BitFieldType | OpaqueType | HelperType
A boolean type represents the set of boolean truth values denoted by the predeclared constants `true` and `false`. The predeclared architecture-independent boolean types are:
bool 1 byte boolean type b8 8-bit boolean type b16 16-bit boolean type b32 32-bit boolean type b64 64-bit boolean type
A numeric type represents sets of integer, floating-point, or rune values. The predeclared architecture-independent numeric types are:
u8 the set of all unsigned 8-bit integers (0 to 255) u16 the set of all unsigned 16-bit integers (0 to 65535) u32 the set of all unsigned 32-bit integers (0 to 4294967295) u64 the set of all unsigned 64-bit integers (0 to 18446744073709551615) u128 the set of all unsigned 128-bit integers (0 to 340282366920938463463374607431768211455) i8 the set of all signed 8-bit integers (-128 to 127) i16 the set of all signed 16-bit integers (-32768 to 32767) i32 the set of all signed 32-bit integers (-2147483648 to 2147483647) i64 the set of all signed 64-bit integers (-9223372036854775808 to 9223372036854775807) i128 the set of all signed 128-bit integers (-170141183460469231731687303715884105728 to 170141183460469231731687303715884105727) f32 the set of all IEEE-754 32-bit floating-point numbers f64 the set of all IEEE-754 64-bit floating-point numbers complex64 the set of all complex numbers with float32 real and imaginary parts complex128 the set of all complex numbers with float64 real and imaginary parts byte alias for u8 rune the set of all Unicode code points represented by a 32-bit integer (-2147483648 to 2147483647)
The value of an n-bit integer is n bits wide and represented using two's complement arithmetic.
There is also a set of architecture-independent numeric types with a specified endianess:
u16le little endian representation of the set of all unsigned 16-bit integers (0 to 65535) u32le little endian representation of the set of all unsigned 32-bit integers (0 to 4294967295) u64le little endian representation of the set of all unsigned 64-bit integers (0 to 18446744073709551615) u128le little endian representation of the set of all unsigned 128-bit integers (0 to 340282366920938463463374607431768211455) i16le little endian representation of the set of all signed 16-bit integers (-32768 to 32767) i32le little endian representation of the set of all signed 32-bit integers (-2147483648 to 2147483647) i64le little endian representation of the set of all signed 64-bit integers (-9223372036854775808 to 9223372036854775807) i128le little endian representation of the set of all signed 128-bit integers (-170141183460469231731687303715884105728 to 170141183460469231731687303715884105727) u16be big endian representation of the set of all unsigned 16-bit integers (0 to 65535) u32be big endian representation of the set of all unsigned 32-bit integers (0 to 4294967295) u64be big endian representation of the set of all unsigned 64-bit integers (0 to 18446744073709551615) u128be big endian representation of the set of all unsigned 128-bit integers (0 to 340282366920938463463374607431768211455) i16be big endian representation of the set of all signed 16-bit integers (-32768 to 32767) i32be big endian representation of the set of all signed 32-bit integers (-2147483648 to 2147483647) i64be big endian representation of the set of all signed 64-bit integers (-9223372036854775808 to 9223372036854775807) i128be big endian representation of the set of all signed 128-bit integers (-170141183460469231731687303715884105728 to 170141183460469231731687303715884105727)
There is also a set of predeclared numeric types with implementation-specific sizes:
uintptr an unsigned integer large enough to store the uninterpreted bits of a pointer value int architecture word sized (32-bit or 64-bit depending on the platform) uint same size as int
To avoid portability issues for all numeric types are defined types and thus distinct, except `byte` which is an alias for `u8`. Explicit conversions are required when different numeric types are mixed in an expression or assignment. For instance, `i64` and `int` are not the same type even though they may have the same size on a particular machine.
A *string* type represents the set of string values. A string value is a (possibly empty) sequence of bytes. The number of bytes is called the length of the string and is a non-negative integer.
The predeclared strings types are:
string cstring
The length of a string `s` can be determined using the built-in procedure `len`. The length is a compile-time constant if the string is a constant. A string's bytes can only be accessed, for string types not derived from `cstring`, by integer indices 0 through `len(s)-1`.
An array is a numbered sequence of elements of a single type, called the element type. The number of element is a called the length of the array and is a non-negative integer.
ArrayType = "[" ArrayLength "]" ElementType . ArrayLength = Expression . ElementType = Type .
The length is part of the array's type; it must evaluate to a non-negative constant representable by a value of type `int`. The length of array `a` be determined using the built-in procedure `len`. The elements can be address by indices 0 through `len(a)-1`. Array types are always one-dimensional but may be composed to form multi-dimensional types.
[32]byte [2*N + 1]union{int, string} [42]^f32 [2][3]int [3][3][3]f64 // same as [3]([3]([3]f64))
A slice is a descriptor for a contiguous segment of *underlying array/memory* and provides access to an indexed sequence of elements from that array/memory. A slice type denotes the set of all slices of arrays of its element type. The number of elements is called the length of the slice and is never negative. The value of an uninitialized slice is `nil`.
SliceType = "[" "]" ElementType .
The length of a slice `s` can be determined by the built-in procedure `len`. Unlike with regular arrays, this length is not a compile-time constant and must be determined during execution. The elements can be addressed by integer indicies `0` through `len(s)-1`.
DynamicArrayType = "[" "dynamic" "]" ElementType .
A struct is a sequence of named elements, called fields, each of which has a name and a type. Within a struct, non-blank field names must be unique.
StructType = "struct" StructTypeTags "{" FieldList "}" . FieldList = FieldDecl { "," FieldDecl } . FieldDecl = [ [using] IdentifierList ":" ] Type . StructTypeTags = { (StructTypeTagPacked | StructTypeTagRawUnion | StructTypeTagAlign) } . StructTypeTagPacked = "#" "packed" . StructTypeTagRawUnion = "#" "raw_union" . StructTypeTagAlign = "#" "align" Expression .
// An empty struct struct {} // A struct with 5 fields struct { x, y: int, f: f32, _: f32, // padding a: ^[]int, p: proc() -> int, } // A struct with no padding between its fields struct #packed { a: u8, b: u16, c: u32, d: u8, }
The `#packed` tag states the memory representation of the struct type to have no padding between its fields.
The `#raw_union` tag states the memory representation of the struct type to have the size of its largest in size field and alignment of its largest in alignment member, and have all fields be accessed by the same memory offset of zero.
The `#raw_union` tag and `#packed` tag cannot be combined together.
The `#align` tag explicitly states the alignment required for the struct type.
TODO: `rawptr`
Two types are either *identical* or *different*.
A distinct type is always different from any other type. Otherwise, type types are identical if their underlying type literals are structurally equivalent; that is, they have the same literal structure and corresponding components have identical types. In detail:
A value `x` is assignment to a variable of type `T` ("`x` is assignable to `T`") if one of the following conditions applies:
A constant `x` is representable by a value of type `T` if one of the following conditions applies:
A *block* is a possibly empty sequence of declarations and statements within matching brace brackets.
Block = "{" StatementList "}" . StatementList = { Statement ";" } .
In addition to explicit blocks in the source text, there are implicit blocks:
1. The *universal block* encompasses all Odin source text.
2. Each package has a *package block* containing all Odin source text for that package.
3. Each file has a *file block* containing all Odin source text in that file.
4. Each "if", "for", "switch" statement is considered to be in its own implicit block.
Blocks nest and influence scoping.
Labels are declared by labeled statements and used in the "break" and "continue" statements. In contrast to other identifiers, labels are not block scoped. The scope of the label is the body of the procedure in which it is declared and excludes the body of any nested procedure.
The *blank identifier* is represented by the underscore character `_`. It acts as an anonymous placeholder rather than a regular non-blank identifier. It has a special meaning in declarations and in assignments.
Types: bool b8 b16 b32 b64 byte complex64 complex128 f32 f64 int i8 i16 i32 i64 i16le i32le i64le i16be i32be i64be uint uintptr u8 u16 u32 u64 u16le u32le u64le u16be u32be u64be any rawptr rune string cstring Constants: true false Zero value: nil Procedures: len cap complex real imag conj swizzle expand_to_tuple min max abs clamp
Unary operators have the highest precedence.
There are eight precedence levels for binary operators.
Precedence Operator 8 & / % %% << >> & &~ 7 + - | ~ 6 'in' 'notin' 5 == != < <= > >= 4 && 3 || 2 .. // If allowed 1 ? // Ternary expression
Binary operators of the same precedence associate from left to right. For instance, `x / y * z` is the same as `(x / y) * z`.
+ sum integers, floats, complex values, constant strings values - difference (binary) integers, floats, complex values - negation (unary) integers, floats, complex values
Arithmetic operators also work on fixed-length arrays of numeric types.
For two integer values `x` and `y`, the integer quotient `q = x / y` and remainder `r = x % y` satisfy the following relationships:
x = q*y + r and |r| < |y|
with `x / y` truncated towards zero ("truncated division[2]").
2: https://wikipedia.org/wiki/Modulo_operation
A conversion changes the type of an expression to the type specified by the conversion. A conversion may appear literally in the source, or it may be implied by the context in which an expression appears.
An *explicit* conversion is an expression of the form `T(x)` or `cast(T)x` where `T` is a type and `x` is an expression that can be converted to type `T`.
Conversion = CallConversion | CastConversion . CallConversion = Type "(" Expression [ ", "] ")" . CastConversion = "cast" "(" Type ")" Expression .
When storage is allocated for a variable, through a declaration, or when a new value is created through a composite literal, and no explicit initialization is provided, the variable or value is given a default value. Each element of such a variable or value is set to the *zero value* for its type.
These two simple declarations are equivalent:
i: int; i: int = 0;
type size in bytes byte, u8, i8, b8 1 u16, i16, u16le, i16le, b16 2 u32, i32, u32le, i32le, b32, f32, rune 4 u64, i64, u64le, i64le, b64, f64, complex64 8 complex128 16
Minimal alignment properties:
1. For a variable `x` of any type, `align_of(x)` is at least 1
2. For a variable `x` of array type, `align_of(x)` is the same as the alignment of a variable of the array's element type
3. For a variable `x` of enum type, `align_of(x)` is the same as the alignment of a variable of the enum's base type (default of `int` if not specified)
4. For a variable `x` of struct type, `align_of(x)` is the largest of all the values `align_of(x.f)` for each field `f` of `x`, but at least 1, unless the alignment has been explicitly stated by the struct tag `#align`, or the alignment is 1 if the struct is declared to be `#packed`.
A struct, bit set, bit field, array, has size zero if it contains not field (or elements) that have a size greater than zero.
A union has size zero if it contains zero variant types.