💾 Archived View for schinkel.bevuta.com › bones › MANUAL.txt captured on 2024-06-16 at 12:45:24.
⬅️ Previous capture (2023-11-04)
-=-=-=-=-=-=-
The BONES Manual ================ /Release 8/ Table of Contents ================= 1 Introduction 2 Portability 3 Requirements 4 Installation 5 Reporting Bugs and Upgrading 6 Usage 6.1 Compiling Scheme code 6.2 Assembling and linking Executables 6.3 Command-line Options 6.4 Configuration Macros for Generated Assembly Code 6.5 Configurations 7 Language Description 7.1 Program Specifications 7.2 Deviations from R5RS 7.3 Extensions to R5RS 7.4 Additional Library Code 7.5 The Evaluator 8 Compiler Details 8.1 Compilation Strategy 8.2 Compiler Stages 8.3 Register Usage 8.4 Optimizations Performed 8.5 Performance 8.6 Hacking and Extending the Compiler 9 Data Representation 10 Garbage Collection 11 Interfacing to Foreign Code 12 Embedding 13 Debugging 14 Bugs and Limitations 15 Porting Guide 16 Suggestions for Projects 17 Terms of Use 18 Further Reading 19 Contact 1 Introduction ~~~~~~~~~~~~~~~ This is BONES, a compiler for R5RS Scheme that generates x86_64 assembly code. BONES is designed to be simple and easy to understand, both to reduce the effort to learn and extend the system and to keep the complexity of the compiler at a minimum. BONES is a batch-compiler, it takes a Scheme source file and produces an assembler file to be subsequently translated into object code. It is also a whole-program compiler, which means it does not support separate compilation of multiple modules. The runtime system is by default added to your program before it is compiled, so there are no external libraries, with the exception of a few bits from the C library ("libc") (this is optional). BONES is mostly R5RS-compliant, but intentionally cuts some corners to reduce code-size, increase performance and simplify the compiler and runtime system. Type-checks are generally omitted, for example. Very little error checking is done and arithmetic overflow of small integers ("fixnums") is not detected. Some R7RS procedures are available in addition to the primitives required for R5RS. Since BONES produces assembly-code, build-times are quite short. The produced code is CPU- and OS-specific and currently supports x86_64 on Linux, *BSD, Mac or Windows. Porting the system to other architectures and operating systems should be (relatively) straightforward, as the platform-specific parts of the compiler are small and the runtime-system is just a single file of about two thousand lines of assembly code. The compiler is self-compiling and uses just a few functions from the C-library. Alternatively, programs can be compiled on Linux without using a C library, by just invoking raw system calls, but support for this is currently incomplete (mostly related to number<->string conversion). Code compiled with BONES can be easily embedded into programs written in other languages as long as they allow calling C functions. There are very little debugging facilities. The compiler expects correct code and no attempt is made to provide more than the most basic error messages. It is recommended to develop and test code first in an interactive Scheme implementation and use BONES only for generating executables when the code can be assumed to work. The development files for BONES are hosted at [bitbucket.org], where you will find example code (mostly tests). The compiler itself is developed with a currently unreleased Scheme system, but as BONES compiles itself, no additional implementation is needed to make changes and extend the system. [bitbucket.org]: https://bitbucket.org/bunny351/bones 2 Portability ~~~~~~~~~~~~~~ To run programs compiled with BONES (and to run the compiler itself), an x86_64 system is required, with support for SSE2 and SSE3 instructions. Currently the compiler runs under the Linux, *BSD, Mac OS X and Windows operating systems. BONES has been successfully tested on the following systems: OS nasm C compiler libc -------------------------------------------------------------------------------+---------+----------------+------------- Linux Mint 13 Maya Ubuntu/Linaro 4.6.3-1ubuntu4 Linux 3.2.0-23-generic x86_64 2.09.10 gcc 4.6.3 EGLIBC 2.15 musl 1.1.0 openSUSE 13.1 (Bottle) Linux 3.15.1-2.g3289da4-default x86_64 2.09.10 gcc 4.8.1 glibc 2.18 Windows 7 Professional Mac OS 10.8.5 2.11.05 clang-503.0.38 OpenBSD 5.5 (GENERIC.MP) #315 2.10.09 gcc 4.2.1 3 Requirements ~~~~~~~~~~~~~~~ The [NASM] assembler is required on all supported platforms. To link the assembled code you will typically need a C development environment. On Linux [GCC] is the recommended compiler, but you will only need it for linking, and for providing the basic C runtime code (CRT). A subset of the functionality provided by this system can be compiled to native code that does not need C runtime support - in this case all you need is `ld', the GNU linker. On Windows, [Microsoft Visual C] and [MINGW] are supported. On Mac OS, [Xcode] should be installed, with `gcc' or `clang' binaries in the `PATH', or another C compiler that is able to link macho64 binaries. [NASM]: http://www.nasm.us [GCC]: http://gcc.gnu.org [Microsoft Visual C]: http://www.visualstudio.com/en-us/downloads/download-visual-studio-vs.aspx [MINGW]: http://www.mingw.org [Xcode]: https://developer.apple.com/xcode/ 4 Installation ~~~~~~~~~~~~~~~ BONES is written in Scheme and translates Scheme to x86_64 assembler for the syntax accepted by NASM, which will have to be installed on your system. To link the assembled code you will need a linker and C runtime support files, including the C library, so a C compiler should be installed as well. To build the system, you need the pre-compiled assembly code for the `bones' executable, which you can obtain by downloading a distribution tarball at this location: [http://www.call-with-current-continuation.org/bones/bones.tar.gz] or [http://www.call-with-current-continuation.org/bones/bones.zip] After downloading, unpack the file using the tar(1) command and assemble and link the appropriate assembler file for the compiler: - For Linux: tar xfz bones.tar.gz cd bones-<date> nasm -f elf64 bones-x86_64-linux.s -o bones.o gcc bones.o -o bones -lrt - For Free/Net/OpenBSD: tar xfz bones.tar.gz cd bones-<date> nasm -f elf64 bones-x86_64-bsd.s -o bones.o gcc bones.o -o bones At least on OpenBSD, you may need to invoke `gcc' with the `-static' and `-nopie' options to generate a working executable. Alternatively pass `-feature pic' to bones when compiling the Scheme code. - For Windows: Unzip the archive and enter the following commands in a command shell window that has access to the command-line MSVC development tools: cd bones-<date> nasm -f win64 bones-x86_64-windows.s -o bones.obj link bones.obj libcmt.lib /out:bones.exe If you are using MINGW, linking is done with the following command: gcc bones.obj -o bones.exe - For Mac OS: tar xfz bones.tar.gz cd bones-<date> nasm -f macho64 bones-x86_64-mac.s -o bones.o gcc bones.o -o bones You may get a linker warning complaining that the object file produced by NASM does not have "PIE" enabled. I'm not completely sure why this happens and how this warning can be prevented. The linked executables run fine as far as I can tell (it may be caused by a bug in NASM or the Apple linker, but I'm not sure about it.) You can disable this warning by adding the option `-Wl,-no_pie' when linking an executable. Now you have a compiler, which you can test by entering ./bones which should give you some usage information. The `bones' executable can be moved anywhere you desire, but assumes that some supporting files are in the directory where it is invoked. You can use the `-L' command-line option (described below) to set the directory where the supporting files are located. The default search path contains the following directories: - The current directory (".") - The value of `BONES_LIBRARY_PATH', if set. Multiple directories can be given, when separated by ";" (semicolon) on Windows or ":" (colon) on other systems. - "/usr/share/bones" - "/usr/local/share/bones" Additionally, when assembling generated code produced by BONES, you will have to pass `-I<INSTALLDIR>/' to the `NASM' invocation, as the code contains `%include' directives that refer to the assembler-parts of the runtime system. Note that trailing slash - this is required for `NASM'. 5 Reporting Bugs and Upgrading ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you find any bugs in BONES (you probably will), please send them to the author (see below under "Contact") so that the bug can be fixed in future releases. You should also first check the [git(1)] development repository at [http://bitbucket.org/bunny351/bones/] - perhaps the bug has already been fixed. The `master' branch should always be in a usable state, just clone the repository and use the contained sources to build a new `bones': git clone https://bitbucket.org/bunny351/bones.git cd bones bones bones.scm -o mynewbones.s Consult the `NEWS' file for obtaining information about the latest changes. [git(1)]: http://www.git-scm.org 6 Usage ~~~~~~~~ 6.1 Compiling Scheme code ========================== Compiling code is straightforward. The `bones' program will compile a single Scheme source file given on the command-line and write the generated assembly code to stdout. You will have to assemble and link the generated code by hand, subprocesses to do this are not invoked automatically. The compiler accepts a few command-line options which are described below. Optionally, the file given to `bones' may be a /program specification/, an extended variant of the configuration language described in [SRFI-7]. The compiler treats code that consists of a single toplevel expression in the form `(program ...)' as a program specification. If the source file does not have this form, it is treated as if the code where embedded into the following specification: (program (include "base.scm") (code <...your code...>)) See below for a description of the configuration language and the default configuration. [SRFI-7]: http://srfi.schemers.org/srfi-7 6.2 Assembling and linking Executables ======================================= Once the compiler has produced an assembler file, it has to be assembled and linked. Using NASM, this is done like this: nasm -f <FORMAT> <FILENAME> -o <OBJECTFILENAME> `<FORMAT>' specifies the output format suitable for the platform on which the object file should be linked. For Linux and *BSD this is `elf64', for Windows `win64' and for Mac OS `macho64'. You can optionally add debug-information to have source-level access when debugging the generated executable with (say) `gdb'. This is done by adding `-g -F <DEBUGFORMAT>' to the `nasm' command line, where `<DEBUGFORMAT>' should be the format suitable for your particular platform, on Linux `dwarf' seems to work for me. Linking is done by invoking the appropriate linker for the target platform. On Linux, *BSD or Mac OS this is done by the "gcc" compiler driver, which also adds the necessary C runtime libraries needed: gcc <OBJECTFILENAME> -o <PROGRAMFILENAME> Note that on Linux you'll need to add `-lrt' to the linker invocation. On Windows the `link' program from Microsoft Viusal Studio can be used: link <OBJECTFILENAME> libcmt.lib /out:<PROGRAMFILENAME> Alternatively, use the MINGW compiler driver: gcc <OBJECTFILENAME> -o <PROGRAMFILENAME> 6.3 Command-line Options ========================= The compiler accepts the following options in any order. `-case-insensitive': Compile source code in case-insensitive mode. The default is to be case-sensitive. `-comment': Emit source-code forms as comments in generated assembly code. `-dump-features': Dump the set of enabled features after any program specification has been completely parsed and stop. `-dump-unused': Dump unused global variables, including library definitions and stop. `-expand': Dump source code to stdout after syntax-expansion and stop. `-feature FEATURE': Define `FEATURE', for use in program-specifications or `cond-expand' clauses. `-L LIBRARYPATH': Tells the compiler where to look for files to include directly or indirectly from program specifications. This option may be given multiple times where every occurence of a library-path is prepended to the search path. `-nostdlib': Do not wrap the source code into a default program specification as described above. This is mainly useful when you have pre-expanded code that was earlier produced by using `-expand'. `-o FILENAME': Write generated code to `FILENAME' instead of stdout. `-v': Print the compiler version and exit. `-verbose': Print currently executing compilation phases and some optimzation statistics to stderr. 6.4 Configuration Macros for Generated Assembly Code ===================================================== `ENABLE_WRITE_BARRIER': When mutating data-structures that are not stored in the heap, the assigned value may be lost for tracing during garbage collection, leading to errors that are very hard to detect. When enabling this macro, the program aborts in such cases with an error message. `ENABLE_GC_LOGGING': Write some informational output to stderr every time a garbage collection is triggered. `PREFIX': Defines a custom entry-point prefix for embedding. By passing `-DPREFIX=my' to NASM, the Scheme toplevel can be run by calling `my_bones'. `TOTAL_HEAP_SIZE': Total size of the heap. Half of this space can be in use at any time. Defaults to 100MB. 6.5 Configurations =================== /Configurations/ are used for the composition of programs, depending on source files and /features/ that are used to perform more fine-grained control over the characteristics of the compiled source and generated assembly code. The compiler will compile normal Scheme code with a default configuration, mentioned above, that includes further configuration options from the file `base.scm', selecting the default target and the primitives available, that is, most R5RS standard procedures and some non-standard extensions. To exercise more control, one can use the `program' form to specify source-files and enable or disable code to be included in the program that will be compiled. If you use your own `program' form, make sure to add a clause including the base components in `base.scm', as it provides some intrinsic forms that are necessary for correct execution of the standard procedures. Study `base.scm' for more information about this. Default features defined by `base.scm': Feature Meaning ---------------------+------------------------------------------------------------------ x86_64 Architecture identifier ieee754 Floating point format used linux Operating system bsd Operating system windows Operating system mac Operating system lp64 64-bit data model (Linux, *BSD + Mac) llp64 64-bit data model (Windows) file-ports Primitives operating on file-ports are available file-system Primitives operating on file-system entitities are available process-environment Access to the process-environment and subprocesses are available time Access to the system time is available jiffy-clock Access to the internal real-time clock is available srfi-0 Specify availability of SRFI functionality srfi-6 srfi-7 srfi-16 srfi-46 The compiler option `-dump-features' can be used to show the features that are by default available for a given configuration. The following features are by default available to enable language extensions or target-specific compilation modes: Feature Meaning ----------+--------------------------------------------------------------------------------------------------- check Insert low-level type- and limit-checks into the generated code embedded Generate code that can be embedded into an application nolibc Generate code that does not require linking with the C runtime library pic Generate position-independent code, always enabled on Windows and Mac, optional on *BSD and Linux 7 Language Description ~~~~~~~~~~~~~~~~~~~~~~~ 7.1 Program Specifications =========================== 7.1.1 Syntax ------------- Program specifications follow mostly the format as defined in the [SRFI-7 specification], with the following extensions: `(cond-expand ...)': Can be used interchangebly with `feature-cond'. `(error STRING)': Report error and abort compilation. `(include FILENAME ...)': Includes more program-specification clauses from `FILENAME'. `(provide FEATURE ...)': Adds additional features to the set of currently active features visible by `cond-expand' and `feature-cond' clauses. `FEATURE' should be a symbol. [SRFI-7 specification]: http://srfi.schemers.org/srfi-7/srfi-7.html 7.1.2 Builtin Features ----------------------- Features available by default are: `bones', `srfi-0', `srfi-7', `srfi-16', `srfi-46'. 7.2 Deviations from R5RS ========================= 7.2.1 Section 1.3.2 -------------------- Argument types to primitive procedures are not checked. The only detected errors are errors related to port- and file-system operations, when the heap is exhausted, or explicit calls to the `error' procedure. 7.2.2 Section 2 ---------------- BONES is by default case-sensitive, both at compile-time and at run-time. 7.2.3 Section 3.4 ------------------ Literal constants are immutable and destructively modifying such a constant will have an undefined effect. 7.2.4 Section 4.2.1 -------------------- Vectors are self-evaluating, as in R7RS. 7.2.5 Section 6 ---------------- Built-in procedures may be redefined (using `define' or `define-syntax'), but it is undefined whether other standard procedures will still work as expected. 7.2.6 Section 6.1 ------------------ `eqv?' performs /structural/ comparison, which means it compares the contents of its two arguments, in case they are of equal type. That means it will will return `#t' if both arguments have the same type and identical contents, even if the arguments are not numbers or characters. `eq?' will not necessarily return the expected result when comparing primitive procedures. Many of these will be compiled in-line using /identifier macros/, a feature of the syntax-expander used, which means that for example the two occurrences of `zero?' below will refer to two different occurrences of the code for that primitive: (eq? zero? zero?) ==> #f 7.2.7 Section 6.2.1 -------------------- Only exact signed integers with 63 bits of magnitude (/fixnums/) and IEEE-754 extended precision floating point numbers (/flonums/) are supported. Numerical overflow is not detected, so arithmetic operations on fixnums will silently wrap on overflow. 7.2.8 Section 6.2.3 -------------------- `/' always returns an inexact number. This is not a violation of R5RS, but should be kept in mind. 7.2.9 Section 6.2.4 -------------------- The `#' character is not allowed inside inexact numeric constants. 7.2.10 Section 6.2.5 --------------------- `complex?' is not available. `max' and `min' return the maximal or minimal argument unchanged, without coercing between types in the case of arguments of mixes types. `numerator', `denominator' and `rationalize' are not available. `make-rectangular', `make-polar', `real-part', `imag-part', `magnitude' and `angle' are not available. The behaviour of the transcendental built-in procedures when given one of the special IEEE-754 numbers like /nan/ and /infinity/ is undefined. 7.2.11 Section 6.2.6 --------------------- `string->number' does not allow radix or exactness prefixes inside the given string argument, nor is the `#' character in inexact numeric strings supported. `string->number' and `number->string' only support the bases 8, 10 and 16. Any other base is ignored and the conversion will be done using base 10. 7.2.12 Section 6.4 ------------------- `map' and `foreach' will terminate once any argument list ends. `force', when given a argument that is not a promise will return that argument unchanged. 7.2.13 Section 6.5 ------------------- `eval', `scheme-report-environment', `null-environment' and `interaction-environment' are not avaialable. 7.2.14 Section 6.6.1 --------------------- `close-input-port' and `close-output-port' will signal an error when the port has already been closed. 7.2.15 Section 6.6.2 --------------------- `char-ready?' is not available. 7.2.16 Section 6.6.4 --------------------- `load', `transcript-on' and `transcript-off' are not available. 7.3 Extensions to R5RS ======================= 7.3.1 Section 2.1 ------------------ The special notation `|...|' can be used to enter symbols containing otherwise illegal identifiers, for example `|x y|' would designate the symbol containing the characters `#\x', `#\space' and `#\y'. `write' will print such symbols using this notation. The `|' (vertical bar) character itself can be escaped using `\' (backslash). 7.3.2 Section 2.2 ------------------ Expression-comments of the form `#;' are supported as in R7RS. The character-sequence `#!' is treated like `;' and ignores the rest of the line. 7.3.3 Section 4.2.2 -------------------- BONES supports the R7RS binding form `letrec*'. 7.3.4 Section 4.3.1 -------------------- BONES uses Al Petrofsky's "alexpander", which provides several enhancements. Among these, /transformer/ expressions in `let-syntax' and `letrec-syntax' forms are extended to allow arbitrary expressions. The core generalization in this system is that both the transformer-specification and the expression in operator position of another expression may be any type of expression or syntax. The four forms of syntax allowed are: a transformer (as allowed in the transformer-spec position in R5RS), a keyword (as allowed in the operator position in R5RS), a macro use that expands into a syntax, and a macro block (`let-syntax' or `letrec-syntax') whose body is a syntax. Some examples: ;; a macro with a local macro (let-syntax ((foo (let-syntax ((bar (syntax-rules () ((bar x) (- x))))) (syntax-rules () ((foo) (bar 2)))))) (foo)) => -2 ;; an anonymous let transformer, used directly in a macro call. ((syntax-rules () ((_ ((var init) ...) . body) ((lambda (var ...) . body) init ...))) ((x 1) (y 2)) (+ x y)) => 3 ;; a keyword used to initialize a keyword (let-syntax ((q quote)) (q x)) => x ;; Binding a keyword to an expression (which could also be thought ;; of as creating a macro that is called without arguments). (let ((n 0)) (let-syntax ((x (set! n (+ n 1)))) (begin x x x n))) => 3 (let-syntax ((x append)) ((x x))) => () Top-level macro blocks. At top level, if a macro block (a `let-syntax' or `letrec-syntax' form) has only one body element, that element need not be an expression (as would be required in R5RS). Instead, it may be anything allowed at top level: an expression, a definition, a begin sequence of top-level forms, or another macro block containing a top-level form. (let-syntax ((- quote)) (define x (- 1))) (list x (- 1)) => (1 -1) Note that, unlike the similar extension in Chez scheme 6.0, this is still R5RS-compatible, because we only treat definitions within the last body element as top-level definitions (and R5RS does not allow internal definitions within a body's last element, even if it is a `begin' form): (begin (define x 1) (let-syntax () (define x 2) 'blah) x) => 1, in R5RS and alexpander, but 2 in Chez scheme (begin (define x 1) (let-syntax () (begin (define x 2) 'blah)) x) => 2, in alexpander and in Chez scheme, but an error in R5RS. 7.3.5 Section 4.3.2 -------------------- "Alexpander" fully supports [SRFI-46], including tail-patterns and user-selectable ellipses. [SRFI-46]: http://srfi.schemrs.org/srfi-46 7.3.6 Section 5.2.2 -------------------- Internal definitions expand into `letrec*' forms, as in R7RS. A definition of the form `(define <expression>)' causes the expression to be evaluated at the conclusion of any enclosing set of internal definitions. That is, at top level, `(define <expression>)' is equivalent to just plain `<expression>'. As for internal definitions, the following are equivalent: (let () (define v1 <init1>) (define <expr1>) (define <expr2>) (define v2 <init2>) (define <expr3>) (begin <expr4> <expr5>)) (let () (define v1 <init1>) (define v2 <init2>) (begin <expr1> <expr2> <expr3> <expr4> <expr5>)) 7.3.7 Section 5.3 ------------------ The /transformer spec/ of a `define-syntax' form may be any expression. 7.3.8 Section 6.3.5 -------------------- The third argument to `substring' is optional and defaults to the length of the argument string. 7.3.9 Section 6.6.1 -------------------- `current-input-port' and `current-output-port' accept a single argument, a /port/, which changes the current default input- and output port. 7.3.10 Syntax Extensions ------------------------- The following non-standard syntax is available: `(assert EXP [MSG ARGUMENT ...])': Signals an error if `EXP' evaluates to `#f', passing `MSG' and `ARGUMENTS ...' to `error' in that case. `(begin0 EXP1 EXP ...)': Evaluates the expressions and returns the result value(s) of the first expression. `(cond-expand CLAUSE ...)': The [SRFI-0] expansion-time conditional. `(cut EXP ...)': Notation for specializing parameters without currying - see [SRFI-26] for more information. `(define-inline (NAME ARGUMENTS ...) BODY ...)': Defines a procedure that should be in-lined at the call-site. This is implemented by defining an /identifier-macro/ for `NAME', which the syntax-expander will replace with a `lambda' form containing the procedure body , when used in the operator-position of a procedure-call. Any use of an inline procedure must appear textually after its definition. `(define-syntax-rule (NAME ARGUMENTS ...) EXPRESSION)': A more concise way of defining single-rule syntax. `(define-values (VAR ...) EXP)': Evaluates `EXP' and binds `VAR' ... to the values returned by it. `(fluid-let ((VAR EXP) ...) BODY ...)': Syntax for dynamic scoping, see [SRFI-15] for more information. `(handle-exceptions VAR HANDLE BODY ...)': Binds `current-exception-handler' temporarily during exection of `BODY ...' and evaluates the expression `HANDLE' if the body raises an exception. While evaluating `HANDLE', `VAR' is bound to the exception object and the next outer exception handler is active. `(let-optionals VAR1 ((VAR EXP) ... [RVAR]) BODY ...)': Destructures the list in `VAR1', binding the optional arguments `VAR' ... with defaults taken from `EXP' ... Any remaining list elements can be bound in the optional final variable `RVAR'. `(letrec* ((VAR VAL) ...) BODY ...)': Binds `VAR' ... to the results of evaluating `VAL' ... and evaluates `BODY', with the variables being in scope during the evaluation of their values. Mostly equivalent to `letrec' but binds the variables sequentially. `(when EXP BODY ...)': Equivalent to `(if EXP (begin BODY ...))'. `(unless EXP BODY ...)': Equivalent to `(if (not EXP) (begin BODY ...))'. [SRFI-0]: http://srfi.schemers.org/srfi-0/ [SRFI-26]: http://srfi.schemers.org/srfi-26/ [SRFI-15]: http://srfi.schemers.org/srfi-15/ 7.3.11 Additional procedures -----------------------------