💾 Archived View for schinkel.bevuta.com › articles › PCN-tutorial.txt captured on 2024-12-17 at 10:42:38.
⬅️ Previous capture (2024-05-12)
-=-=-=-=-=-=-
P r o g r a m C o m p o s i t i o n N o t a t i o n (PCN) A T u t o r i a l by Felix L. Winkelmann I. Introduction This document is an attempt to describe "PCN"[1], a concurrent logic programming language developed at the Argonnne National Laboratory in 1990. PCN is a successor of a number of languages mixing concurrent and logic programming that provides a familiar syntax based on C but retains the essential features which make this particular style of programming interesting. The tutorial is based on "FLENG"[2], my implementation of a low-level language with the same name[10], which provides front-ends for PCN, "Flat Guarded Horn Clauses"[3] (FGHC) and "Strand"[4] and is available for UNIX-compatible operating systems. The FLENG system provides several extensions and a few minor differences to PCN as described in the available technical reports [5,6] for the original version. Concurrent logic programming is a fascinating paradigm, but, unfortunately lost in history, partly as a result of the fallout from the "AI winter". As modern computing systems struggle to find ways to use parallelism and concurrency to the extent now necessary to ensure high performance and to fully utilize available computing resources it may be worthwhile to look at concepts that have existed now for quite some time, yet have been forgotten, and which appear to promise a refreshing and easy to use method of writing correct and robust software systems in the presence of multiprocessing. II. History PCN is a member of a family of concurrent logic programming languages, an attempt to allow for more concurrency in languages based on PROLOG. One of the first was "Concurrent Prolog", followed by a plethora of descendants like "Guarded Horn Clauses" (GHC), "PARLOG", "Janus" and "Strand". The Japanese "Fifth Generation Computing Project"[7] was an ambitious undertaking to create advanced knowledge representation and research applications during the AI boom, where several experimental systems used "KL1"[8], another concurrent logic language based on (Flat) GHC. In the process of the development of these languages a trend in removing features of classic PROLOG can be noticed, with more emphasis on efficiently implementing concurrency[9]. For example, backtracking was removed and certain constraints were imposed to make the execution more streamlined and reduce implementation complexities. Most of the languages in this family are based on a number of common concepts: - "Committed choice" clause selection: once a solution is found, no backtracking is allowed. - Logic single-assignment variables. - "Flatness": clause-selection can only be constrained to a fixed set of built-in predicates.. - Fine-grained "AND" parallelism. With the exception of PCN, most languages retain the PROLOG syntax, while significantly changing the semantics. PCN uses a more traditional syntax based on C. Additionally, PCN can be considered a "composition language" for orchestrating existing sequential code in a concurrent environment, so the familiar syntax and ease of interfacing reduces the friction naturally imposed by mixing languages with different paradigms. PCN is still an imperative language but restricts side effects by design and convention, a necessary step when working with high levels of concurrency and parallelism. In what follows, no prior knowledge of PROLOG or logic programming is needed, the concepts originating in logic languages will be described in the context of PCN as they come up. Where the syntax or semantics of a particular feature differ from the original version, I describe it as implemented in [2]. Where examples are given, they are self-contained and can be directly executed by compiling and running the given piece of code. III. Basic Program Structure PCN largely follows C in terms of lexical syntax and comment format ("//" for single line comments, "/* ... */" for non-nested block comments). Programs are composed of procedures and functions, containing statements. Statements are separated by comma (",") or semicolon (";"). Execution starts in procedure "main": main() { writeln("Hello, world!") // write textual representation of } // some data object terminated by // newline If you have the FLENG system built and installed, you can compile and execute the example code like this: % cat hello.pcn main() { writeln("Hello, world!") } % fleng hello.pcn % ./a.out The keywords "char", "int", "double", "byte", "default", "idle", "over", "return", "foreign_call", "if", "else", "with", "let" and "function" are reserved and may not be used as variable or procedure names. IV. Values PCN handles values of data types usually found in dynamic programming languages: - Signed integers of machine-word size (31 or 63 bits, one bit used for marking). - Strings ("interned" text sequences also called atoms or symbols in other languages), enclosed in double quotes: "this is a string" - Characters, in the full UNICODE range, allowing the same escape sequences as strings. - Extended precision floating point numbers (64 bit IEEE doubles). - Lists of values, optionally with an explicit tail value: main() { writeln(["one", 2, 3.0]), writeln(["one" | [2 | [3.0 | []]]]) // writes exactly the same } - Tuples, which are fixed-size arrays or arbitrary values: main() { writeln({"one", 2, 3.0}) } - Arrays, fixed size homogenous numeric sequences of arbitrary length, with data types char (byte), short (16-bit integers), int (32-bit integers), long (32- or 64-bit integers, depending on platform) and floats (64-bit IEEE doubles). - Sub-arrays or "Views" are arrays sharing storage with other arrays or parts of existing arrays. Strings may not be longer than 255 bytes, supported escape sequences are: \t (tab), \n (newline), \", \xXX and \uXXXX for the UNICODE code points given in 2 or 4 hex digits. As string size is limited, larger text is usually handled in the form of "character lists", i.e. linked lists of UNICODE code points. This also has the advantage that normal list-processing operations transparently handle text encoding in this way. Also, for reasons explained later, working with text in this manner is less inefficient than it appears and provides ample opportunities to process text in parallel. Memory management in PCN automatic, unused data is garbage collected once it is no longer used. PCN is dynamically typed, only mutable variables and arrays have to be declared. Circular data structures are not supported. This simplifies the exchange of structured data between threads with separate heaps considerably. V. Variables Variables come in two flavors: "definitional" and "mutable". Mutable variables are scalars or arrays of one of the numeric types char, short, int, long and double. Mutable variables must be explicitly declared and can be changed (hence "mutable") during their lifetime. main() int x; // declares a mutable variable { x := 123, // assigns a value x := 456, // assigns another value writeln(x) // writes "456" } Mutable variables are initially bound to an unspecified value. Definitional variables on the other hand do not require a declaration and can hold values of every type but can only be assigned once. Unassigned definitional variables are first-class objects, they may be assigned to other definitional variables and stored in data structures. Unassigned (unbound) definitional variables can be considered "holes", to be filled with a value later. Referencing a definitional variable will automatically "dereference" any chain of links between unbound variables and access the final value. main() { y = x, // binds "y" to "x", even though "x" has no value, yet. x = 123, // binds definitional variable "x" // x = 456, <-- would be illegal (already bound) writeln(y) // writes "123" } When the concrete value of a definitional variable is required, either for some computation or comparison, then execution of the current process "suspends" until the variable is bound. main() { writeln(x), // results in "deadlock": x isn't bound yet x = 123 } The ability to work with unbound variables is a tremendous source of power and fundamental to the concepts used in PCN (and other concurrent logic languages) for communication with and the synchronization of concurrent processes. ":=" is the assignment operator and must have a mutable variable or array location at the left hand side. Definitions use the "=" operator and are only allowed for definitional variables. Certain restrictions apply about the use of mutable variables which will be described later. The special variable "_" is called the "anonymous" variable - every occurrence represents a fresh definitional unbound variable: main() { lst = [_, _, _], // a list of 3 unbound variables writeln(length(lst)) // writes "3" and does not suspend // as the elements are not needed } VI. Processes and Compositions The basic execution model of PCN programs can be described as a pool of currently active processes, where processes are taken from in some unspecified order and executed. Executing processes will start other processes which are themselves added to the pool. Processes that need access to definitional variables that are still unbound will be suspended. Once the pool is empty, the program has terminated, or, if processes are still suspended on unbound definitional variables, a "deadlock" is occurs as execution can not continue without some active process that may bind the variables processes are yet suspended on. In PCN code for processes is defined in procedures and functions and process-invocations are grouped inside "compositions". A procedure is defined by giving its name followed by a possibly empty list of parameter declarations and a composition, enclosed in "{...}", where the components of the composition are separated by "," or ";" (a final "," or ";" terminator is allowed but not required). main() { proc() } proc() { writeln("proc called") } A procedure is identified by its name and "arity" (the number of arguments it has declared) and multiple procedures of the same name may be defined if they have different arities. Compositions are statements and thus may be arbitrarily nested. Variable scope is always the whole procedure: variables are shared across all nested compositions inside a procedure definition. Three types of compositions exist: - Sequential composition: each statement in the composition is executed one after the other, in textual order: main() {; // <- sequential composition writeln(1), writeln(2) } The ";" may be omitted in a sequential composition. In the code examples here we will do so unless we want to emphasize the sequential nature of some code. - Parallel composition: each statement is executed in parallel: main() {|| write(x), // suspends until "x" is bound write(2), // may print "12" or "21" x = 1 } The order is completely undefined, but all statements will execute at some point, unless they are suspended. - "Choice" composition: execute statements depending on conditional guards: main() { x = 2 + 2, {? x == 4 -> writeln("ok"), default -> writeln("something's wrong") } } - "Fair" choice compositions ("{~ ...}"): similar to normal choice compositions but "fair" in the sense that if multiple guards succeed, they have an equal chance of being selected. A choice composition contains a list of rules, separated by "," or ";". A rule consists of one or more conditions (called "guards"), separated by ",", an arrow ("->") and a statement to be executed when the conditions all succeed. Should multiple rules succeed, one of them will be arbitrarily picked and executed. The "default" condition always succeeds if no other rule does. If no "default" rule is given and no rule succeeds, execution continues normally and the composition is equivalent to an empty statement. If the body of a procedure consists of a choice composition that holds only a single rule then the braces can be omitted: main() { check(2, 2, 4) } check(a, b, c) a + b == c -> writeln("everything still ok") In choice compositions, one arbitrary rule will execute, if more than one guard succeeds: main() { i over 1 .. 10 :: foo() } foo() {? 1 == 1 -> writeln("a"); 1 == 1 -> writeln("b") } will print 10 times the same string ("a" or "b"). But when we use a "fair" composition, then both branches will be eventually executed at some time: /* will print "a" or "b" */ foo() {~ 1 == 1 -> writeln("a"); 1 == 1 -> writeln("b") } Functions are special procedures returning a value: main() { writeln(square(3)) } function square(x) { return(x * x) } Note that "return" does not change the control flow: it merely stores the result to be returned on function exit: main() { writeln(square(3)) } function square(x) { return(x * x), writeln("done") // still shown } This may seem unexpected but makes sense when the body of the function contains parallel compositions, which always treats the statements in the body as unordered. VII. Calls, Terms and Expressions The right-hand side of definitions and assignments and arguments to procedure calls can be expressions and are evaluated in the familiar manner, as long as referenced definitional variables are bound. An argument to a procedure or the right-hand side of a definition may also be a "term", which designates a data structure like a list or tuple. Nested elements of terms are _not_ evaluated themselves, but may refer to other variables: main() { x = 2 + 2, y = [x, 3 * 3], writeln(y) // writes the list "[4, *(3,3)]" // (this shows an alternative notation // for tuples, equivalent to {"'*", 3, 3}) } Procedure and function arguments are evaluated when they are expressions. As unbound variables are first-class values, passing such a variable effectively is "by-reference": main() { proc(2 + 2, x), writeln(x) // writes "5" } proc(a, b) { b = a + 1 } This also applies to mutable variables: main() int x; { proc(2 + 2, x), writeln(x) // writes "5" as well } proc(a, b) int b; // shares location of "x" in "main" above { b := a + 1 } Mutable variables that are not directly passed to procedures as arguments are always dereferenced, including arrays. An array passed to a procedure but not declared as an array in the callee will be "cloned" and an immutable copy being assigned to the parameter representing the argument: main() int a[10]; { a[ 0 ] := 123, clone(a), modify(a), writeln(a[ 0 ]) // writes "456" } clone(a) { // ... here "a" is immutable and does not share space with // the "a" in main } modify(a) int a[]; // declares parameter to be an array { a[ 0 ] := 456 // here "a" shares memory with "a" in // main and can be modified } The reason for these restrictions is that mutable variables (including arrays) must obey three rules to minimize time-dependent bugs in parallel compositions: Rule 1: A mutable variable can be shared by statements in a parallel composition only if no statement modifies the variable (note that these rules are only partially enforced by the compiler and run-time system). Rule 2: When a mutable variable occurs on the right-hand side of a definition statement, a "snapshort" (copy) of the current value of the mutable is taken, and the definition then proceeds as if a definitional value were involved. Rule 3: When a definitional variable occurs on the right-hand side of an assignment to a mutable variable, the assignment suspends until the variable has a value and then proceeds. PCN supports tail-call optimization: if the last expression in a sequential composition (that is not contained in another sequential composition) is not a quantification, it is automatically compiled as being in a parallel context and does not wait for the operation to complete (and thus does not maintain additional state or data). VIII. Guards and Control Structures Choice compositions are the only primitive method of performing conditional execution and the set of guards or conditions is fixed to one of the following: - Comparisons, binary relations as in C ("==", "!=", ">", "<", ">=", "=<"), where "==" and "!=" handle arbitrary terms and the rest allow numerical expressions. - Type tests ("number(x)", "string(x)", etc.) - Matching ("<variable> ?= <pattern>"), the variable must be definitional and the pattern is a term. Comparisons and tests evaluate their argument expressions and thus suspend if any definitional variable inside the guard is unbound: main() {|| even_odd(x), // suspends inside the choice below x = 2 // resumes suspended choice and writes "even" } even_odd(n) {? n & 1 == 0 -> writeln("even"), default -> writeln("odd") } Matching allows the destructuring of data objects like lists and tuples: main() { x = ["one", 2], {? x ?= [a, b] -> y = [b, a] }, writeln(y) // writes "[2, one]" } Matching suspends only when the pattern requires dereferencing an unbound definitional variable, but when it does, the whole composition is suspended. User-defined or primitive functions may not be used directly in guards, only if part of a comparison or test. Guard expressions must be "atomic" in that only a fixed set of expressions is allowed, for semantic reasons (using the jargon from concurrent logic languages, PCN is "flat"). For convenience, calls to primitive or user-defined functions are syntactically valid but are extracted from the actual choice composition and evaluated before the set of choices is entered. As any variables introduced by match guards are not yet bound during the execution of an embedded function-call, its arguments may not refer to those variables: : if(foo ?= {a, b}, some_function(a)) ... // <-- ILLEGAL : Any expression or term that is used as a guard that is not a type-test, comparison or match is treated as a comparison with "false", so if(some_test(...)) ... is equivalent to if(some_test(...) != "false") ... Similarly if(!some_test(...)) ... equals if(some_test(...) == "false") ... Some convenience forms are available as syntactic sugar that allow one to write choice compositions in a more traditional style. First, "let", for a choice-composition that consists of a single rule and which aborts with an error if the guard fails: main() { x = ["one", 2], let x ?= [a, b], y = [b, a], // equivalent to // {? <guard> -> <statements>, default -> <error> } writeln(y) } Second, the familiar "if": main() { x = 2 + 2; if(x == 4) writeln("ok") // equivalent to else writeln("no!") // {? <guard> -> <if-statement>, default -> <else-statement> } } Iteration is performed with the "quantification" statement: main() { i over 1..10 :: writeln(i) // write numbers 1 to 10 } Whether the body statement is executed sequentially or in parallel depends on the type of the enclosing composition. The quantification variable ("i" in the example above) may be definitional (if undeclared) or mutable. In the former case, it is rebound during execution of the body and remains unbound in later statements following the quantification and can be re-used. If it is mutable, it is assigned the next value on every iteration and retains its value after the iteration completes. Iteration over lists is also supported: main() { x over [1, 2, 3] :: writeln(x) // write numbers 1 to 3 } In list-iteration, the quantification variable must always be definitional. Since PCN is tail-recursive, more complex iteration patterns are best expressed as recursive functions as the iteration runs in constant space: /* filter even numbers given as arguments on the command line */ main() { filter_even(command_line()) // obtain arguments } filter_even(args) {? args ?= [x|rest] -> { // destructure in head + tail n = string_to_integer(x); // arguments are strings if(n & 1 == 0) writeln(x); filter_even(rest) } // no match: do nothing, i.e. simply return } IX. Streams and Ports Now that we have concurrently running processes and automatic suspension when a definitional variable is unbound, we can easily synchronize processes: main() {|| work(done); // start some long-running task wait(done) // wait until it finished } work(done) { timer(1000, done, _) // primitive to set timer for 1000ms and assign // the empty list ("[]") to "done" when finished // the third argument (timer-ID) is not used } wait(done) {? data(done) -> writeln("ready") // data(): guard that succeeds when argument // is bound (and waits, if needed) } Combined with lists, we already have everything to establish communication channels between processes, by using a list with an unbound tail and successively bind the tail to a new list with another unbound tail and so on: main() {|| producer(10, stream); consumer(stream) } producer(n, out) {? n > 0 -> { out = [n|out2]; // create new stream element producer(n - 1, out2) }, default -> out = [] // close stream } consumer(in) {? in ?= [x|more] -> { writeln(x); // writes 10, 9, 8, ..., 1 consumer(more) } } Note how these two processes synchronize automatically: should the producer outrun the consumer, the list merely becomes longer. If the consumer executes faster than the producer it will suspend until the stream contains a new value in its head. We can make the consumer control the pace in which the producer generates values by reversing the role of who provides objects to consume - just use another stream to send unbound definitional variables to be filled by the producer: main() {|| producer(10, stream, [_|fill]); // provide initial slot for first item consumer(stream, fill) } producer(n, out, fill) {? fill ?= [place|more], n > 0 -> { place = n; // take element from placeholder stream and define it out = [place|out2]; // create new output-stream element producer(n - 1, out2, more) }, default -> out = [] } consumer(in, fill) {? in ?= [x|more] -> { writeln(x); // writes 10, 9, 8, ..., 1 fill = [_|fill2]; // create new slot and pass it to producer consumer(more, fill2) } } Streams of this sort are very useful not only as a communication device, they also serve as abstraction boundaries: a process can provide a "service", with requests sent to the process via a stream: main() {|| service(s, []); // start server s = [{"set", "waldo", 42}|s2]; // request setting a DB entry //... s2 = [{"get", "waldo", x}|s3]; // request reading an entry writeln(x); // writes "42" s3 = [] // close stream to terminate service } service(in, db) {? in ?= [{"set", key, val}|in2] -> { map:insert(key, val, db, db2); // call "insert" in the "map" library service(in2, db2) }, in ?= [{"get", key, val}|in2] -> { map:lookup(key, db, val); service(in2, db) } } Notice how we pass values ("set") and placeholders ("get") in the different requests. This is easy to extend and uses the built-in "map" lihrary which provides a balanced tree data structure wiht functional updates. We can easily intercept messages by using an intermediate process, here for logging: main() {|| service(s, []); logger(s): s = [{"set", "waldo", 42}|s2]; // request setting a DB entry //... s2 = [{"get", "waldo", n}|s3]; writeln(n); s3 = [] // close stream to terminate service } logger(in) {? in ?= [{op, key, _}|in2] -> { // use "fmt" library for formatted output fmt:format("log: ~a ~a\n", [op, key]); logger(in2) } } service(in, db) {? in ?= [{"set", key, val}|in2] -> { map:insert(key, val, db, db2); service(in2, db2) }, in ?= [{"get", key, val}|in2] -> { map:lookup(key, db, val); service(in2, db) } } Or perhaps we want to restrict access to certain data by using an intermediate stream: main() {|| service(s1, []); // [] represents an empty map filter(["waldo"], s, s1); s = [{"set", "waldo", 42}|s2]; // request setting a DB entry //... s2 = [{"get", "waldo", n}|s3]; writeln(n); s3 = [] // close stream to terminate service } filter(blacklist, in, out) {? in ?= [{"get", key, result}|in2] -> { list:member(key, blacklist, f); // check for list membership if(f) { result = "CONFIDENTIAL"; // request is not passed on filter(blacklist, in2, out) } else { out = [{"get", key, result}|out2]; filter(blacklist, in2, out2) } }, in == [] -> out = []; default -> let in ?= [req|in2] -> { out = [req|out2]; // pass on unchanged filter(blacklist, in2, out2) } } service(in, db) {? in ?= [{"set", key, val}|in2] -> { map:insert(key, val, db, db2); service(in2, db2) }, in ?= [{"get", key, val}|in2] -> { map:lookup(key, db, val); service(in2, db) } } Note how the original service process (stream consumer) never needs to be changed - we can connect software components by streams through which they communicate and add intermediate processes the perform various pre- and post-processing. Since lists and streams are structurally equivalent, using lists containing character codes makes them natural subjects for stream processing: /* convert input to uppercase */ main() {|| io:read_char_stream(0, in); // read characters from stdin upcase(in) } upcase(in) {? in ?= [c|in2] -> { if(c >= 'a', c <= 'z') c2 = c - 32 else c2 = c; fmt:format("~c", [c2]); upcase(in2) } } Using the "merger" primitive we can join multiple streams into one. "merger" takes an input-stream and copies its values into an output stream, but merges streams given by messages of the form {"merge", S} into the output. The values from different streams are arbitrarily interleaved but retain their order: /* count char-frequency of combined input files given on command line */ main() {|| merger(in, out); follow(command_line(), in); stats(out, []) } follow(files, in) {? files ?= [fname|next] -> { open_file(fname, "r", fd); // open file for reading io:read_char_stream(fd, chars); // read characters from file // as a stream in = [{"merge", chars}|in2]; follow(next, in2) }, files == [] -> in = [] // end input stream } stats(chars, map) {? chars ?= [c|chars2] -> { // notice how we already replace the entry by "new", which // is still unbound at this point map:replace(c, new, old, map, map2); if(old == []) new = 1 // indicates key in map doesn't exist else new = old + 1; stats(chars2, map2) }, chars == [] -> { map:map_to_list(map, lst); // converts map into sorted list of dump(lst) // {"-", KEY, VALUE} tuples } } dump(data) {? data ?= [{_, c, count}|more] -> { fmt:format("~c\t~d\n", [c, count]); dump(more) } } Streams are identified by their tail, the next slot to be filled with a new list cell holding a value and the new tail. This makes it awkward to store streams in data structures or global variables as they effectively have a new identity once extended with a fresh value. For this FLENG supports "ports", which are objects representing a stream: main() {|| open_port(p, s); // create port, associated with a stream put_global("port", p); // store port in thread-global variable consume(s); // process stream as usual work() } consume(s) {? s ?= [x|s2] -> { writeln(x); consume(s2) } } // we could of course just pass "p" as an argument, this code is // only intended to demonstrate storing a port reference somewhere work() { global("port", p); // suspends until variable is set send(p, "message"); // add value to stream put_global("port", []) // release, so it can be closed } Streams attached to ports are "closed" (terminated by the empty list), once the port is not longer referenced anywhere and can be garbage collected. IX. Using Threads PCN, like most implementations of concurrent logic languages is intended to run on multiple "nodes" or hardware processors, often on widely different platforms. The FLENG system utilizes operating system threads by allowing to spawn processes on another thread, running fully in parallel to the main thread. So far we have used concurrent processes running in a single OS thread. Each thread in the FLENG system has its own heap and process pool and doesn't share memory (with the exception of arrays, if desired) with any other thread. Values can be exchanged between processes running on different OS-level threads by passing definitional variables and binding them, which is transparently converted into messages following an internal value-exchange protocol. To launch a process on another thread, use the "at" ("@") notation: main() {|| spawn(x)@fwd; // invoke "spawn" on next thread in ring writeln(x) // suspends until "x" is bound by the other thread } spawn(x) { id = self(); // "self" gives thread ID fmt:format_chars("Hello, I'm #~d", [id], x) // formats into a char-list } When you compile this program, you can set the number of threads you require: % fleng hello2.pcn % ./a.out +THREADS 2 The right-hand side of the "@" operator may be an expression evaluating to an integer thread-ID or a string. The special identifiers "fwd", "bwd", "north", "south", "west", "east", "random" "unloaded" and "all" are also allowed and equivalent to their string form. Threads are addressed by a positive integer, the main thread has the ID "1". Threads can be addressed directly, via their ID or relatively, according to a topology in which all available threads are grouped. Currently ring and torus topologies are available, so "fwd" and "bwd" address the next or previous thread in a ring and "north", "east", "south", "west" address the four neighbours in a torus topology. "random" represents some random thread (including the current one), "unloaded" selects the thread with the least amount of active goals and "all" signifies a broadcast, invoking the call on all threads. The actual number of threads does not need to be reflected in the source code - the program above would still work on a single thread, since the next item in the ring would be the very same thread and invoking the current thread using the "@" notation is the same as calling the procedure without any annotation. Synchronization and communication between processes running on different threads is handled transparently, but note that data sent from one thread must be copied and re-created on the receiving side and so involves allocation and should be of limited size. Mutable variables are passed by value when given in a call decorated with "@", there locations can not be shared between threads. X. Modules and Separate Compilation PCN code can be structured into separately compiled modules, which are then linked together like normal C or C++ code. Say we have two files a.pcn: the main program main() { b:wave() } and b.pcn, a library module -module("b") // a "declaration" that names this compilation unit wave() { writeln("o/") } The files can be compiled and linked into a single executable: % fleng -c b.pcn # produces b.o % fleng a.pcn b.o % ./a.out Calls to procedures and functions in other modules need to be "qualified" with the module name and colon: "<name>:<procedure>(...)". Modules are resolved at the time of the first call, so mutually referencing modules are no problem. There can only be a single main module (internally named "", i.e. the empty string). A module can be declared by using the "module" declaration, as in the example above, or by giving the command line option ("-m") and a module name when compiling the file. A module always represents a single source file and there can not be more than one module in a file, nor can a module span more than one source file. Compiled modules can be combined into libraries just as in C or C++. There are no facilities for "importing" bindings from other modules, entities in modules other than the current one always require a qualificiation. There is also no mechanism to restrict what definitions are accessible from outside of the current module. All procedure and function definitions are exported and accessible from other modules that refer to them. A module is loaded when some other module declares it as "used". Usage in explicit calls to procedures of a module is an indicator for this and thus inferred automatically. But if a module is called indirectly (explained below) a "uses" declaration has to be added. One situation where this may occur is when the used module is intended to automatically start with some background task, which is done using the "initialization" declaration: it marks a zero-argument procedure to be called on thread startup. a.pcn: /* note: never stops, as the clock runs continuously */ -uses("b") main() { // ... } b.pcn: -module("b") -initialization("start") start() {|| clock(1000, clk, _); // produces a stream of integers spin(clk, 1) // in 1-second intervals } spin(clk, secs) let clk ?= [_|clk2] -> { fmt:format("seconds elapsed: ~d\n", [secs]); spin(clk2, secs + 1) } XI. Indirect Calls and Higher-Order Procedures Sometimes it is useful to be able to determine the target of a procedure call dynamically. PCN has "meta" calls for this purpose, where the module, procedure name (or both) is dynamically computed: /* convert input to uppercase or lowercase, depending on program-name so compile like this: % fleng a.pcn -o upcase % ln -s upcase downcase */ main(_, argv, _) // alternative signature: (argc, argv, status) {|| prg = argv[ 0 ]; // get first element of argv list (program name) path:basename(prg, name); // drop directory part + extension mode = list_to_string(name); // basename returns a character list io:read_char_stream(0, in); // read characters from stdin convert(in, mode) } convert(in, conv) {? in ?= [c|in2] -> { `conv`(c, c2); // call procedure with name given by "conv" fmt:format("~c", [c2]); convert(in2) } } upcase(c, result) { if(c >= 'a', c <= 'z') result = c - 32 else result = c } downcase(c, result) { if(c >= 'A', c <= 'Z') result = c + 32 else result = c } The part inside the `...` must be a definitional variable containing a string when in the module part. The procedure part of the call may be a string or a tuple holding the name (a string) and any number of default arguments preceding the arguments given in the call. If the meta-call is not qualified, the procedure part may also be a tuple of the form {":", MODULE, PROCEDURE}, where PROCEDURE may again be a string or tuple. main() { m = ""; // this module p = "foo"; p1 = {"foo", 123}; p2 = {"foo", 123, 456}; pm = {":", m, p1}; f = `(x) -> foo(123, x); `p`(123, 456); // all of these write "[123, 456]" `p1`(456); `p2`; `pm`(456); `f`(456) } foo(a, b) { writeln([a, b]) } The example above demonstrates "lambda" expressions, anonymous functions of the form `(<argument list>) -> <statement>. As `...` adds given default arguments in the call value, we effectively have "closures", that is, we can combine a piece of code with state, here also using the ``...`` notation to construct tuple representing calls conveniently: main() { adder = make_adder(123); writeln(`adder`(1)) } function make_adder(x) { return(``plus(x)``) // same as {":", "", {"plus", x}} } plus(x, y, z) { z = x + y } This allows the use of higher order procedures, procedures that call other procedures: /* write list of file given on command line that are directories */ main() { filter(``isdir``, command_line(), lst); writeln(lst) } isdir(x, r) { file_type(x, t); if(t == "directory") r = true else r = false } filter(func, lst, result) {? lst ?= [x|lst2] -> {|| `func`(x, r); if(r) result = [x|result2] else result = result2; filter(func, lst2, result2) }, default -> result = [] } If no module qualification is given, then the procedure to be called is assumed by built-in calling primitives like "call" and "apply" to be in the module that issues the call. Call targets defined using the ``...`` notation implicitly include this information, so it is preferrable to use this instead of constructing tuples by hand. XII. Interfacing with C/C++ Code To do anything useful, it must be possible to talk to external code like operating system APIs and third party libraries. This is relatively straightforward in PCN and allows it to be used as a language to "orchestrate" sequential native code using a higher level notation that is particularly well suited to coordinate concurrency and parallelism. Calls to undefined procedures which have no module-prefix are assumed to invoke a separately linked C function. Arguments are converted as necessary and are call by reference: main() { printf("hello, world.\n") // this is C's "printf" provided by libc } (This example may fail to link on some platforms, due to dynamic linking and calls via dynamic call stubs. An easy workaround is to use an intermediate function.) PCN passes the source code through the C preprocessor, so conditional compilation, inclusion, platform enquiries and macro definitions as in C are supported. During preprocessing a PCN source file, the macro "PCN" is defined. This allows expanding included files conditionally on whether they are currently processed by the PCN compiler or in a C program, which is handy when interfacing with foreign code that shares certain macro definitions. External calls are only allowed for procedures, not for functions inside expression. Arguments to C functions are converted depending on the type of the argument. All arguments are passed by reference, so mutable variables can be modified by the external code. Strings are passed as "char" pointers and are internally terminated by a zero byte, so that they match C's representation of strings. Arrays are passed as a pointers to the first element. Integers and floats that are passed as literals or as results of expressions are by reference but may not be modified as they refer to internal storage. Lists, tuples, ports and other data is passed as an undefined pointer and not usable on the C side. References to unbound definitional variables will suspend until all arguments are bound. Here we embed C code directly into the PCN source file. The code could also be stored in a separate file and compiled into object code, which would then have to be linked to the final program: /* format current time sing strftime(3) */ main() char buf[ 32 ]; // storage for buffer int len; { tm = current_seconds(); fmt_time(tm, "%F %R", buf, length(buf), len); array:get(0, len, buf, str); fmt:format("~s\n", [str]) } "C": #include <time.h> void fmt_time(long *t, char *fmt, char *buf, long *maxlen, int *len) { time_t t0 = *t; struct tm *ts = localtime(&t0); *len = strftime(buf, *maxlen, fmt, ts); } The "C" marks a section of C code and must be at the start of the source code line. "C++" is also a valid foreign code section marker. You can switch back to PCN code by using a "PCN" section marker. The "fleng" compiler driver will extract C/C++ code section and compile and link them automatically. As a further example, let's integrate Paul Heckbert's "Business Card Raytracer"[11] to take advantage of multiple threads: file "card.pcn": main() char buf[ 512 * 512 * 3 ]; { n = threads() - 1; // number of threads if(n == 0) error("invoke with +THREADS 2 or more"); {|| i over 2 .. threads() :: spawn(n, buf)@i }; write("P6 512 512 255 "); array:write(0, length(buf), buf, 1) // write out array to stdout } spawn(parts, buf) char buf[]; { step = 512 / parts; part = self() - 2; // "self" returns own thread-ID ts(buf, part, step) } file "card.cpp": #include <stdlib.h> #include <stdio.h> #include <math.h> typedef int i;typedef float f;struct v{f x,y,z;v operator+(v r){return v(x+r.x ,y+r.y,z+r.z);}v operator*(f r){ return v(x*r,y*r,z*r);}f operator%(v r){return x*r.x+y*r.y+z*r.z;}v(){}v operator^(v r){return v(y*r.z-z*r.y,z*r.x-x*r.z,x*r. y-y*r.x);}v(f a,f b,f c){x=a;y=b;z=c;}v operator!(){return*this*(1/sqrt(*this%* this));}};i G[]={247570,280596,280600,249748,18578,18577,231184,16,16 };f R(){return(f)rand()/RAND_MAX;}i T(v o,v d,f&t,v&n){t=1e9;i m=0;f p= -o.z/d.z; if(.01<p)t=p, n=v(0,0,1),m=1;for(i k=19;k--;)for(i j=9;j--;)if(G[j]& 1<<k){v p=o+v(-k,0,-j-4);f b=p%d,c=p%p-1,q=b*b-c;if(q>0){f s=-b-sqrt(q);if(s<t &&s>.01)t=s,n=!(p+d*t),m=2;}}return m;}v S(v o,v d){f t;v n;i m=T(o,d,t,n);if( !m)return v(.7,.6,1)*pow(1-d.z,4);v h=o+d*t,l=!(v(9+R(),9+R(),16)+h*-1),r=d+n* (n%d*-2);f b=l%n;if(b<0||T(h,l,t,n))b=0;f p=pow(l%r*(b>0),99);if(m&1){h=h*.2; return((i)(ceil(h.x)+ceil(h.y))&1?v(3,1,1):v(3,3,3))*(b*.2+.1);}return v(p,p,p )+S(h,r)*.5;}extern "C" void ts(char *buf,long *pt,long *st){ i sl=*pt * *st, off=512-sl;char *ptr=buf+sl*512*3;v g=!v(-6,-16,0),a=!(v(0,0, 1)^g)*.002,b=!(g^a)*.002,c=(a+b)*-256+g;for(i y=0;y<*st;++y)for(i x=512;x--;){v p (9,9,9);for(i r=64;r--;){v t=a*(R()-.5)*99+b*(R()-.5)*99;p=S(v(17,16,8)+t,!(t* -1+(a*(R()+x)+b*(off-y+R())+c)*16))*3.5+p;}*(ptr++)=(i)p.x;*(ptr++)=(i)p.y; *(ptr++)=(i)p.z;}} Compile and run like this: % fleng -O3 card.pcn card.cpp % ./a.out +THREADS 9 > aek.ppm # runs with 8 rendering threads XIII. Miscellaneous Features Due to the mostly side-effect free nature of code written in PCN, state is often updated in a functionally manner, by passing it along creating a new state from the old and so "threading" updated values through a chain of variables. Also, values are often accumulated, as for example when streams are built. To ease handling of paired accumulation arguments, this implementation of PCN provides a "variable pair" notation, similar to the argument pair notation used in KL1[8]. If the parameter list of a procedure definition contains a parameter prefixed by a colon (":abc"), then the parameter represents a pair of arguments, an input and an output argument that is practical for creating streams or passing on state. A colon-prefixed reference to such an argument pair in the argument list of a procedure call on the other hand will also be rewritten to a pair of arguments: /* average numbers given on command line */ main() { writeln(average(command_line())) } function average(lst) { average2(lst, 0, a1); return(a1 / length(lst)) } average2(lst, :a) lst ?= [n|lst2] -> { a = a + string_to_integer(n); // or: a += string_to_integer(n) average2(lst2, :a) } (This could of course be done using mutable variables, but once we want to parallelize code functional updates are the safer and often more correct method) The ":a" in "average2" in both occurences represents two arguments, the old (first) and the new (second) part of the argument pair. A definition of a paired variable stands for the new value, so the "a" in "a = a + n" is equivalent to "a_new = a_old + n". All other unqualified references to a paired variable refer to the old (current) value. PCN provides the "<op>=" operators known from C as syntactic sugar for updating pair variables. They are in all respects identical to the longer form "<identifier> = <identifier> <op> <expression>" and are only usable for definitional variables. There is no "default" clause in "average2" in the example above - in this case the new value defaults to the old, passing on the state unchanged, which is the default action, if a pair is not referenced at all, so: main() { foo(1, x); writeln(x) // simply writes "1" } foo(:_) {} Another example: main() {|| build(10, s, []); follow(s) // writes numbers "10" to "1" } follow(s) s ?= [x|s2] -> { writeln(x); follow(s2) } build(n, :s) {? n > 0 -> { s ++ n; build(n - 1, :s) } } Here we build a stream of integers. The use of the "++" operator is equivalent to "s_old = [n|s_new]": it constructs the stream value by value. This is the only operation that assigns of the first/input variable in a variable pair. Once "n" equals zero, the use of an unchanged argument pair will simply unify the input part of the pair with the output part, which is the empty list. "build" effectively injects values into a stream between the head (the first occurrence of "s") and the tail (the empty list) and once we are done we connect our current end of the stream with the tail, the stream ends and "follow" terminates. Lists and tuples can be used to hold structured data but a more convenient way is to use "structs", which are named tuples where fields can be accessed by name: -struct(point(x, y)) main() { // using the structure name as a function constructs a // new instance: p1 = point(100, 200); p2 = move(p1, 50, -10); writeln(p2) // writes "point(150, 190)" } function move(p, x, y) { return(point(p.x + x, p.y + y)) } Functional update is done with the "<instance>(.<field> = ...)" syntax: -struct(vector(x, y, len)) main() { p1 = vector(10, 20, 0); p2 = p1(.len = sqrt(p1.x * p1.x + p1.y * p1.y)); writeln(p2.len) / writes "22.360679774997" } Finally, the "<--" operator can be used to update a fields in structured data: -struct(point(x, y)) main() { p1 = point(100, 200); // create a {"point", 100, 200} tuple move(10, -20, p1, p2) } move(x, y, :p) { p.y <-- p.y - n } The "<--" operator also accepts lists and normal tuples by taking an index instead of a field name. XIV. Further Reading This completes this little tutorial on PCN. I hope I could interest you in the language and concepts used. Concurrent logic programming is a fascinating subject and addresses many common programming problems in a refreshing manner. I recommend reading the "Strand" Book[4] to learn more about various interesting techniques, which the book explains in an easy and often delightful manner. More information about the "FLENG" system can be found at my website[2], including reference documentation for the languages it supports and many libraries that provide extended functionality. If you have questions, suggestions or want to report errors, please go to the FLENG website where you will find contact information. XV. References [1] https://en.wikipedia.org/wiki/Program_Composition_Notation [2] http://www.call-with-current-continuation.org/fleng/fleng.html [3] "Guarded Horn Clauses" PhD Thesis, Kazunori Ueda [4] "Strand: New Concepts for Parallel Programming", Ian Foster and Stephen Taylor, Prentice Hall, 1990 Also available online: http://www.call-with-current-continuation.org/files/strand-book.pdf [5] "Parallel programming with PCN" Ian Foster and Steven Tuecke https://digital.library.unt.edu/ark:/67531/metadc1310344/ [6] "A Primer for Program Composition Notation." Chandy, K. Mani and Taylor, Stephen (1990), http://resolver.caltech.edu/CaltechCSTR:1990.cs-tr-90-10 [7] https://en.wikipedia.org/wiki/Fifth_generation_computer [8] "Design of the Kernel Language for the Parallel Inference Machine", U. Kazunori et al., 1990 [9] "The Deevolution of Concurrent Logic Programming Languages" Evan Tick [10] "FLENG Prolog - The Language which turns Supercomputers into Parallel Prolog Machines" Martin Nilsson and Hidehiko Tanaka [11] https://fabiensanglard.net/rayTracing_back_of_business_card/