Interfacing to R

Topic: APLX Help : Interfacing to other languages : Interfacing to R [Next \| Previous \| Contents \| Index \| APL Home ]
	Interfacing to the R statistical language
What is R? R is an open-source language and set of packages aimed principally at statistical analysis. It includes a huge library of pre-written statistical and mathematical routines, which can be accessed immediately and very conveniently from APLX. It also includes mathematically-oriented graphing facilities. R is available from http://www.r-project.org, which describes R as follows: R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS. Installing R R can be downloaded either in source code form, or as a pre-compiled binary for most popular platforms, from a number of wesbites (see http://www.r-project.org). In each case you need the R shared library (called `libR.so` in Linux, `R.dll` under Windows, and `libR.dylib` under MacOS); this is usually available in the pre-compiled binaries. If installing from source, be sure to specify the option `--enable-R-shlib` when running the configure script. Installing under Windows This is most easily done using the installer provided with the pre-built binaries. The only additional step which you might need to take is to add the R binary directory to your search path, so that APLX can find the DLL `R.dll`. Installing under Linux and MacOS Follow the instructions provided with the R download. You also need to set up environment variables for R; this is usually done in the R script. Calling R from APLX Most of the interface between APLX and R is done using a single external class, named `'r'`, which represents the R session that you are running. (Note that this is different from most of the other external class interfaces, where objects of many different classes can be created separately from APLX). You create a single instance of this class using `⎕NEW`. R functions (either built-in or loaded from packages) then appear as methods of this object, and R variables as properties of the object. For example: ⍝ Open the R interface and try a few simple things r←'r' ⎕new 'r' r.sqrt 2 1.414213562 r.sqrt (⊂⍳5) 1 1.414213562 1.732050808 2 2.236067977 r.sqrt ¯1 [r:NAN] ⍝ Returns a special R object NAN r.mean (⊂⍳10) 5.5 When calling R functions, the APLX right argument is always a vector where each element corresponds to one argument of the R function. The calls to the `sqrt` and `mean` functions above illustrate this; to pass an array as the argument, it needs to be enclosed. Creating variables in the R environment Assigning to a symbol as though it were a property of the R session class creates a variable in the R world: r.x←2 3⍴⍳6 ⍝ x is an R variable r.x 1 2 3 4 5 6 r.x.⎕ref [r:matrix] Evaluating R expressions Because R is an interpreted language, it is possible to use the System Function `⎕EVAL` to run lines of R code, for setting up variables in the R environment, for defining R functions, and so on. 'r' ⎕eval '4:9' 4 5 6 7 8 9 However, a more convenient syntax is provided (for the 'r' class only) in which `⎕EVAL` is a monadic system method. The right argument is a text vector containing any expression which is a valid line of R code. The result is the explicit result (if any) of evaluating the expression in the external environment. For example: r←'r' ⎕new 'r' r.x←2 3⍴⍳6 ⍝ x is an R variable r.x 1 2 3 4 5 6 r.⎕eval 'x[2,]' 4 5 6 r.⎕eval 'mean(x[2,])' 5 Note that the last line could be executed using the alternative syntax where `⎕EVAL` is a system function: 'r' ⎕eval 'mean(x[2,])' 5 Example: 3-D plot In this short but complete example (based on an article by Skomorokhov and Kutinsky from Quote Quad 123 No 4), we create some data in the R environment, define an R function, and run the R outer product to create some test data. We then call the R `persp` function to create a 3-D plot: r←'r' ⎕new 'r' x←r.⎕eval 'seq(-10,10,length=50)' y←x ⍝ Define an R function and return a reference to it: fn←r.⎕eval 'foo<-function(x,y){r<-sqrt(x^2+y^2);10sin(r)/r}' fn [r:function] r.z←r.outer(x y fn) r.x←x r.y←y ⊣r.⎕eval 'persp(x,y,z,theta=30,phi=30,expand=0.5,xlab="X",ylab="Y",zlab="Z")' This causes R to open a window and display a 3-d perspective chart: Listing R variables and functions The `⎕NL` system method can be used to get the names of R variables and/or functions. The function list includes built-in functions and functions from all the loaded R packages, so may be several thousand items long: ⍝ List R variables: vars←r.⎕nl 2 ⍴vars 129 21 ⍝ List R functions: fns←r.⎕nl 3 ⍴fns 2058 34 ⍝ There are lots of them! `⎕DESC` can be used to get the full R function list together with details of the parameters (Caution: the result is very large): fns2←r.⎕desc 3 fns2[1445+⍳5;] pwilcox (q, m, n, lower.tail = TRUE, log.p = FALSE) q (save = "default", status = 0, runLast = TRUE) qbeta (p, shape1, shape2, ncp = 0, lower.tail = TRUE, log.p = FALSE) qbinom (p, size, prob, lower.tail = TRUE, log.p = FALSE) qbirthday (prob = 0.5, classes = 365, coincident = 2) R naming conventions R function names can have characters such as a `<` and `-` in them, which are not legal as symbol names in APLX. To call these in APLX as direct method calls, you need to escape the illegal character with a `$` character. (This is not of course necessary when using `⎕EVAL`, where the string is passed as-is to R). For example, to call `attr<-` from APLX, you would call `r.attr$<$-`. Conversion of R data types to APL data Simple numeric arrays and arrays of strings passed from APLX to R are converted directly to the R equivalent array, and are converted back automatically ('unboxed') when referenced or returned from an R function call, unless you use `⎕REF` to force an object reference to be returned: r.y←2.2 3.3 4.4 r.y 2.2 3.3 4.4 r.y.⎕ref [r:numeric] (r.y.⎕ref).⎕ds ⍝ Use R to format the R array [1] 2.2 3.3 4.4 r.⎕eval 'mean(y)' 3.3 Complex, NA and NAN data types The APLX R interface defines three special object classes for NA ('Not Available'), NaN ('Not A Number') and complex-number data, which R routines may return, or which you may want to pass as arguments into R functions. For example, the following R expression returns a complex number: c←r.⎕eval '3+4i' c [r:complex] c.format 3+4i Instances of these object classes can be created by using `⎕NEW`: NA←'r' ⎕new 'NA' NA [r:NA] NAN←'r' ⎕new 'NAN' r.z←55.6 77.4 NAN 81 NA r.z 55.6 77.4 [r:NAN] 81 [r:NA] r.sqrt (⊂r.z) 7.456540753 8.797726979 [r:NAN] 9 [r:NA] The `complex` class allows you to create either a single complex number, by using a constructor with two numbers for real/imaginary parts: c←'r' ⎕new 'complex' 2 3 c [r:complex] c.format 2+3i or to build an R complex array by passing an array of length-2 vectors of the real and imaginary parts of each complex number: m←'r' ⎕new 'complex' (3 2⍴(1 2) (3 4) (5 6) (7 8) (9 10) (11 12)) m [r:matrix] m.format 1+ 2i 3+ 4i 5+ 6i 7+ 8i 9+10i 11+12i You can access or specify the real and imaginary parts directly using the pseudo-properties `real` and `imag` of the `complex` object: m.real 1 3 5 7 9 11 m.imag←3 2⍴.1×⍳6 m.format 1+0.1i 3+0.2i 5+0.3i 7+0.4i 9+0.5i 11+0.6i m.imag 0.1 0.2 0.3 0.4 0.5 0.6 NAs and NaNs are also supported in Complex arrays: v←'r' ⎕new 'complex' ((3.2 3.4) NA (1.1 8.2)) v.format 3.2+3.4i NA 1.1+8.2i v.real 3.2 [r:NA] 1.1 v.imag 3.4 [r:NA] 8.2 (r.sqrt v).format 1.983563+0.857043i NA 2.164885+1.893865i Advanced R data types Other R types, such as factors and lists, are left 'boxed up' as references to the underlying R object (unless you use `⎕VAL` to force an unbox, if this is possible): lst←r.⎕eval 'list(name="Fred",age=99) lst [r:list] lst.⎕val Fred 99 ⎕display lst.⎕val An object which is still boxed up can be passed as an argument to an R function: r.length lst 2 r.names lst name age As a convenience you can also write this last example as: lst.length 2 lst.names name age This works because APLX treats the expression obj.function arg1,arg2,... ...as equivalent to: r.function obj,arg1,arg2,... Examining an object with `⎕DS` The system method `⎕DS` can be used to examine an R object. It's equivalent to calling the `print` method when working in an interactive R session. lst←r.⎕eval 'list(name="Fred",age=99) lst [r:list] lst.⎕ds $name [1] "Fred" $age [1] 99 Functions on the left side of an R assignment In R, a function name can sometimes be given on the left side of an R assignment as the fourth line of the following example written in the R language shows: > lst<-list(name="Fred",age=99) > names(lst) [1] "name" "age" > names(lst)<-c("firstname", "age") > names(lst) [1] "firstname" "age" What actually happens 'under the hood' is that R treats an assignment like: function(obj) <- value ...as being a call to a function called "function<-" with the function result assigned to the object, i.e. obj <- "function<-" (obj, value) If you wanted to call this function in APLX you could do so, using the $ character to escape the function name: lst←lst.names$<$- (⊂'firstname' 'age') lst.names firstname age However, APLX also supports a much more convenience syntax: lst.names←'firstname' 'age' Indexing lists by name In the R language a list can be indexed either by number or by name, e.g. > lst[[2]] $age [1] 99 > lst$age [1] 99 This is achieved by special R indexing functions called [[ and $ which can also be called from APLX (once again using a $ to escape the function name): lst.$[$[ 2 99 lst.$$ 'age' 99 It is also possible to change the value of a list item, which you would do in R by writing "`lst$age<-95`". Under the hood, R is using a function called `$<-` which we can call from APLX: lst←lst.$$$<$- 'age' 95 Attributes R objects can have attributes* attached to them. By convention, any reference to `∆XXX` is interpreted as an implicit call to `attr(obj, XXX)`: ⍝ Get a copy of the R 'Iris' variable, a sample 'data.frame' iris←r.iris iris [r:frame] (iris.attributes).names names row.names class iris.∆names Sepal.Length Sepal.Width Petal.Length Petal.Width Species You can also change the value of attributes or add your own. Any assignment to `∆XXX` is interpreted as an implicit call to `attr<-(obj, XXX)`: f.∆mycustomatt ← 'Some attribute' f.∆mycustomatt Some attribute ⍝ Longer-winded way of doing the same thing, but creating a new object: f2←r.attr$<$- f 'mycustomattr' 'Some other attribute' r.attr f2 'mycustomattr' Some other attribute Here is an example of creating an R data.frame object from some APL data: data←?3 5⍴100 ⍝ Random APL data for demo data 95 6 77 78 83 13 2 69 87 63 74 73 100 89 24 frame←r.data.frame (⊂data) frame.attributes.⎕ds $names [1] "X1" "X2" "X3" "X4" "X5" $row.names [1] 1 2 3 $class [1] "data.frame" frame.∆names←'Fish' 'Chips' 'Ham' 'Eggs' 'Tea' frame.⎕ds Fish Chips Ham Eggs Tea 1 95 6 77 78 83 2 13 2 69 87 63 3 74 73 100 89 24 frame.summary.⎕ds Fish Chips Ham Eggs Tea Min. :13.00 Min. : 2.0 Min. : 69.0 Min. :78.00 Min. :24.00 1st Qu.:43.50 1st Qu.: 4.0 1st Qu.: 73.0 1st Qu.:82.50 1st Qu.:43.50 Median :74.00 Median : 6.0 Median : 77.0 Median :87.00 Median :63.00 Mean :60.67 Mean :27.0 Mean : 82.0 Mean :84.67 Mean :56.67 3rd Qu.:84.50 3rd Qu.:39.5 3rd Qu.: 88.5 3rd Qu.:88.00 3rd Qu.:73.00 Max. :95.00 Max. :73.0 Max. :100.0 Max. :89.00 Max. :83.00 frame.plot Using the R interface from multiple APL tasks Because it is not safe to call the R interpreter from multiple threads, you cannot use the R interface from more than one APL task at a time. If you try to do so, you will get an error message and a FILE LOCKED error: r←'r' ⎕new 'r' This interface cannot be used by more than one APL task at a time FILE LOCKED r←'r' ⎕new 'r' ^ The lock will be cleared when the APL task which has been accessing R executes a `)CLEAR`, `)LOAD`, or `)OFF`.
Topic: APLX Help : Interfacing to other languages : Interfacing to R [Next \| Previous \| Contents \| Index \| APL Home ]

Copyright © 1996-2010 MicroAPL Ltd