Chapter 28: Data Files

The subject of file-handling in general, and how data is organized in files, is a major topic in itself. In this chapter we will cover only a selection of the facilities available in J.

J functions to read files produce results in the form of character-strings, and similarly functions to write files take strings as arguments. Such a string can be the whole data content of a file when the available memory of the computer is sufficient.

Our approach here will be to look first at some J functions for input and output of strings. Then we look at a few examples of dealing with strings as representing data in various formats.

28.1 Reading and Writing Files

28.1.1 Built-in Verbs

In the following, a filename is a string which is valid as a filename for the operating-system of the computer where we are running J. For example:

   F =: 'c:\demofile.xyz'       NB. a filename

The built-in verb 1!:2 writes data to a file. The right argument is a boxed filename. The left argument is a character-string, the data to be written. The effect is that the file is created if it does not already exist, and the data becomes the whole content of the file. The result is null.

   'some data' 1!:2 < F    NB. write to file F

The built-in verb 1!:1 reads data from a file. The right argument is a boxed filename. The result is a character-string, the data read.

   data =: 1!:1 < F     NB.  read from file F

data $ data
some data
9

28.1.2 Screen and Keyboard As Files

The screen and keyboard can be treated as files. When the expression (1!:1) 1 is evaluated, a prompt is displayed, and the result is whatever string is then typed by the user at the keyboard.

Evaluating the expression x (1!:2) 2 causes the value of x to be displayed on-screen. x can be any noun; it is not restricted to being a string. Since 1!:2 is a verb, it returns the value of x as well as producing the on-screen display.

   x =: 'hello'; 6
   
   x (1!:2) 2
+-----+-+
|hello|6|
+-----+-+
+-----+-+
|hello|6|
+-----+-+
   

28.1.3 Library Verbs

The library script files provides a number of useful verbs. Here is a brief summary of a selection:

s fwrite F write string s to file F
fread F read string from file F
s fappend F append string s to file F
fread F;B,L read slice from file F, starting at B, length L
s fwrites F write text s to file F
freads F read text from file F
fexist F true if file F exists
ferase F delete file F

From now on we will use library verbs for our file-handling.

   require 'files'

The library verb fwrite writes data to a file. The right argument is a filename. The left argument is a character-string, the data to be written. The effect is that the file is created if it does not already exist, and the data becomes the whole content of the file.

   'some data' fwrite F    NB. file write
9

The result shows the number of characters written. A result of _1 shows an error: either the left argument is not a string or the right argument is not valid as a filename, or the specified file exists but is read-only.

(3;4) fwrite 'c:\junk' 'hello' fwrite 'q:\junk'
_1
_1

The library verb fread reads data from file. The argument is a filename and the result is a character-string.

z =: fread F $z
some data
9

A result of _1 shows an error: the specified file does not exist, or is locked.

fread 'qwerty' fexist 'qwerty'
_1
0

28.2 Large Files

For large files, the memory of the computer may not be sufficient to allow the file to be treated as a single string. We look at this case very briefly. An alternative approach, using mapped files, is at present outside our scope.

Write a file with some initial content:

   'abcdefgh' fwrite F
8

We can append some data to the file with library verb fappend.

   'MORE' fappend F
4

To see the effect of fappend (just for this demonstration, but not of course for a large file) we can read the whole file again :

   fread F
abcdefghMORE

We can read a selected slice of the file, say 8 bytes starting from byte 4. In this case we use fread with a right argument of the form filename;start,size.

   start =: 4
   size  =: 8
   fread F ; start, size
efghMORE
   

28.3 Data Formats

We look now at a few examples of how data may be organized in a file, that is, represented by a string. Hence we look at converting between character strings, with various internal structures, and J variables.

We take it that files are read and written for the purpose of exchanging data between programs. Two such programs we can call "writer" and "reader". Questions which arise include:

  1. Are writer and reader both to be J programs? If so, then there is a convenient J-only format. If not, then we expect to work from a programming-language-independent description of the data.

  2. Are writer and reader to run on computers with the same architecture? If not, then even in the J-to-J situation, some finesse may be needed.

  3. Is the data organized entirely as a repetition of some structure (for example, "fixed length records"). If so then we may usefully be able to treat it as one or more J arrays. If not, we may need some ad-hoc programming.

28.3.1 J-0nly Files

Suppose we aim to handle certain files only in J programs, so that we are free to choose any file format convenient for the J programmer. The "binary representation" is particularly convenient.

For any array A,

   A =:  'Thurs'; 19 4 2001 

the binary representation of A is a character string. There are built-in verbs to convert between arrays and binary representations of arrays.

   arrbin  =: 3!:1   NB. array to binary rep.
   binarr  =: 3!:2   NB. binary rep. to array

If B is the binary representation of A, we see that B is a character string, with a certain length.

A $ B =: arrbin A
+-----+---------+ 
|Thurs|19 4 2001| 
+-----+---------+
88

We can write B to a file, read it back, and do the inverse conversion to recover the value of A :

B fwrite F $ Z =: fread F binarr Z
88
88
+-----+---------+ 
|Thurs|19 4 2001| 
+-----+---------+

From J4.06 on, there are variations of the binary representation verbs above to allow for different machine architectures: see the Dictionary under 3!:1.

28.3.2 Text Files

The expression a. (lower-case a dot) is a built-in noun, a character-string containing all 256 ASCII characters in sequence.

65 66 67 { a. $ a.
ABC
256

In the ASCII character set, that is, in a., the character at position 0 is the null, at position 10 is line-feed and at position 13 is carriage return . In J, the names CR and LF are predefined in the standard profile to mean the carriage-return and linefeed characters.

   a. i. CR,LF
13 10

We saw fread and fwrite used for reading and writing character files. Text files are a special kind of character file, in that lines are delimited by CR and/or LF characters.

On some systems the convention is that lines of text are delimited by a single LF and on other systems a CR,LF pair is expected. Regardless of the system on which J is running, for J text variables, the convention is always followed of delimiting a line with single LF and no CR.

Here is an example of a text variable.

   t =: 0 : 0
There is physics
and there is 
stamp-collecting.
)

Evidently it is a string (that is, a 1-dimensional character list) with 3 LF characters and no CR characters.

$ t +/t=LF +/t=CR
49
3
0

If we aim to write this text variable t to a text file, we must choose between the single-LF or CRLF conventions. In the case of single-LF, then fwrite will do thejob.

However, we might choose CRLF if the text file is destined for some other (non-J) application on our system which expects CRLF. In this case there is a suitable library verb named fwrites which converts LF characters to CRLF pairs before writing.

   t fwrites F
52

If we now read back file F with fread, we see that the data contains CRLF pairs:

$ w =: fread F +/w=CR +/w=LF
52
3
3

There is another library verb, freads which reads a file converting CRLF pairs to LF.

$ q =: freads F +/q=CR +/q=LF
49
0
3

For convenience in dealing with a text variable such as q, we can cut it into lines. A verb for this purpose is cut (described more fully in Chapter 17 ).

   cut =: < ;. _2

cut produces a boxed list of lines, removing the LF at the end of each line.

   lines =: cut q
   lines
+----------------+-------------+-----------------+
|There is physics|and there is |stamp-collecting.|
+----------------+-------------+-----------------+

The inverse of cut we can call uncut. It restores the LF at the end of each box and then razes to make a string.

   uncut =: ; @: (,&CR &. >)
   uncut lines
There is physics
and there is 
stamp-collecting.

28.3.3 Fixed Length Records with Binary Data

Suppose our data is in two J variables: a table of customer-names, and for each customer a corresponding amount:

cnames =: 'Mr Rochester' ,: 'Jane' ,. amts =: _10000 3
Mr Rochester 
Jane
_10000 
     3

Now suppose the aim is to write this data to a file, formatted in 16-byte records, one record per customer. Each record is to have two fields: customer-name in 12 bytes followed by amount in 4 bytes, as a signed integer. Here is a possible approach.

The plan is to construct, from cnames and amts, an n-by-16 character table, to be called records. For this example, the number of records, n is given by # cnames

   ] n =: # cnames 
2
   

and with 2 customers, records will look like this:

          Mr Rochester####
          Jane        #### 
   

where #### represents the 4 characters of an integer in binary form.

We build the records table by stitching together side by side an n-by-12 table for the customer names field, and an n-by-4 table for the amounts field.

For the customer-names field we already have cnames which is suitable, since it is 12 bytes wide:

   $ cnames
2 12

For the amounts field we convert amts to characters, using ci4 from Chapter 27. The result is a single string, which is reshaped to be n-by-4.

   ci4 =:  2 & (3!:4)  NB. integer to 4 char
   
   amtsfield =: (n , 4) $ ci4 amts

Now we build the n-by-16 records table by stitching together side-by-side the two "field" tables:

   records =: cnames ,. amtsfield

To inspect records, here is a utility verb which shows a non-printing character as #

   inspect =: 3 : ('A=.a.{~32+i.96';'(A i.y.) { A,''#''')
   

inspect records $ records
Mr Rochester#### 
Jane        ####
2 16

The outgoing string to be written to the file is the ravel of the records.

   (, records) fwrite F
32

The inverse of the process is to recover J variables from the file. We read the file to get the incoming string.

   instr =: fread F

Since the record-length is known to be 16, the number of records is

   n  =: (# instr) % 16

Reshape the incoming string to get the records table.

   inspect records =: (n,16) $ instr
Mr Rochester####
Jane        ####

and extract the data. The customer-names are obtained directly, as columns 0-11 of records.

   cnames =: (i.12) {"1 records

For the amounts, we extract columns 12-15, ravel into a single string and convert to integers with c4i.

   c4i =: _2 & (3!:4)  NB. 4 char  to integer
   
   amts   =: c4i  , (12+i.4) {"1  records

cnames ,. amts
Mr Rochester 
Jane
_10000 
     3

This is the end of Chapter 28


NEXT
Table of Contents


Copyright © Roger Stokes 2002. This material may be freely reproduced, provided that this copyright notice is also reproduced.

last updated 18 Mar 2002