The monadic system function ⎕IMPORT imports data from an external file into an APL array. It supports a number of data formats which are commonly used for data exchange by spreadsheets and other (non-APL) applications. (See also ⎕EXPORT which allows you to export data to a file of specified format, which can then be read into another application.)
The right argument determines the name of the file to be read, and the format of the file. If the right argument is a character vector, it is interpreted as the name of the file you want to import (including full path if required) and the format of the file is inferred from the file extension. If the right argument is a two element nested vector, the first element is the filename (or full pathname), and the second is a text string specifying the file type. File types are case-insensitive.
The explicit result is the converted data.
For example, the two following statements are equivalent, and will import the contents of the file Budget2007.csv in 'comma-separated variables' ('CSV') format, into a variable called BUDGET in the workspace:
BUDGET ← ⎕IMPORT 'Budget2007.csv'
BUDGET ← ⎕IMPORT 'Budget2007.csv' 'csv'
The following file formats are supported by ⎕IMPORT , with the behavior shown:
File type/extension |
Description |
Behavior |
'txt' |
Text file, with data represented in 8-bit extended ASCII form. |
The contents of the file are read as text, and converted to APL text form. The result is a character vector, with APL newline characters (⎕R ) between each line. |
'utf16' or 'utf-16' |
Same as 'txt', but with characters represented in 16-bit UTF-16 Unicode form (2 bytes per character). |
Same as 'txt'. Any Unicode characters which cannot be represented in APLX will be converted to the character set by ⎕MC (by default, question mark). |
'utf8' or 'utf-8' |
Same as 'txt', but with characters represented in the 8-bit UTF-8 Unicode form (variable number of bytes per character). |
Same as 'txt'. Any Unicode characters which cannot be represented in APLX will be converted to ⎕MC (by default, question mark). |
'csv' |
'Comma-separated variables' format, as used by many applications such as spreadsheets for data exchange. The file comprises one line of text per row of the data, with individual elements separated by commas. Numeric elements are expressed in text form. Text elements are usually surrounded by double-quotation marks. |
The result is always a matrix. Elements which are either enclosed in quotes, or are not valid numbers, are converted to text vectors. Elements which are valid numbers and not enclosed in quotes are converted to APL numeric form. The result is either a nested or a numeric matrix, depending on whether any of the elements were text vectors. |
'tsv' |
'Tab-separated variables' format. Same as CSV, except the fields are separated by tab characters instead of commas. |
Same as for CSV. |
'slk' |
Symbolic Link (SYLK) format, a Microsoft-specified file format typically used to exchange data between applications such as Excel and other spreadsheets. |
The result is always a matrix. Elements which are either enclosed in quotes, or are not valid numbers, are converted to text vectors. Elements which are valid numbers and not enclosed in quotes are converted to APL numeric form. The result is either a nested or a numeric matrix, depending on whether any of the elements were text vectors. Some non-ASCII characters may not be representable in APL; these will be replaced by ⎕MC (by default, question mark). |
'xml' |
Extensible Markup Language (XML) format, a format used for saving structured data with markup information. |
The file is read and converted to an APL array with the same specification as ⎕XML . This conversion is
equivalent to the two-stage command:
⎕XML ⎕IMPORT 'filename' 'utf8'
The file may be encoded in UTF-8 or UTF-16 format; APLX determines the file encoding automatically.
Some non-ASCII characters may not be representable in APL; these will be replaced by ⎕MC (by default, question mark). |
For example, suppose that the file 'Budget2007.csv' contains the following lines in CSV format:
"","Q1","Q2","Q3","Q4"
"Sales",11300,13220,16550,19230
"Expenses",12450,12950,13640,13980
"Profit",-1150,270,2910,5250
This can be read into a 4-row matrix of text vectors and numbers as follows:
BUDGET ← ⎕IMPORT 'Budget2007.csv'
BUDGET
Q1 Q2 Q3 Q4
Sales 11300 13220 16550 19230
Expenses 12450 12950 13640 13980
Profit ¯1150 270 2910 5250
⎕DISPLAY BUDGET
┌→───────────────────────────────────┐
↓ ┌⊖┐ ┌→─┐ ┌→─┐ ┌→─┐ ┌→─┐ │
│ │ │ │Q1│ │Q2│ │Q3│ │Q4│ │
│ └─┘ └──┘ └──┘ └──┘ └──┘ │
│ ┌→────┐ │
│ │Sales│ 11300 13220 16550 19230 │
│ └─────┘ │
│ ┌→───────┐ │
│ │Expenses│ 12450 12950 13640 13980 │
│ └────────┘ │
│ ┌→─────┐ │
│ │Profit│ ¯1150 270 2910 5250 │
│ └──────┘ │
└∊───────────────────────────────────┘
Errors
If the file import fails because the format does not match what APLX is expecting, the error DATA DAMAGED will reported. In addition, a longer text message which gives more information will be displayed to the Session window (this is suppressed if error trapping is enabled). This example fails because the data is not in SYLK format:
BUDGET ← ⎕IMPORT 'Budget2007.csv' 'slk'
Could not determine bounds of array
DATA DAMAGED
BUDGET←⎕IMPORT 'Budget2007.csv' 'slk'
^
Special considerations for Client-Server implementations of APLX
In Client-Server implementations of APLX, the front-end which implements the user-interface (the "Client") runs on one machine, and the APLX interpreter itself (the "Server") can run on a different machine. Typically, the Client will be the APLX front-end built as a 32-bit Windows application running on a desktop PC, and the Server will be a 64-bit APLX64 interpreter running on a 64-bit Linux or Windows server.
In such systems, you can specify whether the file should be accessed on the Client or the Server machine. You do this by preceding the file name with either an Up Arrow ↑ to indicate that the file should be accessed on the Client, or a Down Arrow ↓ to indicate that it should be accessed on the Server. If you do not specify, the default is that the access takes place on the Client.
|