Quad-DR Data Representation

Topic: `APLX Help` : `Help on APL language` : `System Functions & Variables` : `⎕DR Data Representation` [ Previous \| Next \| Contents \| Index \| APL Home ]
	`⎕DR` Data Representation
The system function `⎕DR` is used to examine or alter the internal type of an item of data using the following codes: 1 = Boolean (patterns of 1s and 0s) 2 = Integer 3 = Floating point 4 = Character 5 = Overlay 6 = Nested or mixed 7 = Object or class reference One-argument form Reports data type of any array. ⎕DR 2.9 3 X←1 0 1 1 0 1 ⎕DR X 1 ⎕DR 'ABC' 1 2 3 6 ⎕DR (⍳10) (2 2⍴⍳4) 6 APL will choose which type of number format to use, and certain operations will force data to be of a specific type. For example, the result of a comparison `(<` `≤` `=` `≥` `>` `≠)` is guaranteed to be a 0 or a 1, and the result of these operations is thus Boolean. Internally, the different types of data take up different amounts of space: Code Type Space ─────────────────────────────────────────────── 1 Boolean 1 bit per element 2 Integer 4 bytes per element (32 bits) or 8 bytes per element (64 bits) under APLX64 3 Floating point 8 bytes per element (64 bits) 4 Character 1 byte per element (8 bits) 5 Overlay see ⎕OV 6 Nested/mixed depends on contents 7 Object/Class ref. 4 bytes per element (32 bits) or 8 bytes per element (64 bits) under APLX64 (see also `⎕AT`). Two-argument form (scalar left argument in range 1 to 4) On occasions it is useful to examine or change the data type, either for encryption purposes, or to combine character and numeric data quickly. When used with a left argument consisting of one of the data-type codes in the range 1 to 4, and a right argument consisting of a data array of type 1 to 4, `⎕DR` converts the item to the representation specified: 1 ⎕DR 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 (The 32-bit binary pattern for 5) Note that, under APLX64, integers are represented internally as 64-bit numbers, so converting an integer scalar to binary returns a length-64 binary vector. The conversion of one type to another changes the number of elements in data, and so `⎕DR` must change the dimensions of its result. The last dimension of the result is adjusted to compensate for the type change. Zero bits are used to pad out a row if necessary. How the data conversion works In essence, all that dyadic `⎕DR` does is to change the workspace entry's type, without changing the bit pattern of the data. Suppose you start with the character vector `'1234'`. This data is held internally as the four bytes, hex `31 32 33 34`. If those same four bytes are used as the data portion of a binary vector, you will get the bit pattern corresponding to those four bytes: 1 ⎕DR '1234' 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 The first 8 bits are the binary pattern for hex 31, i.e. `0 0 1 1 0 0 0 1`, the second 8 bits are the pattern for hex 32, etc. If you represent this data as an integer (in a 32-bit version of APLX), you will get the 32-bit integer which corresponds to this bit pattern: 2 ⎕DR '1234' 825373492 which is the decimal number equivalent to hex 31323334. Notes a) For a scalar or vector, the length obviously changes (a 32-bit scalar integer becomes a length four character vector or a length 32 binary vector). For higher dimensional arrays, the last dimension is increased or reduced as necessary. b) But what if there are not enough elements? For example, suppose we ask for the 8-byte float representation but only give it 4 bytes (this would happen if we ask for `3 ⎕DR '1234'`). The rule APLX applies in this case is that the data is padded on the right with null (zero) bytes to make up the necessary number of whole data elements. So `3 ⎕DR '1234'` is the same as `3 ⎕DR '1234', ⎕AF 0 0 0 0` c) Changing arbitrary data to float is potentially dangerous because you can produce a bit pattern which is not legal as a 64-bit IEEE floating point number (the internal representation used by APLX for float numbers). d) If you are running APLX64, the 64-bit version of APLX, you will get different answers for integer arguments, because integers are represented as 8 byte-numbers in APLX64. In APLX: 1 ⎕DR 825373492 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 ⍴1 ⎕DR 825373492 32 In APLX64: 1 ⎕DR 825373492 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 ⍴1 ⎕DR 825373492 64 Byte-Ordering Issues In the case of a big-endian processor (PowerPC, SPARC, 68000, etc), the 32-bit hex number 31323234, if viewed as a series of bytes, is in the order you expect: hex bytes 31 32 33 34. But on a little-endian processor (Intel), it is backwards: 34 33 32 31. So the question arises: on a little-endian processor, should the result of transforming a 32-bit integer to binary/character treat the data as a 32-bit container (hex 31323334), or as a series of four bytes as they would appear in memory (hex 34 33 32 31)? In APLX, by default it is treated as 32-bit container (i.e. effectively swap the bytes for the little-endian case - but see below for changing this default). This design decision was taken for two reasons. Firstly, it means the result is the same on all APLX (32-bit) platforms. Secondly, it means the results are consistent with what you would reasonably expect. For example, on all 32-bit platforms APLX gives the following result when converting an integer to binary: 1 ⎕DR 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 (the bit pattern for the integer 2, comprising four bytes) ⎕AF 4 ⎕DR 2 0 0 0 2 (the individual bytes which make up the 4-byte number 2, even though on a little-endian machine they would be backwards if the processor read them as individual bytes) 2 ⎕DR 4 ⎕DR 1 ⎕DR 2 2 (Convert the integer number 2 to binary. Convert the resulting binary vector to a length 4 character vector. You get back the number you started with.) Two-argument form (vector left argument or compatibility mode) As well as the default conversions described above, you can force alternative representations of the converted data by providing a two- or three-element vector as the left argument. You can also use alternative codes for the standard conversions 1 to 4 in the first element, for compatibility with other APL interpreters, as follows: Code Type Space ─────────────────────────────────────────────── 11 Boolean 1 bit per element 83 Integer 1 byte per element (8 bits) 163 Integer 2 bytes per element (16 bits, little-endian) 323 Integer 4 bytes per element (32 bits, little-endian) 7 Integer 8 bytes per element (64 bits) 643 Integer 8 bytes per element (64 bits, little-endian) 645 Floating point 8 bytes per element (64 bits, little-endian) 82 Character 1 byte per element (8 bits) The optional second element, if supplied, is the number of bytes per element when converting from character to integer or float, and vice versa. 0 means use the default value implied by the first argument. The third element, if supplied, is the byte-ordering (endian) flag: 0 Big-endian or as implied by the fist element (default) 1 Little-endian 2 Natural-endian (big on big-endian systems, little on little-endian systems) Note that you can only specify the element size if you are converting to or from a character representation. Examples ⎕AF 4 ⎕DR 2 ⍝ Convert 32-bit integer 2 to four characters 0 0 0 2 ⎕AF 4 0 1 ⎕DR 2 ⍝ Same, but little-endian byte order 2 0 0 0 ⎕AF 4 2 1 ⎕DR 2 ⍝ Force integer to be consider a 16-bit value 2 0 ⎕AF 4 2 1 ⎕DR 200000 ⍝ Too large to fit in 16 bits DOMAIN ERROR ⎕AF 4 2 1 ⎕dr 200000 ^ ⎕AF 82 ⎕DR 2 ⍝ Compatibility mode: Int to Char, little-endian 2 0 0 0 323 ⎕DR 82 ⎕DR 23 ⍝ Round trip 23 ⎕AF 4 ⎕DR 2.56 ⍝ Convert float to 8 bytes (IEEE Double representation) 64 4 122 225 71 174 20 122 ⎕AF 4 4 ⎕DR 2.56 ⍝ Convert float to 4 bytes (IEEE Single representation) 64 35 215 10
Topic: `APLX Help` : `Help on APL language` : `System Functions & Variables` : `⎕DR Data Representation` [ Previous \| Next \| Contents \| Index \| APL Home ]

Copyright © 1996-2010 MicroAPL Ltd