From: Mark Grosberg <XXXXXXXXXXXXXXXXXXXXX>
To: Sean Conner <sean@conman.org>
Subject: Boston: Well, since you're in the land of non-portability …
Date: Sun, 29 Jan 2012 05:55:00
> > > > >
> ```
> static void hexout(char *dest,unsigned long value,size_t size,const int
> padding)
> {
> assert(dest != NULL);
> assert(size > 0);
> assert((padding >= ' ') && (padding <= '~'));
>
> dest[size] = padding;
> while(size--)
> {
> dest[size] = (char)((value & 0x0F) + '0');
> if (dest[size] > '9') dest[size] += 7;
> value >>= 4;
> }
> }
>
> ```
>
You're also in the land of ASCII (American Standard Code for Information Interchange) specificness. Couldn't you make that:
dest[size] = "0123456789ABCDEF"[value & 0x0f];
And then not be tied to ASCII? You could also then switch out that array pointer if you wanted to get a mix of uppercase, lower case depending on what you need.
-MYG
I initially reject the idea of doing this. My reasoning? The code itself is already non-portable, being restricted to a Posix [1]-like system. So what's one more non-portable item on the list? The sequence if (dest[size] > '9') dest[size] += 7 is around six (for a lot of architectures that aren't RISC (Reduced Instruction Set Computer) based) to twelve bytes (RISC systems) in size, and now you want to add an additional 16 bytes? [He asks, working from a system with a few gigabytes of RAM (Random Access Memory) —Editor] [Shut up! –Sean]. Also, in my nearly 30 years of working with computers, I've yet to come across a non-ASCII based computer system.
Yes, there are a few. Baudot code [2] perhaps being the oldest and perhaps, the oddest one. Then there are the 6-bit character encoding schemes [3] and Radix-50 [4], which pack multiple 6-bit characters per “word” of storage (where a “word” could be 16, 18, 32, 36, 60 or 66 bits in size) and varied from system to system. And let's not forget EBCDIC (Extended Binary Coded Decimal Interchange Code) [5], one of about six nearly identical, but maddendly different, encoding schemes developed by IBM [6]. All of these were developed for machines in the 60s, but ASCII won out in the end, being the most widely used and at the core of Unicode [7].
So I asked on a mailing list of classic computer enthusiasts:
From: Sean Conner <spc@conman.org>
To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject: C compilers and non-ASCII systems
Date: Tue, 31 Jan 2012 11:21:02 -0500
> A friend recently raised an issue with some code I wrote (a hex dump routine) saying it depended upon ASCII and thus, would break on non-ASCII based systems (and proposed a solution, but that's beside the issue here). I wrote back, saying the code in question was non-portable to begin with (since it depended upon read() and write()—it was targetted at Posix based systems) and besides, I've never encountered a non-ASCII system in the nearly 30 years I've been using computers.
So now I'm wondering—besides Baudot, 6-bit BCD (Binary Coded Decimal) and EBCDIC (Extended Binary Coded Decimal Interchange Code), is there any other encoding scheme used? And of Baudot, 6-bit BCD and EBCDIC, are there any systems using those encoding schemes AND have a C compiler available?
-spc (Or can I safely assume ASCII and derivatives these days?)
I figure if anyone knew the answer, these people would (many of them not only use computers like the PDP-10 [8], but use them as heaters during the winter months).
The answers were fascinating.
From: "Shoppa, Tim" <XXXXXXXXXXXXXXXXX>
To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject: Re: C compilers and non-ASCII systems
Date: Tue, 31 Jan 2012 13:18:55 -0500
> IBM has a very handy page on C compatibility with EBCDIC system services:
http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html [9]
From: "Dave" <XXXXXXXXXXXXXXXXXXXX>
To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject: RE: C compilers and non-ASCII systems
Date: Tue, 31 Jan 2012 19:33:06 -0000
> Please consider other character codes. An EBCDIC port of GCC is alive and well on several of the "legacy" operating systems (MVS, VM and Music) that run on the Hercules IBM 360/370/XA/390/z emulator. And whilst zLinux runs in ASCII (or whatever it uses to get more than 256 points in a code page) many zLinux sites also have the zVM hypervisor, which includes an optional EBCDIC C compiler. Having ported the BREXX interpreter to this environment I was stung by the fact that the original author had made assumptions about character ordering that are not true on an EBCDIC platform.
From: Phil Budne <XXXXXXXXXXXXXXXXX>
To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject: Re: C compilers and non-ASCII systems
Date: Tue, 31 Jan 2012 13:00:52 -0500
> See “IBM libascii functions for z/OS UNIX System Services”
http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html [10]
Overview
The libascii functions are integrated into the base of the Language Environment. They help you port ASCII-based C applications to the EBCDIC-based z/OS UNIX environment.
From: Nemo <XXXXXXXXXXXXXXXX>
To: Classic Computer Talk <XXXXXXXXXXXXXXXXXXXXX>
Subject: Re: C compilers and non-ASCII systems
Date: Tue, 31 Jan 2012 13:32:06 -0500
> z/OS is not only POSIX, it is UNIX (see http://www.opengroup.org/openbrand/register/brand3470.htm [11]).
Oh.
Well then …
I figure I would then try Mark's suggestion (and several other people on the mailing list suggested the same thing) and at least time the change to see if it's a worthwhile change for such odd-looking, but legal, C code.
>
```
/*************************************************************************
*
* Copyright 2012 by Sean Conner. All Rights Reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
* Comments, questions and criticisms can be sent to: sean@conman.org
*
*************************************************************************/
/* Style: C89, const correctness, assertive, system calls, full buffering */
/* lookup table */
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define LINESIZE 16
/********************************************************************/
extern const char *sys_errlist[];
extern int sys_nerr;
static void do_dump (const int,const int);
static size_t dump_line (char **const,unsigned char *,size_t,const unsigned long);
static void hexout (char *const,unsigned long,size_t,const int);
static void myperror (const char *const);
static size_t myread (const int,char *,size_t);
static void mywrite (const int,const char *const,const size_t);
/********************************************************************/
int main(const int argc,const char *const argv[])
{
if (argc == 1)
do_dump(STDIN_FILENO,STDOUT_FILENO);
else
{
int i;
for (i = 1 ; i < argc ; i++)
{
int fhin;
fhin = open(argv[i],O_RDONLY);
if (fhin == -1)
{
myperror(argv[i]);
continue;
}
mywrite(STDOUT_FILENO,"-----",5);
mywrite(STDOUT_FILENO,argv[i],strlen(argv[i]));
mywrite(STDOUT_FILENO,"-----\n",6);
do_dump(fhin,STDOUT_FILENO);
if (close(fhin) < 0)
myperror(argv[i]);
}
}
return EXIT_SUCCESS;
}
/************************************************************************/
static void do_dump(const int fhin,const int fhout)
{
unsigned char buffer[4096];
char outbuffer[75 * 109];
char *pout;
unsigned long off;
size_t bytes;
size_t count;
assert(fhin >= 0);
assert(fhout >= 0);
memset(outbuffer,' ',sizeof(outbuffer));
off = 0;
count = 0;
pout = outbuffer;
while((bytes = myread(fhin,(char *)buffer,sizeof(buffer))) > 0)
{
unsigned char *p = buffer;
for (p = buffer ; bytes > 0 ; )
{
size_t amount;
amount = dump_line(&pout,p,bytes,off);
p += amount;
bytes -= amount;
off += amount;
count++;
if (count == 109)
{
mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
memset(outbuffer,' ',sizeof(outbuffer));
count = 0;
pout = outbuffer;
}
}
}
if ((size_t)(pout - outbuffer) > 0)
mywrite(fhout,outbuffer,(size_t)(pout - outbuffer));
}
/********************************************************************/
static size_t dump_line(
char **const pline,
unsigned char *p,
size_t bytes,
const unsigned long off
)
{
char *line;
char *dh;
char *da;
size_t count;
assert(pline != NULL);
assert(*pline != NULL);
assert(p != NULL);
assert(bytes > 0);
line = *pline;
hexout(line,off,8,':');
if (bytes > LINESIZE)
bytes = LINESIZE;
p += bytes;
dh = &line[10 + bytes * 3];
da = &line[58 + bytes];
for (count = 0 ; count < bytes ; count++)
{
p --;
da --;
dh -= 3;
if ((*p >= ' ') && (*p <= '~'))
*da = *p;
else
*da = '.';
hexout(dh,(unsigned long)*p,2,' ');
}
line[58 + count] = '\n';
*pline = &line[59 + count];
return count;
}
/**********************************************************************/
static void hexout(char *const dest,unsigned long value,size_t size,const int padding)
{
assert(dest != NULL);
assert(size > 0);
assert((padding >= ' ') && (padding <= '~'));
dest[size] = padding;
while(size--)
{
dest[size] = "0123456789ABCDEF"[value & 0x0f];
value >>= 4;
}
}
/************************************************************************/
static void myperror(const char *const s)
{
int err = errno;
assert(s != NULL);
mywrite(STDERR_FILENO,s,strlen(s));
mywrite(STDERR_FILENO,": ",2);
if (err > sys_nerr)
mywrite(STDERR_FILENO,"(unknown)",9);
else
mywrite(STDERR_FILENO,sys_errlist[err],strlen(sys_errlist[err]));
mywrite(STDERR_FILENO,"\n",1);
}
/************************************************************************/
static size_t myread(const int fh,char *buf,size_t size)
{
size_t amount = 0;
assert(fh >= 0);
assert(buf != NULL);
assert(size > 0);
while(size > 0)
{
ssize_t bytes;
bytes = read(fh,buf,size);
if (bytes < 0)
{
myperror("read()");
exit(EXIT_FAILURE);
}
if (bytes == 0)
break;
amount += bytes;
size -= bytes;
buf += bytes;
}
return amount;
}
/*********************************************************************/
static void mywrite(const int fh,const char *const msg,const size_t size)
{
assert(fh >= 0);
assert(msg != NULL);
assert(size > 0);
if (write(fh,msg,size) < (ssize_t)size)
{
if (fh != STDERR_FILENO)
myperror("output");
exit(EXIT_FAILURE);
}
}
/***********************************************************************/
```
It can't be that much faster, can it?
>
```
[spc]lucy:~/projects/99/src>time ./22 ~/bin/firefox/libxul.so >/dev/null
real 0m0.468s
user 0m0.450s
sys 0m0.018s
[spc]lucy:~/projects/99/src>time ./23 ~/bin/firefox/libxul.so >/dev/null
real 0m0.257s
user 0m0.245s
sys 0m0.012s
```
Almost twice as fast as what I thought was the fastest version already.
Ouch.
Several people (including Mark) mentioned that on modern CPUs, a branch instruction is like hitting a brick wall.
Yes, it's quite apparent that that is true.
But this does give me an idea for removing one more [DELETED-brick wall-DELETED] branch point …
[1] http://en.wikipedia.org/wiki/POSIX
[2] http://en.wikipedia.org/wiki/Baudot_code
[3] http://en.wikipedia.org/wiki/BCD_(6-bit)
[4] http://en.wikipedia.org/wiki/DEC_Radix-50
[5] http://en.wikipedia.org/wiki/EBCDIC
[8] http://www.columbia.edu/cu/computinghistory/pdp10.html
[9] http://www-03.ibm.com/systems/z/os/zos/features/unix/bpxa1p03.html
[10] http://www-03.ibm.com/systems/z/os/zos/features/unix/libascii.html
[11] http://www.opengroup.org/openbrand/register/brand3470.htm