I spent today playing around with coroutines [1] and how I might go about implementing them. My feeling is that with the upcoming push for multiprocessor systems [2], using some form of coroutines might make better use of such power, while at the same time being a bit easier to wrap the brain around instead of dealing with actual multithreaded programming.
My “ideal sample program,” which is a pseudo-C like language, looked like:
>
```
int main(string argv[])
{
file input = file.stdin;
file output = file.stdout;
switch(argv.items)
{
case 3: output = file.open.write(argv[2]);
case 2: input = file.open.read(argv[1]);
default: break;
}
input => conversion(toupper)
=> strip_html
=> rot(13)
=> conversion(tolower)
=> rot(13)
=> output;
exit(EXIT_SUCCESS);
}
/********************************************/
coroutine void conversion(char (*conv)(char))
receive char c;
send char;
{
while(receive c)
send conv(c);
}
/*********************************************/
coroutine void strip_html
receive char c;
send char;
{
char c;
bool in_tag = false;
while(receive c)
{
if (c == '<')
in_tag = true;
else if (c == '>')
in_tag = false;
else
{
if (!in_tag)
send c;
}
}
}
/******************************************/
coroutine void rot(int bias)
receive char c;
send char;
{
while(receive c)
{
if (c.isupper)
{
c += bias;
if (c > 'Z') c -= 26;
}
else if (c.islower)
{
c += bias;
if (c > 'z') c -= 26;
}
send c;
}
}
```
Why yes, the code in main() is reminiscent of the Unix command line. That's the intent. And on a uniprocessor machine, I suppose the code generated would be such that control would pass from one routine to another via some form of interprocedural goto (but hidden, much like the gotos are hidden in a while loop). And on a multiprocessor (or multicore) machine, you could create cheap threads and pipe the data between each coroutine.
And that's what my first prototype version did. This prototype is about three times longer, and uses pipes to transfer the data among the five threads that were created.
>
```
/*----------------------------------------------
; error checking removed to keep this example
; short. It's already getting a bit silly
; in length, don't you think?
;---------------------------------------------*/
rc = pipe(p1);
rc = pipe(p2);
rc = pipe(p3);
rc = pipe(p4);
/*---------------------------------------------------
; the following is so I can avoid doing single byte
; calls to read() and write(), which will definitely
; destroy any performance gains we might get on a
; multiprocessor system. By wrapping these pipes in
; FILE *s, I automagically get buffering.
;
; Also, these structures contain the local data for
; each of the "coroutines" in this program,
; since I can only pass a single pointer to each.
;---------------------------------------------------*/
t1.fpin = stdin;
t1.fpout = fdopen(p1[1],"w");
t2.fpin = fdopen(p1[0],"r");
t2.fpout = fdopen(p2[1],"w");
t3.fpin = fdopen(p2[0],"r");
t3.fpout = fdopen(p3[1],"w");
t4.fpin = fdopen(p3[0],"r");
t4.fpout = fdopen(p4[1],"w");
t5.fpin = fdopen(p4[0],"r");
t5.fpout = stdout;
rc = clone(conversion,&m_stack1[1022],FLAGS,&t1);
rc = clone(strip_html,&m_stack2[1022],FLAGS,&t2);
rc = clone(rot, &m_stack3[1022],FLAGS,&t3);
rc = clone(conversion,&m_stack4[1022],FLAGS,&t4);
rc = clone(rot, &m_stack5[1022],FLAGS,&t5);
while(1)
{
child = waitpid(0,&status,0);
if (child == -1)
{
if (errno == ECHILD) break;
perror("waitpid()");
}
}
```
Not only that, but there're plenty of places where things can break, although in my limited testing, things worked with very minimal error checking. I didn't bother to do any testing, this was more or less a “proof-of-concept” type thing.
I then thought of doing a second implemention, this time doing away with the pipes and seeing if I could somehow pass data directly between the “coroutines.” It required a bit of thought, and the solution I cooked up almost works (and I'll get to that in a bit).
The second version looks a bit like the first, only a bit less overhead since I don't have to create all the pipes:
>
```
/*-------------------------------------
; two extra threads are required for
; this version. One to feed the file
; into the chain of coroutines, and one
; to accept the final data for the
; output file.
;--------------------------------------*/
t0.next = (struct parms_common *)&t1;
t1.next = (struct parms_common *)&t2;
t2.next = (struct parms_common *)&t3;
t3.next = (struct parms_common *)&t4;
t4.next = (struct parms_common *)&t5;
t5.next = (struct parms_common *)&t6;
t6.next = (struct parms_common *)&t0;
t0.pid = clone(INTERNA_input, &m_stack0[1022],FLAGS,&t0);
t1.pid = clone(conversion, &m_stack1[1022],FLAGS,&t1);
t2.pid = clone(strip_html, &m_stack2[1022],FLAGS,&t2);
t3.pid = clone(rot, &m_stack3[1022],FLAGS,&t3);
t4.pid = clone(conversion, &m_stack4[1022],FLAGS,&t4);
t5.pid = clone(rot, &m_stack5[1022],FLAGS,&t5);
t6.pid = clone(INTERNAL_output,&m_stack6[1022],FLAGS,&t6);
```
All the coroutines are linked together, and for the actual implementation, they all pause() (which stops the thread until a signal is received), and upon receiving a signal, process the data given to it (through a fixed location—you'll see in a second) and then signal the “coroutine” next in line before calling pause(). For instance, the conversion “coroutine”:
>
```
int conversion(void *arg)
{
struct parms_conversion *parm = arg;
int c;
while(1)
{
pause();
c = parm->c;
/*--------------------------------------
; convert the data, and dump into the
; next coroutine's data block, then
; signal it that it can process the data.
;----------------------------------------*/
parm->next->c = (*parm->conv)(c);
kill(parm->next->pid,SIGCONT);
/*-----------------------------------
; if we got an end-of-file marker,
; stop this thread.
;----------------------------------*/
if (c == EOF)
break;
}
_exit(EXIT_SUCCESS);
}
```
And it worked.
Mostly.
I was able to track the problem down to the stip_html() “coroutine”:
>
```
int strip_html(void *arg)
{
struct parms_strip_html *parm = arg;
int c;
int in_tag = 0;
while(1)
{
pause();
c = parm->c;
if (c == '<') { in_tag = 1; continue; }
if (c == '>') { in_tag = 0; continue; }
if (!in_tag)
{
parm->next->c = c;
kill(parm->next->pid,SIGCONT);
}
if (c == EOF) break;
}
_exit(EXIT_SUCCESS);
}
```
It doesn't always trigger the next “coroutine” to run, which ultimately causes everything to grind to a halt. While I can hack this to work, it ultimately shows a rather critical failure on this method of implementing coroutines, in that any coroutine that doesn't send all the data it received further down the line will stop the entire chain.
And issues aside, it was about four times the code as the “ideal program.”
If anything, doing this exercise showed me two things. One, it's probably best to use pipes for “coroutines” under Unix. And two, the overhead in using “coroutines” is a bit frightening to do by hand. And it's frightening to think of all the bits that could fail. But is it worth worrying about? In Assembly it's easy to detect if some mathematical operation overflows (in fact, I haven't come across a CPU (Central Processing Unit) that doesn't detect overflow), but it's something I don't really concern myself with in C. Heck, in the first “proof-of-concept” I went with using fgetc() and fputc() which simply return EOF when anything bad happens. Sure, it simplifies the code, but it's hard to determine what went wrong when it goes wrong. Which is why I'm concerned: does abstraction scale [3]?
More thoughts (now surfacing after writing all this): are coroutines really all that useful? I can see using them in mod_blog to simplify some code (mainly, formatting of entries) but in general, not much of the code I've written lends itself really to flow by coroutine. Perhaps it's that I don't have that particular tool in my toolchest and I've learned to work around it.
More pondering is required.
[3] http://www.nooranch.com/synaesmedia/wiki/wiki.cgi?DoesAbstractionScale