I fixed the problems that valgrind was complaining about [1] in the greylist daemon [2] (and I had hoped such problems would fix the problem, but alas, it doesn't seem to be the case, but I'm still testing), but the very nature of one of the complaints is interesting in and of itself.
A design pattern I've long used in code is the following:
>
```
/*--------------------------------------------
; the following is written to clarify
; the issue I'm writing about. This
; is not how I would normally write such
; code. It presents a simplified, yet
; realistic bit of code.
;
; And yes, this is the format I use for block
; comments in C.
;------------------------------------------*/
char buffer[BUFSIZ];
size_t i;
size_t j;
for (i = j = 0 ; i < BUFSIZ ; i++)
{
if (!isctrl(buffer[i]))
buffer[j++] = buffer[i];
}
```
Basically, I'm going through a character array, removing certain elements (in this example, control characters) and doing so in place, to conserve memory consumption. Yes, there are instances where I'm reading and writing the same value to the same location, but in the scheme of things, it's not a bad thing. And, at least for this example, the overhead of trying to avoid unnecessary copying of data overwhelms the amount of time it takes to just do the copy.
So when I was writing the code to clean up the tuple array in the greylist daemon, I naturally wrote it as:
>
```
for (i = j = 0 ; i < g_poolnum ; i++)
{
/* other code */
if (difftime(Tao,g_pool[i].atime) < c_timeout_gray)
{
g_pool[j++] = g_pool[i];
continue;
}
/* rest of loop */
}
```
It follows a successful pattern I've used plenty of times before. I saw nothing necessarily wrong with it, yet valgrind complained bitterly about this fragment of code. And I was a mildly surprised to see a call to memcpy() when I never explicitly called memcpy().
I just got bit with a leaky abstraction [3], and a rather insidious one at that.
Pre-ANSI C, I wouldn't have been able to write that code, since back then, C didn't have the concept of structure assignments, and thus, I would have had to explicitly call memcpy():
>
```
if (difftime(Tao,g_pool[i].atime) < c_timeout_gray)
{
memcpy(&g_pool[j++],&g_pool[i],sizeof(struct tuple));
continue;
}
```
But one of the advantages of working up the abstraction scale (so to speak) is that you can let the compiler take care of the grunt work for you. I mean, what if struct tuple was the size of an int? The overhead of calling memcpy() would swamp the actual act of copying the data. In fact, if struct tuple was a small multiple of an int in size, it might still not be worth the overhead of calling memcpy(). And the compiler is a better place to push such knowledge, since it can keep track of not only structure sizes, but the overhead of calling a function to copy memory and handle things accordingly.
So ANSI C allowed the compiler to handle structure assignment. And it can do a pretty fine job of it too. For instance, using a recent version of gcc [4] with the compiler options -O3 -fomit-frame-pointer (some heavy duty optimization), it compiled the following bit of code:
>
```
struct foo
{
int f;
};
struct foo A[256];
size_t i;
size_t j;
for (i = j = 0 ; i < 256 ; i++)
{
if (A[i].f == 56)
A[j++] = A[i];
}
```
to the following bit of x86 code (and frankly, I was surprised at the quality—and it's been translated from the alien AT&T syntax gcc uses to proper Intel syntax):
>
```
xor edx,edx
xor eax,eax
jmps .L6
.L4: inc eax
cmp eax,255
ja .L12
.L6: cmp [A + eax*4],56
jne .L4
inc eax
mov [A + edx*4],56 ; A[j].f = 56
inc edx ; j++
cmp eax,255
jbe .L6
.L12:
```
It didn't even copy the data since the compiler figured out it didn't need to. Even if we increased the size of the structure a bit:
>
```
struct foo
{
size_t f;
size_t f2;
char f3a;
char f3b;
char f3c;
char f3d;
short f4a;
short f4b;
};
struct foo A[256];
size_t i;
size_t j;
for (i = j = 0 ; i < 256 ; i++)
{
if (A[i].f == 56)
A[j++] = A[i];
}
```
gcc still has yet to call memcpy():
>
```
push ebx
xor edx,edx
xor ebx,ebx
mov ecx,255
jmps .L18
.L16: add edx,16
dec ecx
js .L23
.L18: cmp [A + edx],56
jne .L16
mov [A + ebx],56 ; A[j].f = 56
mov eax,[A + 4 + edx] ; A[j].f2 = A[i].f2
mov [A + 4 + ebx],eax
mov eax,[A + 8 + edx] ; A[j].f3(a-d) = A[i].f3(a-d)
mov [A + 8 + ebx],eax
mov eax,[A + 12 + edx] ; A[j].f4(a,b) = A[i].f4(a,b)
mov [A + 12 + ebx],eax
add edx,16
add ebx,16
dec ecx
jns .L18
.L23: pop ebx
```
It just copies the 16 bytes of the structure as one assignment (because of constant propagation) and three four-byte moves. It's not until the structure gets significantly large:
>
```
struct foo
{
size_t f;
char b1[124];
size_t s2;
char b2[124];
};
struct foo A[256];
size_t i;
size_t j;
for (i = j = 0 ; i < 256 ; i++)
{
if (A[i].f == 56)
A[j++] = A[i];
}
```
that the compiler switches to calling memcpy():
>
```
push edi
push esi
push ebx
xor esi,esi
mov edi,offset A
mov ebx,255
jmps .L40
.L38: add esi,256
dec ebx
js .L45
.L40: cmp [A + esi],56
jne .L38
push ecx ; ???
push 256
lea eax,[A + esi]
mov edx,edi
push eax
push edx
call memcpy
add edi,256
add esp,16
add esi,256
dec ebx
jns .L40
.L45: pop ebx
pop esi
pop edi
```
(The only anomaly in the code is the push ecx. The register isn't initialized, and memcpy() only takes three parameters, not four. My guess is that this “additional parameter” exists to keep the stack aligned along a cache line and it's this type of detail that compilers exist to keep track of. It's also interesting to note that when compiling for an 80486, gcc never bothers to call memcpy() and instead inlines the operation using REP MOVS.)
By now, you're probably asking yourself, “So? Where's the leaky abstraction?”
Well, the leaky abstraction comes in the call to memcpy(), which happened inherently.
## Synopsis
> > > >
> ```
> #include <string.h>
> void *memcpy(void *s1,void *s2,size_t n);
>
> ```
>
## Description
The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.
## Returns
The memcpy function returns the value of s1.
“The C Standard” (emphasis added)
Well, how about that? I'm inadvertantly invoking undefined behavior in the greylist daemon! (actually, I was hoping this was the cause of the problem, but I don't think it is—sigh) Technically speaking, I don't see how there could be a problem when I'm copying a block of memory over itself, except for consuming some extra CPU (Central Processing Unit) time. But I would think that a compiler could see I was modifying an array of items, and either include a check to skip the operation if they overlapped completely, or switch to using memmove() (which allows the objects to overlap).
But such is the nature of working with high abstractions. When they leak, they leak!
I suppose I could have realized I was ultimately calling memcpy(), and that memcpy() has undefined semantics when the source and destination overlap, but I also expected the compiler to inline the code to copy the structure (much like gcc did when compiling on an 80486), not actually call memcpy()!
Sheesh.
[3] http://www.joelonsoftware.com/articles/LeakyAbstractions.html