Tumbling through code, part III

I was going through my logs (I've been vacation [1] for the past two weeks) and I noticed a few crashes of mod_blog [2]. It was easy enough to determine that a call to assert() was the culpret (the clue is highlighted):

CRASH(32421/000): pid=32421 signal='Aborted'
CRASH(32421/001): reason='Unspecified/untranslated error'
CRASH(32421/002): CS=B7EA0073 DS=007B ES=007B FS=0000 GS=0033
CRASH(32421/003): EIP=B7FE87A2 EFL=00000246 ESP=BFF9AE28 EBP=BFF9AE3C ESI=00007EA5 EDI=B7FAFFF4
CRASH(32421/004): EAX=00000000 EBX=00007EA5 ECX=00007EA5 EDX=00000006
CRASH(32421/005): UESP=BFF9AE28 TRAPNO=00000000 ERR=00000000
CRASH(32421/006): STACK DUMP
CRASH(32421/007):        BFF9AE28:  A5 07 EB B7 00 00 00 00 F4 FF FA B7 00 00 00 00
CRASH(32421/008):        BFF9AE38:  C0 86 E8 B7 6C AF F9 BF 09 22 EB B7 06 00 00 00
CRASH(32421/009):        BFF9AE48:  50 AE F9 BF 00 00 00 00 20 00 00 00 00 00 00 00
CRASH(32421/010):        BFF9AE58:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH(32421/011):        BFF9AE68:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH(32421/012):        BFF9AE78:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH(32421/013):        BFF9AE88:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH(32421/014):        BFF9AE98:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH(32421/015):        BFF9AEA8:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH(32421/016):        BFF9AEB8:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH(32421/017):        BFF9AEC8:  00 00 00 00 00 00 00 00 C7 04 FB B7 C8 04 FB B7
CRASH(32421/018):        BFF9AED8:  F4 FF FA B7 C7 04 FB B7 80 04 FB B7 08 AF F9 BF
CRASH(32421/019):        BFF9AEE8:  28 85 CA 08 F4 FF FA B7 9F 70 EE B7 02 00 00 00
CRASH(32421/020):        BFF9AEF8:  C8 78 CA 08 4C 00 00 00 C8 78 CA 08 4C 00 00 00
CRASH(32421/021):        BFF9AF08:  44 AF F9 BF EC 72 EE B7 80 04 FB B7 C8 78 CA 08
CRASH(32421/022):        BFF9AF18:  4C 00 00 00 27 00 00 00 C7 04 FB B7 00 00 00 00
CRASH(32421/023): STACK TRACE
CRASH(32421/024):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi[0x805ccf0]
CRASH(32421/025):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi[0x805d46b]
CRASH(32421/026):        /lib/tls/libc.so.6[0xb7eb0890]
CRASH(32421/027):        /lib/tls/libc.so.6(abort+0xe9)[0xb7eb2209]

CRASH(32421/029):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi(max_monthday+0x5a)[0x80595a2]
CRASH(32421/030):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi(tumbler_new+0xbcb)[0x805aa5a]
CRASH(32421/031):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi[0x8057f19]
CRASH(32421/032):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi(main_cgi_get+0xbf)[0x8057c1a]
CRASH(32421/033):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi(main+0x99)[0x804cb8d]
CRASH(32421/034):        /lib/tls/libc.so.6(__libc_start_main+0xd3)[0xb7e9dde3]
CRASH(32421/035):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi[0x804ca6d]
CRASH(32421/036): COMMAND LINE
CRASH(32421/037):        /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi
CRASH(32421/038): ENVIRONMENT
CRASH(32421/039):        REDIRECT_STATUS=200
CRASH(32421/040):        BLOG_CONFIG=/home/spc/web/sites/boston.conman.org/journal/boston.cnf
CRASH(32421/041):        HTTP_FROM=the.knowledge.ai@gmail.com
CRASH(32421/042):        HTTP_HOST=boston.conman.org
CRASH(32421/043):        HTTP_CONNECTION=Keep-Alive
CRASH(32421/044):        HTTP_USER_AGENT=The Knowledge AI
CRASH(32421/045):        HTTP_ACCEPT_ENCODING=gzip,deflate
CRASH(32421/046):        PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
CRASH(32421/047):        SERVER_SIGNATURE=<address>Apache/2.0.52 (CentOS) Server at boston.conman.org Port 80</address> 
CRASH(32421/048):        SERVER_SOFTWARE=Apache/2.0.52 (CentOS)
CRASH(32421/049):        SERVER_NAME=boston.conman.org
CRASH(32421/050):        SERVER_ADDR=66.252.224.242
CRASH(32421/051):        SERVER_PORT=80
CRASH(32421/052):        REMOTE_ADDR=64.62.252.174
CRASH(32421/053):        DOCUMENT_ROOT=/home/spc/web/sites/boston.conman.org/htdocs
CRASH(32421/054):        SERVER_ADMIN=sean@conman.org
CRASH(32421/055):        SCRIPT_FILENAME=/home/spc/web/sites/boston.conman.org/htdocs/boston.cgi
CRASH(32421/056):        REMOTE_PORT=36622
CRASH(32421/057):        REDIRECT_URL=/2015/04-2015/
CRASH(32421/058):        GATEWAY_INTERFACE=CGI/1.1
CRASH(32421/059):        SERVER_PROTOCOL=HTTP/1.1
CRASH(32421/060):        REQUEST_METHOD=GET
CRASH(32421/061):        QUERY_STRING=
CRASH(32421/062):        REQUEST_URI=/2015/04-2015/
CRASH(32421/063):        SCRIPT_NAME=/boston.cgi
CRASH(32421/064):        PATH_INFO=/2015/04-2015/
CRASH(32421/065):        PATH_TRANSLATED=/home/spc/web/sites/boston.conman.org/htdocs/2015/04-2015/
CRASH(32421/066): DONE

The hard part was trying to figure out which of the three calls [3] to assert() was being triggered. Fortunately, there was enough information logged to reproduce the error (for the record, it was assert(month < 13)). Unfortunately, it has to do with the tumbler parsing code [4].

One of the unique features of mod_blog is the “entry addressing scheme,” where you can address not only a single entry like 2018/10/14.1 [5] but a range of entries like 2000/08/10.2-15.5 [6]. In fact, the same code internally changes a reference like 2018/09 [7] to 2018/09/11.1-09/30.1 [8] (the first and last entry in the given month; it also works for days and years). When I wrote the code, I had in mind a way of it working and the bug here is in my inattention to details in checking what I've received.

The code in question, when it sees a request in the form of “number / number - number” is to assume that the number after the literal “-” is a month and not a year. “The Knowledge AI” program was making a request of 2015/04-2015, and max_monthday() was being given an invalid month, thus the assert(month < 13) being false and triggering a crash. That I can fix.

But I do question the programming of the “The Knowledge AI” crawler. I don't have any links in that form, and I'm not aware of any links on other pages of that form (in fact, that particular feature of entry addressing is not used that often, even by me) so I have to wonder how it got a link like that? Does it try randomly generating links to see what it gets? A bug in their code? It's inexplicable.

[1] /boston/2018/09/27-10/13

[2] https://github.com/spc476/mod_blog

[3] https://github.com/spc476/mod_blog/blob/3aa54424ab488bea5dc217448bfd154f78ae1e8c/src/timeutil.c#L31

[4] /boston/2015/07/20.1

[5] /boston/2018/10/14.1

[6] /boston/2000/08/10.2-15.5

[7] /boston/2018/09

[8] /boston/2018/09/11.1-09/30.1

Gemini Mention this post

Contact the author