Assembly

Some random pointers on assembly. I mostly use NASM, which usually can be found in a ports or package system, as I'm mostly on AMD64 systems that run some sort of unix, OpenBSD in particular. With QEMU or some other emulator (or suitable physical hardware of a different type) one could target something else.

Learning some assembly can be educational, though in 2023 it may not be a practical skill as assembly suffers from a lack of portability (the Linux kernel is only two percent assembly) and generally lacks libraries, so you either need to use something that someone else has written, call over to libc, or to re-implement some amount of libc (or libm or so forth) yourself.

On OpenBSD the nasm, bvi, and rizin packages may be interesting, possibly along with qemu if you want to run qemu-system-i386 or other archs. A hexdumper may also be handy; for that I usually resort to od(1) or hexdump(1) though there are others, or you could write a suitable program in a suitable language that shows you the needful.

how to run a slab of assembly code within a C program

Hello World

This can be tricky to get right, especially on OpenBSD. How to use either linker is shown.

    ; AMD64 OpenBSD 7.4 NASM 2.16.01
    ;   nasm -f elf64 helloworld.asm -o helloworld.o
    ;   ld -nopie -static helloworld.o -o hw.ld
    ;   ld.bfd -m elf_x86_64_obsd -nopie helloworld.o -o hw.bfd

    BITS 64

    %define stdout    1
    %define sys_exit  1
    %define sys_write 4

    section .note.openbsd.ident note
        align 2
        dd 8, 4, 1
        db 'OpenBSD',0
        dd 0
        align 2

    section .data
        msg  db "Hello world!",10
        .len equ $-msg

    section .text
    global  _start
    _start:
        mov rax,sys_write
        mov rdi,stdout
        mov rsi,msg
        mov rdx,msg.len
        syscall

        mov rax,sys_exit
        xor rdi,rdi
        syscall

helloworld.asm

The trailing "note" on the relevant section line sets the correct section type, a recent innovation. This is a 0x7 value, and can be found by finding the section header type word in the suitable section header entry.

    $ nasm -f elf64 helloworld.asm -o helloworld.o
    $ ld -nopie -static helloworld.o -o hello
    $ readelf -S hello | grep NOTE
      [ 1] .note.openbsd.ide NOTE             0000000000200190  00000190
    $ grep SHT_NOTE /usr/include/sys/exec_elf.h
    #define SHT_NOTE                7       /* note section */
    $ readelf -S hello | sed 1q
    There are 8 section headers, starting at offset 0x2d8:
    $ hexdump -s $(( 0x2d8 + 64 )) -n 8 -C hello
    00000318  01 00 00 00 07 00 00 00                           |........|

Disassembly

NASM ships with ndisasm. Be sure that the bits agree with the "BITS" used to compile.

If you build with a NASM format of elf64 then maybe use `objdump -M intel -D` to disassemble the file. There are other syntax; NASM uses the intel form by default, while objdump may not. The ELF format adds a lot of metadata; it may be simpler to use the default bin format if all you are interested in is the raw assembly.

Relative Addresses

The x86 relative CALL (opcode E9) is calculated from (just after) where the call is made to the destination. Here we call "mysub" which will be relative to just after where the CALL opcode appears.

    BITS BW
    myadd:
        add bl,al
        ret
    mysub:
        sub bl,al
        ret
    times PADBY db 0
        mov al,42
        mov bl,12
        call mysub

calc.asm

    #!/bin/sh
    bits=${1:-16}
    width() {
        nasm -dBW="$bits" -dPADBY="$1" calc.asm -o cal"$1".o
        echo "WIDTH $1"
        hexdump -x cal"$1".o
        echo
        ndisasm -b "$bits" cal"$1".o
        echo
    }
    width 4
    width 8

runcalc.sh

    $ sh ./runcalc.sh | grep call
    0000000E  E8F2FF            call 0x3
    00000012  E8EEFF            call 0x3
    $ echo $((0xE + 2 - 0x3))
    13
    $ printf '%x\n' $((0xFFFF - 13))
    fff2
    $ echo $((0x12 + 2 - 0x3))
    17
    $ printf '%x\n' $((0xFFFF - 17))
    ffee

That's plus 2 because the start is from after the CALL, minus the destination. NASM helpfully calculates this for you.

Intel is little-endian so words (16-bit values) may be shown swapped depending on the tool; observe the differences between the hexdump and the disassembly: 0cb3 versus B30C and similar.

Endian

There's at least three endians: little, big, and PDP, though you may not see PDP (or 2143) in the wild.

    $ grep _ENDIAN /usr/include/sys/_endian.h | grep '#def'
    #define _SYS__ENDIAN_H_
    #define __FROM_SYS__ENDIAN
    #define _LITTLE_ENDIAN  1234
    #define _BIG_ENDIAN     4321
    #define _PDP_ENDIAN     3412

GDB has a flag to change the endian on the fly, if addresses are in a wrong endian, and you're in GDB for some reason (set endian big). Commented out lines in ~/.gdbinit might help document handy but seldom used things, e.g. "layout regs" may also be useful but you may not want it on by default.

ntonl(3) details routines that do 16- and 32-bit swaps, or you could write a little script to reverse the numbers. One might better recognize 2130706433 if put into some other form.

    $ perl -e 'print pack "V", 2130706433' | hexdump -C
    00000000  01 00 00 7f                                       |....|
    00000004
    $ perl -e 'print pack "N", 2130706433' | od -t d1
    0000000  127   0   0   1
    0000004
    $ perl -E 'say 127 << 24 | 1'
    2130706433

Yep, IPv4 localhost. Or one of them. They gave the entire 127.0.0.0/8 for loopback traffic. Whoops? Network protocols are typically big endian (especially Internet standards) while Intel and subsequent AMD64 are little endian.

External Links

https://jasper.la/posts/nasm-on-openbsd/

αcτµαlly pδrταblε εxεcµταblε