💾 Archived View for thrig.me › software › assembly captured on 2023-12-28 at 16:39:49. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
Some random pointers on assembly. I mostly use NASM, which usually can be found in a ports or package system, as I'm mostly on AMD64 systems that run some sort of unix, OpenBSD in particular. With QEMU or some other emulator (or suitable physical hardware of a different type) one could target something else.
Learning some assembly can be educational, though in 2023 it may not be a practical skill as assembly suffers from a lack of portability (the Linux kernel is only two percent assembly) and generally lacks libraries, so you either need to use something that someone else has written, call over to libc, or to re-implement some amount of libc (or libm or so forth) yourself.
On OpenBSD the nasm, bvi, and rizin packages may be interesting, possibly along with qemu if you want to run qemu-system-i386 or other archs. A hexdumper may also be handy; for that I usually resort to od(1) or hexdump(1) though there are others, or you could write a suitable program in a suitable language that shows you the needful.
how to run a slab of assembly code within a C program
This can be tricky to get right, especially on OpenBSD. How to use either linker is shown.
; AMD64 OpenBSD 7.4 NASM 2.16.01 ; nasm -f elf64 helloworld.asm -o helloworld.o ; ld -nopie -static helloworld.o -o hw.ld ; ld.bfd -m elf_x86_64_obsd -nopie helloworld.o -o hw.bfd BITS 64 %define stdout 1 %define sys_exit 1 %define sys_write 4 section .note.openbsd.ident note align 2 dd 8, 4, 1 db 'OpenBSD',0 dd 0 align 2 section .data msg db "Hello world!",10 .len equ $-msg section .text global _start _start: mov rax,sys_write mov rdi,stdout mov rsi,msg mov rdx,msg.len syscall mov rax,sys_exit xor rdi,rdi syscall
The trailing "note" on the relevant section line sets the correct section type, a recent innovation. This is a 0x7 value, and can be found by finding the section header type word in the suitable section header entry.
$ nasm -f elf64 helloworld.asm -o helloworld.o $ ld -nopie -static helloworld.o -o hello $ readelf -S hello | grep NOTE [ 1] .note.openbsd.ide NOTE 0000000000200190 00000190 $ grep SHT_NOTE /usr/include/sys/exec_elf.h #define SHT_NOTE 7 /* note section */ $ readelf -S hello | sed 1q There are 8 section headers, starting at offset 0x2d8: $ hexdump -s $(( 0x2d8 + 64 )) -n 8 -C hello 00000318 01 00 00 00 07 00 00 00 |........|
NASM ships with ndisasm. Be sure that the bits agree with the "BITS" used to compile.
If you build with a NASM format of elf64 then maybe use `objdump -M intel -D` to disassemble the file. There are other syntax; NASM uses the intel form by default, while objdump may not. The ELF format adds a lot of metadata; it may be simpler to use the default bin format if all you are interested in is the raw assembly.
The x86 relative CALL (opcode E9) is calculated from (just after) where the call is made to the destination. Here we call "mysub" which will be relative to just after where the CALL opcode appears.
BITS BW myadd: add bl,al ret mysub: sub bl,al ret times PADBY db 0 mov al,42 mov bl,12 call mysub
#!/bin/sh bits=${1:-16} width() { nasm -dBW="$bits" -dPADBY="$1" calc.asm -o cal"$1".o echo "WIDTH $1" hexdump -x cal"$1".o echo ndisasm -b "$bits" cal"$1".o echo } width 4 width 8
$ sh ./runcalc.sh | grep call 0000000E E8F2FF call 0x3 00000012 E8EEFF call 0x3 $ echo $((0xE + 2 - 0x3)) 13 $ printf '%x\n' $((0xFFFF - 13)) fff2 $ echo $((0x12 + 2 - 0x3)) 17 $ printf '%x\n' $((0xFFFF - 17)) ffee
That's plus 2 because the start is from after the CALL, minus the destination. NASM helpfully calculates this for you.
Intel is little-endian so words (16-bit values) may be shown swapped depending on the tool; observe the differences between the hexdump and the disassembly: 0cb3 versus B30C and similar.
There's at least three endians: little, big, and PDP, though you may not see PDP (or 2143) in the wild.
$ grep _ENDIAN /usr/include/sys/_endian.h | grep '#def' #define _SYS__ENDIAN_H_ #define __FROM_SYS__ENDIAN #define _LITTLE_ENDIAN 1234 #define _BIG_ENDIAN 4321 #define _PDP_ENDIAN 3412
GDB has a flag to change the endian on the fly, if addresses are in a wrong endian, and you're in GDB for some reason (set endian big). Commented out lines in ~/.gdbinit might help document handy but seldom used things, e.g. "layout regs" may also be useful but you may not want it on by default.
ntonl(3) details routines that do 16- and 32-bit swaps, or you could write a little script to reverse the numbers. One might better recognize 2130706433 if put into some other form.
$ perl -e 'print pack "V", 2130706433' | hexdump -C 00000000 01 00 00 7f |....| 00000004 $ perl -e 'print pack "N", 2130706433' | od -t d1 0000000 127 0 0 1 0000004 $ perl -E 'say 127 << 24 | 1' 2130706433
Yep, IPv4 localhost. Or one of them. They gave the entire 127.0.0.0/8 for loopback traffic. Whoops? Network protocols are typically big endian (especially Internet standards) while Intel and subsequent AMD64 are little endian.