💾 Archived View for uscoffings.net › tech › programming › memory-barriers.gmi captured on 2024-06-19 at 22:51:25. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-06-03)
-=-=-=-=-=-=-
[date: 2013-10-04]
I can't seem to keep the details of memory ordering and memory barriers fresh in my mind, so this time I am taking notes.
Memory Ordering in Modern Microprocessors by Paul E. McKenney
http://www.mjmwired.net/kernel/Documentation/memory-barriers.txt
Processors used to be slow, in-order, and single-core. Now they are fast, out-of-order, and multi-core.
It's the out-of-order memory accesses, combined with multi-core, that really cause the problems. Where do these out-of-order memory accesses come from?
compiler memory barrier (nothing emitted into code) vs hardware memory barrier
Of course in practice, macros which emit hardware memory barriers also will emit a compiler memory barrier.
read/write/...
release barrier: All previous memory stores are globally visible, and all previous memory loads have been satisifed, but following memory reads are not prevented from being speculated to before the barrier.
acquire barrier: References after the builtin cannot move to (or be speculated to) before the builtin, but previous memory stores may not be globally visible yet, and previous memory loads may not yet be satisfied.
Locking primitives typically act as some sort of memory barrier.
The Linux kernel clearly documents that lock/unlock critical sections are not fully fenced. Loosely speaking, memory accesses outside can "leak" into the critical section, but accesses inside cannot leak out. See [3].
I used to think that using pthreads in userspace was simple and safe. My understanding (assumption?) was that each pthread mutex call was, effectively, a full memory barrier. That is, memory accesses before the lock were completed strictly before; memory accesses between the lock and unlock were completed between, and memory accesses after the unlock remained after.
It's not clear that's a safe assumption. Quoting [2]:
It is clearly not generally acceptable to move memory operations out of critical sections, i.e. regions in which a lock is held; doing so would introduce a data race. But it is far less clear whether it is acceptable to move memory operations into critical sections, which is the question we pursue here.
...
...a memory operation immediately following a pthread_mutex_unlock operation may always be moved to just before the pthread_mutex_unlock. But the corresponding reordering of a pthread_mutex_lock with a preceding memory operation is not generally safe. More general movement into critical sections is possible in the absence of the pthread_mutex_trylock call.
As we point out in section 3, many current implementations either do not follow these rules, or add extra fences. Nor is it completely apparent that they should, since these semantics appear to have been largely accidental.
To me, this sounds like the only safe, portable expectation of pthreads mutexes is to treat them like Linux kernel locks: The enclosed memory references cannot leak out, but other memory refernces can leak in.
In fact, Table 1 of [2] shows this through code inspection.
To state it clearly: *pthreads locks are not full memory barriers*.
struct T { atomic_t refs; }; struct T *alloc_t() { struct T *t = kmalloc(sizeof(struct T)); atomic_set(&t->refs, 1); return t; } static void free_t(struct T *t) { } void get_t(struct T *t) { } void put_t(struct T *t) { }