普通视图

发现新文章,点击刷新页面。
昨天以前首页

Fighting Hyrum's Law in LLVM

作者 MaskRay
2026年5月10日 15:00

With a sufficient number of users of an API, it does not matterwhat you promise in the contract: all observable behaviors of yoursystem will be depended on by somebody. — Hyrum's Law

In a compiler, the most common form of Hyrum's Law is dependence onunspecified behavior — hash bucket order, the order of equalelements after std::sort, padding offsets. The same framingcovers a few cases that are technically undefined behavior (use of aninvalidated iterator) or plain incidental properties (ABI struct layout,ELF section offsets).

When the compiler itself harbors such a dependency, the symptom isusually output that varies build-to-build: an unstable sort that landsdifferently after the standard library changes, a hash map whoseiteration order shifts when the hash function does. Occasionally thevariation is run-to-run within a single build —DenseMap<void *, X> keys with an ASLR-derived seedreorder buckets each invocation. Either way, reproducible builds,bisection, and bug reports all assume same input → same output, and astealth Hyrum dependency breaks that.

This post surveys some mechanisms that perturb the contract's blindspots so dependencies cannot quietly form.

Hash seed perturbation

The first line of defense is the hash function itself.llvm/include/llvm/ADT/Hashing.h:

1
2
3
4
5
6
7
8
inline uint64_t get_execution_seed() {
#if LLVM_ENABLE_ABI_BREAKING_CHECKS
return static_cast<uint64_t>(
reinterpret_cast<uintptr_t>(&install_fatal_error_handler));
#else
return 0xff51afd7ed558ccdULL;
#endif
}

The seed XORed into every llvm::hash_value is theruntime address of install_fatal_error_handler — underASLR, different every process. The header comment is explicit:

the seed is non-deterministic per process (address of a functionin LLVMSupport) to prevent having users depend on the particular hashvalues.

Every hash_combine / hash_integer_valuecall picks up the seed, and every DenseMap<K, V>keyed by a hash_value-using type then reorders its bucketsper run. MD5, BLAKE3, SHA1, SHA256 stay byte-stable — those are theright tools when you actually want a digest.

My commitce80c80dca45 introduced the seed in 2024.

Container iteration order

Code can grow dependencies on the iteration order.LLVM_ENABLE_REVERSE_ITERATION walks hash containersbackwards to flag violations.llvm/include/llvm/Support/ReverseIteration.h:

1
2
3
4
5
6
7
template <class T = void *> constexpr bool shouldReverseIterate() {
#if LLVM_ENABLE_REVERSE_ITERATION
return detail::IsPointerLike<T>::value;
#else
return false;
#endif
}

DenseMap flips its BucketItTy tostd::reverse_iterator<pointer>;SmallPtrSet swaps begin() andend(); StringMap bitwise-NOTs the hash beforebucket selection — the only thing that perturbs StringMap,since its hash bypasses get_execution_seed.

Unlike the hash seed, reverse iteration isn't auto-on withassertions; -DLLVM_REVERSE_ITERATION=ON opts in explicitly.In 2026 has already merged fixes triggered by it: 7f703cabf728(MLIR SSA-value completion order), 0b3afd35c41d(MLIR SROA alloca order), and f5e2c5ddcec7(a clang test).

Iterator invalidation

Orthogonal to iteration order: what happens to an existing iteratorafter a mutation. llvm/include/llvm/ADT/EpochTracker.h:

1
2
3
4
5
6
7
8
9
10
11
12
class DebugEpochBase {
uint64_t Epoch = 0;
public:
void incrementEpoch() { ++Epoch; }
~DebugEpochBase() { incrementEpoch(); } // catches use-after-free

class HandleBase {
bool isHandleInSync() const {
return *EpochAddress == EpochAtCreation;
}
};
};

DenseMap and friends inherit fromDebugEpochBase. Mutations bump the epoch; iterators captureit at construction and assert on mismatch. The destructor bumps too, sostale iterators into destroyed containers assert rather than read freedmemory.

Without it, mutate-during-iteration "happens to work" depending onbucket layout — and bucket layout is what the hash seed and reverseiteration above perturb. The epoch check turns the latent bug into aclean assert regardless of which "lucky" layout the run lands on.Collapses to a no-op under NDEBUG.

Pre-shuffling unstable sorts

The same defensive pattern shows up twice in the monorepo, indifferent sub-projects, years apart.

llvm::sort underEXPENSIVE_CHECKS

llvm/include/llvm/ADT/STLExtras.h:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#ifdef EXPENSIVE_CHECKS
namespace detail {
inline unsigned presortShuffleEntropy() {
static unsigned Result(std::random_device{}());
return Result;
}

template <class IteratorTy>
inline void presortShuffle(IteratorTy Start, IteratorTy End) {
std::mt19937 Generator(presortShuffleEntropy());
llvm::shuffle(Start, End, Generator);
}
} // end namespace detail
#endif

template <typename IteratorTy, typename Compare>
inline void sort(IteratorTy Start, IteratorTy End, Compare Comp) {
#ifdef EXPENSIVE_CHECKS
detail::presortShuffle<IteratorTy>(Start, End);
#endif
std::sort(Start, End, Comp);
}

std::sort and qsort are unstable; codeobserving the order of equal elements is depending on undocumentedbehavior. Pre-shuffling makes that observation different every run. commit5a3d47fabcb6 added the wrapper in 2018, motivated by PR35135.

LLVM also ships its own llvm::shuffle rather thancalling std::shuffle, "so that LLVM behaves the same whenusing different standard libraries." A reproducibility tool whosereproducibility depends on the host stdlib is worse than no tool — andthe linker section below relies on this.

llvm::stable_sort deliberately does not pre-shuffle; itis the explicit opt-in for code that legitimately needs ordering ofequal elements.

libc++_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY

libc++ has a near-perfect parallel mechanism, designed for downstreamusers rather than the project's own internals.libcxx/include/__debug_utils/randomize_range.h:

1
2
3
4
5
6
7
8
9
10
template <class _AlgPolicy, class _Iterator, class _Sentinel>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14
void __debug_randomize_range(_Iterator __first, _Sentinel __last) {
#ifdef _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY
if (!__libcpp_is_constant_evaluated())
std::__shuffle<_AlgPolicy>(__first, __last, __libcpp_debug_randomizer());
#else
(void)__first; (void)__last;
#endif
}

Three callsites:

  • std::sort — pre-shuffles the input.
  • std::partial_sort — pre-shuffles the input andre-shuffles the unsorted tail afterward.
  • std::nth_element — pre-shuffles, then re-shuffles eachside of the partition.

Seed handling rhymes with get_execution_seed: ASLR orstatic std::random_device for per-process variation, with_LIBCPP_RANDOMIZE_UNSPECIFIED_STABILITY_SEED=<n> as afixed-seed escape hatch. Off by default; C++11 and later only.

libcxx/docs/DesignDocs/UnspecifiedBehaviorRandomization.rstexplains the motivation:

Google has measured couple of thousands of tests to be dependenton the stability of sorting and selection algorithms. As we also plan onupdating (or least, providing under flag more) sorting algorithms, thiseffort helps doing it gradually and sustainably.

It cites PR20837 — aworst-case O(n²) std::sort — as the upgradelibc++ specifically wanted to ship. The shuffle is the gating tool: ifdownstream tests pass with it enabled, they will pass after thealgorithm change too.

Comparing the two is more interesting than either alone:

  • llvm::sort's wrapper is internal hygiene: LLVM is itsown primary user, so the shuffle lives in STLExtras.hbehind a build flag with no docs.
  • libc++'s wrapper is user-facing — DesignDocs/ page,public macro, public seed override, explicit "Patches welcome."invitation. It has to be: libc++'s users are not libc++, and thecontract being defended is the C++ standard itself.
  • libc++ generalizes the primitive:__debug_randomize_range applies at three callsites, eachdeclaring which sub-range the algorithm leaves unspecified. LLVM'swrapper only covers the simpler equal-element case.
  • Hashed containers — std::unordered_* iteration order —are unspecified in both, but libc++ does not randomize them.LLVM-the-library does; on this one surface LLVM is ahead of its ownstdlib.
Linkeroutput: --shuffle-sections and--randomize-section-padding

Two ELF-only lld knobs perturb layout details that no contractcovers.

--shuffle-sections=<glob>=<seed>

lld/ELF/Writer.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for (const auto &patAndSeed : ctx.arg.shuffleSections) {
...
const uint32_t seed = patAndSeed.second;
if (seed == UINT32_MAX) {
// If --shuffle-sections <section-glob>=-1, reverse the section order.
// The section order is stable even if the number of sections changes.
// This is useful to catch issues like static initialization order
// fiasco reliably.
std::reverse(matched.begin(), matched.end());
} else {
std::mt19937 g(seed ? seed : std::random_device()());
llvm::shuffle(matched.begin(), matched.end(), g);
}
}

Three regimes in one option:

  • seed = -1 — deterministic reverse, stable even as newsections appear. Glob .init_array* to -1,rebuild, run the test suite: anything that breaks is a realstatic-init-order bug. One flag, no Frankenstein link script.
  • seed > 0 — deterministic random shuffle,reproducible across runs and hosts (because llvm::shuffleis host-independent). Useful in CI without breaking bisection.
  • seed = 0std::random_device()-seeded.Fresh nondeterminism every link.

History: 423cb321dfaeintroduced the =-1 reverse mode; 16c30c3c23efgeneralized to per-glob seeds, which is what makes the.init_array*=-1 recipe possible; c135a68d426ffixed a bug where the feature itself produced an invalid dynamicrelocation order — even Hyrum mitigations have correctness traps.

--randomize-section-padding=<seed>

The sister option perturbs section offsets by insertingpadding between input sections and at segment starts(lld/ELF/Writer.cpp):

1
2
3
4
static void randomizeSectionPadding(Ctx &ctx) {
std::mt19937 g(*ctx.arg.randomizeSectionPadding);
// Insert padding between input sections and at segment starts.
}

Callers grow dependencies on padding-induced offsets the linker neverpromised — profile-guided pipelines, side-channel research, exploittoolchains pinning to specific addresses. A seeded perturbation makesthose dependencies visible.

Both options are ELF-only; MachO and COFF ports have nothingequivalent.

ABI break detection

llvm/include/llvm/Config/abi-breaking.h.cmake:

1
2
3
4
5
6
7
8
9
#if LLVM_ENABLE_ABI_BREAKING_CHECKS
ABI_BREAKING_EXPORT_ABI extern int EnableABIBreakingChecks;
LLVM_HIDDEN_VISIBILITY
__attribute__((weak)) int *VerifyEnableABIBreakingChecks =
&EnableABIBreakingChecks;
#else
ABI_BREAKING_EXPORT_ABI extern int DisableABIBreakingChecks;
...
#endif

Every TU including the header takes a weak reference toEnableABIBreakingChecks orDisableABIBreakingChecks depending on its own build flag.Mixing the two against the same libLLVM produces anunresolved symbol at link time. MSVC gets the same guarantee via#pragma detect_mismatch.

Out-of-tree users routinely compile against headers from one tree andlink against a different libLLVM. Without this gate,whichever struct layout the link happens to pick silently miscompiles;with it, the link fails.

What LLVM is not doing

The mechanisms above all target surfaces no stable consumer shouldcare about: bucket order, equal-element sort order, init-array order.Debuggers, profilers, sanitizers, and reproducible-build infrastructureconsume those outputs and need them stable.

In some cases, stronger guarantee is only provided with explicitoptions. For example, Bitcode and textual IR preserve use-list orderonly under -preserve-bc-uselistorder /-preserve-ll-uselistorder.

A near-cousin: clang's -frandomize-layout-seed /__attribute__((randomize_layout)). Mechanically the same —seeded std::shuffle on struct fields — and it doescoincidentally invalidate offsetof dependencies. But theintent is exploit mitigation, cribbed from GrSecurity's Randstruct GCCplugin: per-build kernel hardening, not a developer tool.

Bit-field layout

作者 MaskRay
2026年2月22日 16:00

The C and C++ standards leave nearly every detail to theimplementation. C23 §6.7.3.2:

An implementation may allocate any addressable storage unit largeenough to hold a bit-field. If enough space remains, a bit-field thatimmediately follows another bit-field in a structure shall be packedinto adjacent bits of the same unit. If insufficient space remains,whether a bit-field that does not fit is put into the next unit oroverlaps adjacent units is implementation-defined. The order ofallocation of bit-fields within a unit (high-order to low-order orlow-order to high-order) is implementation-defined. The alignment of theaddressable storage unit is unspecified

C++ is also terse — [class.bit]p1:

Allocation of bit-fields within a class object isimplementation-defined. Alignment of bit-fields isimplementation-defined. Bit-fields are packed into some addressableallocation unit.

The actual rules come from the platform ABI:

  • Itanium ABI — used on Linux, macOS, BSD, and mostnon-Windows platforms. The Itanium C++ ABI (section2.4) defers bit-field placement to "the base C ABI" but adds its ownconstraints (notably: bit-fields are never placed in the tail padding ofa base class).
  • System V ABI Processor Supplement. The x86-64 psABI says littleabout bit-fields, while the AArch64AAPCS has a more detailed description.
  • Microsoft ABI — used on Windows (MSVC). In GCC andClang, structs with the ms_struct attribute also mimicsthis ABI.

Clang implements both ABIs inclang/lib/AST/RecordLayoutBuilder.cpp. It processesbit-fields in two distinct phases:

  1. Layout (storage units) — assign a bit offset toevery bit-field. This is ABI-specified and determinessizeof and alignof.
  2. Codegen (access units) — choose what LLVM IR loadsand stores to emit. This is a compiler optimization that affectsgenerated code but not the ABI.

Understanding these separately is the key to understandingbit-fields. This article focuses on Itanium (the default on mostplatforms), with a section on how the Microsoft ABI differs.

Phase 1: Storage Units

In clang/lib/AST/RecordLayoutBuilder.cpp,ItaniumRecordLayoutBuilder::LayoutFields lays out fields ofa RecordDecl. For each bit field, it callsLayoutBitField to determine the storage unit and bitoffset.

A storage unit is a region of sizeof(T)bytes, by default aligned to alignof(T). For anint bit-field, that's a 4-byte region at a 4-byte-alignedoffset. The alignment can be reduced by the packedattribute and #pragma pack.

  • StorageUnitSize = sizeof(T) * 8 — the unit's size inbits
  • FieldAlign = alignof(T) in bits — the unit's alignment(before modifiers)
  • FieldOffset — the first bit after the lastbit-field

Itanium's Core Rule

1
2
3
4
if (FieldSize == 0 ||
(AllowPadding &&
(FieldOffset & (FieldAlign-1)) + FieldSize > StorageUnitSize))
FieldOffset = alignTo(FieldOffset, FieldAlign);

Compute where FieldOffset falls within its alignedstorage unit. If the remaining space is less thanFieldSize, round up to the next aligned boundary.Otherwise, pack the bit-field at the current position.

Declared Type Matters

Consider two structs that store the same total number of bits (7 + 7+ 2 = 16) but use different declared types:

1
2
3
4
struct U8  { uint8_t  a:7, b:7, c:2; };   // sizeof = 3
struct U16 { uint16_t a:7, b:7, c:2; }; // sizeof = 2

struct S1 { int a:14; int b:10; int c:30; }; // sizeof = 8

Walk-through for U8 (all fields haveStorageUnitSize = 8, FieldAlign = 8):

  • a at bit 0. Position = 0, 0 + 7 = 7 <= 8. Fits.Offset = 0.
  • b at bit 7. Position = 7, 7 + 7 = 14 > 8. Doesn'tfit. New unit at bit 8. Offset = 8.
  • c at bit 15. Position = 15 - 8 = 7, 7 + 2 = 9 > 8.Doesn't fit. New unit at bit 16. Offset = 16.

Three 1-byte storage units. sizeof(U8) = 3. Eightpadding bits wasted.

Walk-through for U16 (all fields haveStorageUnitSize = 16, FieldAlign = 16):

  • a at bit 0. Position = 0, 0 + 7 = 7 <= 16. Fits.Offset = 0.
  • b at bit 7. Position = 7, 7 + 7 = 14 <= 16. Fits.Offset = 7.
  • c at bit 14. Position = 14, 14 + 2 = 16 <= 16. Fits.Offset = 14.

One 2-byte storage unit. sizeof(U16) = 2. No waste.

Walk-through for S1 (all fields haveStorageUnitSize = 32, FieldAlign = 32):

  • a at bit 0. Position = 0, 14 fits in 32. Offset= 0.
  • b at bit 14. Position = 14, 14 + 10 = 24 <= 32.Fits. Offset = 14. Bits 24–31 are padding (unfilledtail of the first storage unit).
  • c at bit 24. Position = 24, 24 + 30 = 54 > 32.Doesn't fit. New unit at bit 32. Offset = 32. Bits62–63 are padding (unfilled tail of the second storage unit).

sizeof(S1) = 8, alignof(S1) = 4.

Note: Phase 1 uses two int storage units, but Phase 2 isfree to merge a, b, and c into asingle i64 access unit (since there are no non-bit-fieldbarriers and 8 bytes fits in a register). On x86_64, the LLVM type endsup as { i64 }.

Mixed Types

When bit-fields have different declared types, the storage unit sizechanges:

1
struct S2 { int a:24; short b:8; };   // sizeof = 4
  • a is int (StorageUnitSize = 32). Placed atbit 0.
  • b is short (StorageUnitSize = 16,FieldAlign = 16). Current offset = 24. Position within a 16-bit alignedunit: 24 % 16 = 8. 8 + 8 = 16 <= 16. Fits. Offset =24.

sizeof(S2) = 4. The short bit-fieldoverlaps into the int's storage unit. Under Itanium,storage units of different types can share bytes.

The short can also reuse space left by a smallerbit-field:

1
struct S2b { int a:16; short b:8; };   // sizeof = 4
  • a is int (StorageUnitSize = 32). Placed atbit 0.
  • b is short (StorageUnitSize = 16,FieldAlign = 16). Current offset = 16. Position within a 16-bit alignedunit: 16 % 16 = 0. 0 + 8 = 8 <= 16. Fits. Offset =16.

Here b's 16-bit storage unit (bits 16–31) falls entirelywithin a's 32-bit storage unit.

Under Microsoft ABI, sizeof is 8: the type size changefrom int to short forces a new storageunit.

This overlapping extends to non-bit-field members too. Anon-bit-field can be allocated within the unfilled bytes of a precedingbit-field's storage unit:

1
struct S2c { uint16_t first:8; uint8_t second; };   // sizeof = 2
  • first is uint16_t:8. Placed at bit 0. Uses8 bits of a 16-bit storage unit (bytes 0–1).
  • second is a non-bit-field uint8_t. Thebit-field state resets, but DataSize is only 1 byte. second(alignment 1) goes at byte 1 (bit 8) — insidefirst's storage unit.

Note that this overlapping means a write to first viaits access unit could touch byte 1 where second lives.Phase 2 must ensure the access units don't clobber each other (see Hard constraints).

Under Microsoft ABI, sizeof is 4: firstgets a full uint16_t unit (2 bytes), andsecond starts at byte 2 instead of byte 1.

Non-bit-field AfterBit-field

When a non-bit-field field cannot fit within the remaining bytes, itresets the bit-field state and unfilled bits become padding:

1
struct S3 { int a:10; int b:6; char c; int d:6; };   // sizeof = 4
  • a at bit 0, b at bit 10 — both fit in thefirst int storage unit. a + b occupy 16 bits =2 bytes, leaving 16 bits unused in the 32-bit storage unit.
  • c is not a bit-field. It resetsUnfilledBitsInLastUnit to 0. c (achar, alignment 1) goes at byte 2 (bit16). A subsequent bit-field could have used bits 16–31, but thenon-bit-field c claims byte 2.
  • d is a new int bit-field. Current bitoffset = 24 (byte 3). Position = 24 % 32 = 24. 24 + 6 = 30 <= 32.Fits. Offset = 24.

sizeof(S3) = 4.

Under Microsoft ABI, sizeof is 12:a+b get a full int unit (4bytes), c starts at byte 4, and d gets a newint unit at byte 8.

Bit-field AfterNon-bit-field

The overlap works in the other direction too. When a bit-fieldfollows a non-bit-field, its storage unit can encompass the precedingbytes:

1
struct NB { char a; int b:4; };   // sizeof = 4
  • a is a char at byte 0. DataSize = 1byte.
  • b is int:4. FieldOffset = 8, FieldAlign =32, StorageUnitSize = 32. Position: 8 & 31 = 8.8 + 4 = 12 ≤ 32. Fits. Offset = 8.

b's 4-byte int storage unit (bytes 0–3)encompasses a at byte 0. No padding is inserted — the corerule only cares whether the field fits within an aligned unit, notwhether that unit overlaps earlier non-bit-field storage.

Under Microsoft ABI, sizeof is 8: b'sint unit starts at byte 4, after a is paddedto int alignment.

Attributes and Pragmas

Several attributes and pragmas alter the placement rules. They allwork by changing FieldAlign.

packed — setsFieldAlign = 1 (bit-granular packing). Bitfields pack atthe next available bit with no alignment constraint.

1
2
struct [[gnu::packed]] P { int x:4, y:30, z:30; };
// 4 + 30 + 30 = 64 bits = 8 bytes. sizeof = 8.

Under Microsoft ABI, sizeof is 12: each bit-field mustfit within a single int unit, so x,y, and z each get their own 4-byte unit.

packed can also be applied to individual fields:

1
2
struct P2 { short a:8; [[gnu::packed]] int b:30; };   // sizeof = 6, b at bit 8
// Without packed on b: b at bit 32, sizeof = 8

Without packed, b's FieldAlign is 32, so it doesn't fitin a's short storage unit and starts a newint unit at bit 32. With packed, b'sFieldAlign drops to 1, so it packs immediately after a atbit 8.

#pragma pack(N) — capsFieldAlign at N * 8 bits and suppresses thepadding-insertion test (AllowPadding = false, so theoverflow check is skipped — the field is placed at the current offsetwithout rounding up).

1
2
3
#pragma pack(1)
struct PP { char a; int b:4; int c:28; char s; }; // sizeof = 6
#pragma pack()

b packs at bit 8 by the normal core rule —(8 & 31) + 4 = 12 ≤ 32, so it fits. Without#pragma pack, c:28 at bit 12 would fail thesame check — 12 + 28 = 40 > 32 — and round up to bit 32.With #pragma pack(1), AllowPadding is false,so the overflow check is skipped and c stays at bit 12.Total: a(8) + b+c(32) +s(8) = 48 bits = 6 bytes.

aligned(N) — forces minimum alignment.Overrides packed, but is itself overridden by#pragma pack.

1
2
struct A { char a; [[gnu::aligned(16)]] int b:1; char c; };
// b aligned to 16 bytes = bit 128. c at byte 17. sizeof = 32, alignof = 16.

Precedence (for non-zero-width bit-fields):#pragma pack > aligned attr >packed attr > natural alignment.

Zero-width Bitfields

T : 0 rounds up to alignof(T), acting as aseparator. Subsequent fields start in a new storage unit.

1
2
3
struct Z { char x; int : 0; char y; };
// x86: y at offset 4, sizeof = 5, alignof = 1
// ARM/AArch64: y at offset 4, sizeof = 8, alignof = 4

On most targets, anonymous bit-fields don't contribute to structalignment. But on AArch32/AArch64 (withuseZeroLengthBitfieldAlignment()), zero-width bit-fieldsdo raise the struct's alignment.

Zero-width bit-fields are exempt from both packed and#pragma pack — they always round up toalignof(T).

Microsoft ABI Differences

Clang uses the Microsoft layout rules in two situations: targeting aWindows triple (e.g. x86_64-windows-msvc), which usesMicrosoftRecordLayoutBuilder; or applying__attribute__((ms_struct)) to individual structs on anytarget, which activates the IsMsStruct path insideItaniumRecordLayoutBuilder. GCC documents the rules underTARGET_MS_BITFIELD_LAYOUT_P.

The Microsoft ABI uses a fundamentally different layout strategy.While Itanium packs bit-fields into overlapping storage units ofpotentially different types, Microsoft allocates acomplete storage unit of the declared type, thenparcels bits among successive bit-fields of the same typesize.

The key differences:

Type size changes force a new storage unit. In theGCC documentation's wording: "a bit-field won't share the same storageunit with the previous bit-field if their underlying types havedifferent sizes, and the bit-field will be aligned to the highestalignment of the underlying types of itself and of the previousbit-field." Itanium would let them overlap.

1
2
struct Itn { int a:24; short b:8; };                             // sizeof = 4
struct __attribute__((ms_struct)) MS { int a:24; short b:8; }; // sizeof = 8

Under Itanium, b's short storage unitoverlaps into a's int unit — everything fitsin 4 bytes. Under Microsoft, the type size changes from 4 to 2, sob gets its own storage unit. The int unit (4bytes) plus the short unit (2 bytes, padded to 4 foralignment) gives 8 bytes. Note that the rule is about typesize, not type identity — int a:24; unsigned b:8share a unit because both types are 4 bytes.

Each unit is discrete — this is a direct consequence of the type sizerule.

Zero-width bit-fields are ignored unless they follow anon-zero-width bit-field.(MicrosoftRecordLayoutBuilder::layoutZeroWidthBitField.)GCC's documentation: "zero-sized bit-fields are disregarded unless theyfollow another nonzero-size bit-field." When honored, they terminate thecurrent run and affect the struct's alignment.

1
2
3
4
5
6
// MS mode:
struct MS_ZW1 { long : 0; char bar; }; // sizeof = 1 (no preceding bit-field)
struct MS_ZW2 { char foo; int : 0; char bar; }; // sizeof = 2 (preceding non-bit-field doesn't count)
struct MS_ZW3 { int : 0; long : 0; char bar; }; // sizeof = 1 (zero-width doesn't count either)
struct MS_ZW4 { char foo : 4; int : 0; char bar; }; // sizeof = 8 (non-zero-width bit-field — honored)
struct MS_ZW5 { long : 0; char foo : 4; int : 0; char bar; }; // sizeof = 8 (first ignored, second honored)

Alignment = type size. The alignment of afundamental type always equals its size —alignof(long long) == 8 even on targets where the naturalalignment is 4 (like Darwin PPC32).

Unions. ms_struct ignores all alignment attributesin unions. All bit-fields use alignment 1 and start at offset 0.

Phase 2: Access Units

LLVM IR has no bit-field concept. To access a bit-field, theClang-generated IR must:

  1. Load an integer from memory (the access unit)
  2. Mask and shift to extract or insert the bit-field's bits
  3. Store the integer back

The access unit is the LLVM type that gets loaded and stored.Choosing it well matters:

  • Too narrow means multiple memory operations for adjacent bit-fieldwrites;
  • Too wide means touching memory unnecessarily or clobbering adjacentdata.

Implementation: CGRecordLowering::accumulateBitFields(clang/lib/CodeGen/CGRecordLayoutBuilder.cpp).

Itanium: Merging Algorithm

Hard constraints — an access unit must never:

  1. Overlap non-bit-field storage. The C memory modelallows non-bit-field members to be accessed from other threads. Aload/store of the access unit must not touch bytes belonging to othermembers.
  2. Cross a zero-width bit-field at a byte boundary.Zero-width bit-fields define memory location boundaries — they arebarriers.
  3. Extend into reusable tail padding. In C++, aderived class may place fields in a non-POD base class's tail padding.The access unit must not overwrite those bytes.

Soft goals — subject to the hard constraints, accessunits should be:

  • Power-of-2 sized (1, 2, 4, 8 bytes). Non-power-of-2sizes (e.g., 3 bytes) get lowered as multiple smaller loads plus bitmanipulation.
  • No wider than a register. Avoids multi-registerloads.
  • Naturally aligned (on strict-alignment targets).Avoids the compiler synthesizing unaligned access sequences.
  • As wide as possible within the above. Fewer, wideraccesses let LLVM combine adjacent bit-field writes into oneread-modify-write.

The algorithm: spans then merging.

Step 1 — Spans. Bitfields that share a byte are inseparable.They form a minimal "span" that must be in the same access unit. A spanis a maximal run of bit-fields where each successive one startsmid-byte.

Spans break at byte-aligned boundaries and at zero-width bit-fieldbarriers. A field mid-byte is unconditionally part of the current span —step 2 never sees it as a merge point.

Step 2 — Merge. Starting from each span, try to widen theaccess unit by incorporating the next span. Accept the merge if thecombined unit:

  • Fits in one register (<= RegSize)
  • Is power-of-2 and naturally aligned (on strict-alignmenttargets)
  • Doesn't cross a barrier (zero-width bit-field or non-bit-fieldstorage)
  • The natural iN type fits before the limit offset

Track the best candidate and install it when merging can't improvefurther.

Access unit representation.

Clang represents each access unit as either an integer typeiN or an array type [N x i8] (seeCGRecordLowering::accumulateBitFields). iN ispreferred — it generates a single load/store instruction. But LLVM'siN types have allocation sizes rounded up to powers of 2(DataLayout.getTypeAllocSize). For example,i24 has allocation size 4 bytes.

If that rounded-up size would extend past the next field or pastreusable tail padding, the access unit is clipped to[N x i8], which has an exact byte count. Clang assumesclipped for each new span (BestClipped = true) and sets itto false only when the natural iN fits within the availablespace (BeginOffset + TypeSize <= LimitOffset).

1
2
3
4
5
6
// Tail padding reuse (C++)
struct A { int x:24; ~A(); }; // non-POD: DataSize=3, Size=4
struct B : A { char c; }; // c at offset 3, in A's tail padding

// i24 allocates 4 bytes, but byte 3 belongs to B::c.
// Access unit for x is clipped to [3 x i8].

Strict vs cheap unaligned. On targets with cheapunaligned access (x86, AArch64 without +strict-align),alignment checks are skipped — spans merge freely up to register width.On strict-alignment targets (e.g. -mstrict-align), a mergeis rejected if the combined access unit would not be naturally alignedat its offset within the struct.

1
2
3
4
5
6
7
struct Align { char x; short a:12; short b:4; char c:8; }; // sizeof = 6

// AArch64 -mno-strict-align: %struct.S = type <{ i8, i8, i32 }>
// → a+b+c merged into one i32 at offset 2 (unaligned, but cheap)
// AArch64 -mstrict-align: %struct.S = type { i8, i16, i8 }
// → a+b merged
// → +c rejected; a+b stay as i16, c gets its own i8

-ffine-grained-bit-field-accesses. ThisClang flag disables merging entirely. Each span becomes its own accessunit — no adjacent spans are combined. For example:

1
2
3
struct S4 { unsigned long f1:28, f2:4, f3:12; };
// Default: %struct.S4 = type { i64 } — spans merged into one access unit
// Fine-grained: %struct.S4 = type { i32, i16 } — each span kept separate

The flag is incompatiblewith sanitizers and is automatically disabled (with a warning) whenany sanitizer is active.

Returning to S3:

1
struct S3 { int a:10; int b:6; char c; int d:6; };

Phase 1 assigned: a@0, b@10, c@16 (byte 2), d@24 (byte 3).

Phase 2 sees two bit-field runs (separated by non-bit-fieldc):

Run 1: a and b (bits 0–15, bytes0–1). They share byte 1 (bits 8–15), so they form one span. The spancovers 2 bytes. The natural type i16 fits exactly — noclipping needed. Access unit: i16.

Run 2: d (bits 24–29, byte 3). Single span, 6bits in 1 byte. Access unit: i8.

The resulting LLVM struct type:

1
2
%struct.S3 = type { i16, i8, i8 }
a,b c d

To read a, codegen loads the i16, extractsbits 0–9. To read b, it loads the same i16,extracts bits 10–15. Neither load touches c.

When clipping is needed. Widen the bit-fields soa + b no longer fits in 2 bytes:

1
struct S3w { int a:14; int b:10; char c; int d:6; };

Phase 1 assigned: a@0, b@14, c@24 (byte 3), d@32 (byte 4).sizeof(S3w) = 8.

Run 1: a and b (bits 0–23, bytes0–2). The span covers 3 bytes. The natural type i24 hasallocation size 4 bytes — but byte 3 belongs to c. Theaccess unit is clipped to [3 x i8].

Run 2: d (bits 32–37, byte 4). Access unit:i8.

1
2
%struct.S3w = type { [3 x i8], i8, i8, [3 x i8] }
a,b c d padding

Endianness.

Access unit selection is endianness-agnostic — spans, merging, andclipping all work in byte offsets from the start of the struct.Endianness matters only when codegen emits the shift/mask sequence toextract or insert a bitfield within its access unit.

LLVM loads an access unit as a single integer. On little-endian, bit0 of the integer corresponds to the lowest-addressed byte's LSB —bitfield offsets from Phase 1 can be used directly as shift amounts. Onbig-endian, bit 0 of the integer corresponds to the highest-addressedbyte's MSB, so the bit numbering within the loaded integer isreversed.

Clang handles this in setBitFieldInfo(CGRecordLayoutBuilder.cpp):

1
2
3
4
Info.Offset = (unsigned)(getFieldBitOffset(FD) - Context.toBits(StartOffset));
// ...
if (DataLayout.isBigEndian())
Info.Offset = Info.StorageSize - (Info.Offset + Info.Size);

The little-endian offset counts up from the LSB; the big-endianoffset is mirrored to count down from the MSB.EmitLoadOfBitfieldLValue (CGExpr.cpp) thenuses Info.Offset uniformly — it right-shifts byOffset and masks to Size bits, which works forboth endiannesses because the flip was already baked intoOffset.

Microsoft: Discrete AccessUnits

Microsoft ABI's codegen is simple: each bit-field gets an access unitof its declared type. Adjacent bit-fields of the same type size shareone access unit. Zero-width bit-fields and type-size changes break runs.There is no complex merging — the Phase 1 storage units are theaccess units.

Contrast S3 under both ABIs:

1
struct S3 { int a:10; int b:6; char c; int d:6; };
1
2
Itanium:   %struct.S3  = type { i16, i8, i8 }        // a,b merged into i16, d is i8
Microsoft: %struct.MS3 = type { i32, i8, i32 } // a,b share i32 unit, d gets own i32

Itanium's Phase 2 merges a and b into thetightest access unit that covers both (i16), and clips orshrinks to avoid touching c. Microsoft uses the fulldeclared type (int = i32) for each storageunit — no merging, no clipping.

Similarly for mixed types:

1
struct S2 { int a:24; short b:8; };
1
2
Itanium:   %struct.S2  = type { i32 }                 // a and b merged into one i32
Microsoft: %struct.MS2 = type { i32, i16 } // separate units: i32 for a, i16 for b

Itanium merges a and b into a singlei32 since they share the same 4 bytes. Microsoft gives eachits own access unit matching the declared type.

Conclusion

Phase 1 decides where bits go — it's specified by the ABIand determines sizeof and alignof. Phase 2decides how to access them — it's a compiler optimization thataffects codegen but not the binary layout. They answer differentquestions and often produce different-sized units. The storage unit fora bit-field is determined by its declared type; the access unit isdetermined by what's safe and efficient to load.

❌
❌