普通视图

发现新文章,点击刷新页面。
今天 — 2026年5月11日首页

Fighting Hyrum's Law in LLVM

作者 MaskRay
2026年5月10日 15:00

With a sufficient number of users of an API, it does not matterwhat you promise in the contract: all observable behaviors of yoursystem will be depended on by somebody. — Hyrum's Law

In a compiler, the most common form of Hyrum's Law is dependence onunspecified behavior — hash bucket order, the order of equalelements after std::sort, padding offsets. The same framingcovers a few cases that are technically undefined behavior (use of aninvalidated iterator) or plain incidental properties (ABI struct layout,ELF section offsets).

When the compiler itself harbors such a dependency, the symptom isusually output that varies build-to-build: an unstable sort that landsdifferently after the standard library changes, a hash map whoseiteration order shifts when the hash function does. Occasionally thevariation is run-to-run within a single build —DenseMap<void *, X> keys with an ASLR-derived seedreorder buckets each invocation. Either way, reproducible builds,bisection, and bug reports all assume same input → same output, and astealth Hyrum dependency breaks that.

This post surveys some mechanisms that perturb the contract's blindspots so dependencies cannot quietly form.

Hash seed perturbation

The first line of defense is the hash function itself.llvm/include/llvm/ADT/Hashing.h:

1
2
3
4
5
6
7
8
inline uint64_t get_execution_seed() {
#if LLVM_ENABLE_ABI_BREAKING_CHECKS
return static_cast<uint64_t>(
reinterpret_cast<uintptr_t>(&install_fatal_error_handler));
#else
return 0xff51afd7ed558ccdULL;
#endif
}

The seed XORed into every llvm::hash_value is theruntime address of install_fatal_error_handler — underASLR, different every process. The header comment is explicit:

the seed is non-deterministic per process (address of a functionin LLVMSupport) to prevent having users depend on the particular hashvalues.

Every hash_combine / hash_integer_valuecall picks up the seed, and every DenseMap<K, V>keyed by a hash_value-using type then reorders its bucketsper run. MD5, BLAKE3, SHA1, SHA256 stay byte-stable — those are theright tools when you actually want a digest.

My commitce80c80dca45 introduced the seed in 2024.

Container iteration order

Code can grow dependencies on the iteration order.LLVM_ENABLE_REVERSE_ITERATION walks hash containersbackwards to flag violations.llvm/include/llvm/Support/ReverseIteration.h:

1
2
3
4
5
6
7
template <class T = void *> constexpr bool shouldReverseIterate() {
#if LLVM_ENABLE_REVERSE_ITERATION
return detail::IsPointerLike<T>::value;
#else
return false;
#endif
}

DenseMap flips its BucketItTy tostd::reverse_iterator<pointer>;SmallPtrSet swaps begin() andend(); StringMap bitwise-NOTs the hash beforebucket selection — the only thing that perturbs StringMap,since its hash bypasses get_execution_seed.

Unlike the hash seed, reverse iteration isn't auto-on withassertions; -DLLVM_REVERSE_ITERATION=ON opts in explicitly.In 2026 has already merged fixes triggered by it: 7f703cabf728(MLIR SSA-value completion order), 0b3afd35c41d(MLIR SROA alloca order), and f5e2c5ddcec7(a clang test).

Iterator invalidation

Orthogonal to iteration order: what happens to an existing iteratorafter a mutation. llvm/include/llvm/ADT/EpochTracker.h:

1
2
3
4
5
6
7
8
9
10
11
12
class DebugEpochBase {
uint64_t Epoch = 0;
public:
void incrementEpoch() { ++Epoch; }
~DebugEpochBase() { incrementEpoch(); } // catches use-after-free

class HandleBase {
bool isHandleInSync() const {
return *EpochAddress == EpochAtCreation;
}
};
};

DenseMap and friends inherit fromDebugEpochBase. Mutations bump the epoch; iterators captureit at construction and assert on mismatch. The destructor bumps too, sostale iterators into destroyed containers assert rather than read freedmemory.

Without it, mutate-during-iteration "happens to work" depending onbucket layout — and bucket layout is what the hash seed and reverseiteration above perturb. The epoch check turns the latent bug into aclean assert regardless of which "lucky" layout the run lands on.Collapses to a no-op under NDEBUG.

Pre-shuffling unstable sorts

The same defensive pattern shows up twice in the monorepo, indifferent sub-projects, years apart.

llvm::sort underEXPENSIVE_CHECKS

llvm/include/llvm/ADT/STLExtras.h:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#ifdef EXPENSIVE_CHECKS
namespace detail {
inline unsigned presortShuffleEntropy() {
static unsigned Result(std::random_device{}());
return Result;
}

template <class IteratorTy>
inline void presortShuffle(IteratorTy Start, IteratorTy End) {
std::mt19937 Generator(presortShuffleEntropy());
llvm::shuffle(Start, End, Generator);
}
} // end namespace detail
#endif

template <typename IteratorTy, typename Compare>
inline void sort(IteratorTy Start, IteratorTy End, Compare Comp) {
#ifdef EXPENSIVE_CHECKS
detail::presortShuffle<IteratorTy>(Start, End);
#endif
std::sort(Start, End, Comp);
}

std::sort and qsort are unstable; codeobserving the order of equal elements is depending on undocumentedbehavior. Pre-shuffling makes that observation different every run. commit5a3d47fabcb6 added the wrapper in 2018, motivated by PR35135.

LLVM also ships its own llvm::shuffle rather thancalling std::shuffle, "so that LLVM behaves the same whenusing different standard libraries." A reproducibility tool whosereproducibility depends on the host stdlib is worse than no tool — andthe linker section below relies on this.

llvm::stable_sort deliberately does not pre-shuffle; itis the explicit opt-in for code that legitimately needs ordering ofequal elements.

libc++_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY

libc++ has a near-perfect parallel mechanism, designed for downstreamusers rather than the project's own internals.libcxx/include/__debug_utils/randomize_range.h:

1
2
3
4
5
6
7
8
9
10
template <class _AlgPolicy, class _Iterator, class _Sentinel>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14
void __debug_randomize_range(_Iterator __first, _Sentinel __last) {
#ifdef _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY
if (!__libcpp_is_constant_evaluated())
std::__shuffle<_AlgPolicy>(__first, __last, __libcpp_debug_randomizer());
#else
(void)__first; (void)__last;
#endif
}

Three callsites:

  • std::sort — pre-shuffles the input.
  • std::partial_sort — pre-shuffles the input andre-shuffles the unsorted tail afterward.
  • std::nth_element — pre-shuffles, then re-shuffles eachside of the partition.

Seed handling rhymes with get_execution_seed: ASLR orstatic std::random_device for per-process variation, with_LIBCPP_RANDOMIZE_UNSPECIFIED_STABILITY_SEED=<n> as afixed-seed escape hatch. Off by default; C++11 and later only.

libcxx/docs/DesignDocs/UnspecifiedBehaviorRandomization.rstexplains the motivation:

Google has measured couple of thousands of tests to be dependenton the stability of sorting and selection algorithms. As we also plan onupdating (or least, providing under flag more) sorting algorithms, thiseffort helps doing it gradually and sustainably.

It cites PR20837 — aworst-case O(n²) std::sort — as the upgradelibc++ specifically wanted to ship. The shuffle is the gating tool: ifdownstream tests pass with it enabled, they will pass after thealgorithm change too.

Comparing the two is more interesting than either alone:

  • llvm::sort's wrapper is internal hygiene: LLVM is itsown primary user, so the shuffle lives in STLExtras.hbehind a build flag with no docs.
  • libc++'s wrapper is user-facing — DesignDocs/ page,public macro, public seed override, explicit "Patches welcome."invitation. It has to be: libc++'s users are not libc++, and thecontract being defended is the C++ standard itself.
  • libc++ generalizes the primitive:__debug_randomize_range applies at three callsites, eachdeclaring which sub-range the algorithm leaves unspecified. LLVM'swrapper only covers the simpler equal-element case.
  • Hashed containers — std::unordered_* iteration order —are unspecified in both, but libc++ does not randomize them.LLVM-the-library does; on this one surface LLVM is ahead of its ownstdlib.
Linkeroutput: --shuffle-sections and--randomize-section-padding

Two ELF-only lld knobs perturb layout details that no contractcovers.

--shuffle-sections=<glob>=<seed>

lld/ELF/Writer.cpp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for (const auto &patAndSeed : ctx.arg.shuffleSections) {
...
const uint32_t seed = patAndSeed.second;
if (seed == UINT32_MAX) {
// If --shuffle-sections <section-glob>=-1, reverse the section order.
// The section order is stable even if the number of sections changes.
// This is useful to catch issues like static initialization order
// fiasco reliably.
std::reverse(matched.begin(), matched.end());
} else {
std::mt19937 g(seed ? seed : std::random_device()());
llvm::shuffle(matched.begin(), matched.end(), g);
}
}

Three regimes in one option:

  • seed = -1 — deterministic reverse, stable even as newsections appear. Glob .init_array* to -1,rebuild, run the test suite: anything that breaks is a realstatic-init-order bug. One flag, no Frankenstein link script.
  • seed > 0 — deterministic random shuffle,reproducible across runs and hosts (because llvm::shuffleis host-independent). Useful in CI without breaking bisection.
  • seed = 0std::random_device()-seeded.Fresh nondeterminism every link.

History: 423cb321dfaeintroduced the =-1 reverse mode; 16c30c3c23efgeneralized to per-glob seeds, which is what makes the.init_array*=-1 recipe possible; c135a68d426ffixed a bug where the feature itself produced an invalid dynamicrelocation order — even Hyrum mitigations have correctness traps.

--randomize-section-padding=<seed>

The sister option perturbs section offsets by insertingpadding between input sections and at segment starts(lld/ELF/Writer.cpp):

1
2
3
4
static void randomizeSectionPadding(Ctx &ctx) {
std::mt19937 g(*ctx.arg.randomizeSectionPadding);
// Insert padding between input sections and at segment starts.
}

Callers grow dependencies on padding-induced offsets the linker neverpromised — profile-guided pipelines, side-channel research, exploittoolchains pinning to specific addresses. A seeded perturbation makesthose dependencies visible.

Both options are ELF-only; MachO and COFF ports have nothingequivalent.

ABI break detection

llvm/include/llvm/Config/abi-breaking.h.cmake:

1
2
3
4
5
6
7
8
9
#if LLVM_ENABLE_ABI_BREAKING_CHECKS
ABI_BREAKING_EXPORT_ABI extern int EnableABIBreakingChecks;
LLVM_HIDDEN_VISIBILITY
__attribute__((weak)) int *VerifyEnableABIBreakingChecks =
&EnableABIBreakingChecks;
#else
ABI_BREAKING_EXPORT_ABI extern int DisableABIBreakingChecks;
...
#endif

Every TU including the header takes a weak reference toEnableABIBreakingChecks orDisableABIBreakingChecks depending on its own build flag.Mixing the two against the same libLLVM produces anunresolved symbol at link time. MSVC gets the same guarantee via#pragma detect_mismatch.

Out-of-tree users routinely compile against headers from one tree andlink against a different libLLVM. Without this gate,whichever struct layout the link happens to pick silently miscompiles;with it, the link fails.

What LLVM is not doing

The mechanisms above all target surfaces no stable consumer shouldcare about: bucket order, equal-element sort order, init-array order.Debuggers, profilers, sanitizers, and reproducible-build infrastructureconsume those outputs and need them stable.

In some cases, stronger guarantee is only provided with explicitoptions. For example, Bitcode and textual IR preserve use-list orderonly under -preserve-bc-uselistorder /-preserve-ll-uselistorder.

A near-cousin: clang's -frandomize-layout-seed /__attribute__((randomize_layout)). Mechanically the same —seeded std::shuffle on struct fields — and it doescoincidentally invalidate offsetof dependencies. But theintent is exploit mitigation, cribbed from GrSecurity's Randstruct GCCplugin: per-build kernel hardening, not a developer tool.

❌
❌