阅读视图

发现新文章,点击刷新页面。

LLVM integrated assembler: Engineering better fragments

In my previous assembler posts, I've discussed improvements on expressionresolving and relocation generation. Now, let's turn our attentionto recent refinements within section fragments. Understanding how anassembler utilizes these fragments is key to appreciating theimprovements we've made. At a high level, the process unfolds in threemain stages:

  • Parsing phase: The assembler constructs section fragments. Thesefragments represent sequences of regular instructions or data, span-dependentinstructions, alignment directives, and other elements.
  • Section layout phase: Once fragments are built, the assemblerassigns offsets to them and finalizes the span-dependent content.
  • Relocationdecision phase: In the final stage, the assembler evaluates fixupsand, if necessary, updates the content of the fragments.

When the LLVM integrated assembler was introduced in 2009, itssection and fragment design was quite basic. Performance wasn't theconcern at the time. As LLVM evolved, many assembler features added overthe years came to rely heavily on this original design. This created acomplex web that made optimizing the fragment representationincreasingly challenging.

Here's a look at some of the features that added to this complexityover the years:

  • 2010: Mach-O .subsection_via_symbols and atoms
  • 2012: NativeClient's bundle alignment mode. I've created a dedicatedchapter for this.
  • 2015: Hexagon instruction bundle
  • 2016: CodeView variable definition ranges
  • 2018: RISC-V linker relaxation
  • 2020: x86 -mbranches-within-32B-boundaries
  • 2023: LoongArch linker relaxation. This is largely identical toRISC-V linker relaxation. Any refactoring or improvements to the RISC-Vlinker relaxation often necessitate corresponding changes to theLoongArch implementation.
  • 2023: z/OS GOFF(Generalized Object File Format)

I've included the start year for each feature to indicate when it wasinitially introduced, to the best of my knowledge. This doesn't implythat maintenance stopped after that year. On the contrary, many of thesefeatures, like RISC-V linker relaxation, require ongoing, activemaintenance.

Despite the intricate history, I've managed to untangle thesedependencies and implement the necessary fixes. And that, in a nutshell,is what this blog post is all about!

Reducing sizeof(MCFragment)

A significant aspect of optimizing fragment management involveddirectly reducing the memory footprint of the MCFragment object itself.Several targeted changes contributed to makingsizeof(MCFragment) smaller, as mentioned by my previousblog post: Integratedassembler improvements in LLVM 19.

The fragment management system has also been streamlined bytransitioning from a doubly-linked list (llvm::iplist) to asingly-linked list, eliminating unnecessary overhead. A few prerequisitecommits removed backward iterator requirements. It's worth noting thatthe complexities introduced by features like NaCl's bundle alignmentmode, x86's -mbranches-within-32B-boundaries option, andHexagon's instruction bundles presented challenges.

The quest fortrivially destructible fragments

Historically, MCFragment subclasses, specificallyMCDataFragment and MCRelaxableFragment, reliedon SmallVector member variables to store their content andfixups. This approach, while functional, presented two keyinefficiencies:

  • Inefficient storage of small objects: The content and fixups forindividual fragments are typically very small. Storing a multitude ofthese tiny objects individually within SmallVectors led toless-than-optimal memory utilization.
  • Non-trivial destructors: When deallocating sections, the~MCSection destructor had to meticulously traverse thefragment list and explicitly destroy each fragment.

In 2024, @aengelke initiated a draft to storefragment content out-of-line. Building upon that foundation, I'veextended this approach to also store fixups out-of-line, and ensuredcompatibility with the aforementioned features that cause complexity(especially RISC-V and LoongArch linker relaxation.)

Furthermore, MCRelaxableFragment previously containedMCInst Inst;, which also necessitated a non-trivialdestructor. To address this, I've redesigned its data structure.operands are now stored within the parent MCSection, and theMCRelaxableFragment itself only holds references:

1
2
3
4
uint32_t Opcode = 0;
uint32_t Flags = 0; // x86-only for the EVEX prefix
uint32_t OperandStart = 0;
uint32_t OperandSize = 0;

Unfortunately, we still need to encode MCInst::Flags tosupport the x86 EVEX prefix, e.g., {evex} xorw $foo, %ax.My hope is that the x86 maintainers might refactorX86MCCodeEmitter::encodeInstruction to make this flagstorage unnecessary.

The new design of MCFragment and MCSectionis as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class MCFragment {
...
// Track content and fixups for the fixed-size part as fragments are
// appended to the section. The content remains immutable, except when
// modified by applyFixup.
uint32_t ContentStart = 0;
uint32_t ContentEnd = 0;
uint32_t FixupStart = 0;
uint32_t FixupEnd = 0;

// Track content and fixups for the optional variable-size tail part,
// typically modified during relaxation.
uint32_t VarContentStart = 0;
uint32_t VarContentEnd = 0;
uint32_t VarFixupStart = 0;
uint32_t VarFixupEnd = 0;
};

class MCSection {
...
// Content and fixup storage for fragments
SmallVector<char, 0> ContentStorage;
SmallVector<MCFixup, 0> FixupStorage;
SmallVector<MCOperand, 0> MCOperandStorage;
};

(As a side note, the LLVMCamelCase variables are odd. As the MC maintainer, I'dbe delighted to see them refactored to camelBack orsnake_case if people agree on the direction.)

Key changes:

Fewerfragments: fixed-size part and variable tail

Prior to LLVM 21.1, the assembler, operated with a fragment designdating back to 2009, placed every span-dependent instruction into itsown distinct fragment. The x86 code sequencepush rax; jmp foo; nop; jmp foo would be represented withnumerous fragments:MCDataFragment(nop); MCRelaxableFragment(jmp foo); MCDataFragment(nop); MCRelaxableFragment(jmp foo).

A more efficient approach emerged: storing both a fixed-sizepart and an optional variable-size tail within a singlefragment.

  • The fixed-size part maintains a consistent size throughout theassembly process.
  • The variable-size tail, if present, encodes elements that can changein size or content, such as a span-dependent instruction, an alignmentdirective, a fill directive, or other similar span-dependentconstructs.

The new design led to significantly fewer fragments:

1
2
MCFragment(fixed: push rax, variable: jmp foo)
MCFragment(fixed: nop, variable: jmp foo)

Key changes:

Reducing instructionencoding overhead

Encoding individual instructions is the most performance-criticaloperation within MCObjectStreamer. Recognizing this,significant effort has been dedicated to reducing this overhead sinceMay 2023.

It's worth mentioning that x86's instruction padding features,introduced in 2020, have imposed considerable overhead. Specifically,these features are:

My recent optimization efforts demanded careful attention to theseparticularly complex and performance-sensitive code.

Eager fragment creation

Encoding an instruction is a far more frequent operation thanappending a variable-size tail to the current fragment. In the previousdesign, the instruction encoder was burdened with an extra check: it hadto determine if the current fragment already had a variable-sizetail.

1
2
3
4
5
6
7
8
9
10
encodeInstruction:
if (current fragment has a variable-size tail)
start a new fragment
append data to the current fragment

emitValueToAlignment:
Encode the alignment in the variable-size tail of the current fragment

emitDwarfLocDirective:
Encode the .loc in the variable-size tail of the current fragment

Our new strategy optimizes this by maintaining a current fragmentthat is guaranteed not to have a variable-size tail. This meansfunctions appending data to the fixed-size part no longer need toperform this check. Instead, any function that sets a variable-size tailwill now immediately start a new fragment.

Here's how the workflow looks with this optimization:

1
2
3
4
5
6
7
8
9
10
11
encodeInstruction:
assert(current fragment doesn't have a variable-size tail)
append data to the current fragment

emitValueToAlignment:
Encode the alignment in the variable-size tail of the current fragment
start a new fragment

emitDwarfLocDirective:
Encode the .loc in the variable-size tail of the current fragment
start a new fragment

Key changes:

It's worth noting that the first patch was made possible thanks tothe removal of the bundle alignment mode.

Fragment content in trailingdata

Our MCFragment class manages four distinct sets ofappendable data: fixed-size content, fixed-size fixups, variable-sizetail content, and variable-size tail fixups. Of these, the fixed-sizecontent is typically the largest. We can optimize its storage byutilizing it as trailing data, akin to a flexible array member.

This approach offers several compelling advantages:

  • Improved data locality: Storing the content after the MCFragmentobject enhances cache utility.
  • Simplified metadata: We can replace the pair ofuint32_t ContentStart = 0; uint32_t ContentEnd = 0; with asingle uint32_t ContentSize;.

This optimization leverages a clever technique made possible by usinga special purpose bump allocator. After allocatingsizeof(MCFragment) bytes for a new fragment, we know thatany remaining space within the current bump allocator block immediatelyfollows the fragment's end. This contiguous space can then beefficiently used for the fragment's trailing data.

However, this design introduces a few important considerations:

  • Tail fragment appends only: Data can only be appended to the tailfragment of a subsection. Fragments located in the middle of asubsection are immutable in their fixed-size content. Anypost-assembler-layout adjustments must target the variable-sizetail.
  • Dynamic Allocation Management: When new data needs to be appended, afunction is invoked to ensure the current bump allocator block hassufficient space. If not, the current fragment is closed (its fixed-sizecontent is finalized), and a new fragment is started. For instance, an8-byte sequence could be stored as one single fragment, or, if spaceconstraints dictate, as two fragments each encoding 4 bytes.
  • New block allocation: If the available space in the current block isinsufficient, a new block large enough to accommodate both an MCFragmentand the required bytes for its trailing data is allocated.
  • Section/subsection Switching: The previously saved fragment listtail cannot be simply reused. This is because it's tied to the memoryspace of the previous bump allocator block. Instead, a new fragment mustbe allocated using the current bump allocator block and appended to thenew subsection's tail.

I have thought about making the variable-size content immediatelyfollow the fixed-size content, but leb128 and x86's potentially verylong instruction (15 bytes) stopped me from doing it. There is certainlyroom for future improvements, though.

Key changes:

Fragment fixups stored insection

TODO

MCFragment should not hold references to fixups stored in the parentMCSection. Instead, fixups reference the fragment.

The optional variable-size tail of a fragment can have at most onefixup.

Deprecatingcomplexity: NativeClient's bundle alignment mode

Google's now-discontinued Native Client (NaCl) project provided asandboxing environment through a combination of Software Fault Isolation(SFI) and memory segmentation. A distinctive feature of its SFIimplementation was the "bundle alignment mode", which adds NOP paddingto ensure that no instruction crosses a 32-byte alignment boundary. Theverifier's job is to check all instructions starting at 32-byte-multipleaddresses.

While the core concept of aligned bundling is intriguing, itsimplementation within the LLVM assembler proved problematic. Introducedin 2012, this feature imposed noticeable performance penalties on userswho had no need for NaCl, perhaps more critically, significantlyincreased the complexity of MC's internal workings. I was particularlyconcerned by its pervasive modifications toMCObjectStreamer and MCAssembler.

The complexity deepened with the introduction of

In MCObjectStreamer, newly defined labels were put intoa "pending label" list and initially assigned to aMCDummyFragment associated with the current section. Thesymbols would be reassigned to a new fragment when the next instructionor directive was parsed. This pending label system introduced complexityand a missing flushPendingLabels could lead to subtle bugsrelated to incorrect symbol values. flushPendingLabels wascalled by many MCObjectStreamer functions, noticeably oncefor each new fragment, adding overhead. It also complicated the labeldifference evaluation due to MCDummyFragment inMCExpr.cpp:AttemptToFoldSymbolOffsetDifference.

For the following code, aligned bundling requires that .Ltmp isdefined at addl.

1
2
3
4
5
6
7
8
9
$ clang var.c -S -o - -fPIC -m32
...
.bundle_lock align_to_end
calll .L0$pb
.bundle_unlock
.L0$pb:
popl %eax
.Ltmp0:
addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax

Recognizing these long-standing issues, a series of pivotal changeswere undertaken:

  • 2024: [MC]Aligned bundling: remove special handling for RelaxAll removed anoptimization for NaCl in the mc-relax-allmode
  • 2024: [MC]Remove pending labels
  • 2024: [MC]AttemptToFoldSymbolOffsetDifference: remove MCDummyFragment check.NFC
  • 2025: Finally, MC: Removebundle alignment mode, after Derek Schuff agreed to drop NaClsupport from LLVM.

Should future features require a variant of bundle alignment, Ifirmly believe a much cleaner implementation is necessary. This couldpotentially be achieved through a backend hook withinX86AsmBackend::finishLayout, applied after the primaryassembler layout phase, similar to how the-mbranches-within-32B-boundaries option is handled, thougheven that implementation warrants an extensive revisit itself.

Lessons learned

The cost of missing early optimization

Early design choices can have a far-reaching impact on future code.The initial LLVM MC design, while admirably simple in its inception,inadvertently created a rigid foundation. As new features piled on, eachrelying more and more on the specific fragment internals, rectifyingfoundational inefficiencies became incredibly challenging. The Hyrum'sLaw was evident: features built on this foundation inevitably dependedon all its observable behaviors. Optimizing the underlying structurerequired not just a change to the core, but also a thorough fix for allits unsuspecting users. I encountered significant struggles with thedeeply ingrained complexities stemming from NaCl's bundle alignmentmode, x86's -mbranches-within-32B-boundaries option, andthe intricacies of RISC-V linker relaxation.

Cargo cult programming and snowball effect

I observed instances of "cargo cult programming", where existingsolutions were copied without a full understanding of their underlyingrationale or applicability. For example:

  • The WebAssembly implementation heavily mirrored that of ELF.Consequently, many improvements made to the ELF component oftennecessitated corresponding, sometimes redundant, changes to theWebAssembly implementation. In additin, the WebAssembly implementationcopied ELF-specific code that was irrelevant for WebAssembly'sarchitecture, adding unnecessary bloat and complexity.
  • LoongArch's RISC-V replication: LoongArch's linker relaxationimplementation directly copied the approach taken for RISC-V.Refactoring or improvements to RISC-V's linker relaxation frequentlyrequire mirrored changes in the LoongArch codebase, creating parallelmaintenance burdens. I am particularly glad that I landed myfoundational [RISCV] Makelinker-relaxable instructions terminate MCDataFragment and [RISCV] Allow delayed decisionfor ADD/SUB relocations in 2023, before the LoongArch teamreplicated the RISC-V approach. This timing, I hope, mitigated somefuture headaches for their implementation.

These patterns illustrate how initial design choices, or theexpedience of copying existing solutions, can lead to a "snowballeffect" of accumulating complexity and redundant code that makes futureoptimization and maintenance significantly harder. On a positive note,I'm also pleased that thestreamlining of the relocation generation framework was completedbefore Apple's upstreaming of their Mach-O support for 32-bit RISC-V.This critical work should provide a more robust and less complex basefor their contributions, and reducing maintenance on my end.

The cost of features

Specific features, particularly those designed for niche orspecialized use cases like NaCl's bundle alignment mode, introduceddisproportionate complexity and performance overhead across the entireassembler. Even though NaCl itself was deprecated in 2020, it took until2025 to finally excise its complex support from LLVM. This highlights acommon challenge in large, open-source projects: while many developersare motivated to add new features, there's often far less incentive ordedicated effort to streamline or remove their underlying implementationcomplexities once they're no longer strictly necessary or have become aperformance drain.

I want to acknowledge the work of individuals like Rafael Ávila deEspíndola, Saleem Abdulrasool, and Nirav Dave, whose improvements toLLVM MC were vital. Without their contributions, the MC layer wouldundoubtedly be in a far less optimized state today.

Epilogue

This extensive work on fragment optimization would not have beenpossible without the invaluable contributions of Alexis Engelke. My sincere thanks go toAlexis for his meticulous reviews of numerous patches, his insightfulsuggestions, and for contributing many significant improvementshimself.

What I have learnd through the process?

Appendix:How GNU Assembler mastered fragments decades ago

After dedicating several paragraphs to explaining the historicalshortcomings of LLVM MC's fragment representation, a natural questionarises: how does GNU Assembler (GAS), arguably the other most popularassembler on Linux systems, approach fragment handling?

Delving into its history reveals a fascinating answer. The earliestcommit I could locate is a cvs2svn-generated record from April 1991.Given the 1987 copyright notice within the code, it's highly probablethat this foundational work on fragments was laid down as early as1987.

You can explore this initial structure in as.h here: https://github.com/bminor/binutils-gdb/commit/3a69b3aca678a3caf3ade7f9d42d18233b097ec6#diff-0771d3312685417eb5061a8f0856da4f0406ca8bd6c7d68b6a50a026a4e48c9dR212.Please check out as.h and frags.c.

Observing the frag struct, a few points stand out:

  • While the exact purpose of fr_offset isn't immediatelyclear to me, fr_fix and fr_var bear a strikingresemblance to the concepts we've recently introduced in MCFragment. Itmight make the variable-size content immediately follow the fixed-sizecontent, though.
  • The char fr_literal[1] demonstrates an early use ofwhat we now call a flexible array member. Today, GCC and Clang's-fstrict-flex-arrays=2 would report a warning.
  • fr_symbol could be more appropriately placed within aunion
  • fr_pcrel_adjust and fr_bsr would ideallybe architecture-specific data.
  • Fragments are allocated using obstacks,which appear to be a more sophisticated form of a bump allocator, withadditional bookkeeping overhead.

But truly, I should stop the minor nit-picking. What astonishinglyimpresses me is the sheer foresight demonstrated in GAS's fragmentallocator design. Conceived in 1987 or even earlier, it masterfullyanticipated solutions that LLVM MC, first conceived in 2009, has onlynow achieved decades later. This design held the lead on fragmentarchitecture for nearly four decades!

My greatest tribute goes to the original authors of GNU Assembler forthis remarkable piece of engineering.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/*
* A code fragment (frag) is some known number of chars, followed by some
* unknown number of chars. Typically the unknown number of chars is an
* instruction address whose size is yet unknown. We always know the greatest
* possible size the unknown number of chars may become, and reserve that
* much room at the end of the frag.
* Once created, frags do not change address during assembly.
* We chain the frags in (a) forward-linked list(s). The object-file address
* of the 1st char of a frag is generally not known until after relax().
* Many things at assembly time describe an address by {object-file-address
* of a particular frag}+offset.

BUG: it may be smarter to have a single pointer off to various different
notes for different frag kinds. See how code pans
*/
struct frag /* a code fragment */
{
unsigned long fr_address; /* Object file address. */
struct frag *fr_next; /* Chain forward; ascending address order. */
/* Rooted in frch_root. */

long fr_fix; /* (Fixed) number of chars we know we have. */
/* May be 0. */
long fr_var; /* (Variable) number of chars after above. */
/* May be 0. */
struct symbol *fr_symbol; /* For variable-length tail. */
long fr_offset; /* For variable-length tail. */
char *fr_opcode; /*->opcode low addr byte,for relax()ation*/
relax_stateT fr_type; /* What state is my tail in? */
relax_substateT fr_subtype;
/* These are needed only on the NS32K machines */
char fr_pcrel_adjust;
char fr_bsr;
char fr_literal [1]; /* Chars begin here. */
/* One day we will compile fr_literal[0]. */
};

GCC 13.3.0 miscompiles LLVM

For years, I've been involved in updating LLVM's MC layer. A recentjourney led me to eliminatethe FK_PCRel_ fixup kinds:

MCFixup: Remove FK_PCRel_The generic FK_Data_ fixup kinds handle both absolute and PC-relativefixups. ELFObjectWriter sets IsPCRel to true for `.long foo-.`, so thebackend has to handle PC-relative FK_Data_.However, the existence of FK_PCRel_ encouraged backends to implement itas a separate fixup type, leading to redundant and error-prone code.Removing FK_PCRel_ simplifies the overall fixup mechanism.

As a prerequisite, I had to update several backends that relied onthe now-deleted fixup kinds. It was during this process that somethingunexpected happened. Contributors reportedthat when built by GCC 13.3.0, the LLVM integrated assembler hadtest failures.

To investigate, I downloaded and built GCC 13.3.0 locally:

1
2
../../configure --prefix=$HOME/opt/gcc-13.3.0 --disable-bootstrap --enable-languages=c,c++ --disable-libsanitizer --disable-multilib
make -j 30 && make -j 30 install

I then built a Release build (-O3) of LLVM. Sure enough,the failure was reproducible:

1
2
3
4
5
6
7
8
9
10
11
% /tmp/out/custom-gcc-13/bin/llc llvm/test/CodeGen/X86/2008-08-06-RewriterBug.ll -mtriple=i686 -o s -filetype=obj
Unknown immediate size
UNREACHABLE executed at /home/ray/llvm/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h:904!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: /tmp/out/custom-gcc-13/bin/llc llvm/test/CodeGen/X86/2008-08-06-RewriterBug.ll -mtriple=i686 -o s -filetype=obj
1. Running pass 'Function Pass Manager' on module 'llvm/test/CodeGen/X86/2008-08-06-RewriterBug.ll'.
2. Running pass 'X86 Assembly Printer' on function '@foo'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 llc 0x0000000002f06bcb
fish: Job 1, '/tmp/out/custom-gcc-13/bin/llc …' terminated by signal SIGABRT (Abort)

Interestingly, a RelWithDebInfo build (-O2 -g) of LLVMdid not reproduce the failure, suggesting either an undefined behavior,or an optimization-related issue within GCC 13.3.0.

The Bisection trail

I built GCC at the releases/gcc-13 branch, and the issuevanished. This strongly indicated that the problem lay somewhere betweenthe releases/gcc-13.3.0 tag and thereleases/gcc-13 branch.

The bisection led me to a specific commit, directing me to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109934#c6.

I developed a workaround at the code block with a typo "RemaningOps".Although I had observed it before, I was hesitant to introduce a commitsolely for a typo fix. However, it became clear this was the perfectopportunity to address both the typo and implement a workaround for theGCC miscompilation. This led to the landing of thiscommit, resolving the miscompilation.

Sam James from Gentoo mentioned that the miscompilation wasintroduced by a commit cherry-picked into GCC 13.3.0. GCC 13.2.0 and GCC13.4.0 are good.

LLVM integrated assembler: Improving expressions and relocations

In my previous post, LLVMintegrated assembler: Improving MCExpr and MCValue delved intoenhancements made to LLVM's internal MCExpr and MCValue representations.This post covers recent refinements to MC, focusing on expressionresolving and relocation generation.

Preventing cyclicdependencies

Equatedsymbols may form a cycle, which is not allowed.

1
2
3
4
5
6
7
8
9
# CHECK: [[#@LINE+2]]:7: error: cyclic dependency detected for symbol 'a'
# CHECK: [[#@LINE+1]]:7: error: expression could not be evaluated
a = a + 1

# CHECK: [[#@LINE+3]]:6: error: cyclic dependency detected for symbol 'b1'
# CHECK: [[#@LINE+1]]:6: error: expression could not be evaluated
b0 = b1
b1 = b2
b2 = b0

Previously, LLVM's interated assembler used an occurs check to detectthese cycles when parsing symbol equating directives.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
bool parseAssignmentExpression(StringRef Name, bool allow_redef,
MCAsmParser &Parser, MCSymbol *&Sym,
const MCExpr *&Value) {
...
// Validate that the LHS is allowed to be a variable (either it has not been
// used as a symbol, or it is an absolute symbol).
Sym = Parser.getContext().lookupSymbol(Name);
if (Sym) {
// Diagnose assignment to a label.
//
// FIXME: Diagnostics. Note the location of the definition as a label.
// FIXME: Diagnose assignment to protected identifier (e.g., register name).
if (Value->isSymbolUsedInExpression(Sym))
return Parser.Error(EqualLoc, "Recursive use of '" + Name + "'");
...
}

isSymbolUsedInExpression implemented occurs check as atree (or more accurately, a DAG) traversal.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
bool MCExpr::isSymbolUsedInExpression(const MCSymbol *Sym) const {
switch (getKind()) {
case MCExpr::Binary: {
const MCBinaryExpr *BE = static_cast<const MCBinaryExpr *>(this);
return BE->getLHS()->isSymbolUsedInExpression(Sym) ||
BE->getRHS()->isSymbolUsedInExpression(Sym);
}
case MCExpr::Target: {
const MCTargetExpr *TE = static_cast<const MCTargetExpr *>(this);
return TE->isSymbolUsedInExpression(Sym);
}
case MCExpr::Constant:
return false;
case MCExpr::SymbolRef: {
const MCSymbol &S = static_cast<const MCSymbolRefExpr *>(this)->getSymbol();
if (S.isVariable() && !S.isWeakExternal())
return S.getVariableValue()->isSymbolUsedInExpression(Sym);
return &S == Sym;
}
case MCExpr::Unary: {
const MCExpr *SubExpr =
static_cast<const MCUnaryExpr *>(this)->getSubExpr();
return SubExpr->isSymbolUsedInExpression(Sym);
}
}

llvm_unreachable("Unknown expr kind!");
}

While generally effective, this routine wasn't universally appliedacross all symbol equating scenarios, such as with .weakrefor some target-specific parsing code, leading to potential undetectedcycles, and therefore infinite loop in assembler execution.

To address this, I adopted a 2-color depth-first search (DFS)algorithm. While a 3-color DFS is typical for DAGs, a 2-color approachsuffices for our trees, although this might lead to more work when asymbol is visited multiple times. Shared subexpressions are very rare inLLVM.

Here is the relevant change toevaluateAsRelocatableImpl. I also need a new bit fromMCSymbol.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
@@ -497,13 +498,25 @@ bool MCExpr::evaluateAsRelocatableImpl(MCValue &Res, const MCAssembler *Asm,

case SymbolRef: {
const MCSymbolRefExpr *SRE = cast<MCSymbolRefExpr>(this);
- const MCSymbol &Sym = SRE->getSymbol();
+ MCSymbol &Sym = const_cast<MCSymbol &>(SRE->getSymbol());
const auto Kind = SRE->getKind();
bool Layout = Asm && Asm->hasLayout();

// Evaluate recursively if this is a variable.
+ if (Sym.isResolving()) {
+ if (Asm && Asm->hasFinalLayout()) {
+ Asm->getContext().reportError(
+ Sym.getVariableValue()->getLoc(),
+ "cyclic dependency detected for symbol '" + Sym.getName() + "'");
+ Sym.IsUsed = false;
+ Sym.setVariableValue(MCConstantExpr::create(0, Asm->getContext()));
+ }
+ return false;
+ }
if (Sym.isVariable() && (Kind == MCSymbolRefExpr::VK_None || Layout) &&
canExpand(Sym, InSet)) {
+ Sym.setIsResolving(true);
+ auto _ = make_scope_exit([&] { Sym.setIsResolving(false); });
bool IsMachO =
Asm && Asm->getContext().getAsmInfo()->hasSubsectionsViaSymbols();
if (Sym.getVariableValue()->evaluateAsRelocatableImpl(Res, Asm,

Unfortunately, I cannot removeMCExpr::isSymbolUsedInExpression, as it is still used byAMDGPU ([AMDGPU] Avoidresource propagation for recursion through multiple functions).

Revisiting the.weakref directive

The .weakref directive had intricate impact on the expressionresolving framework.

.weakref enables the creation of weak aliases withoutdirectly modifying the target symbol's binding. This allows a headerfile in library A to optionally depend on symbols from library B. Whenthe target symbol is otherwise not referenced, the object file affectedby the weakref directive will include an undefined weak symbol. However,when the target symbol is defined or referenced (by the user), it canretain STB_GLOBAL binding to support archive member extraction. GCC's[[gnu::weakref]] attribute, as used in runtime libraryheaders like libgcc/gthr-posix.h, utilizes thisfeature.

I've noticed a few issues:

  • Unreferenced .weakref alias, target created undefinedtarget.
  • Crash when alias was already defined.
  • VK_WEAKREF was mis-reused by the aliasdirective of llvm-ml (MASM replacement).

And addressed them with

  • [MC]Ignore VK_WEAKREF in MCValue::getAccessVariant (2019-12). Wow, it'sinteresting to realize I'd actually delved into this a few yearsago!
  • MC:Rework .weakref (2025-05)

Expression resolving andreassignments

= and its equivalents (.set,.equ) allow a symbol to be equatedmultiple times. This means when a symbol is referenced, its currentvalue is captured at that moment, and subsequent reassignments do notalter prior references.

1
2
3
4
5
6
7
.data
.set x, 0
.long x // reference the first instance
x = .-.data
.long x // reference the second instance
.set x,.-.data
.long x // reference the third instance

The assembly code evaluates to.long 0; .long 4; .long 8.

Historically, the LLVM integrated assembler restricted reassigningsymbols whose value wasn't a parse-time integer constant(MCConstExpr). This was a safeguard against potentiallyunsafe reassignments, as an old value might still be referenced.

1
2
3
4
% clang -c g.s
g.s:6:8: error: invalid reassignment of non-absolute variable 'x'
.set x,.-.data
^

The safeguard was implemented with multiple conditions, aided by a mysterious IsUsedvariable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Diagnose assignment to a label.
//
// FIXME: Diagnostics. Note the location of the definition as a label.
// FIXME: Diagnose assignment to protected identifier (e.g., register name).
if (Value->isSymbolUsedInExpression(Sym))
return Parser.Error(EqualLoc, "Recursive use of '" + Name + "'");
else if (Sym->isUndefined(/*SetUsed*/ false) && !Sym->isUsed() &&
!Sym->isVariable())
; // Allow redefinitions of undefined symbols only used in directives.
else if (Sym->isVariable() && !Sym->isUsed() && allow_redef)
; // Allow redefinitions of variables that haven't yet been used.
else if (!Sym->isUndefined() && (!Sym->isVariable() || !allow_redef))
return Parser.Error(EqualLoc, "redefinition of '" + Name + "'");
else if (!Sym->isVariable())
return Parser.Error(EqualLoc, "invalid assignment to '" + Name + "'");
else if (!isa<MCConstantExpr>(Sym->getVariableValue()))
return Parser.Error(EqualLoc,
"invalid reassignment of non-absolute variable '" +
Name + "'");

Over the past few years, during our work on porting Clang to Linuxkernel ports, we worked around this by modifying the assembly codeitself:

  • ARM:8971/1: replace the sole use of a symbol with its definition in2020-04
  • crypto:aesni - add compatibility with IAS in 2020-07
  • powerpc/64/asm:Do not reassign labels in 2021-12

This prior behavior wasn't ideal. I've since enabled properreassignment by implementing a system where the symbol is cloned uponredefinition, and the symbol table is updated accordingly. Crucially,any existing references to the original symbol remain unchanged, and theoriginal symbol is no longer included in the final emitted symboltable.

Before rolling out this improvement, I discovered problematic uses inthe AMDGPU and ARM64EC backends that required specific fixes orworkarounds. This is a common challenge when making general improvementsto LLVM's MC layer: you often need to untangle and resolve individualbackend-specific "hacks" before a more generic interface enhancement canbe applied.

  • MCParser:Error when .set reassigns a non-redefinable variable
  • MC:Allow .set to reassign non-MCConstantExpr expressions

For the following assembly, newer Clang emits relocations referencingfoo, foo, bar, foo like GNU Assembler.

1
2
3
4
5
6
7
b = a
a = foo
call a
call b
a = bar
call a
call b

Relocation generation

For a deeper dive into the concepts of relocation generation, youmight find my previous post, Relocationgeneration in assemblers, helpful.

Driven by the need to support new RISC-V vendor relocations (e.g.,Xqci extensions from Qualcomm) and my preference against introducing anextra MCAsmBackend hook, I've significantly refactoredLLVM's relocation generation framework. This effort generalized existingRISC-V/LoongArch ADD/SUB relocation logic and enabled its customizationfor other targets like AVR and PowerPC.

  • MC:Generalize RISCV/LoongArch handleAddSubRelocations and AVRshouldForceRelocation

The linker relaxation framework sometimes generated redundantrelocations that could have been resolved. This occurred in severalscenarios, including:

1
2
3
4
5
6
7
8
9
10
11
12
.option norelax
j label
// For assembly input, RISCVAsmParser::ParseInstruction sets ForceRelocs (https://reviews.llvm.org/D46423).
// For direct object emission, RISCVELFStreamer sets ForceRelocs (#77436)
.option relax
call foo // linker-relaxable

.option norelax
j label // redundant relocation due to ForceRelocs
.option relax

label:

And also with label differences within a section withoutlinker-relaxable instructions:

1
2
3
4
5
6
7
8
9
10
11
call foo

.section .text1,"ax"
# No linker-relaxable instruction. Label differences should be resolved.
w1:
nop
w2:

.data
# Redundant R_RISCV_SET32 and R_RISCV_SUB32
.long w2-w1

These issues have now been resolved through a series of patches,significantly revamping the target-neutral relocation generationframework. Key contributions include:

I've also streamlined relocation generation within the SPARC backend.Given its minimal number of relocations, the SPARC implementation couldserve as a valuable reference for downstream targets seeking tocustomize their own relocation handling.

Simplificationto assembly and machine code emission

For a dive into the core classes involved in LLVM's assembly andmachine code emission, you might read my Noteson LLVM assembly and machine code emission.

The MCAssembler class orchestrates the emission process,managing MCAsmBackend, MCCodeEmitter, andMCObjectWriter. In turn, MCObjectWriteroversees MCObjectTargetWriter.

Historically, many member functions within the subclasses ofMCAsmBackend, MCObjectWriter, andMCObjectTargetWriter accepted a MCAssembler *argument. This was often redundant, as it was typically only used toaccess the MCContext instance. To streamline this, I'veadded a MCAssembler * member variable directly toMCAsmBackend, MCObjectWriter, andMCObjectTargetWriter, along with convenient helperfunctions like getContext. This change cleans up theinterfaces and improves code clarity.

  • MCAsmBackend:Add member variable MCAssembler * and define getContext
  • ELFObjectWriter:Remove the MCContext argument from getRelocType
  • MachObjectWriter:Remove the MCAssembler argument from getSymbolAddress
  • WinCOFFObjectWriter:Simplify code with member MCAssembler *

Previously, the ARM, Hexagon, and RISC-V backends had uniquerequirements that led to extra arguments being passed to MCAsmBackendhooks. These arguments were often unneeded by other targets. I've sincerefactored these interfaces, replacing those specialized arguments withmore generalized and cleaner approaches.

  • ELFObjectWriter:Move Thumb-specific condition to ARMELFObjectWriter
  • MCAsmBackend:Remove MCSubtargetInfo argument
  • MCAsmBackend,X86:Pass MCValue to fixupNeedsRelaxationAdvanced. NFC
  • MCAsmBackend,Hexagon:Remove MCRelaxableFragment from fixupNeedsRelaxationAdvanced
  • MCAsmBackend:Simplify applyFixup

Future plan

The assembler's ARM port has a limitation where only relocations withimplicit addends (REL) are handled. For CREL, weaim to use explicit addends across all targets to simplifylinker/tooling implementation, but this is incompatible withARMAsmBackend's current design. See this ARM CREL assemblerissue https://github.com/llvm/llvm-project/issues/141678.

To address this issue, we should

  • In MCAssembler::evaluateFixup, generalizeMCFixupKindInfo::FKF_IsAlignedDownTo32Bits (ARM hack, alsoused by other backends) to support more fixups, includingARM::fixup_arm_uncondbl (R_ARM_CALL). Create anew hook in MCAsmBackend.
  • In ARMAsmBackend, move the Value -= 8 codefrom adjustFixupValue to the new hook.
1
2
3
4
5
6
7
8
9
10
11
12
unsigned ARMAsmBackend::adjustFixupValue(const MCAssembler &Asm,
...
case ARM::fixup_arm_condbranch:
case ARM::fixup_arm_uncondbranch:
case ARM::fixup_arm_uncondbl:
case ARM::fixup_arm_condbl:
case ARM::fixup_arm_blx:
// Check that the relocation value is legal.
Value -= 8;
if (!isInt<26>(Value)) {
Ctx.reportError(Fixup.getLoc(), "Relocation out of range");
return 0;

Enabling RELA/CREL support requires significant effort and exceeds myexpertise or willingness to address for AArch32. However, I do want toadd a new MCAsmBackend hook to minimize AArch32's invasive modificationsto the generic relocation generation framework.

For reference, the arm-vxworks port in binutils introducedRELA support in 2006.

LLVM integrated assembler: Improving MCExpr and MCValue

In my previous post, RelocationGeneration in Assemblers, I explored some key concepts behindLLVM’s integrated assemblers. This post dives into recent improvementsI’ve made to refine that system.

The LLVM integrated assembler handles fixups and relocatableexpressions as distinct entities. Relocatable expressions, inparticular, are encoded using the MCValue class, whichoriginally looked like this:

1
2
3
4
5
class MCValue {
const MCSymbolRefExpr *SymA = nullptr, *SymB = nullptr;
int64_t Cst = 0;
uint32_t RefKind = 0;
};

In this structure:

  • RefKind acts as an optional relocation specifier,though only a handful of targets actually use it.
  • SymA represents an optional symbol reference (theaddend).
  • SymB represents another optional symbol reference (thesubtrahend).
  • Cst holds a constant value.

While functional, this design had its flaws. For one, the wayrelocation specifiers were encoded varied across architectures:

  • Targets like COFF, Mach-O, and ELF's PowerPC, SystemZ, and X86 embedthe relocation specifier within MCSymbolRefExpr *SymA aspart of SubclassData.
  • Conversely, ELF targets such as AArch64, MIPS, and RISC-V store itas a target-specific subclass of MCTargetExpr, and convertit to MCValue::RefKind duringMCValue::evaluateAsRelocatable.

Another issue was with SymB. Despite being typed asconst MCSymbolRefExpr *, itsMCSymbolRefExpr::VariantKind field went unused. This isbecause expressions like add - sub@got are notrelocatable.

Over the weekend, I tackled these inconsistencies and reworked therepresentation into something cleaner:

1
2
3
4
5
6
class MCValue {
const MCSymbol *SymA = nullptr, *SymB = nullptr;
int64_t Cst = 0;
uint32_t Specifier = 0;
};

This updated design not only aligns more closely with the concept ofrelocatable expressions but also shaves off some compiler time in LLVM.The ambiguous RefKind has been renamed toSpecifier for clarity. Additionally, targets thatpreviously encoded the relocation specifier withinMCSymbolRefExpr (rather than usingMCTargetExpr) can now access it directly viaMCValue::Specifier.

To support this change, I made a few adjustments:

  • IntroducedgetAddSym and getSubSym methods, returningconst MCSymbol *, as replacements for getSymAand getSymB.
  • Eliminated dependencies on the old accessors,MCValue::getSymA and MCValue::getSymB.
  • Reworkedthe expression folding code that handles + and -
  • Storedthe const MCSymbolRefExpr *SymA specifier atMCValue::Specifier
  • Some targets relied on PC-relative fixups with explicit specifiersforcing relocations. I have definedMCAsmBackend::shouldForceRelocation for SystemZ and cleanedup ARM and PowerPC
  • Changedthe type of SymA and SymB toconst MCSymbol *
  • Replacedthe temporary getSymSpecifier withgetSpecifier
  • Replacedthe legacy getAccessVariant withgetSpecifier

Streamlining Mach-O support

Mach-O assembler support in LLVM has accumulated significanttechnical debt, impacting both target-specific and generic code. Oneparticularly nagging issue was theconst SectionAddrMap *Addrs parameter inMCExpr::evaluateAs* functions. This parameter existed tohandle cross-section label differences, primarily for generating(compact) unwind information in Mach-O. A typical example of this can beseen in assembly like:

1
2
3
4
5
6
        .section        __TEXT,__text,regular,pure_instructions
Leh_func_begin0:
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
Ltmp3:
Ltmp4 = Leh_func_begin0-Ltmp3
.long Ltmp4

The SectionAddrMap *Addrs parameter always felt like aclunky workaround to me. It wasn’t until I dug into the Mach-OAArch64 object writer that I realized this hack wasn't necessary forthat writer. This discovery prompted a cleanup effort to remove thedependency on SectionAddrMap for ARM and X86 and eliminatethe parameter:

  • [MC,MachO]Replace SectionAddrMap workaround with cleaner variablehandling
  • MCExpr:Remove unused SectionAddrMap workaround

While I was at it, I also tidied up MCSymbolRefExpr byremovingthe clunky HasSubsectionsViaSymbolsBit, furthersimplifying the codebase.

Stremlining InstPrinter

The MCExpr code also determines how expression operands in assemblyinstructions are printed. I have made improvements in this area aswell:

  • [MC]Don't print () around $ names
  • [MC]Simplify MCBinaryExpr/MCUnaryExpr printing by reducingparentheses

Relocation generation in assemblers

This post explores how GNU Assembler and LLVM integrated assemblergenerate relocations, an important step to generate a relocatable file.Relocations identify parts of instructions or data that cannot be fullydetermined during assembly because they depend on the final memorylayout, which is only established at link time or load time. These areessentially placeholders that will be filled in (typically with absoluteaddresses or PC-relative offsets) during the linking process.

Relocation generation: thebasics

Symbol references are the primary candidates for relocations. Forinstance, in the x86-64 instruction movl sym(%rip), %eax(GNU syntax), the assembler calculates the displacement between theprogram counter (PC) and sym. This distance affects theinstruction's encoding and typically triggers aR_X86_64_PC32 relocation, unless sym is alocal symbol defined within the current section.

Both the GNU assembler and LLVM integrated assembler utilize multiplepasses during assembly, with several key phases relevant to relocationgeneration:

Parsing phase

During parsing, the assembler builds section fragments that containinstructions and other directives. It parses each instruction into itsopcode (e.g., movl) and operands (e.g.,sym(%rip), %eax). It identifies registers, immediate values(like 3 in movl $3, %eax), and expressions.

Expressions can be constants, symbol refereces (likesym), or unary and binary operators (-sym,sym0-sym1). Those unresolvable at parse time-potentialrelocation candidates-turn into "fixups". These often skip immediateoperand range checks, as shown here:

1
2
3
4
5
6
7
% echo 'addi a0, a0, 2048' | llvm-mc -triple=riscv64
<stdin>:1:14: error: operand must be a symbol with %lo/%pcrel_lo/%tprel_lo modifier or an integer in the range [-2048, 2047]
addi a0, a0, 2048
^
% echo 'addi a0, a0, %lo(x)' | llvm-mc -triple riscv64 -show-encoding
addi a0, a0, %lo(x) # encoding: [0x13,0x05,0bAAAA0101,A]
# fixup A - offset: 0, value: %lo(x), kind: fixup_riscv_lo12_i

A fixup ties to a specific location (an offset within a fragment),with its value being the expression (which must eventually evaluate to arelocatable expression).

Meanwhile, the assembler tracks defined and referenced symbols, andfor ELF, it tracks symbol bindings(STB_LOCAL, STB_GLOBAL, STB_WEAK) from directives like.globl, .weak, or the rarely used.local.

Section layout phase

After parsing, the assembler arranges each section by assigningprecise offsets to its fragments-instructions, data, or other directives(e.g., .line, .uleb128). It calculates sizesand adjusts for alignment. This phase finalizes symbol offsets (e.g.,start: at offset 0x10) while leaving external ones for thelinker.

This phase, which employs a fixed-point iteration, is quite complex.I won't go into details, but you might find Clang's-O0 output: branch displacement and size increase interesting.

Relocation decision phase

Then the assembler evaluates each fixup to determine if it can beresolved directly or requires a relocation entry. This process starts byattempting to convert fixups into relocatable expressions.

Evaluating relocatableexpressions

In their most general form, relocatable expressions follow thepattern relocation_specifier(sym_a - sym_b + offset),where

  • relocation_specifier: This may or may not be absent. Iwill explain this concept later.
  • sym_a is a symbol reference (the "addend")
  • sym_b is an optional symbol reference (the"subtrahend")
  • offset is a constant value

Most common cases involve only sym_a oroffset (e.g., movl sym(%rip), %eax ormovl $3, %eax). Only a few target architectures support thesubtrahend term (sym_b). Notable exceptions include AVR andRISC-V, as explored in Thedark side of RISC-V linker relaxation.

Attempting to use unsupported expression forms will result inassembly errors:

1
2
3
4
5
6
7
% echo -e 'movl a+b, %eax\nmovl a-b, %eax' | clang -c -xassembler -
<stdin>:1:1: error: expected relocatable expression
movl a+b, %eax
^
<stdin>:2:1: error: symbol 'b' can not be undefined in a subtraction expression
movl a-b, %eax
^

Let's use some notations from the AArch64 psABI.

  • S is the address of the symbol.
  • A is the addend for the relocation.
  • P is the address of the place being relocated (derivedfrom r_offset).
  • GOT is the address of the Global Offset Table, thetable of code and data addresses to be resolved at dynamic linktime.
  • GDAT(S+A) represents a pointer-sized entry in theGOT for address S+A.

PC-relative fixups

PC-relative fixups compute their values assym_a - current_location + offset (S - P + A)and can be seen as a special case that uses sym_b. (I’veskipped - sym_b, since no target I know permits asubtrahend here.)

When sym_a is a non-ifunc local symbol defined withinthe current section, these PC-relative fixups evaluate to constants. Butif sym_a is a global or weak symbol in the same section, arelocation entry is generated. This ensures ELF symbolinterposition stays in play.

In contrast, label differences (e.g. .quad g-f) can beresolved even if f and g are global.

On some targets (e.g., AArch64, PowerPC, RISC-V), the PC-relativeoffset is relative to the start of the instruction (P), while others(e.g., AArch32, x86) are relative to P plus a constant.

Resolution Outcomes

The assembler's evaluation of fixups leads to one of threeoutcomes:

  • Error: When the expression isn't supported.
  • Resolved fixups: The assembler updates the relevant bits in theinstruction directly. No relocation entry is needed.
    • There are target-specific exceptions that make the fixup unresolved.In AArch64 adrp x0, l0; l0:, the immediate might be either0 or 1, dependant on the instruction address. In RISC-V, linkerrelaxation might make fixups unresolved.
  • Unresolved fixups: When the fixup evaluates to a relocatableexpression but not a constant, the assembler
    • Generates an appropriate relocation (offset, type, symbol,addend).
    • For targets that use RELA, usually zeros out the bits in theinstruction field that will be modified by the linker.
    • For targets that use REL, leave the addend in the instructionfield.
    • If the referenced symbol is defined and local, and the relocationtype is not in exceptions (gas tc_fix_adjustable), therelocation references the section symbol instead of the localsymbol.

Fixup resolution depends on the fixup type:

  • PC-relative fixups that describe the symbol itself (the relocationoperation looks like S - P + A) resolve to a constant ifsym_a is a non-ifunc local symbol defined in the currentsection.
  • relocation_specifier(S + A) style fixups resolve whenS refers to an absolute symbol.
  • Other fixups, including TLS and GOT related ones, remainunresolved.

For ELF targets, if a non-TLS relocation operation references thesymbol itself S (not GDAT), it may be adjustedto reference the section symbol instead.

If you are interested in relocation representations in differentobject file formats, please check out my post Exploring objectfile formats.

If an equated symbol sym is resolved relative to asection, relocations are generated against sym. Otherwise,if it resolves to a constant or an undefined symbol, relocations aregenerated against that constant or undefined symbol.

Examples in action

Branches

1
2
3
4
5
6
7
8
9
10
11
12
% echo -e 'call fun\njmp fun' | clang -c -xassembler - -o - | fob -dr -
...
0: e8 00 00 00 00 callq 0x5 <.text+0x5>
0000000000000001: R_X86_64_PLT32 fun-0x4
5: e9 00 00 00 00 jmp 0xa <.text+0xa>
0000000000000006: R_X86_64_PLT32 fun-0x4
% echo -e 'bl fun\nb fun' | clang --target=aarch64 -c -xassembler - -o - | fob -dr -
...
0: 94000000 bl 0x0 <.text>
0000000000000000: R_AARCH64_CALL26 fun
4: 14000000 b 0x4 <.text+0x4>
0000000000000004: R_AARCH64_JUMP26 fun

Absolute and PC-relative symbol references

1
2
3
4
5
6
% echo -e 'movl a, %eax\nmovl a(%rip), %eax' | clang -c -xassembler - -o - | llvm-objdump -dr -
...
0: 8b 04 25 00 00 00 00 movl 0x0, %eax
0000000000000003: R_X86_64_32S a
7: 8b 05 00 00 00 00 movl (%rip), %eax # 0xd <.text+0xd>
0000000000000009: R_X86_64_PC32 a-0x4

(a-.)(%rip) would probably be more semantically correctbut is not adopted by GNU Assembler.

Relocation specifiers

Relocation specifiers guide the assembler on how to resolve andencode expressions into instructions. They specify details like:

  • Whether to reference the symbol itself, its Procedure Linkage Table(PLT) entry, or its Global Offset Table (GOT) entry.
  • Which part of a symbol's address to use (e.g., lower or upperbits).
  • Whether to use an absolute address or a PC-relative one.

This concept appears across various architectures but withinconsistent terminology. The Arm architecture refers to elements like:lo12: and :lower16: as "relocationspecifiers". IBM's AIX documentation also uses this term. Many GNUBinutils target documents simply call these "modifiers", while AVRdocumentation uses "relocatable expression modifiers".

Picking the right term was tricky. "Relocatable expression modifier"nails the idea of tweaking relocatable expressions but feels overlyverbose. "Relocation modifier", though concise, suggests adjustmentshappen during the linker's relocation step rather than the assembler'sexpression evaluation. I landed on "relocation specifier" as the winner.It's clear, aligns with Arm and IBM’s usage, and fits the assembler'srole seamlessly.

For example, RISC-V addi can be used with either anabsolute address or a PC-relative address. Relocation specifiers%lo and %pcrel_lo could differentiate the twouses. Similarly, %hi, %pcrel_hi, and%got_pcrel_hi could differentiate the uses oflui and auipc.

1
2
3
4
5
6
7
8
9
10
11
# Position-dependent code (PDC) - absolute addressing
lui a0, %hi(var) # Load upper immediate with high bits of symbol address
addi a0, a0, %lo(var) # Add lower 12 bits of symbol address

# Position-independent code (PIC) - PC-relative addressing
auipc a0, %pcrel_hi(var) # Add upper PC-relative offset to PC
addi a0, a0, %pcrel_lo(.Lpcrel_hi1) # Add lower 12 bits of PC-relative offset

# Position-independent code via Global Offset Table (GOT)
auipc a0, %got_pcrel_hi(var) # Calculate address of GOT entry relative to PC
ld a0, %pcrel_lo(.Lpcrel_hi1)(a0) # Load var's address from GOT

Why use %hi with lui if it's always paired?It's about clarify and explicitness. %hi ensuresconsistency with %lo and cleanly distinguishes it from from%pcrel_hi. Since both lui andauipc share the U-type instruction format, tying relocationspecifiers to formats rather than specific instructions is a smart,flexible design choice.

Relocation specifier flavors

Assemblers use various syntaxes for relocation specifiers, reflectingarchitectural quirks and historical conventions. Below, we explore themain flavors, their usage across architectures, and some of theirpeculiarities.

expr@specifier

This is likely the most widespread syntax, adopted by many binutilstargets, including ARC, C-SKY, Power, M68K, SuperH, SystemZ, and x86,among others. It's also used in Mach-O object files, e.g.,adrp x8, _bar@GOTPAGE.

This suffix style puts the specifier after an @. It'sintuitive—think sym@got. In PowerPC, operators can getelaborate, such as sym@toc@l(9). Here, @toc@lis a single, indivisible operator-not two separate @pieces-indicating a TOC-relative reference with a low 16-bitextraction.

Parsing is loose: while both expr@specifier+expr andexpr+expr@specifier are accepted (by many targets),conceptually it's just specifier(expr+expr). For example,x86 accepts sym@got+4 or sym+4@got, but don'tmisread—@got applies to sym+4, not justsym.

%specifier(expr)

MIPS, SPARC, RISC-V, and LoongArch favor this prefix style, wrappingthe expression in parentheses for clarity. In MIPS, parentheses areoptional, and operators can nest, like

1
2
3
4
5
# MIPS
addiu $2, $2, %lo(0x12345)
addiu $2, $2, %lo 0x12345
lui $1, %hi(%neg(%gp_rel(main)))
ld $1, %got_page($.str)($gp)

Like expr@specifier, the specifier applies to the wholeexpression. Don't misinterpret %lo(3)+sym-it resolves assym+3 with an R_MIPS_LO16 relocation.

1
2
3
# MIPS
addiu $2, $2, %lo(3)+sym # R_MIPS_LO16 sym+0x3
addiu $2, $2, %lo 3+sym # R_MIPS_LO16 sym+0x3

SPARC has an anti-pattern. Its %lo and %hiexpand to different relocation types depending on whether gas's-KPIC option (llvm-mc -position-independent)is specified.

expr(specifier)

A simpler suffix style, this is used by AArch32 for data directives.It's less common but straightforward, placing the operator inparentheses after the expression.

1
2
3
4
.word sym(gotoff)
.long f(FUNCDESC)

.long f(got)+3 // allowed b GNU assembler and LLVM integrated assembler, but probably not used in the wild

:specifier:expr

AArch32 and AArch64 adopt this colon-framed prefix notation, avoidingthe confusion that parentheses might introduce.

1
2
3
4
5
6
7
8
// AArch32
movw r0, :lower16:x

// AArch64
add x8, x8, :lo12:sym

adrp x0, :got:var
ldr x0, [x0, :got_lo12:var]

Applying this syntax to data directives, however, could createparsing ambiguity. In both GNU Assembler and LLVM,.word :plt:fun would be interpreted as.word: plt: fun, treating .word andplt as labels, rather than achieving the intendedmeaning.

Recommendation

For new architectures, I'd suggest adopting%specifier(expr), and never use @specifier.The % symbol works seamlessly with data directives, andduring operand parsing, the parser can simply peek at the first token tocheck for a relocation specifier.

I favor %specifier(expr) over%specifier expr because it provides clearer scoping,especially in data directives with multiple operands, such as.long %lo(a), %lo(b).

( %specifier(...) resembles % expansion inGNU Assembler's altmacro mode.

1
2
3
.altmacro
.macro m arg; .long \arg; .endm
.data; m %(1+2)
)

Inelegance

RISC-V favors %specifier(expr) but clings tocall sym@plt for legacyreasons.

AArch64 uses :specifier:expr, yetR_AARCH64_PLT32 (.word foo@plt - .) and PAuthABI (.quad (g + 7)@AUTH(ia,0)) cannot use :after data directives due to parsing ambiguity. https://github.com/llvm/llvm-project/issues/132570

TLS symbols

When a symbol is defined in a section with the SHF_TLSflag (Thread-Local Storage), GNU assembler assigns it the typeSTT_TLS in the symbol table. For undefined TLS symbols, theprocess differs: GCC and Clang don’t emit explicit labels. Instead,assemblers identify these symbols through TLS-specific relocationspecifiers in the code, deduce their thread-local nature, and set theirtype to STT_TLS accordingly.

1
2
3
4
5
// AArch64
add x8, x8, :tprel_hi12:tls

// x86
movl %fs:tls@TPOFF, %eax

Composed relocations

Most instructions trigger zero or one relocation, but some generatetwo. Often, one acts as a marker, paired with a standard relocation. Forexample:

  • PPC64 bl __tls_get_addr(x@tlsgd)pairs a marker R_PPC64_TLSGD withR_PPC64_REL24
  • PPC64's link-time GOT-indirect to PC-relative optimization (withPower10's prefixed instruction) generates aR_PPC64_PCREL_OPT relocation following a GOT relocation. https://reviews.llvm.org/D79864
  • RISC-V linker relaxation uses R_RISCV_RELAX alongsideanother relocation, andR_RISCV_ADD*/R_RISCV_SUB* pairs.
  • Mach-O scattered relocations for label differences.
  • XCOFF represents a label difference with a pair of R_POS andR_NEG relocations.

These marker cases tie into "composed relocations", as outlined inthe Generic ABI:

If multiple consecutive relocation records are applied to the samerelocation location (r_offset), they are composed insteadof being applied independently, as described above. By consecutive, wemean that the relocation records are contiguous within a singlerelocation section. By composed, we mean that the standard applicationdescribed above is modified as follows:

  • In all but the last relocation operation of a composed sequence,the result of the relocation expression is retained, rather than havingpart extracted and placed in the relocated field. The result is retainedat full pointer precision of the applicable ABI processorsupplement.

  • In all but the first relocation operation of a composed sequence,the addend used is the retained result of the previous relocationoperation, rather than that implied by the relocation type.

Note that a consequence of the above rules is that the locationspecified by a relocation type is relevant for the first element of acomposed sequence (and then only for relocation records that do notcontain an explicit addend field) and for the last element, where thelocation determines where the relocated value will be placed. For allother relocation operands in a composed sequence, the location specifiedis ignored.

An ABI processor supplement may specify individual relocation typesthat always stop a composition sequence, or always start a new one.

Implicit addends

ELF SHT_REL and Mach-O utilize implicit addends.TODO

  • R_MIPS_HI16 (https://reviews.llvm.org/D101773)

GNU Assembler internals

GNU Assembler utilizes struct fixup to represent boththe fixup and the relocatable expression.

1
2
3
4
5
6
7
8
9
10
11
struct fix {
...
/* NULL or Symbol whose value we add in. */
symbolS *fx_addsy;

/* NULL or Symbol whose value we subtract. */
symbolS *fx_subsy;

/* Absolute number we add in. */
valueT fx_offset;
};

The relocation specifier is part of the instruction instead of partof struct fix. Targets have different internalrepresentations of instructions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// gas/config/tc-aarch64.c
struct reloc
{
bfd_reloc_code_real_type type;
expressionS exp;
int pc_rel;
enum aarch64_opnd opnd;
uint32_t flags;
unsigned need_libopcodes_p : 1;
};

struct aarch64_instruction
{
aarch64_inst base;
aarch64_operand_error parsing_error;
int cond;
struct reloc reloc;
unsigned gen_lit_pool : 1;
};

// gas/config/tc-ppc.c
struct ppc_fixup
{
expressionS exp;
int opindex;
bfd_reloc_code_real_type reloc;
};

The 2002 message stageone of gas reloc rewrite describes the passes.

In PPC, the result of @l and @ha can beeither signed or unsigned, determined by the instruction opcode.

In md_apply_fix, TLS-related relocation specifiers callS_SET_THREAD_LOCAL (fixP->fx_addsy);.

LLVM internals

LLVM integrated assembler encodes fixups and relocatable expressionsseparately.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class MCFixup {
/// The value to put into the fixup location. The exact interpretation of the
/// expression is target dependent, usually it will be one of the operands to
/// an instruction or an assembler directive.
const MCExpr *Value = nullptr;

/// The byte index of start of the relocation inside the MCFragment.
uint32_t Offset = 0;

/// The target dependent kind of fixup item this is. The kind is used to
/// determine how the operand value should be encoded into the instruction.
MCFixupKind Kind = FK_NONE;

/// The source location which gave rise to the fixup, if any.
SMLoc Loc;
};

LLVM encodes relocatable expressions as MCValue,

1
2
3
4
5
class MCValue {
const MCSymbol *SymA = nullptr, *SymB = nullptr;
int64_t Cst = 0;
uint32_t Specifier = 0;
};

with:

  • Specifier as an optional relocation specifier (namedRefKind before LLVM 21)
  • SymA as an optional symbol reference (addend)
  • SymB as an optional symbol reference (subtrahend)
  • Cst as a constant value

This mirrors the relocatable expression concept, butSpecifieraddedin 2014 for AArch64 as RefKind—remains rare amongtargets. (I've recently made some cleanup to some targets. For instance,I migrated PowerPC's @l and @ha folding to useSpecifier.)

AArch64 implements a clean approach to select the relocation type. Itdispatches on the fixup kind (an operand within a specific instructionformat), then refines it with the relocation specifier.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// AArch64ELFObjectWriter::getRelocType
unsigned Kind = Fixup.getTargetKind();
switch (Kind) {
// Handle generic MCFixupKind.
case FK_Data_1:
case FK_Data_2:
...

// Handle target-specific MCFixupKind.
case AArch64::fixup_aarch64_add_imm12:
if (RefKind == AArch64::S_DTPREL_HI12)
return R_CLS(TLSLD_ADD_DTPREL_HI12);
if (RefKind == AArch64::S_TPREL_HI12)
return R_CLS(TLSLE_ADD_TPREL_HI12);
...
}

MCAssembler::evaluateFixup andELFObjectWriter::recordRelocation record a relocation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// MCAssembler::evaluateFixup
Evaluate `const MCExpr *Fixup::Value` to a relocatable expression.
Determine the fixup value. Adjust the value if FKF_IsPCRel.
If the relocatable expression is a constant, treat this fixup as resolved.

if (IsResolved && is_reloc_directive)
IsResolved = false;
Backend.applyFixup(...)



// applyFixup
if (...)
IsResolved = false;
if (!IsResolved) {
// For exposition I've inlined ELFObjectWriter::recordRelocation here.
// the function roughly maps to GNU Assembler's `md_apply_fix` and `tc_gen_reloc`,
Type = TargetObjectWriter->getRelocType(Ctx, Target, Fixup, IsPCRel)
Determine whether SymA can be converted to a section symbol.
Relocations.push_back(...)
}
// Write a value to the relocated location. When using relocations with explicit addends, the function is a no-op when `IsResolved` is true.

FKF_IsPCRel applies to fixups whose relocationoperations look like S - P + A, like branches andPC-relative operations, but not to GOT-related operations (e.g.,GDAT - P + A).

MCSymbolRefExpr issues

The expression structure follows a traditional object-orientedhierarchy:

1
2
3
4
5
6
7
8
MCExpr
MCConstantExpr: Value
MCSymbolRefExpr: VariantKind, Symbol
MCUnaryExpr: Op, Expr
MCBinaryExpr: Op, LHS, RHS
MCTargetExpr:
X86MCExpr: x86 register
MCSpecifierExpr: expression with a relocation specifier

MCSymbolRefExpr::VariantKind enums the relocationspecifier, but it's a poor fit:

  • Other expressions, like MCConstantExpr (e.g., PPC4@l) and MCBinaryExpr (e.g., PPC(a+1)@l), also need it.
  • Semantics blur when folding expressions with @, whichis unavoidable when @ can occur at any position within thefull expression.
  • The generic MCSymbolRefExpr lacks target-specifichooks, cluttering the interface with any target-specific logic.

Consider what happens with addition or subtraction:

1
2
3
MCBinaryExpr
LHS(MCSymbolRefExpr): VariantKind, SymA
RHS(MCSymbolRefExpr): SymB

Here, the specifier attaches only to the LHS, leaving the full resultuncovered. This awkward design demands workarounds.

  • Parsing a+4@got exposes clumsiness. AfterAsmParser::parseExpression processes a+4, itdetects @got and retrofits it ontoMCSymbolRefExpr(a), which feels hacked together.
  • PowerPC's @l @ha optimization needsPPCAsmParser::extractSpecifier andPPCAsmParser::applySpecifier to convert aMCSymbolRefExpr to a MCSpecifierExpr.

Worse, leaky abstractions that MCSymbolRefExpr isaccessed widely in backend code introduces another problem: whileMCBinaryExpr with a constant RHS mimicsMCSymbolRefExpr semantically, code often handles only thelatter.

MCFixupshould store MCValue instead of MCExpr

The const MCExpr *MCFixup::getValue() method feelsinconvenient and less elegant compared to GNU Assembler's unifiedfixup/relocatable expression for these reasons:

  • Relocation specifier can be encoded by every sub-expression in theMCExpr tree, rather than the fixup itself (or theinstruction, as in GNU Assembler). Supporting all ofa+4@got, a@got+4, (a+4)@got requires extensive hacks inLLVM MCParser.
  • evaluateAsRelocatable converts an MCExpr to an MCValuewithout updating the MCExpr itself. This leads to redundant evaluations,as MCAssembler::evaluateFixup is called multiple times,such as in MCAssembler::fixupNeedsRelaxation andMCAssembler::layout.

Storing a MCValue directly in MCFixup, or adding a relocationspecifier member, could eliminate the need for many target-specificMCTargetFixup classes that manage relocation specifiers.However, target-specific evaluation hooks would still be needed forspecifiers like PowerPC @l or RISC-V%lo().

Computing label differences will be simplified as we can utilizeSymA and SymB.

Our long-term goal is to encode the relocation specifier withinMCFixup. (https://github.com/llvm/llvm-project/issues/135592)

MCSymbolRefExpr::VariantKind as the legacy way to encoderelocations should be completely removed (probably in a distant futureas many cleanups are required).

AsmParser:expr@specifier

In LLVM's assembly parser library (LLVMMCParser), the parsing ofexpr@specifier was supported for all targets until Iupdated it to be anopt-in feature in March 2025.

AsmParser's @specifier parsing is suboptimal,necessitating lexer workarounds.

The @ symbol can appear after a symbol or an expression(via parseExpression) and may occur multiple times within asingle operand, making it challenging to validate and reject invalidcases.

In the GNU Assembler, COFF targets permit @ withinidentifier names, and MinGW supports constructs like.long ext24@secrel32. It appears that a recognized suffixis treated as a specifier, while an unrecognized suffix results in asymbol that includes the @.

The PowerPC AsmParser(llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp) parsesan operand and then calls PPCAsmParser::extractSpecifier toextract the optional @ specifier. When the @specifier is detected and removed, it generates aPPCMCExpr. This functionality is currently implemented for@l and @ha`,and it would be beneficial to extend this to include all specifiers.

AsmPrinter

In llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp,AsmPrinter::lowerConstant outlines how LLVM handles theemission of a global variable initializer. When processingConstantExpr elements, this function may generate datadirectives in the assembly code that involve differences betweensymbols.

One significant use case for this intricate code isclang++ -fexperimental-relative-c++-abi-vtables. Thisfeature produces a PC-relative relocation that points to either the PLT(Procedure Linkage Table) entry of a function or the function symboldirectly.

Compiling C++ with the Clang API

This post describes how to compile a single C++ source file to anobject file with the Clang API. Here is the code. It behaves like asimplified clang executable that handles -cand -S.

1
cat > main.cc <<eof
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
#include <clang/CodeGen/CodeGenAction.h> // EmitObjAction
#include <clang/Driver/Compilation.h>
#include <clang/Driver/Driver.h>
#include <clang/Frontend/CompilerInstance.h>
#include <clang/Frontend/FrontendOptions.h>
#include <llvm/Config/llvm-config.h> // LLVM_VERSION_MAJOR
#include <llvm/Support/TargetSelect.h> // LLVMInitialize*
#include <llvm/Support/VirtualFileSystem.h>

using namespace clang;

constexpr llvm::StringRef kTargetTriple = "x86_64-unknown-linux-gnu";

namespace {
struct DiagsSaver : DiagnosticConsumer {
std::string message;
llvm::raw_string_ostream os{message};

void HandleDiagnostic(DiagnosticsEngine::Level diagLevel, const Diagnostic &info) override {
DiagnosticConsumer::HandleDiagnostic(diagLevel, info);
const char *level;
switch (diagLevel) {
default:
return;
case DiagnosticsEngine::Note:
level = "note";
break;
case DiagnosticsEngine::Warning:
level = "warning";
break;
case DiagnosticsEngine::Error:
case DiagnosticsEngine::Fatal:
level = "error";
break;
}

llvm::SmallString<256> msg;
info.FormatDiagnostic(msg);
auto &sm = info.getSourceManager();
auto loc = info.getLocation();
auto fileLoc = sm.getFileLoc(loc);
os << sm.getFilename(fileLoc) << ':' << sm.getSpellingLineNumber(fileLoc)
<< ':' << sm.getSpellingColumnNumber(fileLoc) << ": " << level << ": "
<< msg << '\n';
if (loc.isMacroID()) {
loc = sm.getSpellingLoc(loc);
os << sm.getFilename(loc) << ':' << sm.getSpellingLineNumber(loc) << ':'
<< sm.getSpellingColumnNumber(loc) << ": note: expanded from macro\n";
}
}
};
}

static std::pair<bool, std::string> compile(int argc, char *argv[]) {
auto fs = llvm::vfs::getRealFileSystem();
DiagsSaver dc;
std::vector<const char *> args{"clang"};
args.insert(args.end(), argv + 1, argv + argc);
auto diags = CompilerInstance::createDiagnostics(
#if LLVM_VERSION_MAJOR >= 20
*fs,
#endif
new DiagnosticOptions, &dc, false);
driver::Driver d(args[0], kTargetTriple, *diags, "cc", fs);
d.setCheckInputsExist(false);
std::unique_ptr<driver::Compilation> comp(d.BuildCompilation(args));
const auto &jobs = comp->getJobs();
if (jobs.size() != 1)
return {false, "only support one job"};
const llvm::opt::ArgStringList &ccArgs = jobs.begin()->getArguments();

auto invoc = std::make_unique<CompilerInvocation>();
CompilerInvocation::CreateFromArgs(*invoc, ccArgs, *diags);
auto ci = std::make_unique<CompilerInstance>();
ci->setInvocation(std::move(invoc));
ci->createDiagnostics(*fs, &dc, false);
// Disable CompilerInstance::printDiagnosticStats, which might display "2 warnings generated."
ci->getDiagnostics().getDiagnosticOptions().ShowCarets = false;
ci->createFileManager(fs);
ci->createSourceManager(ci->getFileManager());

// Clang calls BuryPointer on the internal AST and CodeGen-related elements like TargetMachine.
// This will cause memory leaks if `compile` is executed many times.
ci->getCodeGenOpts().DisableFree = false;
ci->getFrontendOpts().DisableFree = false;

LLVMInitializeX86AsmParser();
LLVMInitializeX86AsmPrinter();
LLVMInitializeX86Target();
LLVMInitializeX86TargetInfo();
LLVMInitializeX86TargetMC();

switch (ci->getFrontendOpts().ProgramAction) {
case frontend::ActionKind::EmitObj: {
EmitObjAction action;
ci->ExecuteAction(action);
} break;
case frontend::ActionKind::EmitAssembly: {
EmitAssemblyAction action;
ci->ExecuteAction(action);
} break;
default:
return {false, "unhandled action"};
}
return {true, std::move(dc.message)};
}

int main(int argc, char *argv[]) {
auto [ok, err] = compile(argc, argv);
llvm::errs() << err;
}
1
eof

Building the code with CMake

Let's write a CMakeLists.txt that links against theneeded Clang and LLVM libraries.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
cat > CMakeLists.txt <<eof
project(cc)
cmake_minimum_required(VERSION 3.16)
find_package(LLVM REQUIRED CONFIG)
find_package(Clang REQUIRED CONFIG)

include_directories(${LLVM_INCLUDE_DIRS} ${CLANG_INCLUDE_DIRS})
add_executable(cc main.cc)

if(NOT LLVM_ENABLE_RTTI)
target_compile_options(cc PRIVATE -fno-rtti)
endif()

if(CLANG_LINK_CLANG_DYLIB)
target_link_libraries(cc PRIVATE clang-cpp)
else()
target_link_libraries(cc PRIVATE
clangAST
clangBasic
clangCodeGen
clangDriver
clangFrontend
clangLex
clangParse
clangSema
)
endif()

if(LLVM_LINK_LLVM_DYLIB)
target_link_libraries(cc PRIVATE LLVM)
else()
target_link_libraries(cc PRIVATE LLVMOption LLVMSupport LLVMTarget
LLVMX86AsmParser LLVMX86CodeGen LLVMX86Desc LLVMX86Info)
endif()
eof

We need an LLVM and Clang installation that provides bothlib/cmake/llvm/LLVMConfig.cmake andlib/cmake/clang/ClangConfig.cmake. You can grab these fromsystem packages (dev versions may be required) or build LLVMyourself-I'll skip the detailed steps here. For a DIY build, use:

1
2
3
# cmake ... -DLLVM_ENABLE_PROJECTS='clang'

ninja -C out/stable clang-cmake-exports clang

No install step is needed. Next, create a builddirectory with the CMake configuration above:

1
2
cmake -S. -Bout/debug -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_COMPILER=$HOME/Stable/bin/clang++ -DCMAKE_PREFIX_PATH="$HOME/llvm/out/stable"
ninja -C out/debug

I've set a prebuilt Clang as CMAKE_CXX_COMPILER-just ahabit of mine. llvm-project isn't guaranteed to build warning-free withGCC, since GCC -Wall -Wextra has many false positives andLLVM developers avoid cluttering the codebase.

1
2
3
4
5
6
7
8
9
% echo 'void f() {}' > a.cc
% out/debug/cc -S a.cc && head -n 5 a.s
.file "a.cc"
.text
.globl _Z1fv # -- Begin function _Z1fv
.p2align 4
.type _Z1fv,@function
% out/debug/cc -c a.cc && ls a.o
a.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

Anonymous files

The input source file and the output ELF file are stored in thefilesystem. We could create a temporary file and delete it with a RAIIclass llvm::FileRemover:

1
2
3
std::error_code ec = llvm::sys::fs::createTemporaryFile("clang", "cc", fdIn, tempPath);
llvm::raw_fd_stream osIn(fdIn, /*ShouldClose=*/true);
llvm::FileRemover remover(tempPath);

On Linux, we could utilzie memfd_create to create a filein RAM with a volatile backing storage.

1
2
3
4
5
6
7
8
9
10
11
12
13
int fdIn = memfd_create("input", MFD_CLOEXEC);
if (fdIn < 0)
return {"", "failed to create input memfd"};
int fdOut = memfd_create("output", MFD_CLOEXEC);
if (fdOut < 0) {
close(fdIn);
return {"", "failed to create output memfd"};
}

std::string pathIn = "/proc/self/fd/" + std::to_string(fdIn);
std::string pathOut = "/proc/self/fd/" + std::to_string(fdOut);

// clang -c -xc++ /proc/self/fd/3 -o /proc/self/fd/4

LLVMInitialize*

To generate x86 code, we need a few LLVM X86 libraries defined byllvm/lib/Target/X86/**/CMakeLists.txt files.

1
2
3
4
LLVMInitializeX86AsmPrinter();
LLVMInitializeX86Target();
LLVMInitializeX86TargetInfo();
LLVMInitializeX86TargetMC();

If inline assembly is used, we will also need the AsmParserlibrary:

1
LLVMInitializeX86AsmParser();

We could also call LLVMInitializeAll* functions instead,which initialize all supported targets (build-timeLLVM_TARGETS_TO_BUILD).

Here are some notes about the LLVMX86 libraries:

  • LLVMX86Info: llvm/lib/Target/X86/TargetInfo/
  • LLVMX86Desc: llvm/lib/Target/X86/MCTargetDesc/ (dependson LLVMX86Info)
  • LLVMX86AsmParser: llvm/lib/Target/X86/AsmParser(depends on LLVMX86Info and LLVMX86Desc)
  • LLVMX86CodeGen: llvm/lib/Target/X86/ (depends onLLVMX86Info and LLVMX86Desc)

EmitAssembly andEmitObj

The code supports two frontend actions, EmitAssembly(-S) and EmitObj (-c).

You could also utilize the API inclang/include/clang/FrontendTool/Utils.h, but that wouldpull in another library clangFrontendTool (different fromclangFrontend).

Diagnostics

The diagnostics system is quite complex. We haveDiagnosticConsumer, DiagnosticsEngine, andDiagnosticOptions.

1
2
3
4
5
6
DiagnosticsEngine
├─ DiagnosticIDs (defines diagnostics)
├─ SourceManager (provides locations)
├─ DiagnosticOptions (configures output)
└─ DiagnosticConsumer (handles output)
└─ Diagnostic (individual message)

We define a simple DiagnosticConsumer that handlesnotes, warnings, errors, and fatal errors. When macro expansion comesinto play, we report two key locations:

  • The physical location (fileLoc), where the expandedtoken triggers an issue-matching Clang's error line, and
  • The spelling location within the macro's replacement list(sm.getSpellingLoc(loc)).

Although Clang also highlights intermediate locations for chainedexpansions, our simple approach offers a solid approximation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
% cat a.h
#define FOO(x) x + 1
% cat a.cc
#include "a.h"
#define BAR FOO
void f() {
int y = BAR("abc");
}
% out/debug/cc -c -Wall a.cc
a.cc:4:11: warning: adding 'int' to a string does not append to the string
./a.h:1:18: note: expanded from macro
a.cc:4:11: note: use array indexing to silence this warning
./a.h:1:18: note: expanded from macro
a.cc:4:7: error: cannot initialize a variable of type 'int' with an rvalue of type 'const char *'
% clang -c -Wall a.cc
a.cc:4:11: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int]
4 | int y = BAR("abc");
| ^~~~~~~~~~
a.cc:2:13: note: expanded from macro 'BAR'
2 | #define BAR FOO
| ^
./a.h:1:18: note: expanded from macro 'FOO'
1 | #define FOO(x) x + 1
| ~~^~~
a.cc:4:11: note: use array indexing to silence this warning
a.cc:2:13: note: expanded from macro 'BAR'
2 | #define BAR FOO
| ^
./a.h:1:18: note: expanded from macro 'FOO'
1 | #define FOO(x) x + 1
| ^
a.cc:4:7: error: cannot initialize a variable of type 'int' with an rvalue of type 'const char *'
4 | int y = BAR("abc");
| ^ ~~~~~~~~~~
1 warning and 1 error generated.

We call a convenience functionCompilerInstance::ExecuteAction, which wraps lower-levelAPI like BeginSource, Execute, andEndSource. However, it will print1 warning and 1 error generated. unless we setShowCarets to false.

clang::createInvocation

clang::createInvocation, renamed from createInvocationFromCommandLinein 2022, combines clang::Driver::BuildCompilation andclang::CompilerInvocation::CreateFromArgs. While it saves afew lines for certain tasks, it lacks the flexibility we need for ourspecific use cases.

Migrating comments to giscus

Followed this guide: https://www.patrickthurmond.com/blog/2023/12/11/commenting-is-available-now-thanks-to-giscus

Add the following to layout/_partial/article.ejs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<% if (!index && post.comments) { %>
<section class="giscus"></section>
<script src="https://giscus.app/client.js"
data-repo="MaskRay/maskray.me"
data-repo-id="FILL IT UP"
data-category="Blog Post Comments"
data-category-id="FILL IT UP"
data-mapping="pathname"
data-strict="0"
data-reactions-enabled="1"
data-emit-metadata="0"
data-input-position="bottom"
data-theme="preferred_color_scheme"
data-lang="en"
data-loading="lazy"
crossorigin="anonymous"
async>
</script>
<% } %>

Unfortunately comments from Disqus have not been migrated yet. Ifyou've left comments in the past, thank you. Apologies they are nowgone.

While you can create Github Discussions via GraphQL API, I haven'tfound a solution that works out of the box. https://www.davidangulo.xyz/posts/dirty-ruby-script-to-migrate-comments-from-disqus-to-giscus/provides a Ruby solution, which is promising but no longer works.

1
2
3
4
5
6
7
8
9
Failed to define value method for :name, because EnterpriseOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value me
thod generation.
Failed to define value method for :name, because EnvironmentOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value m
ethod generation.
Failed to define value method for :name, because LabelOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value method
generation.
...
.local/share/gem/ruby/3.3.0/gems/graphql-client-0.25.0/lib/graphql/client.rb:338:in `query': wrong number of arguments (given 2, expected 1) (ArgumentError)
from g.rb:42:in `create_discussion'

lld 20 ELF changes

LLVM 20 will be released. As usual, I maintain lld/ELF and have addedsome notes to https://github.com/llvm/llvm-project/blob/release/20.x/lld/docs/ReleaseNotes.rst.I've meticulously reviewed nearly all the patches that are not authoredby me. I'll delve into some of the key changes.

  • -z nosectionheader has been implemented to omit thesection header table. The operation is similar tollvm-objcopy --strip-sections. (#101286)
  • --randomize-section-padding=<seed> is introducedto insert random padding between input sections and at the start of eachsegment. This can be used to control measurement bias in A/Bexperiments. (#117653)
  • The reproduce tarball created with --reproduce= nowexcludes directories specified in the --dependency-fileargument (used by Ninja). This resolves an error where non-existentdirectories could cause issues when invokingld.lld @response.txt.
  • --symbol-ordering-file= and call graph profile can nowbe used together.
  • When --call-graph-ordering-file= is specified,.llvm.call-graph-profile sections in relocatable files areno longer used.
  • --lto-basic-block-sections=labels is deprecated infavor of --lto-basic-block-address-map. (#110697)
  • In non-relocatable links, a .note.GNU-stack sectionwith the SHF_EXECINSTR flag is now rejected unless-z execstack is specified. (#124068)
  • In relocatable links, the sh_entsize member of aSHF_MERGE section with relocations is now respected in theoutput.
  • Quoted names can now be used in output section phdr, memory regionnames, OVERLAY, the LHS of --defsym, andINSERT AFTER.
  • Section CLASS linker script syntax binds input sectionsto named classes, which are referenced later one or more times. Thisprovides access to the automatic spilling mechanism of--enable-non-contiguous-regions without globally changingthe semantics of section matching. It also independently increases theexpressive power of linker scripts. (#95323)
  • INCLUDE cycle detection has been fixed. A linker scriptcan now be included twice.
  • The archivename: syntax when matching input sections isnow supported. (#119293)
  • To support Arm v6-M, short thunks using B.w are no longer generated.(#118111)
  • For AArch64, BTI-aware long branch thunks can now be created to adestination function without a BTI instruction. (#108989) (#116402)
  • Relocations related to GOT and TLSDESC for the AArch64 PointerAuthentication ABI are now supported.
  • Supported relocation types for x86-64 target:
    • R_X86_64_CODE_4_GOTPCRELX (#109783) (#116737)
    • R_X86_64_CODE_4_GOTTPOFF (#116634)
    • R_X86_64_CODE_4_GOTPC32_TLSDESC (#116909)
    • R_X86_64_CODE_6_GOTTPOFF (#117675)
  • Supported relocation types for LoongArch target:R_LARCH_TLS_{LD,GD,DESC}_PCREL20_S2. (#100105)

Linker scripts

The CLASS keyword, which separates section matching andreferring, is a noteworthy new feature to the linker script support.Here is the GNU ld featurerequest.

Section layout

If --symbol-ordering-file= is specified,--symbol-ordering-file= specified sections are placedfirst. In LLD 20, SHT_LLVM_CALL_GRAPH_PROFILE sections inrelocatable files are still used for other sections.

The next release will support options--bp-compression-sort=both and--bp-startup-sort=function --irpgo-profile=a.profdata thatimproves Lempel-Ziv compression and reduces page faults during programstartup for mobile applications.

.dynsym computation

The purpose of Symbol::includeInDynsym was somewhatambiguous, as it was used both to determine if a symbol should beexported to .dynsym and to conservatively suppresstransformations in other contexts like MarkLive and ICF. LLD 20clarifies this by introducing Symbol::isExportedspecifically for indicating whether a defined symbol should be exported.All previous uses of Symbol::includeInDynsym have beenupdated to use Symbol::isExported instead. The oldconfusing Symbol::exportDynamic has been removed.

A special case within Symbol::includeInDynsym checkedfor isUndefWeak() && ctx.arg.noDynamicLinker. (Thiscould be generalized toisUndefined() && ctx.arg.noDynamicLinker, asnon-weak undefined symbols led to errors. Nonetheless,noDynamicLinker has been removed to improve consistency.)This condition ensures that undefined symbols are not included in.dynsym for statically linked ET_DYNexecutables (created with clang -static-pie).

This condition has been generalized in LLD 20 to(ctx.arg.shared || !ctx.sharedFiles.empty()) && (sym->isUndefined() || sym->isExported).This means undefined symbols are excluded from .dynsym inboth ld.lld -pie a.o andld.lld -pie --no-dynamic-linker a.o, but notld.lld -pie a.o b.so. This change brings LLD's behaviormore in line with GNU ld.

Symbol::isPreemptible, indicating whether a symbol couldbe bound to another component, was calculated before relocation scanningand, in LLD 19, also during Identical Code Folding (ICF). In LLD 20, theICF-related calculation has been moved to the symbol versioning parsingstage.

In LLD 20, isExported and isPreemptible arecomputed in the following passes.

  • Scan input files, interleaved with symbol resolution: setisExported when defined or referenced by sharedobjects
  • Clear isExported if influenced by--exclude-libs
  • parseVersionAndComputeIsPreemptible
    • Clear isExported if localized due to hiddenvisibility.
    • For undefined symbols, compute isPreemptible
    • For defined symbols in relocatable files, or bitcode files when!ltoCanOmit, set isExported and computeisPreemptible
  • compileBitcodeFiles
  • Scan LTO compiled relocatable files
  • Clear isExported if influenced by--exclude-libs
  • finalizeSections: recomputeisPreemptible
  • isPreemptible and isExported determinewhether a symbol should be exported to .dynsym.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for (Symbol *sym : ctx.symtab->getSymbols()) {
if (!sym->isUsedInRegularObj || !includeInSymtab(ctx, *sym))
continue;
if (!ctx.arg.relocatable)
sym->binding = sym->computeBinding(ctx);
if (ctx.in.symTab)
ctx.in.symTab->addSymbol(sym);

// computeBinding might localize a linker-synthesized hidden symbol
// that was considered exported.
if ((sym->isExported || sym->isPreemptible) && !sym->isLocal()) {
ctx.partitions[sym->partition - 1].dynSymTab->addSymbol(sym);
if (auto *file = dyn_cast<SharedFile>(sym->file))
if (file->isNeeded && !sym->isUndefined())
addVerneed(ctx, *sym);
}
}

Link: lld 19 ELFchanges

Natural loops

A dominator tree can beused to compute natural loops.

  • For every node H in a post-order traversal of thedominator tree (or the original CFG), find all predecessors that aredominated by H. This identifies all back edges.
  • Each back edge T->H identifies a natural loop withH as the header.
    • Perform a flood fill starting from T in the reverseddominator tree (from exiting block to header)
    • All visited nodes reachable from the root belong to the natural loopassociated with the back edge. These nodes are guaranteed to bereachable from H due to the dominator property.
    • Visited nodes unreachable from the root should be ignored.
    • Loops associated with visited nodes are considered subloops.

Here is an C++ implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
#include <cstdio>
#include <deque>
#include <numeric>
#include <vector>
using namespace std;

vector<vector<int>> e, ee, edom;
vector<int> dfn, dfn2, rdfn, uf, best, sdom, idom;
int tick;

void dfs(int u) {
dfn[u] = tick;
rdfn[tick++] = u;
for (int v : e[u])
if (dfn[v] < 0) {
uf[v] = u;
dfs(v);
}
}

int eval(int v, int cur) {
if (dfn[v] <= cur)
return v;
int u = uf[v], r = eval(u, cur);
if (dfn[best[u]] < dfn[best[v]])
best[v] = best[u];
return uf[v] = r;
}

void semiNca(int n, int r) {
idom.assign(n, -1);
dfn.assign(n, -1);
rdfn.resize(n); // initial values are unused
uf.resize(n); // initial values are unused
sdom.resize(n); // initial values are unused
tick = 0;
dfs(r);
best.resize(n);
iota(best.begin(), best.end(), 0);
for (int i = tick; --i; ) {
int v = rdfn[i];
sdom[v] = v;
for (int u : ee[v])
if (~dfn[u]) {
eval(u, i);
if (dfn[best[u]] < dfn[sdom[v]])
sdom[v] = best[u];
}
best[v] = sdom[v];
idom[v] = uf[v];
}
edom.assign(n, vector<int>());
for (int i = 1; i < tick; i++) {
int v = rdfn[i];
while (dfn[idom[v]] > dfn[sdom[v]])
idom[v] = idom[idom[v]];
edom[idom[v]].push_back(v);
}
}

struct Loop {
int idx, header;
Loop *parent = nullptr, *child = nullptr, *next = nullptr;
vector<int> nodes;
};
deque<Loop> loops;

void postorder(int u) {
dfn[u] = tick;
for (int v : edom[u])
if (dfn[v] < 0)
postorder(v);
rdfn[tick++] = u;
dfn2[u] = tick;
}

void identifyLoops(int n, int r) {
vector<int> worklist;
vector<Loop *> to_loop(n);
dfn.assign(n, -1);
dfn2.assign(n, -1);
tick = 0;
postorder(r);
loops.clear();
for (int i = 0; i < tick; i++) {
int header = rdfn[i];
for (int u : ee[header])
if (dfn[header] <= dfn[u] && dfn2[u] <= dfn2[header])
worklist.push_back(u);
if (worklist.empty())
continue;
loops.push_back(Loop{(int)loops.size(), header});
Loop *lp = &loops.back();
while (worklist.size()) {
int v = worklist.back();
worklist.pop_back();
if (!to_loop[v]) {
if (dfn[v] < 0) // Skip unreachable node
continue;
// Find a node not in a loop.
to_loop[v] = lp;
lp->nodes.push_back(v);
if (v == header)
continue;
for (int u : ee[v])
worklist.push_back(u);
} else {
// Find a subloop.
Loop *sub = to_loop[v];
while (sub->parent)
sub = sub->parent;
if (sub == lp)
continue;
sub->parent = lp;
sub->next = lp->child;
lp->child = sub;
for (int u : ee[sub->header])
if (to_loop[u] != sub)
worklist.push_back(u);
}
}
}
}

int main() {
int n, m;
scanf("%d%d", &n, &m);
e.resize(n);
ee.resize(n);
for (int i = 0; i < m; i++) {
int u, v;
scanf("%d%d", &u, &v);
e[u].push_back(v);
ee[v].push_back(u);
}
semiNca(n, 0);
for (int i = 0; i < n; i++)
printf("%d: %d\n", i, idom[i]);

identifyLoops(n, 0);
for (Loop &lp : loops) {
printf("loop %d:", lp.idx);
for (int v : lp.nodes)
printf(" %d", v);
for (Loop *c = lp.child; c; c = c->next)
printf(" (loop %d)", c->idx);
puts("");
}
}

The code iterates over the dominator tree in post-order.Alternatively, a post-order traversal of the original control flow graphcould be used.

worklist may contain duplicate elements. This isacceptable. You could also deduplicate elements.

Importantly, the header predecessor of a subloop can be anothersubloop.

In the final loops array, parent loops are listed aftertheir child loops.

This example examines multiple subtle details: a self-loop (node 6),an unreachable node (node 8), and a scenario where the headerpredecessor of one subloop (nodes 2 and 3) leads to another subloop(nodes 4 and 5).

1
2
3
4
5
6
7
8
9
10
11
12
13
9 12
0 1
1 2
1 7
2 3
2 4
3 2
8 3
4 5
4 6
5 4
6 1
6 6

Useawk 'BEGIN{print "digraph G{"} NR>1{print $1"->"$2} END{print "}"}'to generate a graphviz dot file.

Natural loops

A dominator tree can beused to compute natural loops.

  • For every node H in a post-order traversal of thedominator tree (or the original CFG), find all predecessors that aredominated by H. This identifies all back edges.
  • Each back edge T->H identifies a natural loop withH as the header.
    • Perform a flood fill starting from T in the reverseddominator tree (from exiting block to header)
    • All visited nodes reachable from the root belong to the natural loopassociated with the back edge. These nodes are guaranteed to bereachable from H due to the dominator property.
    • Visited nodes unreachable from the root should be ignored.
    • Loops associated with visited nodes are considered subloops.

Here is an C++ implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
#include <cstdio>
#include <deque>
#include <numeric>
#include <vector>
using namespace std;

vector<vector<int>> e, ee, edom;
vector<int> dfn, dfn2, rdfn, uf, best, sdom, idom;
int tick;

void dfs(int u) {
dfn[u] = tick;
rdfn[tick++] = u;
for (int v : e[u])
if (dfn[v] < 0) {
uf[v] = u;
dfs(v);
}
}

int eval(int v, int cur) {
if (dfn[v] <= cur)
return v;
int u = uf[v], r = eval(u, cur);
if (dfn[best[u]] < dfn[best[v]])
best[v] = best[u];
return uf[v] = r;
}

void semiNca(int n, int r) {
idom.assign(n, -1);
dfn.assign(n, -1);
rdfn.resize(n); // initial values are unused
uf.resize(n); // initial values are unused
sdom.resize(n); // initial values are unused
tick = 0;
dfs(r);
best.resize(n);
iota(best.begin(), best.end(), 0);
for (int i = tick; --i; ) {
int v = rdfn[i];
sdom[v] = v;
for (int u : ee[v])
if (~dfn[u]) {
eval(u, i);
if (dfn[best[u]] < dfn[sdom[v]])
sdom[v] = best[u];
}
best[v] = sdom[v];
idom[v] = uf[v];
}
edom.assign(n, vector<int>());
for (int i = 1; i < tick; i++) {
int v = rdfn[i];
while (dfn[idom[v]] > dfn[sdom[v]])
idom[v] = idom[idom[v]];
edom[idom[v]].push_back(v);
}
}

struct Loop {
int idx, header;
Loop *parent = nullptr, *child = nullptr, *next = nullptr;
vector<int> nodes;
};
deque<Loop> loops;

void postorder(int u) {
dfn[u] = tick;
for (int v : edom[u])
if (dfn[v] < 0)
postorder(v);
rdfn[tick++] = u;
dfn2[u] = tick;
}

void identifyLoops(int n, int r) {
vector<int> worklist;
vector<Loop *> to_loop(n);
dfn.assign(n, -1);
dfn2.assign(n, -1);
tick = 0;
postorder(r);
loops.clear();
for (int i = 0; i < tick; i++) {
int header = rdfn[i];
for (int u : ee[header])
if (dfn[header] <= dfn[u] && dfn2[u] <= dfn2[header])
worklist.push_back(u);
if (worklist.empty())
continue;
loops.push_back(Loop{(int)loops.size(), header});
Loop *lp = &loops.back();
while (worklist.size()) {
int v = worklist.back();
worklist.pop_back();
if (!to_loop[v]) {
if (dfn[v] < 0) // Skip unreachable node
continue;
// Find a node not in a loop.
to_loop[v] = lp;
lp->nodes.push_back(v);
if (v == header)
continue;
for (int u : ee[v])
worklist.push_back(u);
} else {
// Find a subloop.
Loop *sub = to_loop[v];
while (sub->parent)
sub = sub->parent;
if (sub == lp)
continue;
sub->parent = lp;
sub->next = lp->child;
lp->child = sub;
for (int u : ee[sub->header])
if (to_loop[u] != sub)
worklist.push_back(u);
}
}
}
}

int main() {
int n, m;
scanf("%d%d", &n, &m);
e.resize(n);
ee.resize(n);
for (int i = 0; i < m; i++) {
int u, v;
scanf("%d%d", &u, &v);
e[u].push_back(v);
ee[v].push_back(u);
}
semiNca(n, 0);
for (int i = 0; i < n; i++)
printf("%d: %d\n", i, idom[i]);

identifyLoops(n, 0);
for (Loop &lp : loops) {
printf("loop %d:", lp.idx);
for (int v : lp.nodes)
printf(" %d", v);
for (Loop *c = lp.child; c; c = c->next)
printf(" (loop %d)", c->idx);
puts("");
}
}

The code iterates over the dominator tree in post-order.Alternatively, a post-order traversal of the original control flow graphcould be used.

worklist may contain duplicate elements. This isacceptable. You could also deduplicate elements.

Importantly, the header predecessor of a subloop can be anothersubloop.

In the final loops array, parent loops are listed aftertheir child loops.

This example examines multiple subtle details: a self-loop (node 6),an unreachable node (node 8), and a scenario where the headerpredecessor of one subloop (nodes 2 and 3) leads to another subloop(nodes 4 and 5).

1
2
3
4
5
6
7
8
9
10
11
12
13
9 12
0 1
1 2
1 7
2 3
2 4
3 2
8 3
4 5
4 6
5 4
6 1
6 6

Useawk 'BEGIN{print "digraph G{"} NR>1{print $1"->"$2} END{print "}"}'to generate a graphviz dot file.

Understanding and improving Clang -ftime-report

Clang provides a few options to generate timing report. Among them,-ftime-report and -ftime-trace can be used toanalyze the performance of Clang's internal passes.

  • -fproc-stat-report records time and memory on spawnedprocesses (ld, and gas if-fno-integrated-as).
  • -ftime-trace, introduced in 2019, generates Clangtiming information in the Chrome Trace Event format (JSON). The formatsupports nested events, providing a rich view of the front end.
  • -ftime-report: The option name is borrowed fromGCC.

This post focuses on the traditional -ftime-report,which uses a line-based textual format.

Understanding-ftime-report output

The output consists of information about multiple timer groups. Thelast group spans the largest interval and encompasses timing data fromother groups.

Up to Clang 19, the last group is called "Clang front-end timereport". You would see something like the following.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
% clang -c -w -ftime-report ~/Dev/testsuite/sqlite3.i
...
===-------------------------------------------------------------------------===
Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2993 ( 71.5%) 0.1069 ( 93.5%) 0.4062 ( 76.3%) 0.4066 ( 76.2%) Code Generation Time
0.1190 ( 28.5%) 0.0074 ( 6.5%) 0.1264 ( 23.7%) 0.1270 ( 23.8%) LLVM IR Generation Time
0.4183 (100.0%) 0.1143 (100.0%) 0.5326 (100.0%) 0.5336 (100.0%) Total
...
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 0.7780 seconds (0.7788 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.6538 (100.0%) 0.1241 (100.0%) 0.7780 (100.0%) 0.7788 (100.0%) Clang front-end timer
0.6538 (100.0%) 0.1241 (100.0%) 0.7780 (100.0%) 0.7788 (100.0%) Total

The "Clang front-end timer" timer measured the time spent inclang::FrontendAction::Execute, which includes lexing,parsing, semantic analysis, LLVM IR generation, optimization, andmachine code generation. However, "Code Generation Time" and "LLVM IRGeneration Time" belonged to the default timer group "MiscellaneousUngrouped Timers". This caused confusion for many users. For example, https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/elaborates on the issues.

To address the ambiguity, I revamped the output in Clang 20.

1
2
3
4
5
6
7
8
9
10
11
12
...
===-------------------------------------------------------------------------===
Clang time report
===-------------------------------------------------------------------------===
Total Execution Time: 0.7685 seconds (0.7686 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2798 ( 42.4%) 0.0966 ( 89.6%) 0.3765 ( 49.0%) 0.3768 ( 49.0%) Machine code generation
0.2399 ( 36.3%) 0.0045 ( 4.2%) 0.2445 ( 31.8%) 0.2442 ( 31.8%) Front end
0.1179 ( 17.8%) 0.0067 ( 6.2%) 0.1246 ( 16.2%) 0.1246 ( 16.2%) LLVM IR generation
0.0230 ( 3.5%) 0.0000 ( 0.0%) 0.0230 ( 3.0%) 0.0230 ( 3.0%) Optimizer
0.6606 (100.0%) 0.1079 (100.0%) 0.7685 (100.0%) 0.7686 (100.0%) Total

The last group has been renamed and changed to cover a longerinterval within the invocation. It provides timing information for fourstages:

  • Front end: Includes lexing, parsing, semantic analysis, andmiscellnaenous tasks not captured by the subsequent timers.
  • LLVM IR generation: The time spent in generating LLVM IR.
  • LLVM IR optimization: The time consumed by LLVM's IR optimizationpipeline.
  • Machine code generation: The time taken to generate machine code orassembly from the optimized IR.

The -ftime-report output further elaborates on thesestages through additional groups:

  • "Pass execution timing report" (first instance): A subset of the"Optimizer" group, providing detailed timing for individual optimizationpasses.
  • "Analysis execution timing report": A subset of the first "Passexecution timing report". In LLVM's new pass manager, analyses areexecuted as part of pass invocations.
  • "Pass execution timing report" (second instance): A subset of the"Machine code generation" group. (This group's name should be updatedonce the legacy pass manager is no longer used for IRoptimization.)
  • "Instruction Selection and Scheduling": This group appears whenSelectionDAG is utilized and is part of the "Instruction Selection"timer within the second "Pass execution timing report".

Examples:

"Pass execution timing report" (first instance)

1
2
3
4
5
6
7
8
9
10
===-------------------------------------------------------------------------===
Pass execution timing report
===-------------------------------------------------------------------------===
Total Execution Time: 3.0009 seconds (3.0016 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.9626 ( 32.7%) 0.0162 ( 26.6%) 0.9788 ( 32.6%) 0.9790 ( 32.6%) InstCombinePass
0.3203 ( 10.9%) 0.0056 ( 9.2%) 0.3259 ( 10.9%) 0.3263 ( 10.9%) InlinerPass
0.3123 ( 10.6%) 0.0068 ( 11.1%) 0.3190 ( 10.6%) 0.3187 ( 10.6%) SimplifyCFGPass
...

When -ftime-report=per-run-pass is specified, a timer iscreated for each pass object. This can result in significant output,especially for modules with numerous functions, as each pass will bereported multiple times.

Clang internals

As clang -### -c -ftime-report shows, clangDriverforwards -ftime-report to Clang cc1. Within cc1, thisoption sets the codegen flagclang::CodeGenOptions::TimePasses. This flag enables ethuses of llvm::Timer objects to measure the execution timeof specific code blocks.

From Clang 20 onwards, the placement of the timers can be understoodthrough the following call tree.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cc1_main
ExecuteCompilerInvocation // "Front end" minus the following timers
... all kinds of initialization
CompilerInstance::ExecuteAction
FrontendAction::BeginSourceFile
FrontendAction::Execute
FrontendAction::ExecutionAction
ASTFrontendAction::ExecuteAction
ParseAST
BackendConsumer::HandleTranslationUnit
clang::emitBackendOutput
EmitAssemblyHelper::emitAssembly
RunOptimizationPipeline // "Optimizer"
RunCodegenPipeline // "Machine code generation"
FrontendAction::EndSourceFile

The measured interval does not cover the whole invocation. integratedcc1 clang -c -ftime-report a.c

LLVM internals

LLVM/lib/Support/Time.cpp implements the timer feature.Timer belongs to a TimerGroup.Timer::startTimer and Timer::stopTimergenerate a TimeRecord. Inclang/tools/driver/cc1_main.cpp,llvm::TimerGroup::printAll(llvm::errs()); dumps theseTimerGroup and TimeRecord information tostderr.

There are a few cl::opt options

  • sort-timers (default: true): sort the timers in a groupin descending wall time.
  • track-memory: record increments or decrements in mallocstatistics. In glibc 2.33 and above, this utilizesmallinfo2::unordblks.
  • info-output-file: dump output to the specifiedfile.

Examples:

1
2
clang -c -ftime-report -mllvm -sort-timers=0 a.c
clang -c -ftime-report -mllvm=-sort-timers=0 a.c

The cl::opt option -time-passes can be used with theLLVM internal tools opt and llc, e.g.

1
2
opt -S -passes='default<O2>' -time-passes < a.ll
llc -time-passes < a.ll

On Apple platforms, LLVM_SUPPORT_XCODE_SIGNPOSTS=onbuilds enableos_signpost forstartTimer/stopTimer.

The -ftime-report system has a significant limitation:it doesn't support nested timers. Although adding more timer groupsmight seem like a solution, the resulting output lacks any hierarchicalstructure, making it difficult to understand.

2024年总结

一如既往,主要在工具链领域耕耘。

Blogging

I have been busy creating posts, authoring a total of 31 blog posts(including this one). 7 posts resonated on Hacker News, garnering over50 points. (https://news.ycombinator.com/from?site=maskray.me).

I have also revised many posts initially written between 2020 and2024.

Mastodon: https://hachyderm.io/@meowray

GCC

I made 5 commits to the project, including the addition of the x86inline asm constraint "Ws". you can read more about that in my earlierpost Rawsymbol names in inline assembly.

I believe that modernizing code review and test infrastructure willenhance the contributor experience and attract more contributors.

llvm-project

  • Reviewednumerous patches. queryis:pr created:>2024-01-01 reviewed-by:MaskRay => "989Closed"
  • Official maintainer status on the MC layer and binary utilities
  • My involvement with LLVM 18 and 19

Key Points:

  • TODO
  • Added a script update_test_body.pyto generate elaborated IR and assembly tests (#89026)
  • MC
    • Made some MCand assembler improvements in LLVM 19
    • Fixed some intrusive changes to the generic code due to AIX andz/OS.
    • Made llvm-mc better as an assemblerand disassembler
  • Light ELF
    • Implementeda compact relocation format for ELF
  • AArch64mapping symbol size optimization
  • Enabled StackSafetyAnalysis for AddressSanitizer to removeinstrumentations on stack-allocated variables that are guaranteed to besafe from memory access bugs
    • Bail out if MemIntrinsic length is -1
    • Bail out when calling ifunc
  • Added the Clang cc1 option--output-asm-variant= and cleaned up internals of itsfriends (x86-asm-syntax).
  • llvm/ADT/Hashing.hstability

llvm/ADT/Hashing.h stability

To facilitate improvements, llvm/ADT/Hashing.h promisedto be non-deteriministic so that users could not depend on exact hashvalues. However, the values were actually deterministic unlessset_fixed_execution_hash_seed was called. A lot of internalcode incorrectly relied on the stability ofhash_value/hash_combine/hash_combine_range. I have fixedthem and landed https://github.com/llvm/llvm-project/pull/96282 to makethe hash value non-deteriministic inLLVM_ENABLE_ABI_BREAKING_CHECKS builds.

lld/ELF

lld/ELF is quite stable. I have made some maintenance changes. Asusual, I wrote the ELF port's release notes for the two releases. See lld 18 ELF changes and lld 19 ELF changes fordetail.

Linux kernel

Contributed 4 commits.

ccls

I finally removed support for LLVM 7, 8, and 9. The latest release https://github.com/MaskRay/ccls/releases/tag/0.20241108has some nice features.

  • didOpen: sort index requests. When you open A/B/foo.cc, files under"A/B/" and "A/" will be prioritized during the initial indexing process,leading to a quicker response time.
  • Support for older these LLVM versions 7, 8, and 9 has beendropped.
  • LSP semantic tokens are now supported. See usage guidehttps://maskray.me/blog/2024-10-20-ccls-and-lsp-semantic-tokens usage(including rainbow semantic highlighting)
  • textDocument/switchSourceHeader (LSP extension) is nowsupported.

Misc

Reported 12 feature requests or bugs to binutils.

  • objdump -R: dump SHT_RELR relocations?
  • gas arm aarch64: missing mapping symbols $d in the absence of alignment directives
  • gas: Extend .loc directive to emit a label
  • Compressed .strtab and .symtab
  • gas: Support \+ in .rept/.irp/.irpc directives
  • ld: Add CLASS to allow separate section matching and referring
  • gas/ld: Implicit addends for non-code sections
  • binutils: Support CREL relocation format
  • ld arm: global/weak non-hidden symbols referenced by R_ARM_FUNCDESC are unnecessarily exported
  • ld arm: fdpic link segfaults on R_ARM_GOTOFFFUNCDESC referencing a hidden symbol
  • ld arm: fdpic link may have null pointer dereference in allocate_dynrelocs_for_symbol
  • objcopy: add --prefix-symbols-remove

Reported 2 feature requests to glibc

  • Feature request: special static-pie capable of loading the interpreter from a relative path
  • rtld: Support DT_CREL relocation format

Skipping boring functions in debuggers

In debuggers, stepping into a function with arguments that involvefunction calls may step into the nested function calls, even if they aresimple and uninteresting, such as those found in the C++ STL.

GDB

Consider the following example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <cstdio>
#include <memory>
#include <vector>
using namespace std;

void foo(int i, int j) {
printf("%d %d\n", i, j);
}

int main() {
auto i = make_unique<int>(3);
vector v{1,2};
foo(*i, v.back()); // step into
}

When GDB stops at the foo call, the step(s) command will step into std::vector::backand std::unique_ptr::operator*. While you can executefinish (fin) and then execute sagain, it's time-consuming and distracting, especially when dealing withcomplex argument expressions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
% g++ -g a.cc -o a
% gdb ./a
...
(gdb) s
std::vector<int, std::allocator<int> >::back (this=0x7fffffffddd0) at /usr/include/c++/14.2.1/bits/stl_vector.h:1235
1235 back() _GLIBCXX_NOEXCEPT
(gdb) fin
Run till exit from #0 std::vector<int, std::allocator<int> >::back (this=0x7fffffffddd0) at /usr/include/c++/14.2.1/bits/stl_vector.h:1235
0x00005555555566f8 in main () at a.cc:13
13 foo(*i, v.back());
Value returned is $1 = (__gnu_cxx::__alloc_traits<std::allocator<int>, int>::value_type &) @0x55555556c2d4: 2
(gdb) s
std::unique_ptr<int, std::default_delete<int> >::operator* (this=0x7fffffffddc0) at /usr/include/c++/14.2.1/bits/unique_ptr.h:447
447 __glibcxx_assert(get() != pointer());
(gdb) fin
Run till exit from #0 std::unique_ptr<int, std::default_delete<int> >::operator* (this=0x7fffffffddc0) at /usr/include/c++/14.2.1/bits/unique_ptr.h:447
0x0000555555556706 in main () at a.cc:13
13 foo(*i, v.back());
Value returned is $2 = (int &) @0x55555556c2b0: 3
(gdb) s
foo (i=3, j=2) at a.cc:7
7 printf("%d %d\n", i, j);

This problem was tracked as a feature request in 2003: https://sourceware.org/bugzilla/show_bug.cgi?id=8287.Fortunately, GDB provides the skipcommand to skip functions that match a regex or filenames that matcha glob (GDB 7.12 feature). You can skip all demangled function namesthat start with std::.

1
skip -rfu ^std::

Alternatively, you can executeskip -gfi /usr/include/c++/*/bits/* to skip these libstdc++files.

Important note:

The skip command's file matching behavior uses thefnmatch function with the FNM_FILE_NAMEflag. This means the wildcard character (*) won't matchslashes. So, skip -gfi /usr/* won't exclude/usr/include/c++/14.2.1/bits/stl_vector.h.

I proposed to dropthe FNM_FILE_NAME flag. With GDB 17, I will be able toskip a project directory with

1
skip -gfi */include/llvm/ADT/*

instead of

1
skip -gfi /home/ray/llvm/llvm/include/llvm/ADT/*

User functionscalled by skipped functions

When a function (let's call it "A") is skipped during debugging, anyuser-defined functions that are called by "A" will also be skipped.

For example, consider the following code snippet:

1
2
3
std::vector<int> a{1, 2};
if (std::all_of(a.begin(), a.end(), predicate)) {
}

If std::all_of is skipped due to a skipcommand, predicate called within std::all_ofwill also be skipped when you execute s at the ifstatement.

LLDB

By default, LLDB avoids stepping into functions whose names startwith std:: when you use the s(step, thread step-in) command. This behavioris controlled by a setting:

1
2
3
4
(lldb) settings show target.process.thread.step-avoid-regexp
target.process.thread.step-avoid-regexp (regex) = ^std::
(lldb) set sh target.process.thread.step-avoid-libraries
target.process.thread.step-avoid-libraries (file-list) =

target.process.thread.step-avoid-libraries can be usedto skip functions defined in a library.

While the command settings set is long, you can shortenit to set set.

Visual Studio

Visual Studio provides a debugging feature JustMy Code that automatically steps over calls to system,framework, and other non-user code.

It also supports a Step Into Specific command, whichseems interesting.

The implementation inserts a call to__CheckForDebuggerJustMyCode at the start of every userfunction. The function(void __CheckForDebuggerJustMyCode(const char *flag)) takesa global variable defined in the .msvcjmc section anddetermines whether the debugger should stop.

This LLDB feature request has a nice description: https://github.com/llvm/llvm-project/issues/61152.

For the all_of example, the feature can possibly allowthe debugger to stop at test.

1
2
3
std::vector<int> a{1, 2};
if (std::all_of(a.begin(), a.end(), test)) {
}

Fuchsia zxdb

The Fuchsia debugger "zxdb" provides a command "ss"similar to Visual Studio's "Step Into Specific".

1
2
3
4
5
6
7
8
[zxdb] ss
1 std::string::string
2 MyClass::MyClass
3 HelperFunctionCall
4 MyClass::~MyClass
5 std::string::~string
quit
>

Exporting Tweets

On https://x.com/settings/, clickMore -> Settings and privacy -> Download an archive of your data.Wait for a message from x.com: "@XXX your X data is ready" Download thearchive.

1
cp data/tweets.js tweets.ts

Change the first line from window.YTD.tweets.part0 = [to let part0 = [, and append

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import { unescape } from "@std/html/entities";

let out = part0.map(tw => [new Date(tw.tweet.created_at), tw.tweet.full_text])
out.sort((a,b) => a[0] - b[0])

let yy0 = 0, mm0 = 0, str = ''
for (let i=0, j=0; i<=out.length; i++) {
let d = i<out.length ? out[i][0] : new Date('9999-12-31')
let yy = d.getYear()+1900, mm = d.getMonth()+1
if (yy0 != yy) {
if (str.length) {
try {
Deno.mkdirSync(String(yy0))
} catch (e) {
}
Deno.writeTextFileSync(`${yy0}/index.md`, str)
}
yy0 = yy
mm0 = 0
str = `# ${yy0}\n`
if (i == out.length) break
}
if (mm0 != mm) {
str += `\n## ${yy}-${String(mm).padStart(2,'0')}\n`
mm0 = mm
}
str += `\n${unescape(out[i][1]).replace(/(http(s)?:[-/.\w]+)/, "<$1>")}\n`
}

Then run deno run --allow-write=. tweets.ts

1
2
3
4
5
6
7
8
9
10
11
12
% cat 2022/index.md
# 2022

## 2022-01

tweet0

tweet1

## 2022-02

...

tweet0

tweet1

Simplifying disassembly with LLVM tools

Both compiler developers and security researchers have builtdisassemblers. They often prioritize different aspects. Compilertoolchains, benefiting from direct contributions from CPU vendors, tendto offer more accurate and robust decoding. Security-focused tools, onthe other hand, often excel in user interface design.

For quick disassembly tasks, rizinprovides a convenient command-line interface.

1
2
3
% rz-asm -a x86 -b 64 -d 4829c390
sub rbx, rax
nop

-a x86 can be omitted.

llvm-mc

Within the LLVM ecosystem, llvm-objdump serves as a drop-inreplacement for the traditional GNU objdump, leveraging instructioninformation from LLVM's TableGen files(llvm/lib/Target/*/*.td). Another LLVM tool, llvm-mc, wasoriginally designed for internal testing of the Machine Code (MC) layer,particularly the assembler and disassembler components. There arenumerous RUN: llvm-mc ... tests withinllvm/test/MC. Despite its internal origins, llvm-mc isoften distributed as part of the LLVM toolset, making it accessible tousers.

However, using llvm-mc for simple disassembly tasks can becumbersome. It requires explicitly prefixing hexadecimal byte valueswith 0x:

1
2
3
4
% echo 0x48 0x29 0xc3 0x90 | llvm-mc --triple=x86_64 --cdis --output-asm-variant=1
.text
sub rbx, rax
nop

Let's break down the options used in this command:

  • --triple=x86_64: This specifies the targetarchitecture. If your LLVM build's default target triple is alreadyx86_64-*-*, this option can be omitted.
  • --output-asm-variant=1:LLVM, like GCC, defaults to AT&T syntax for x86 assembly. Thisoption switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntaxif you prefer the Intel syntax in compiler toolchains.
  • --cdis: Introduced in LLVM 18, this option enablescolored disassembly. In older LLVM versions, you have to use--disassemble.

I have contributed patches to remove.text and allow disassemblingraw bytes without the 0x prefix. You can now use the--hex option:

1
2
3
% echo 4829c390 | llvm-mc --cdis --hex --output-asm-variant=1
sub rbx, rax
nop

You can further simplify this by creating a bash/zsh function. bashand zsh's "here string" feature provides a clean way to specifystdin.

1
2
3
disasm() {
llvm-mc --cdis --hex --output-asm-variant=1 <<< $@
}
1
2
3
4
5
6
% disasm 4829c390
sub rbx, rax
nop
% disasm $'4829 c3\n# comment\n90'
sub rbx, rax
nop

The --hex option conveniently ignores whitespace and#-style comments within the input.

Atomic blocks

llvm-mc handles decoding failures by skipping a number of bytes, asdetermined by the target-specificllvm::MCDisassembler::getInstruction. To treat a sequenceof bytes as a single unit during disassembly, enclose them within[].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
% echo 'f995ab99f995 ab99' | fllvm-mc --triple=riscv64 --cdis --hex
<stdin>:1:1: warning: invalid instruction encoding
f995ab99f995 ab99
^
<stdin>:1:5: warning: invalid instruction encoding
f995ab99f995 ab99
^
<stdin>:1:14: warning: invalid instruction encoding
f995ab99f995 ab99
^
<stdin>:1:16: warning: invalid instruction encoding
f995ab99f995 ab99
^
% echo '[f995ab99][f995 ab99]' | fllvm-mc --triple=riscv64 --cdis --hex
<stdin>:1:2: warning: invalid instruction encoding
[f995ab99][f995 ab99]
^
<stdin>:1:12: warning: invalid instruction encoding
[f995ab99][f995 ab99]
^

llvm-mc can also function as an assembler:

1
2
% echo 'li t3, 42' | llvm-mc -show-encoding --triple=riscv64
li t3, 42 # encoding: [0x13,0x0e,0xa0,0x02]

(I've contributed a change to LLVM 20 that removesthe previously printed .text directive.)

llvm-objdump

For address information, llvm-mc falls short. We need to turn tollvm-objdump to get that detail. Here is a little fish script that takesraw hex bytes as input, converts them to a binary format(xxd -r -p), and then creates an ELF relocatable file(llvm-objcopy -I binary) targeting the x86-64 architecture.Finally, llvm-objdump with the -D flag disassembles thedata section (.data) containing the converted binary.

1
2
#!/usr/bin/env fish
llvm-objdump -D -j .data (echo $argv | xxd -r -p | llvm-objcopy -I binary -O elf64-x86-64 - - | psub) | sed '1,/<_binary__stdin__start>:/d'

Here is a more feature-rich script that supports multiplearchitectures:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/env fish
argparse a/arch= att r -- $argv; or return 1
if test -z "$_flag_arch"; set _flag_arch x86_64; end
set opt --triple=$_flag_arch
if test -z "$_flag_att" && string match -rq 'i.86|x86_64' $_flag_arch; set -a opt -M intel; end
if test -n "$_flag_r"; set -a opt --no-leading-addr; set -a opt --no-show-raw-insn; end

switch $_flag_arch
case arm; set bfdname elf32-littlearm
case aarch64; set bfdname elf64-littleaarch64
case ppc32; set bfdname elf32-powerpc
case ppc32le; set bfdname elf32-powerpcle
case ppc64; set bfdname elf64-powerpc
case ppc64le; set bfdname elf64-powerpcle
case riscv32; set bfdname elf32-littleriscv
case riscv64; set bfdname elf64-littleriscv
case 'i?86'; set bfdname elf32-i386
case x86_64; set bfdname elf64-x86-64
case '*'; echo unknown arch >&2; return 1
end
llvm-objdump -D -j .data $opt (echo $argv | xxd -r -p | llvm-objcopy -I binary -O $bfdname - - | psub) | sed '1,/<_binary__stdin__start>:/d'
1
2
3
4
5
6
7
8
9
10
11
12
% ./disasm e8 00000000c3 e800000000 c3
0: e8 00 00 00 00 call 0x5 <_binary__stdin__start+0x5>
5: c3 ret
6: e8 00 00 00 00 call 0xb <_binary__stdin__start+0xb>
b: c3 ret
% ./disasm -r e8 00000000c3 e800000000 c3
call 0x5 <_binary__stdin__start+0x5>
ret
call 0xb <_binary__stdin__start+0xb>
ret
% ./disasm -a riscv64 1300 0000
0: 00000013 nop

Summary

  • Assembler: llvm-mc --show-encoding
  • Disassembler: llvm-mc --cdis --hex
  • Disassembler with address information: xxd -r -p,llvm-objcopy, andllvm-objdump -D -j .data

Simplifying disassembly with llvm-mc

Both compiler developers and security researchers have builtdisassemblers. They often prioritize different aspects. Compilertoolchains, benefiting from direct contributions from CPU vendors, tendto offer more accurate and robust decoding. Security-focused tools, onthe other hand, often excel in user interface design.

For quick disassembly tasks, rizinprovides a convenient command-line interface.

1
2
3
% rz-asm -a x86 -b 64 -d 4829c390
sub rbx, rax
nop

-a x86 can be omitted.

Within the LLVM ecosystem, llvm-objdump serves as a drop-inreplacement for the traditional GNU objdump, leveraging instructioninformation from LLVM's TableGen files(llvm/lib/Target/*/*.td). Another LLVM tool, llvm-mc, wasoriginally designed for internal testing of the Machine Code (MC) layer,particularly the assembler and disassembler components. There arenumerous RUN: llvm-mc ... tests withinllvm/test/MC. Despite its internal origins, llvm-mc isoften distributed as part of the LLVM toolset, making it accessible tousers.

However, using llvm-mc for simple disassembly tasks can becumbersome. It requires explicitly prefixing hexadecimal byte valueswith 0x:

1
2
3
4
% echo 0x48 0x29 0xc3 0x90 | llvm-mc --triple=x86_64 --cdis --output-asm-variant=1
.text
sub rbx, rax
nop

Let's break down the options used in this command:

  • --triple=x86_64: This specifies the targetarchitecture. If your LLVM build's default target triple is alreadyx86_64-*-*, this option can be omitted.
  • --output-asm-variant=1:LLVM, like GCC, defaults to AT&T syntax for x86 assembly. Thisoption switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntaxif you prefer the Intel syntax in compiler toolchains.
  • --cdis: Introduced in LLVM 18, this option enablescolored disassembly. In older LLVM versions, you have to use--disassemble.

I have contributed patches to remove.text and allow disassemblingraw bytes without the 0x prefix. You can now use the--hex option:

1
2
3
% echo 4829c390 | llvm-mc --cdis --hex --output-asm-variant=1
sub rbx, rax
nop

You can further simplify this by creating a shell alias:

1
alias disasm="llvm-mc --cdis --hex --output-asm-variant=1"

bash and zsh's "here string" feature provides a clean way to specifystdin.

1
2
3
4
5
6
% disasm <<< 4829c390
sub rbx, rax
nop
% disasm <<< $'4829 c3\n# comment\n90'
sub rbx, rax
nop

The --hex option conveniently ignores whitespace and#-style comments within the input.


clang-format and single-line statements

The Google C++ Style is widely adopted by projects. It contains abrace omission guideline in Loopingand branching statements:

For historical reasons, we allow one exception to the above rules:the curly braces for the controlled statement or the line breaks insidethe curly braces may be omitted if as a result the entire statementappears on either a single line (in which case there is a space betweenthe closing parenthesis and the controlled statement) or on two lines(in which case there is a line break after the closing parenthesis andthere are no braces).

1
2
3
4
5
6
7
8
9
// OK - fits on one line.
if (x == kFoo) { return new Foo(); }

// OK - braces are optional in this case.
if (x == kFoo) return new Foo();

// OK - condition fits on one line, body fits on another.
if (x == kBar)
Bar(arg1, arg2, arg3);

In clang-format's predefined Google style for C++, there are tworelated style options:

1
2
3
% clang-format --dump-config --style=Google | grep -E 'AllowShort(If|Loop)'
AllowShortIfStatementsOnASingleLine: WithoutElse
AllowShortLoopsOnASingleLine: true

The two options cause clang-format to aggressively join lines for thefollowing code:

1
2
3
4
5
6
7
8
for (int x : a)
foo(x);

while (cond())
foo(x);

if (x)
foo(x);

As a heavy debugger user, I find this behavior cumbersome.

1
2
3
4
5
6
7
// clang-format --style=Google
#include <vector>
void foo(int v) {}
int main() {
std::vector<int> a{1, 2, 3};
for (int x : a) foo(x); // breakpoint
}

When GDB stops at the for loop, how can I step into theloop body? Unfortunately, it's not simple.

If I run step, GDB will dive into the implementationdetail of the range-based for loop. It will stop at thestd::vector::begin function. Stepping out and executingstep again will stop at the std::vector::endfunction. Stepping out and executing step another time willstop at the operator!= function of the iterator type. Hereis an interaction example with GDB:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
(gdb) n
5 for (int x : a) foo(v);
(gdb) s
std::vector<int, std::allocator<int> >::begin (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:873
873 begin() _GLIBCXX_NOEXCEPT
(gdb) fin
Run till exit from #0 std::vector<int, std::allocator<int> >::begin (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:873
0x00005555555561d5 in main () at a.cc:5
5 for (int x : a) foo(v);
Value returned is $1 = 1
(gdb) s
std::vector<int, std::allocator<int> >::end (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:893
893 end() _GLIBCXX_NOEXCEPT
(gdb) fin
Run till exit from #0 std::vector<int, std::allocator<int> >::end (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:893
0x00005555555561e5 in main () at a.cc:5
5 for (int x : a) foo(v);
Value returned is $2 = 0
(gdb) s
__gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > > (__lhs=1, __rhs=0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1235
1235 { return __lhs.base() != __rhs.base(); }
(gdb) fin
Run till exit from #0 __gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > > (__lhs=1, __rhs=0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1235
0x0000555555556225 in main () at a.cc:5
5 for (int x : a) foo(v);
Value returned is $3 = true
(gdb) s
__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator* (this=0x7fffffffdca0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1091
1091 { return *_M_current; }
(gdb) fin
Run till exit from #0 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator* (this=0x7fffffffdca0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1091
0x00005555555561f7 in main () at a.cc:5
5 for (int x : a) foo(v);
Value returned is $4 = (int &) @0x55555556b2b0: 1

You can see that this can significantly hinder the debugging process,as it forces the user to delve into uninteresting function calls of therange-based for loop.

In contrast, when the loop body is on the next line, we can just runnext to skip the three uninteresting function calls:

1
2
for (int x : a) // next
foo(x); // step

The AllowShortIfStatementsOnASingleLine style option issimilar. While convenient for simple scenarios, it can sometimes hinderdebuggability.

For the following code, it's not easy to skip the c()and d() function calls if you just want to step intofoo(v).

1
if (c() && d()) foo(v);

Many developers, mindful of potential goto fail-likeissues, often opt to include braces in their code. clang-format'sdefault style can further reinforce this practice.

1
2
3
4
5
6
7
// clang-format does not join lines.
if (v) {
foo(v);
}
for (int x : a) {
foo(x);
}

Other predefined styles

clang-format's Chromium style is a variant of the Google style anddoes not have the aforementioned problem. The LLVM style, and manystyles derived from it, do not have the problem either.

1
2
3
4
5
6
% clang-format --dump-config --style=Chromium | grep -E 'AllowShort(If|Loop)'
AllowShortIfStatementsOnASingleLine: Never
AllowShortLoopsOnASingleLine: false
% clang-format --dump-config --style=LLVM | grep -E 'AllowShort(If|Loop)'
AllowShortIfStatementsOnASingleLine: Never
AllowShortLoopsOnASingleLine: false

A comparative look atother Languages

Go, Odin, and Rust require {} for if statements but omit(), striking a balance between clarity and conciseness.C/C++'s required ()` makes opt-in braces feel a bit verbose.

C3 and Jai, similar to C++, make {} optional.

Removing global state from LLD

LLD, the LLVM linker, is a matureand fast linker supporting multiple binary formats (ELF, Mach-O,PE/COFF, WebAssembly). Designed as a standalone program, the code baserelies heavily on global state, making it less than ideal for libraryintegration. As outlined in RFC:Revisiting LLD-as-a-library design, two main hurdles exist:

  • Fatal errors: they exit the process without returning control to thecaller. This was actually addressed for most scenarios in 2020 byutilizing llvm::sys::Process::Exit(val, /*NoCleanup=*/true)and CrashRecoveryContext (longjmp under thehood).
  • Global variable conflicts: shared global variables do not allow twoconcurrent invocation.

I understand that calling a linker API could be convenient,especially when you want to avoid shipping another executable (which canbe large when you link against LLVM statically). However, I believe thatinvoking LLD as a separate process remains the recommended approach.There are several advantages:

  • Build system control: Build systems gain greater control overscheduling and resource allocation for LLD. In an edit-compile-linkcycle, the link could need more resources and threading is moreuseful.
  • Better parallelism management
  • Global state isolation: LLVM's global state (primarilycl::opt and ManagedStatic) is isolated.

While spawning a new process offers build system benefits, the issueof global state usage within LLD remains a concern. This is a factor toconsider, especially for advanced use cases. Here are global variablesin the LLD 15 code base.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
% rg '^extern [^(]* \w+;' lld/ELF
lld/ELF/SyntheticSections.h
1290:extern InStruct in;

lld/ELF/Symbols.h
51:extern SmallVector<SymbolAux, 0> symAux;

lld/ELF/SymbolTable.h
87:extern std::unique_ptr<SymbolTable> symtab;

lld/ELF/InputSection.h
33:extern std::vector<Partition> partitions;
403:extern SmallVector<InputSectionBase *, 0> inputSections;
408:extern llvm::DenseSet<std::pair<const Symbol *, uint64_t>> ppc64noTocRelax;

lld/ELF/OutputSections.h
156:extern llvm::SmallVector<OutputSection *, 0> outputSections;

lld/ELF/InputFiles.h
43:extern std::unique_ptr<llvm::TarWriter> tar;

lld/ELF/Driver.h
23:extern std::unique_ptr<class LinkerDriver> driver;

lld/ELF/LinkerScript.h
366:extern std::unique_ptr<LinkerScript> script;

lld/ELF/Config.h
372:extern std::unique_ptr<Configuration> config;
406:extern std::unique_ptr<Ctx> ctx;

Some global states exist as static member variables.

Cleaning up global variables

LLD has been undergoing a transformation to reduce its reliance onglobal variables. This improves its suitability for libraryintegration.

  • In 2020, [LLD][COFF] Coverusage of LLD as a library enabled running the LLD driver multipletimes even if there is a fatal error.
  • In 2021, global variables were removed fromlld/Common.
  • The COFF port followed suite, eliminating most of its globalvariables.

Inspired by theseadvancements, I conceived a plan to eliminate globalvariables from the ELF port. In 2022, as part of the work to enableparallel section initialization, I introduced a classstruct Ctx to lld/ELF/Config.h. Here is myplan:

  • Global variables will be migrated into Ctx.
  • Functions will be modified to accept a new Ctx &ctxparameter.
  • The previously global variable lld::elf::ctx will be transformedinto a local variable within lld::elf::link.

Encapsulating globalvariables into Ctx

Over the past two years and a half, I have migrated global variablesinto the Ctx class, e.g..

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 590c19e6d88d..915c4d94e870 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -382,2 +382,10 @@ struct Ctx {
std::atomic<bool> hasSympart{false};
+ // A tuple of (reference, extractedFile, sym). Used by --why-extract=.
+ SmallVector<std::tuple<std::string, const InputFile *, const Symbol &>, 0>
+ whyExtractRecords;
+ // A mapping from a symbol to an InputFile referencing it backward. Used by
+ // --warn-backrefs.
+ llvm::DenseMap<const Symbol *,
+ std::pair<const InputFile *, const InputFile *>>
+ backwardReferences;
};
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 8315d43c776e..2ab698c91b01 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1776,3 +1776,3 @@ static void handleUndefined(Symbol *sym, const char *option) {
if (!config->whyExtract.empty())
- driver->whyExtract.emplace_back(option, sym->file, *sym);
+ ctx->whyExtractRecords.emplace_back(option, sym->file, *sym);
}
@@ -1812,3 +1812,3 @@ static void handleLibcall(StringRef name) {

-void LinkerDriver::writeArchiveStats() const {
+static void writeArchiveStats() {
if (config->printArchiveStats.empty())
@@ -1834,3 +1834,3 @@ void LinkerDriver::writeArchiveStats() const {
++extracted[CachedHashStringRef(file->archiveName)];
- for (std::pair<StringRef, unsigned> f : archiveFiles) {
+ for (std::pair<StringRef, unsigned> f : driver->archiveFiles) {
unsigned &v = extracted[CachedHashString(f.first)];

I did not do anything thing with the global variables in 2024. Thework was resumed in July 2024. I moved TarWriter,SymbolAux, Out, ElfSym,outputSections, etc into Ctx.

1
2
3
4
5
6
7
struct Ctx {
Config arg;
LinkerDriver driver;
LinkerScript *script;
std::unique_ptr<TargetInfo> target;
...
};

The config variable, used to store command-line options,was pervasive throughout lld/ELF. To enhance code clarity andmaintainability, I renamed it to ctx.arg (mold naming).

I've removed other instances of static storage variables throughtlld/ELF, e.g.

  • staticmember LinkerDriver::nextGroupId
  • staticmember SharedFile::vernauxNum
  • sectionMapin lld/ELF/Arch/ARM.cpp

Passing Ctx &ctxas parameters

The subsequent phase involved adding Ctx &ctx as aparameter to numerous functions and classes, gradually eliminatingreferences to the global ctx.

I incorporated Ctx &ctx as a member variable to afew classes (e.g. SyntheticSection,OutputSection) to minimize the modifications to memberfunctions. This approach was not suitable for Symbol andInputSection, since even a single word could increasememory consumption significantly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Writer.cpp
template <class ELFT> class Writer {
public:
LLVM_ELF_IMPORT_TYPES_ELFT(ELFT)

Writer(Ctx &ctx) : ctx(ctx), buffer(ctx.e.outputBuffer) {}
...

template <class ELFT> void elf::writeResult(Ctx &ctx) {
Writer<ELFT>(ctx).run();
}
...

bool elf::includeInSymtab(Ctx &ctx, const Symbol &b) {
if (auto *d = dyn_cast<Defined>(&b)) {
// Always include absolute symbols.
SectionBase *sec = d->section;
if (!sec)
return true;
assert(sec->isLive());

if (auto *s = dyn_cast<MergeInputSection>(sec))
return s->getSectionPiece(d->value).live;
return true;
}
return b.used || !ctx.arg.gcSections;
}

Eliminating the globalctx variable

Once the global ctx variable's reference count reachedzero, it was time to remove it entirely. I implemented the change onNovember 16, 2024.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 72feeb9d49cb..a9b7a98e5b54 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -539,4 +539,2 @@ struct InStruct {
std::unique_ptr<SymtabShndxSection> symTabShndx;
-
- void reset();
};
@@ -664,3 +662,2 @@ struct Ctx {
Ctx();
- void reset();

@@ -671,4 +668,2 @@ struct Ctx {

-LLVM_LIBRARY_VISIBILITY extern Ctx ctx;
-
// The first two elements of versionDefinitions represent VER_NDX_LOCAL and
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 334dfc0e3ba1..631051c27381 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -81,4 +81,2 @@ using namespace lld::elf;

-Ctx elf::ctx;
-
static void setConfigs(Ctx &ctx, opt::InputArgList &args);
@@ -165,2 +114,3 @@ bool link(ArrayRef<const char *> args, llvm::raw_ostream &stdoutOS,
llvm::raw_ostream &stderrOS, bool exitEarly, bool disableOutput) {
+ Ctx ctx;
// This driver-specific context will be freed later by unsafeLldMain().
@@ -169,7 +119,2 @@ bool link(ArrayRef<const char *> args, llvm::raw_ostream &stdoutOS,
context->e.initialize(stdoutOS, stderrOS, exitEarly, disableOutput);
- context->e.cleanupCallback = []() {
- Ctx &ctx = elf::ctx;
- ctx.reset();
- ctx.partitions.emplace_back(ctx);
- };
context->e.logName = args::getFilenameWithoutExe(args[0]);

Prior to this modification, the cleanupCallback function wasessential for resetting the global ctx when lld::elf::link was calledmultiple times.

Previously, cleanupCallback was essential for resettingthe global ctx when lld::elf::link was invokedmultiple times. With the removal of the global variable, this callbackis no longer necessary. We can now rely on the constructor to initializeCtx and avoid the need for a resetfunction.

Removing global state fromlld/Common

While significant progress has been made to lld/ELF,lld/Common needs a lot of work as well. A lot of sharedutility code (diagnostics, bump allocator) utilizes the globallld::context().

1
2
3
4
5
6
7
8
9
10
/// Returns the default error handler.
ErrorHandler &errorHandler();

void error(const Twine &msg);
void error(const Twine &msg, ErrorTag tag, ArrayRef<StringRef> args);
[[noreturn]] void fatal(const Twine &msg);
void log(const Twine &msg);
void message(const Twine &msg, llvm::raw_ostream &s = outs());
void warn(const Twine &msg);
uint64_t errorCount();

Although thread-local variables are an option, worker threads spawnedby llvm/lib/Support/Parallel.cpp don't inherit their valuesfrom the main thread. Given our direct access toCtx &ctx, we can leverage context-aware APIs asreplacements.

https://github.com/llvm/llvm-project/pull/112319introduced context-aware diagnostic utilities:

  • log("xxx") =>Log(ctx) << "xxx"
  • message("xxx") =>Msg(ctx) << "xxx"
  • warn("xxx") =>Warn(ctx) << "xxx"
  • errorOrWarn(toString(f) + "xxx") =>Err(ctx) << f << "xxx"
  • error(toString(f) + "xxx") =>ErrAlways(ctx) << f << "xxx"
  • fatal("xxx") =>Fatal(ctx) << "xxx"

As of Nov 16, 2024, I have eliminatedlog/warn/error/fatal from lld/ELF.

The underlying functions lld::ErrorHandler::fatal, andlld::ErrorHandler::error when the error limit is hit andexitEarly is true, call exitLld(1).

This transformation eliminates a lot of code size overhead due tollvm::Twine. Even in the simplest Twine(123)case, the generated code needs a stack object to hold the value and aTwine kind.

lld::make from lld/include/lld/Common/Memory.his an allocation function that uses the global context. When theownership is clear, std::make_unique might be a betterchoice.

Guideline:

  • Avoid lld::saver
  • Avoidvoid message(const Twine &msg, llvm::raw_ostream &s = outs());,which utilizes lld::outs()
  • Avoid lld::make from lld/include/lld/Common/Memory.h
  • Avoid fatal error in a half-initialized object, e.g. fatal error ina base class constructor (ELFFileBase::init) ([LLD][COFF] When usingLLD-as-a-library, always prevent re-entrance on failures)

Global state in LLVM

LTO link jobs utilize LLVM. Understanding its global state iscrucial.

While LLVM allows for multiple LLVMContext instances tobe allocated and used concurrently, it's important to note that theseinstances share certain global states, such as cl::opt andManagedStatic. Specifically, it's not possible to run twoconcurrent LLVM compilations (including LTO link jobs) with distinctsets of cl::opt option values. To link with distinctcl::opt values, even after removing LLD's global state,you'll need to spawn a new LLD process.

Any proposal that moves away from global state seems to complicatecl::opt usage, making it impractical.

LLD also utilizes functions from llvm/Support/Parallel.hfor parallelism. These functions rely on global state likegetDefaultExecutor andllvm::parallel::strategy. Ongoing work by Alexandre Ganeaaims to make these functions context-aware. (It's nice to meet you inperson in LLVM Developers' Meeting last month)

Supported library usagescenarios

You can repeatedly call lld::lldMain from lld/Common/Driver.h.If fatal has been invoked, it will not be safe to calllld::lldMain again in certain rare scenarios. Runninglld::lldMain concurrently in two threads is notsupported.

The command LLD_IN_TEST=3 lld-link ... runs the linkprocess three times, but only the final invocation outputs diagnosticsto stdout/stderr. lld/test/lit.cfg.py has configured theCOFF port to run tests twice ([lld] Add test suite mode forrunning LLD main twice). Other ports need work to make this modework.

Keeping pace with LLVM: compatibility strategies

LLVM's C++ API doesn't offer a stability guarantee. This meansfunction signatures can change or be removed between versions, forcingprojects to adapt.

On the other hand, LLVM has an extensive API surface. When a librarylike llvm/lib/Y relies functionality from another library,the API is often exported in header files underllvm/include/llvm/X/, even if it is not intended to beuser-facing.

To be compatible with multiple LLVM versions, many projects rely on#if directives based on the LLVM_VERSION_MAJORmacro. This post explores the specific techniques used by ccls to ensurecompatibility with LLVM versions 7 to 19. For the latest release (ccls0.20241108), support for LLVM versions 7 to 9 has beendiscontinued.

Given the tight coupling between LLVM and Clang, theLLVM_VERSION_MAJOR macro can be used for both versiondetection. There's no need to checkCLANG_VERSION_MAJOR.


Changed namespaces

In Oct 2018, https://reviews.llvm.org/D52783 moved the namespaceclang::vfs to llvm::vfs. To remaincompatibility, I renamed clang::vfs uses and added aconditional namespace alias:

1
2
3
4
5
6
#if LLVM_VERSION_MAJOR < 8
// D52783 Lift VFS from clang to llvm
namespace llvm {
namespace vfs = clang::vfs;
}
#endif

Removed functions

In March 2019, https://reviews.llvm.org/D59377 removed the membervariable VirtualFileSystem and removedsetVirtualFileSystem. To adapt to this change, ccls employsan #if.

1
2
3
4
5
6
#if LLVM_VERSION_MAJOR >= 9 // rC357037
Clang->createFileManager(FS);
#else
Clang->setVirtualFileSystem(FS);
Clang->createFileManager();
#endif

Changed function parameters

In April 2020, the LLVM monorepo integrated a new subproject: flang.flang developers made many changes to clangDriver to reuse it for flang.https://reviews.llvm.org/D86089 changed the constructorclang::driver::Driver. I added

1
2
3
4
5
#if LLVM_VERSION_MAJOR < 12 // llvmorg-12-init-5498-g257b29715bb
driver::Driver d(args[0], llvm::sys::getDefaultTargetTriple(), *diags, vfs);
#else
driver::Driver d(args[0], llvm::sys::getDefaultTargetTriple(), *diags, "ccls", vfs);
#endif

In November 2020, https://reviews.llvm.org/D90890 changed an argument ofComputePreambleBounds fromconst llvm::MemoryBuffer *Buffer toconst llvm::MemoryBufferRef &Buffer.

1
2
3
4
5
6
7
std::unique_ptr<llvm::MemoryBuffer> buf =
llvm::MemoryBuffer::getMemBuffer(content);
#if LLVM_VERSION_MAJOR >= 12 // llvmorg-12-init-11522-g4c55c3b66de
auto bounds = ComputePreambleBounds(*ci.getLangOpts(), *buf, 0);
#else
auto bounds = ComputePreambleBounds(*ci.getLangOpts(), buf.get(), 0);
#endif

https://reviews.llvm.org/D91297 made a similar changeand I adapted it similarly.

In Jan 2022, https://reviews.llvm.org/D116317 added a new parameterbool Braced toCodeCompleteConsumer::ProcessOverloadCandidates.

1
2
3
4
5
6
7
8
9
10
11
12
  void ProcessOverloadCandidates(Sema &s, unsigned currentArg,
OverloadCandidate *candidates,
unsigned numCandidates
#if LLVM_VERSION_MAJOR >= 8
,
SourceLocation openParLoc
#endif
#if LLVM_VERSION_MAJOR >= 14
,
bool braced
#endif
) override {

In late 2022 and early 2023, there were many changes to migrate fromllvm::Optional to std::optional.

1
2
3
4
5
6
7
8
#if LLVM_VERSION_MAJOR >= 16 // llvmorg-16-init-12589-ge748db0f7f09
std::array<std::optional<StringRef>, 3>
#else
std::array<Optional<StringRef>, 3>
#endif
redir{StringRef(stdinPath), StringRef(path), StringRef()}; 0 ref
std::vector<StringRef> args{g_config->compilationDatabaseCommand, root}; 0 ref
if (sys::ExecuteAndWait(args[0], args, {}, redir, 0, 0, &err_msg) < 0) {

In Sep 2023, https://github.com/llvm/llvm-project/pull/65647 changedCompilerInvocationRefBase toCompilerInvocationBase. I duplicated the code with..

1
2
3
4
5
6
7
8
9
10
11
#if LLVM_VERSION_MAJOR >= 18
ci->getLangOpts().SpellChecking = false;
ci->getLangOpts().RecoveryAST = true;
ci->getLangOpts().RecoveryASTType = true;
#else
ci->getLangOpts()->SpellChecking = false;
#if LLVM_VERSION_MAJOR >= 11
ci->getLangOpts()->RecoveryAST = true;
ci->getLangOpts()->RecoveryASTType = true;
#endif
#endif

In April 2024, https://github.com/llvm/llvm-project/pull/89548/ removedllvm::StringRef::startswith in favor ofstarts_with. starts_with has been available since Oct 2022 andstartswith had been deprecated. I added the followingsnippet:

1
2
3
4
#if LLVM_VERSION_MAJOR >= 19
#define startswith starts_with
#define endswith ends_with
#endif

It's important to note that the converse approach

1
2
#define starts_with startswith
#define ends_with endswith

could break code that callsstd::string_view::starts_with.

Changed enumerators

In November 2023, https://github.com/llvm/llvm-project/pull/71160 changedan unnamed enumeration to a scoped enumeration. To keep the followingsnippet compiling,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
switch (tag_d->getTagKind()) {
case TTK_Struct:
tag = "struct";
break;
case TTK_Interface:
tag = "__interface";
break;
case TTK_Union:
tag = "union";
break;
case TTK_Class:
tag = "class";
break;
case TTK_Enum:
tag = "enum";
break;
}

I introduced macros.

1
2
3
4
5
6
7
#if LLVM_VERSION_MAJOR >= 18 // llvmorg-18-init-10631-gedd690b02e16
#define TTK_Class TagTypeKind::Class
#define TTK_Enum TagTypeKind::Enum
#define TTK_Interface TagTypeKind::Interface
#define TTK_Struct TagTypeKind::Struct
#define TTK_Union TagTypeKind::Union
#endif

In April 2024, https://github.com/llvm/llvm-project/pull/89639 renamedan enumerator. I have made the following adaptation:

1
2
3
4
5
6
7
#if LLVM_VERSION_MAJOR >= 19 // llvmorg-19-init-9465-g39adc8f42329
case BuiltinType::ArraySection:
#else
case BuiltinType::OMPArraySection:
return "<OpenMP array section type>";
#endif
return "<array section type>";

Build system changes

In Dec 2022, https://reviews.llvm.org/D137838 added a new LLVMlibrary LLVMTargetParser. I adjusted ccls's CMakeLists.txt:

1
2
3
4
target_link_libraries(ccls PRIVATE LLVMOption LLVMSupport)
if(LLVM_VERSION_MAJOR GREATER_EQUAL 16) # llvmorg-16-init-15123-gf09cf34d0062
target_link_libraries(ccls PRIVATE LLVMTargetParser)
endif()

Summary

The above examples illustrate how to adapt to changes in the LLVM andClang APIs. It's important to remember that API changes are a naturalpart of software development, and testing with different releases iscrucial for maintaining compatibility with a wide range of LLVMversions.

When introducing new interfaces, we should pay a lot of attention toreduce the chance that the interface will be changed in a way thatcauses disruption to the downstream. That said, changes are normal. Whenan API change is justified, do it.

Downstream projects should be mindful of the stability guarantees ofdifferent LLVM APIs. Some API may be more prone to change than others.It's essential to write code in a way that can easily adapt to changesin the LLVM API.

LLVM C API

While LLVM offers a C API with an effort made towards compatibility,its capabilities often fall short.

Clang provides a C API called libclang. Whilehighly stable, libclang's limited functionality makes it unsuitable formany tasks.

In 2018, when creating ccls (a fork of cquery), I encounteredmultiple limitations in libclang's ability to handle code completion andindexing. This led to rewriting the relevant code to leverage the ClangC++ API for a more comprehensive solution. The following commits offerinsights into how the C API and the mostly equivalent but better C++ APIworks:

  • Firstdraft: replace libclang indexer with clangIndex
  • UseClang C++ for completion and diagnostics
❌