In my previous post, RelocationGeneration in Assemblers, I explored some key concepts behindLLVM’s integrated assemblers. This post dives into recent improvementsI’ve made to refine that system.
The LLVM integrated assembler handles fixups and relocatableexpressions as distinct entities. Relocatable expressions, inparticular, are encoded using the MCValue class, whichoriginally looked like this:
RefKind acts as an optional relocation specifier,though only a handful of targets actually use it.
SymA represents an optional symbol reference (theaddend).
SymB represents another optional symbol reference (thesubtrahend).
Cst holds a constant value.
While functional, this design had its flaws. For one, the wayrelocation specifiers were encoded varied across architectures:
Targets like COFF, Mach-O, and ELF's PowerPC, SystemZ, and X86 embedthe relocation specifier within MCSymbolRefExpr *SymA aspart of SubclassData.
Conversely, ELF targets such as AArch64, MIPS, and RISC-V store itas a target-specific subclass of MCTargetExpr, and convertit to MCValue::RefKind duringMCValue::evaluateAsRelocatable.
Another issue was with SymB. Despite being typed asconst MCSymbolRefExpr *, itsMCSymbolRefExpr::VariantKind field went unused. This isbecause expressions like add - sub@got are notrelocatable.
Over the weekend, I tackled these inconsistencies and reworked therepresentation into something cleaner:
This updated design not only aligns more closely with the concept ofrelocatable expressions but also shaves off some compiler time in LLVM.The ambiguous RefKind has been renamed toSpecifier for clarity. Additionally, targets thatpreviously encoded the relocation specifier withinMCSymbolRefExpr (rather than usingMCTargetExpr) can now access it directly viaMCValue::Specifier.
To support this change, I made a few adjustments:
IntroducedgetAddSym and getSubSym methods, returningconst MCSymbol *, as replacements for getSymAand getSymB.
Eliminated dependencies on the old accessors,MCValue::getSymA and MCValue::getSymB.
Reworkedthe expression folding code that handles + and -
Some targets relied on PC-relative fixups with explicit specifiersforcing relocations. I have definedMCAsmBackend::shouldForceRelocation for SystemZ and cleanedup ARM and PowerPC
Changedthe type of SymA and SymB toconst MCSymbol *
Mach-O assembler support in LLVM has accumulated significanttechnical debt, impacting both target-specific and generic code. Oneparticularly nagging issue was theconst SectionAddrMap *Addrs parameter inMCExpr::evaluateAs* functions. This parameter existed tohandle cross-section label differences, primarily for generating(compact) unwind information in Mach-O. A typical example of this can beseen in assembly like:
The SectionAddrMap *Addrs parameter always felt like aclunky workaround to me. It wasn’t until I dug into the Mach-OAArch64 object writer that I realized this hack wasn't necessary forthat writer. This discovery prompted a cleanup effort to remove thedependency on SectionAddrMap for ARM and X86 and eliminatethe parameter:
[MC,MachO]Replace SectionAddrMap workaround with cleaner variablehandling
MCExpr:Remove unused SectionAddrMap workaround
While I was at it, I also tidied up MCSymbolRefExpr byremovingthe clunky HasSubsectionsViaSymbolsBit, furthersimplifying the codebase.
This post explores how GNU Assembler and LLVM integrated assemblergenerate relocations, an important step to generate a relocatable file.Relocations identify parts of instructions or data that cannot be fullydetermined during assembly because they depend on the final memorylayout, which is only established at link time or load time. These areessentially placeholders that will be filled in (typically with absoluteaddresses or PC-relative offsets) during the linking process.
Relocation generation: thebasics
Symbol references are the primary candidates for relocations. Forinstance, in the x86-64 instruction movl sym(%rip), %eax(GNU syntax), the assembler calculates the displacement between theprogram counter (PC) and sym. This distance affects theinstruction's encoding and typically triggers aR_X86_64_PC32 relocation, unless sym is alocal symbol defined within the current section.
Both the GNU assembler and LLVM integrated assembler utilize multiplepasses during assembly, with several key phases relevant to relocationgeneration:
Parsing phase
During parsing, the assembler builds section fragments that containinstructions and other directives. It parses each instruction into itsopcode (e.g., movl) and operands (e.g.,sym(%rip), %eax). It identifies registers, immediate values(like 3 in movl $3, %eax), and expressions.
Expressions can be constants, symbol refereces (likesym), or unary and binary operators (-sym,sym0-sym1). Those unresolvable at parse time-potentialrelocation candidates-turn into "fixups". These often skip immediateoperand range checks, as shown here:
1 2 3 4 5 6 7
% echo 'addi a0, a0, 2048' | llvm-mc -triple=riscv64 <stdin>:1:14: error: operand must be a symbol with %lo/%pcrel_lo/%tprel_lo modifier or an integer in the range [-2048, 2047] addi a0, a0, 2048 ^ % echo 'addi a0, a0, %lo(x)' | llvm-mc -triple riscv64 -show-encoding addi a0, a0, %lo(x) # encoding: [0x13,0x05,0bAAAA0101,A] # fixup A - offset: 0, value: %lo(x), kind: fixup_riscv_lo12_i
A fixup ties to a specific location (an offset within a fragment),with its value being the expression (which must eventually evaluate to arelocatable expression).
Meanwhile, the assembler tracks defined and referenced symbols, andfor ELF, it tracks symbol bindings(STB_LOCAL, STB_GLOBAL, STB_WEAK) from directives like.globl, .weak, or the rarely used.local.
Section layout phase
After parsing, the assembler arranges each section by assigningprecise offsets to its fragments-instructions, data, or other directives(e.g., .line, .uleb128). It calculates sizesand adjusts for alignment. This phase finalizes symbol offsets (e.g.,start: at offset 0x10) while leaving external ones for thelinker.
This phase, which employs a fixed-point iteration, is quite complex.I won't go into details, but you might find Clang's-O0 output: branch displacement and size increase interesting.
Relocation decision phase
Then the assembler evaluates each fixup to determine if it can beresolved directly or requires a relocation entry. This process starts byattempting to convert fixups into relocatable expressions.
Evaluating relocatableexpressions
In their most general form, relocatable expressions follow thepattern relocation_specifier(sym_a - sym_b + offset),where
relocation_specifier: This may or may not be absent. Iwill explain this concept later.
sym_a is a symbol reference (the "addend")
sym_b is an optional symbol reference (the"subtrahend")
offset is a constant value
Most common cases involve only sym_a oroffset (e.g., movl sym(%rip), %eax ormovl $3, %eax). Only a few target architectures support thesubtrahend term (sym_b). Notable exceptions include AVR andRISC-V, as explored in Thedark side of RISC-V linker relaxation.
Attempting to use unsupported expression forms will result inassembly errors:
1 2 3 4 5 6 7
% echo -e 'movl a+b, %eax\nmovl a-b, %eax' | clang -c -xassembler - <stdin>:1:1: error: expected relocatable expression movl a+b, %eax ^ <stdin>:2:1: error: symbol 'b' can not be undefined in a subtraction expression movl a-b, %eax ^
PC-relative fixups
PC-relative fixups compute their values assym_a + offset - current_location. (I’ve skipped- sym_b, since no target I know permits a subtrahendhere.)
When sym_a is a local symbol defined within the currentsection, these PC-relative fixups evaluate to constants. But ifsym_a is a global or weak symbol in the same section, arelocation entry is generated. This ensures ELF symbolinterposition stays in play.
Resolution Outcomes
The assembler's evaluation of fixups leads to one of threeoutcomes:
Error: When the expression isn't supported.
Resolved fixups: When the fixup evaluates to a constant, theassembler updates the relevant bits in the instruction directly. Norelocation entry is needed.
There are target-specific exceptions that make the fixup unresolved.In AArch64 adrp x0, l0; l0:, the immediate might be either0 or 1, dependant on the instructin address. In RISC-V, linkerrelaxation might make fixups unresolved.
Unresolved fixups: When the fixup evaluates to a relocatableexpression but not a constant, the assembler
Generates an appropriate relocation (offset, type, symbol,addend).
For targets that use RELA, usually zeros out the bits in theinstruction field that will be modified by the linker.
For targets that use REL, leave the addend in the instructionfield.
If the referenced symbol is defined and local, and the relocationtype is not in exceptions (gas tc_fix_adjustable), therelocation references the section symbol instead of the localsymbol.
If you are interested in relocation representations in differentobject file formats, please check out my post Exploring objectfile formats.
(a-.)(%rip) would probably be more semantically correctbut is not adopted by GNU Assembler.
Relocation specifiers
Relocation specifiers guide the assembler on how to resolve andencode expressions into instructions. They specify details like:
Whether to reference the symbol itself, its Procedure Linkage Table(PLT) entry, or its Global Offset Table (GOT) entry.
Which part of a symbol's address to use (e.g., lower or upperbits).
Whether to use an absolute address or a PC-relative one.
This concept appears across various architectures but withinconsistent terminology. The Arm architecture refers to elements like:lo12: and :lower16: as "relocationspecifiers". IBM's AIX documentation also uses this term. Many GNUBinutils target documents simply call these "modifiers", while AVRdocumentation uses "relocatable expression modifiers".
Picking the right term was tricky. "Relocatable expression modifier"nails the idea of tweaking relocatable expressions but feels overlyverbose. "Relocation modifier", though concise, suggests adjustmentshappen during the linker's relocation step rather than the assembler'sexpression evaluation. I landed on "relocation specifier" as the winner.It's clear, aligns with Arm and IBM’s usage, and fits the assembler'srole seamlessly.
For example, RISC-V addi can be used with either anabsolute address or a PC-relative address. Relocation specifiers%lo and %pcrel_lo could differentiate the twouses. Similarly, %hi, %pcrel_hi, and%got_pcrel_hi could differentiate the uses oflui and auipc.
1 2 3 4 5 6 7 8 9 10 11
# Position-dependent code (PDC) - absolute addressing lui a0, %hi(var) # Load upper immediate with high bits of symbol address addi a0, a0, %lo(var) # Add lower 12 bits of symbol address
# Position-independent code (PIC) - PC-relative addressing auipc a0, %pcrel_hi(var) # Add upper PC-relative offset to PC addi a0, a0, %pcrel_lo(.Lpcrel_hi1) # Add lower 12 bits of PC-relative offset
# Position-independent code via Global Offset Table (GOT) auipc a0, %got_pcrel_hi(var) # Calculate address of GOT entry relative to PC ld a0, %pcrel_lo(.Lpcrel_hi1)(a0) # Load var's address from GOT
Why use %hi with lui if it's always paired?It's about clarify and explicitness. %hi ensuresconsistency with %lo and cleanly distinguishes it from from%pcrel_hi. Since both lui andauipc share the U-type instruction format, tying relocationspecifiers to formats rather than specific instructions is a smart,flexible design choice.
Relocation specifier flavors
Assemblers use various syntaxes for relocation specifiers, reflectingarchitectural quirks and historical conventions. Below, we explore themain flavors, their usage across architectures, and some of theirpeculiarities.
expr@specifier
This is likely the most widespread syntax, adopted by many binutilstargets, including ARC, C-SKY, Power, M68K, SuperH, SystemZ, and x86,among others. It's also used in Mach-O object files, e.g.,adrp x8, _bar@GOTPAGE.
This suffix style puts the specifier after an @. It'sintuitive—think sym@got. In PowerPC, operators can getelaborate, such as sym@toc@l(9). Here, @toc@lis a single, indivisible operator-not two separate @pieces-indicating a TOC-relative reference with a low 16-bitextraction.
Parsing is loose: while both expr@specifier+expr andexpr+expr@specifier are accepted (by many targets),conceptually it's just specifier(expr+expr). For example,x86 accepts sym@got+4 or sym+4@got, but don'tmisread—@got applies to sym+4, not justsym.
%specifier(expr)
MIPS, SPARC, RISC-V, and LoongArch favor this prefix style, wrappingthe expression in parentheses for clarity. In MIPS, parentheses areoptional, and operators can nest, like
A simpler suffix style, this is used by AArch32 for data directives.It's less common but straightforward, placing the operator inparentheses after the expression.
1 2 3 4
.word sym(gotoff) .long f(FUNCDESC)
.long f(got)+3 // allowed b GNU assembler and LLVM integrated assembler, but probably not used in the wild
:specifier:expr
AArch32 and AArch64 adopt this colon-framed prefix notation, avoidingthe confusion that parentheses might introduce.
1 2 3 4 5 6 7 8
// AArch32 movw r0, :lower16:x
// AArch64 add x8, x8, :lo12:sym
adrp x0, :got:var ldr x0, [x0, :got_lo12:var]
Applying this syntax to data directives, however, could createparsing ambiguity. In both GNU Assembler and LLVM,.word :plt:fun would be interpreted as.word: plt: fun, treating .word andplt as labels, rather than achieving the intendedmeaning.
Recommendation
For new architectures, I'd suggest adopting%specifier(expr), and never use @specifier.The % symbol works seamlessly with data directives, andduring operand parsing, the parser can simply peek at the first token tocheck for a relocation specifier.
I favor %specifier(expr) over%specifier expr because it provides clearer scoping,especially in data directives with multiple operands, such as.long %lo(a), %lo(b).
.altmacro .macro m arg; .long \arg; .endm .data; m %(1+2)
)
Inelegance
RISC-V favors %specifier(expr) but clings tocall sym@plt for legacyreasons.
AArch64 uses :specifier:expr, yetR_AARCH64_PLT32 (.word foo@plt - .) and PAuthABI (.quad (g + 7)@AUTH(ia,0)) cannot use :after data directives due to parsing ambiguity. https://github.com/llvm/llvm-project/issues/132570
TLS symbols
When a symbol is defined in a section with the SHF_TLSflag (Thread-Local Storage), GNU assembler assigns it the typeSTT_TLS in the symbol table. For undefined TLS symbols, theprocess differs: GCC and Clang don’t emit explicit labels. Instead,assemblers identify these symbols through TLS-specific relocationspecifiers in the code, deduce their thread-local nature, and set theirtype to STT_TLS accordingly.
1 2 3 4 5
// AArch64 add x8, x8, :tprel_hi12:tls
// x86 movl %fs:tls@TPOFF, %eax
Composed relocations
Most instructions trigger zero or one relocation, but some generatetwo. Often, one acts as a marker, paired with a standard relocation. Forexample:
PPC64 bl __tls_get_addr(x@tlsgd)pairs a marker R_PPC64_TLSGD withR_PPC64_REL24
PPC64's link-time GOT-indirect to PC-relative optimization (withPower10's prefixed instruction) generates aR_PPC64_PCREL_OPT relocation following a GOT relocation. https://reviews.llvm.org/D79864
Mach-O scattered relocations for label differences.
These marker cases tie into "composed relocations", as outlined inthe Generic ABI:
If multiple consecutive relocation records are applied to the samerelocation location (r_offset), they are composed insteadof being applied independently, as described above. By consecutive, wemean that the relocation records are contiguous within a singlerelocation section. By composed, we mean that the standard applicationdescribed above is modified as follows:
In all but the last relocation operation of a composed sequence,the result of the relocation expression is retained, rather than havingpart extracted and placed in the relocated field. The result is retainedat full pointer precision of the applicable ABI processorsupplement.
In all but the first relocation operation of a composed sequence,the addend used is the retained result of the previous relocationoperation, rather than that implied by the relocation type.
Note that a consequence of the above rules is that the locationspecified by a relocation type is relevant for the first element of acomposed sequence (and then only for relocation records that do notcontain an explicit addend field) and for the last element, where thelocation determines where the relocated value will be placed. For allother relocation operands in a composed sequence, the location specifiedis ignored.
An ABI processor supplement may specify individual relocation typesthat always stop a composition sequence, or always start a new one.
Implicit addends
ELF SHT_REL and Mach-O utilize implicit addends.TODO
R_MIPS_HI16 (https://reviews.llvm.org/D101773)
GNU Assembler internals
GNU Assembler utilizes struct fixup to represent boththe fixup and the relocatable expression.
1 2 3 4 5 6 7 8 9 10 11
structfix { ... /* NULL or Symbol whose value we add in. */ symbolS *fx_addsy;
/* NULL or Symbol whose value we subtract. */ symbolS *fx_subsy;
/* Absolute number we add in. */ valueT fx_offset; };
The relocation specifier is part of the instruction instead of partof struct fix. Targets have different internalrepresentations of instructions.
The 2002 message stageone of gas reloc rewrite describes the passes.
In PPC, the result of @l and @ha can beeither signed or unsigned, determined by the instruction opcode.
In md_apply_fix, TLS-related relocation specifiers callS_SET_THREAD_LOCAL (fixP->fx_addsy);.
LLVM internals
LLVM integrated assembler encodes fixups and relocatable expressionsseparately.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
classMCFixup { /// The value to put into the fixup location. The exact interpretation of the /// expression is target dependent, usually it will be one of the operands to /// an instruction or an assembler directive. const MCExpr *Value = nullptr;
/// The byte index of start of the relocation inside the MCFragment. uint32_t Offset = 0;
/// The target dependent kind of fixup item this is. The kind is used to /// determine how the operand value should be encoded into the instruction. MCFixupKind Kind = FK_NONE;
/// The source location which gave rise to the fixup, if any. SMLoc Loc; };
Specifier as an optional relocation specifier (namedRefKind before LLVM 21)
SymA as an optional symbol reference (addend)
SymB as an optional symbol reference (subtrahend)
Cst as a constant value
This mirrors the relocatable expression concept, butRefKind—addedin 2014 for AArch64—remains rare among targets. (I've recently madesome cleanup to some targets. For instance, I migrated PowerPC's @l and @ha folding to use RefKind.)
AArch64 implements a clean approach to select the relocation type. Itdispatches on the fixup kind (an operand within a specific instructionformat), then refines it with the relocation specifier.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
// AArch64ELFObjectWriter::getRelocType unsigned Kind = Fixup.getTargetKind(); switch (Kind) { // Handle generic MCFixupKind. case FK_Data_1: case FK_Data_2: ...
// Handle target-specific MCFixupKind. case AArch64::fixup_aarch64_add_imm12: if (RefKind == AArch64MCExpr::VK_DTPREL_HI12) returnR_CLS(TLSLD_ADD_DTPREL_HI12); if (RefKind == AArch64MCExpr::VK_TPREL_HI12) returnR_CLS(TLSLE_ADD_TPREL_HI12); ... }
MCSymbolRefExpr issues
The expression structure follows a traditional object-orientedhierarchy:
1 2 3 4 5 6 7
MCExpr MCConstantExpr: Value MCSymbolRefExpr: VariantKind, Symbol MCUnaryExpr: Op, Expr MCBinaryExpr: Op, LHS, RHS MCTargetExpr AArch64MCExpr: VariantKind, Expr
MCSymbolRefExpr::VariantKind enums the relocationspecifier, but it's a poor fit:
Other expressions, like MCConstantExpr (e.g., PPC4@l) and MCBinaryExpr (e.g., PPC(a+1)@l), also need it.
Semantics blur when folding expressions with @, whichis unavoidable when @ can occur at any position within thefull expression.
The generic MCSymbolRefExpr lacks target-specifichooks, cluttering the interface with any target-specific logic.
Consider what happens with addition or subtraction:
Here, the specifier attaches only to the LHS, leaving the full resultuncovered. This awkward design demands workarounds.
Parsing a+4@got exposes clumsiness. AfterAsmParser::parseExpression processes a+4, itdetects @got and retrofits it ontoMCSymbolRefExpr(a), which feels hacked together.
PowerPC's @l@ha optimization needsPPCAsmParser::extractModifierFromExpr andPPCAsmParser::applyModifierToExpr to convert aMCSymbolRefExpr to a PPCMCExpr.
Many targets (e.g., X86) use MCValue::getAccessVariantto grab LHS's specifier, though MCValue::RefKind would becleaner.
Worse, leaky abstractions that MCSymbolRefExpr isaccessed widely in backend code introduces another problem: whileMCBinaryExpr with a constant RHS mimicsMCSymbolRefExpr semantically, code often handles only thelatter.
MCTargetExprencoding relocation specifiers
MCTargetExpr subclasses, as used by AArch64 and RISC-V,offer a cleaner approach to encode relocations. We should limitMCTargetExpr to top-level use to encode one singlerelocation and avoid its inclusion as a subexpression.
MCSymbolRefExpr::VariantKind as the legacy way to encoderelocations should be completely removed (probably in a distant futureas many cleanups are required).
Our long-term goal is to migrate MCValue to useMCSymbol pointers instead of MCSymbolRefExprpointers.
In LLVM's assembly parser library (LLVMMCParser), the parsing ofexpr@specifier was supported for all targets until Iupdated it to be anopt-in feature in March 2025.
AsmParser's @specifier parsing is suboptimal,necessitating lexer workarounds.
The @ symbol can appear after a symbol or an expression(via parseExpression) and may occur multiple times within asingle operand, making it challenging to validate and reject invalidcases.
In the GNU Assembler, COFF targets permit @ withinidentifier names, and MinGW supports constructs like.long ext24@secrel32. It appears that a recognized suffixis treated as a specifier, while an unrecognized suffix results in asymbol that includes the @.
The PowerPC AsmParser(llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp) parsesan operand and then calls PPCAsmParser::extractSpecifier toextract the optional @ specifier. When the @specifier is detected and removed, it generates aPPCMCExpr. This functionality is currently implemented for@l and @ha`,and it would be beneficial to extend this to include all specifiers.
AsmPrinter
In llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp,AsmPrinter::lowerConstant outlines how LLVM handles theemission of a global variable initializer. When processingConstantExpr elements, this function may generate datadirectives in the assembly code that involve differences betweensymbols.
One significant use case for this intricate code isclang++ -fexperimental-relative-c++-abi-vtables. Thisfeature produces a PC-relative relocation that points to either the PLT(Procedure Linkage Table) entry of a function or the function symboldirectly.
This post describes how to compile a single C++ source file to anobject file with the Clang API. Here is the code. It behaves like asimplified clang executable that handles -cand -S.
auto invoc = std::make_unique<CompilerInvocation>(); CompilerInvocation::CreateFromArgs(*invoc, ccArgs, *diags); auto ci = std::make_unique<CompilerInstance>(); ci->setInvocation(std::move(invoc)); ci->createDiagnostics(*fs, &dc, false); // Disable CompilerInstance::printDiagnosticStats, which might display "2 warnings generated." ci->getDiagnostics().getDiagnosticOptions().ShowCarets = false; ci->createFileManager(fs); ci->createSourceManager(ci->getFileManager());
// Clang calls BuryPointer on the internal AST and CodeGen-related elements like TargetMachine. // This will cause memory leaks if `compile` is executed many times. ci->getCodeGenOpts().DisableFree = false; ci->getFrontendOpts().DisableFree = false;
We need an LLVM and Clang installation that provides bothlib/cmake/llvm/LLVMConfig.cmake andlib/cmake/clang/ClangConfig.cmake. You can grab these fromsystem packages (dev versions may be required) or build LLVMyourself-I'll skip the detailed steps here. For a DIY build, use:
1 2 3
# cmake ... -DLLVM_ENABLE_PROJECTS='clang'
ninja -C out/stable clang-cmake-exports clang
No install step is needed. Next, create a builddirectory with the CMake configuration above:
I've set a prebuilt Clang as CMAKE_CXX_COMPILER-just ahabit of mine. llvm-project isn't guaranteed to build warning-free withGCC, since GCC -Wall -Wextra has many false positives andLLVM developers avoid cluttering the codebase.
1 2 3 4 5 6 7 8 9
% echo 'void f() {}' > a.cc % out/debug/cc -S a.cc && head -n 5 a.s .file "a.cc" .text .globl _Z1fv # -- Begin function _Z1fv .p2align 4 .type _Z1fv,@function % out/debug/cc -c a.cc && ls a.o a.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
Anonymous files
The input source file and the output ELF file are stored in thefilesystem. We could create a temporary file and delete it with a RAIIclass llvm::FileRemover:
LLVMX86AsmParser: llvm/lib/Target/X86/AsmParser(depends on LLVMX86Info and LLVMX86Desc)
LLVMX86CodeGen: llvm/lib/Target/X86/ (depends onLLVMX86Info and LLVMX86Desc)
EmitAssembly andEmitObj
The code supports two frontend actions, EmitAssembly(-S) and EmitObj (-c).
You could also utilize the API inclang/include/clang/FrontendTool/Utils.h, but that wouldpull in another library clangFrontendTool (different fromclangFrontend).
Diagnostics
The diagnostics system is quite complex. We haveDiagnosticConsumer, DiagnosticsEngine, andDiagnosticOptions.
We define a simple DiagnosticConsumer that handlesnotes, warnings, errors, and fatal errors. When macro expansion comesinto play, we report two key locations:
The physical location (fileLoc), where the expandedtoken triggers an issue-matching Clang's error line, and
The spelling location within the macro's replacement list(sm.getSpellingLoc(loc)).
Although Clang also highlights intermediate locations for chainedexpansions, our simple approach offers a solid approximation.
% cat a.h #define FOO(x) x + 1 % cat a.cc #include "a.h" #define BAR FOO void f() { int y = BAR("abc"); } % out/debug/cc -c -Wall a.cc a.cc:4:11: warning: adding 'int' to a string does not append to the string ./a.h:1:18: note: expanded from macro a.cc:4:11: note: use array indexing to silence this warning ./a.h:1:18: note: expanded from macro a.cc:4:7: error: cannot initialize a variable of type 'int' with an rvalue of type 'const char *' % clang -c -Wall a.cc a.cc:4:11: warning: adding 'int' to a string does not append to the string [-Wstring-plus-int] 4 | int y = BAR("abc"); | ^~~~~~~~~~ a.cc:2:13: note: expanded from macro 'BAR' 2 | #define BAR FOO | ^ ./a.h:1:18: note: expanded from macro 'FOO' 1 | #define FOO(x) x + 1 | ~~^~~ a.cc:4:11: note: use array indexing to silence this warning a.cc:2:13: note: expanded from macro 'BAR' 2 | #define BAR FOO | ^ ./a.h:1:18: note: expanded from macro 'FOO' 1 | #define FOO(x) x + 1 | ^ a.cc:4:7: error: cannot initialize a variable of type 'int' with an rvalue of type 'const char *' 4 | int y = BAR("abc"); | ^ ~~~~~~~~~~ 1 warning and 1 error generated.
We call a convenience functionCompilerInstance::ExecuteAction, which wraps lower-levelAPI like BeginSource, Execute, andEndSource. However, it will print1 warning and 1 error generated. unless we setShowCarets to false.
clang::createInvocation
clang::createInvocation, renamed from createInvocationFromCommandLinein 2022, combines clang::Driver::BuildCompilation andclang::CompilerInvocation::CreateFromArgs. While it saves afew lines for certain tasks, it lacks the flexibility we need for ourspecific use cases.
Followed this guide: https://www.patrickthurmond.com/blog/2023/12/11/commenting-is-available-now-thanks-to-giscus
Add the following to layout/_partial/article.ejs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
<% if (!index && post.comments) { %> <section class="giscus"></section> <script src="https://giscus.app/client.js" data-repo="MaskRay/maskray.me" data-repo-id="FILL IT UP" data-category="Blog Post Comments" data-category-id="FILL IT UP" data-mapping="pathname" data-strict="0" data-reactions-enabled="1" data-emit-metadata="0" data-input-position="bottom" data-theme="preferred_color_scheme" data-lang="en" data-loading="lazy" crossorigin="anonymous" async> </script> <% } %>
Unfortunately comments from Disqus have not been migrated yet. Ifyou've left comments in the past, thank you. Apologies they are nowgone.
While you can create Github Discussions via GraphQL API, I haven'tfound a solution that works out of the box. https://www.davidangulo.xyz/posts/dirty-ruby-script-to-migrate-comments-from-disqus-to-giscus/provides a Ruby solution, which is promising but no longer works.
1 2 3 4 5 6 7 8 9
Failed to define value method for :name, because EnterpriseOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value me thod generation. Failed to define value method for :name, because EnvironmentOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value m ethod generation. Failed to define value method for :name, because LabelOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value method generation. ... .local/share/gem/ruby/3.3.0/gems/graphql-client-0.25.0/lib/graphql/client.rb:338:in `query': wrong number of arguments (given 2, expected 1) (ArgumentError) from g.rb:42:in `create_discussion'
LLVM 20 will be released. As usual, I maintain lld/ELF and have addedsome notes to https://github.com/llvm/llvm-project/blob/release/20.x/lld/docs/ReleaseNotes.rst.I've meticulously reviewed nearly all the patches that are not authoredby me. I'll delve into some of the key changes.
-z nosectionheader has been implemented to omit thesection header table. The operation is similar tollvm-objcopy --strip-sections. (#101286)
--randomize-section-padding=<seed> is introducedto insert random padding between input sections and at the start of eachsegment. This can be used to control measurement bias in A/Bexperiments. (#117653)
The reproduce tarball created with --reproduce= nowexcludes directories specified in the --dependency-fileargument (used by Ninja). This resolves an error where non-existentdirectories could cause issues when invokingld.lld @response.txt.
--symbol-ordering-file= and call graph profile can nowbe used together.
When --call-graph-ordering-file= is specified,.llvm.call-graph-profile sections in relocatable files areno longer used.
--lto-basic-block-sections=labels is deprecated infavor of --lto-basic-block-address-map. (#110697)
In non-relocatable links, a .note.GNU-stack sectionwith the SHF_EXECINSTR flag is now rejected unless-z execstack is specified. (#124068)
In relocatable links, the sh_entsize member of aSHF_MERGE section with relocations is now respected in theoutput.
Quoted names can now be used in output section phdr, memory regionnames, OVERLAY, the LHS of --defsym, andINSERT AFTER.
Section CLASS linker script syntax binds input sectionsto named classes, which are referenced later one or more times. Thisprovides access to the automatic spilling mechanism of--enable-non-contiguous-regions without globally changingthe semantics of section matching. It also independently increases theexpressive power of linker scripts. (#95323)
INCLUDE cycle detection has been fixed. A linker scriptcan now be included twice.
The archivename: syntax when matching input sections isnow supported. (#119293)
To support Arm v6-M, short thunks using B.w are no longer generated.(#118111)
For AArch64, BTI-aware long branch thunks can now be created to adestination function without a BTI instruction. (#108989) (#116402)
Relocations related to GOT and TLSDESC for the AArch64 PointerAuthentication ABI are now supported.
Supported relocation types for x86-64 target:
R_X86_64_CODE_4_GOTPCRELX (#109783) (#116737)
R_X86_64_CODE_4_GOTTPOFF (#116634)
R_X86_64_CODE_4_GOTPC32_TLSDESC (#116909)
R_X86_64_CODE_6_GOTTPOFF (#117675)
Supported relocation types for LoongArch target:R_LARCH_TLS_{LD,GD,DESC}_PCREL20_S2. (#100105)
Linker scripts
The CLASS keyword, which separates section matching andreferring, is a noteworthy new feature to the linker script support.Here is the GNU ld featurerequest.
Section layout
If --symbol-ordering-file= is specified,--symbol-ordering-file= specified sections are placedfirst. In LLD 20, SHT_LLVM_CALL_GRAPH_PROFILE sections inrelocatable files are still used for other sections.
The next release will support options--bp-compression-sort=both and--bp-startup-sort=function --irpgo-profile=a.profdata thatimproves Lempel-Ziv compression and reduces page faults during programstartup for mobile applications.
.dynsym computation
The purpose of Symbol::includeInDynsym was somewhatambiguous, as it was used both to determine if a symbol should beexported to .dynsym and to conservatively suppresstransformations in other contexts like MarkLive and ICF. LLD 20clarifies this by introducing Symbol::isExportedspecifically for indicating whether a defined symbol should be exported.All previous uses of Symbol::includeInDynsym have beenupdated to use Symbol::isExported instead. The oldconfusing Symbol::exportDynamic has been removed.
A special case within Symbol::includeInDynsym checkedfor isUndefWeak() && ctx.arg.noDynamicLinker. (Thiscould be generalized toisUndefined() && ctx.arg.noDynamicLinker, asnon-weak undefined symbols led to errors.) This condition ensures thatundefined symbols are not included in .dynsym forstatically linked ET_DYN executables (created withclang -static-pie).
This condition has been generalized in LLD 20 to(ctx.arg.shared || !ctx.sharedFiles.empty()) && (sym->isUndefined() || sym->isExported).This means undefined symbols are excluded from .dynsym inboth ld.lld -pie a.o andld.lld -pie --no-dynamic-linker a.o, but notld.lld -pie a.o b.so. This change brings LLD's behaviormore in line with GNU ld.
Symbol::isPreemptible, indicating whether a symbol couldbe bound to another component, was calculated before relocation scanningand, in LLD 19, also during Identical Code Folding (ICF). In LLD 20, theICF-related calculation has been moved to the symbol versioning parsingstage.
In LLD 20, isExported and isPreemptible arecomputed in the following passes.
Scan input files, interleaved with symbol resolution: setisExported when defined or referenced by sharedobjects
Clear isExported if influenced by--exclude-libs
parseVersionAndComputeIsPreemptible
Clear isExported if localized due to hiddenvisibility.
For undefined symbols, compute isPreemptible
For defined symbols in relocatable files, or bitcode files when!ltoCanOmit, set isExported and computeisPreemptible
compileBitcodeFiles
Scan LTO compiled relocatable files
Clear isExported if influenced by--exclude-libs
finalizeSections: recomputeisPreemptible
isPreemptible and isExported determinewhether a symbol should be exported to .dynsym.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
for (Symbol *sym : ctx.symtab->getSymbols()) { if (!sym->isUsedInRegularObj || !includeInSymtab(ctx, *sym)) continue; if (!ctx.arg.relocatable) sym->binding = sym->computeBinding(ctx); if (ctx.in.symTab) ctx.in.symTab->addSymbol(sym);
// computeBinding might localize a linker-synthesized hidden symbol // that was considered exported. if ((sym->isExported || sym->isPreemptible) && !sym->isLocal()) { ctx.partitions[sym->partition - 1].dynSymTab->addSymbol(sym); if (auto *file = dyn_cast<SharedFile>(sym->file)) if (file->isNeeded && !sym->isUndefined()) addVerneed(ctx, *sym); } }
For every node H in a post-order traversal of thedominator tree (or the original CFG), find all predecessors that aredominated by H. This identifies all back edges.
Each back edge T->H identifies a natural loop withH as the header.
Perform a flood fill starting from T in the reverseddominator tree (from exiting block to header)
All visited nodes reachable from the root belong to the natural loopassociated with the back edge. These nodes are guaranteed to bereachable from H due to the dominator property.
Visited nodes unreachable from the root should be ignored.
Loops associated with visited nodes are considered subloops.
vector<vector<int>> e, ee, edom; vector<int> dfn, dfn2, rdfn, uf, best, sdom, idom; int tick;
voiddfs(int u){ dfn[u] = tick; rdfn[tick++] = u; for (int v : e[u]) if (dfn[v] < 0) { uf[v] = u; dfs(v); } }
inteval(int v, int cur){ if (dfn[v] <= cur) return v; int u = uf[v], r = eval(u, cur); if (dfn[best[u]] < dfn[best[v]]) best[v] = best[u]; return uf[v] = r; }
voidsemiNca(int n, int r){ idom.assign(n, -1); dfn.assign(n, -1); rdfn.resize(n); // initial values are unused uf.resize(n); // initial values are unused sdom.resize(n); // initial values are unused tick = 0; dfs(r); best.resize(n); iota(best.begin(), best.end(), 0); for (int i = tick; --i; ) { int v = rdfn[i]; sdom[v] = v; for (int u : ee[v]) if (~dfn[u]) { eval(u, i); if (dfn[best[u]] < dfn[sdom[v]]) sdom[v] = best[u]; } best[v] = sdom[v]; idom[v] = uf[v]; } edom.assign(n, vector<int>()); for (int i = 1; i < tick; i++) { int v = rdfn[i]; while (dfn[idom[v]] > dfn[sdom[v]]) idom[v] = idom[idom[v]]; edom[idom[v]].push_back(v); } }
voidpostorder(int u){ dfn[u] = tick; for (int v : edom[u]) if (dfn[v] < 0) postorder(v); rdfn[tick++] = u; dfn2[u] = tick; }
voididentifyLoops(int n, int r){ vector<int> worklist; vector<Loop *> to_loop(n); dfn.assign(n, -1); dfn2.assign(n, -1); tick = 0; postorder(r); loops.clear(); for (int i = 0; i < tick; i++) { int header = rdfn[i]; for (int u : ee[header]) if (dfn[header] <= dfn[u] && dfn2[u] <= dfn2[header]) worklist.push_back(u); if (worklist.empty()) continue; loops.push_back(Loop{(int)loops.size(), header}); Loop *lp = &loops.back(); while (worklist.size()) { int v = worklist.back(); worklist.pop_back(); if (!to_loop[v]) { if (dfn[v] < 0) // Skip unreachable node continue; // Find a node not in a loop. to_loop[v] = lp; lp->nodes.push_back(v); if (v == header) continue; for (int u : ee[v]) worklist.push_back(u); } else { // Find a subloop. Loop *sub = to_loop[v]; while (sub->parent) sub = sub->parent; if (sub == lp) continue; sub->parent = lp; sub->next = lp->child; lp->child = sub; for (int u : ee[sub->header]) if (to_loop[u] != sub) worklist.push_back(u); } } } }
intmain(){ int n, m; scanf("%d%d", &n, &m); e.resize(n); ee.resize(n); for (int i = 0; i < m; i++) { int u, v; scanf("%d%d", &u, &v); e[u].push_back(v); ee[v].push_back(u); } semiNca(n, 0); for (int i = 0; i < n; i++) printf("%d: %d\n", i, idom[i]);
identifyLoops(n, 0); for (Loop &lp : loops) { printf("loop %d:", lp.idx); for (int v : lp.nodes) printf(" %d", v); for (Loop *c = lp.child; c; c = c->next) printf(" (loop %d)", c->idx); puts(""); } }
The code iterates over the dominator tree in post-order.Alternatively, a post-order traversal of the original control flow graphcould be used.
worklist may contain duplicate elements. This isacceptable. You could also deduplicate elements.
Importantly, the header predecessor of a subloop can be anothersubloop.
In the final loops array, parent loops are listed aftertheir child loops.
This example examines multiple subtle details: a self-loop (node 6),an unreachable node (node 8), and a scenario where the headerpredecessor of one subloop (nodes 2 and 3) leads to another subloop(nodes 4 and 5).
For every node H in a post-order traversal of thedominator tree (or the original CFG), find all predecessors that aredominated by H. This identifies all back edges.
Each back edge T->H identifies a natural loop withH as the header.
Perform a flood fill starting from T in the reverseddominator tree (from exiting block to header)
All visited nodes reachable from the root belong to the natural loopassociated with the back edge. These nodes are guaranteed to bereachable from H due to the dominator property.
Visited nodes unreachable from the root should be ignored.
Loops associated with visited nodes are considered subloops.
vector<vector<int>> e, ee, edom; vector<int> dfn, dfn2, rdfn, uf, best, sdom, idom; int tick;
voiddfs(int u){ dfn[u] = tick; rdfn[tick++] = u; for (int v : e[u]) if (dfn[v] < 0) { uf[v] = u; dfs(v); } }
inteval(int v, int cur){ if (dfn[v] <= cur) return v; int u = uf[v], r = eval(u, cur); if (dfn[best[u]] < dfn[best[v]]) best[v] = best[u]; return uf[v] = r; }
voidsemiNca(int n, int r){ idom.assign(n, -1); dfn.assign(n, -1); rdfn.resize(n); // initial values are unused uf.resize(n); // initial values are unused sdom.resize(n); // initial values are unused tick = 0; dfs(r); best.resize(n); iota(best.begin(), best.end(), 0); for (int i = tick; --i; ) { int v = rdfn[i]; sdom[v] = v; for (int u : ee[v]) if (~dfn[u]) { eval(u, i); if (dfn[best[u]] < dfn[sdom[v]]) sdom[v] = best[u]; } best[v] = sdom[v]; idom[v] = uf[v]; } edom.assign(n, vector<int>()); for (int i = 1; i < tick; i++) { int v = rdfn[i]; while (dfn[idom[v]] > dfn[sdom[v]]) idom[v] = idom[idom[v]]; edom[idom[v]].push_back(v); } }
voidpostorder(int u){ dfn[u] = tick; for (int v : edom[u]) if (dfn[v] < 0) postorder(v); rdfn[tick++] = u; dfn2[u] = tick; }
voididentifyLoops(int n, int r){ vector<int> worklist; vector<Loop *> to_loop(n); dfn.assign(n, -1); dfn2.assign(n, -1); tick = 0; postorder(r); loops.clear(); for (int i = 0; i < tick; i++) { int header = rdfn[i]; for (int u : ee[header]) if (dfn[header] <= dfn[u] && dfn2[u] <= dfn2[header]) worklist.push_back(u); if (worklist.empty()) continue; loops.push_back(Loop{(int)loops.size(), header}); Loop *lp = &loops.back(); while (worklist.size()) { int v = worklist.back(); worklist.pop_back(); if (!to_loop[v]) { if (dfn[v] < 0) // Skip unreachable node continue; // Find a node not in a loop. to_loop[v] = lp; lp->nodes.push_back(v); if (v == header) continue; for (int u : ee[v]) worklist.push_back(u); } else { // Find a subloop. Loop *sub = to_loop[v]; while (sub->parent) sub = sub->parent; if (sub == lp) continue; sub->parent = lp; sub->next = lp->child; lp->child = sub; for (int u : ee[sub->header]) if (to_loop[u] != sub) worklist.push_back(u); } } } }
intmain(){ int n, m; scanf("%d%d", &n, &m); e.resize(n); ee.resize(n); for (int i = 0; i < m; i++) { int u, v; scanf("%d%d", &u, &v); e[u].push_back(v); ee[v].push_back(u); } semiNca(n, 0); for (int i = 0; i < n; i++) printf("%d: %d\n", i, idom[i]);
identifyLoops(n, 0); for (Loop &lp : loops) { printf("loop %d:", lp.idx); for (int v : lp.nodes) printf(" %d", v); for (Loop *c = lp.child; c; c = c->next) printf(" (loop %d)", c->idx); puts(""); } }
The code iterates over the dominator tree in post-order.Alternatively, a post-order traversal of the original control flow graphcould be used.
worklist may contain duplicate elements. This isacceptable. You could also deduplicate elements.
Importantly, the header predecessor of a subloop can be anothersubloop.
In the final loops array, parent loops are listed aftertheir child loops.
This example examines multiple subtle details: a self-loop (node 6),an unreachable node (node 8), and a scenario where the headerpredecessor of one subloop (nodes 2 and 3) leads to another subloop(nodes 4 and 5).
Clang provides a few options to generate timing report. Among them,-ftime-report and -ftime-trace can be used toanalyze the performance of Clang's internal passes.
-fproc-stat-report records time and memory on spawnedprocesses (ld, and gas if-fno-integrated-as).
-ftime-trace, introduced in 2019, generates Clangtiming information in the Chrome Trace Event format (JSON). The formatsupports nested events, providing a rich view of the front end.
-ftime-report: The option name is borrowed fromGCC.
This post focuses on the traditional -ftime-report,which uses a line-based textual format.
Understanding-ftime-report output
The output consists of information about multiple timer groups. Thelast group spans the largest interval and encompasses timing data fromother groups.
Up to Clang 19, the last group is called "Clang front-end timereport". You would see something like the following.
The "Clang front-end timer" timer measured the time spent inclang::FrontendAction::Execute, which includes lexing,parsing, semantic analysis, LLVM IR generation, optimization, andmachine code generation. However, "Code Generation Time" and "LLVM IRGeneration Time" belonged to the default timer group "MiscellaneousUngrouped Timers". This caused confusion for many users. For example, https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/elaborates on the issues.
To address the ambiguity, I revamped the output in Clang 20.
1 2 3 4 5 6 7 8 9 10 11 12
... ===-------------------------------------------------------------------------=== Clang time report ===-------------------------------------------------------------------------=== Total Execution Time: 0.7685 seconds (0.7686 wall clock)
The last group has been renamed and changed to cover a longerinterval within the invocation. It provides timing information for fourstages:
Front end: Includes lexing, parsing, semantic analysis, andmiscellnaenous tasks not captured by the subsequent timers.
LLVM IR generation: The time spent in generating LLVM IR.
LLVM IR optimization: The time consumed by LLVM's IR optimizationpipeline.
Machine code generation: The time taken to generate machine code orassembly from the optimized IR.
The -ftime-report output further elaborates on thesestages through additional groups:
"Pass execution timing report" (first instance): A subset of the"Optimizer" group, providing detailed timing for individual optimizationpasses.
"Analysis execution timing report": A subset of the first "Passexecution timing report". In LLVM's new pass manager, analyses areexecuted as part of pass invocations.
"Pass execution timing report" (second instance): A subset of the"Machine code generation" group. (This group's name should be updatedonce the legacy pass manager is no longer used for IRoptimization.)
"Instruction Selection and Scheduling": This group appears whenSelectionDAG is utilized and is part of the "Instruction Selection"timer within the second "Pass execution timing report".
When -ftime-report=per-run-pass is specified, a timer iscreated for each pass object. This can result in significant output,especially for modules with numerous functions, as each pass will bereported multiple times.
Clang internals
As clang -### -c -ftime-report shows, clangDriverforwards -ftime-report to Clang cc1. Within cc1, thisoption sets the codegen flagclang::CodeGenOptions::TimePasses. This flag enables ethuses of llvm::Timer objects to measure the execution timeof specific code blocks.
From Clang 20 onwards, the placement of the timers can be understoodthrough the following call tree.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
cc1_main ExecuteCompilerInvocation // "Front end" minus the following timers ... all kinds of initialization CompilerInstance::ExecuteAction FrontendAction::BeginSourceFile FrontendAction::Execute FrontendAction::ExecutionAction ASTFrontendAction::ExecuteAction ParseAST BackendConsumer::HandleTranslationUnit clang::emitBackendOutput EmitAssemblyHelper::emitAssembly RunOptimizationPipeline // "Optimizer" RunCodegenPipeline // "Machine code generation" FrontendAction::EndSourceFile
The measured interval does not cover the whole invocation. integratedcc1 clang -c -ftime-report a.c
LLVM internals
LLVM/lib/Support/Time.cpp implements the timer feature.Timer belongs to a TimerGroup.Timer::startTimer and Timer::stopTimergenerate a TimeRecord. Inclang/tools/driver/cc1_main.cpp,llvm::TimerGroup::printAll(llvm::errs()); dumps theseTimerGroup and TimeRecord information tostderr.
There are a few cl::opt options
sort-timers (default: true): sort the timers in a groupin descending wall time.
track-memory: record increments or decrements in mallocstatistics. In glibc 2.33 and above, this utilizesmallinfo2::unordblks.
info-output-file: dump output to the specifiedfile.
On Apple platforms, LLVM_SUPPORT_XCODE_SIGNPOSTS=onbuilds enableos_signpost forstartTimer/stopTimer.
The -ftime-report system has a significant limitation:it doesn't support nested timers. Although adding more timer groupsmight seem like a solution, the resulting output lacks any hierarchicalstructure, making it difficult to understand.
I have been busy creating posts, authoring a total of 31 blog posts(including this one). 7 posts resonated on Hacker News, garnering over50 points. (https://news.ycombinator.com/from?site=maskray.me).
I have also revised many posts initially written between 2020 and2024.
I made 5 commits to the project, including the addition of the x86inline asm constraint "Ws". you can read more about that in my earlierpost Rawsymbol names in inline assembly.
I believe that modernizing code review and test infrastructure willenhance the contributor experience and attract more contributors.
Official maintainer status on the MC layer and binary utilities
My involvement with LLVM 18 and 19
Key Points:
TODO
Added a script update_test_body.pyto generate elaborated IR and assembly tests (#89026)
MC
Made some MCand assembler improvements in LLVM 19
Fixed some intrusive changes to the generic code due to AIX andz/OS.
Made llvm-mc better as an assemblerand disassembler
Light ELF
Implementeda compact relocation format for ELF
AArch64mapping symbol size optimization
Enabled StackSafetyAnalysis for AddressSanitizer to removeinstrumentations on stack-allocated variables that are guaranteed to besafe from memory access bugs
Bail out if MemIntrinsic length is -1
Bail out when calling ifunc
Added the Clang cc1 option--output-asm-variant= and cleaned up internals of itsfriends (x86-asm-syntax).
llvm/ADT/Hashing.hstability
llvm/ADT/Hashing.h stability
To facilitate improvements, llvm/ADT/Hashing.h promisedto be non-deteriministic so that users could not depend on exact hashvalues. However, the values were actually deterministic unlessset_fixed_execution_hash_seed was called. A lot of internalcode incorrectly relied on the stability ofhash_value/hash_combine/hash_combine_range. I have fixedthem and landed https://github.com/llvm/llvm-project/pull/96282 to makethe hash value non-deteriministic inLLVM_ENABLE_ABI_BREAKING_CHECKS builds.
lld/ELF
lld/ELF is quite stable. I have made some maintenance changes. Asusual, I wrote the ELF port's release notes for the two releases. See lld 18 ELF changes and lld 19 ELF changes fordetail.
Linux kernel
Contributed 4 commits.
ccls
I finally removed support for LLVM 7, 8, and 9. The latest release https://github.com/MaskRay/ccls/releases/tag/0.20241108has some nice features.
didOpen: sort index requests. When you open A/B/foo.cc, files under"A/B/" and "A/" will be prioritized during the initial indexing process,leading to a quicker response time.
Support for older these LLVM versions 7, 8, and 9 has beendropped.
LSP semantic tokens are now supported. See usage guidehttps://maskray.me/blog/2024-10-20-ccls-and-lsp-semantic-tokens usage(including rainbow semantic highlighting)
textDocument/switchSourceHeader (LSP extension) is nowsupported.
Misc
Reported 12 feature requests or bugs to binutils.
objdump -R: dump SHT_RELR relocations?
gas arm aarch64: missing mapping symbols $d in the absence of alignment directives
gas: Extend .loc directive to emit a label
Compressed .strtab and .symtab
gas: Support \+ in .rept/.irp/.irpc directives
ld: Add CLASS to allow separate section matching and referring
gas/ld: Implicit addends for non-code sections
binutils: Support CREL relocation format
ld arm: global/weak non-hidden symbols referenced by R_ARM_FUNCDESC are unnecessarily exported
ld arm: fdpic link segfaults on R_ARM_GOTOFFFUNCDESC referencing a hidden symbol
ld arm: fdpic link may have null pointer dereference in allocate_dynrelocs_for_symbol
objcopy: add --prefix-symbols-remove
Reported 2 feature requests to glibc
Feature request: special static-pie capable of loading the interpreter from a relative path
In debuggers, stepping into a function with arguments that involvefunction calls may step into the nested function calls, even if they aresimple and uninteresting, such as those found in the C++ STL.
intmain(){ auto i = make_unique<int>(3); vector v{1,2}; foo(*i, v.back()); // step into }
When GDB stops at the foo call, the step(s) command will step into std::vector::backand std::unique_ptr::operator*. While you can executefinish (fin) and then execute sagain, it's time-consuming and distracting, especially when dealing withcomplex argument expressions.
% g++ -g a.cc -o a % gdb ./a ... (gdb) s std::vector<int, std::allocator<int> >::back (this=0x7fffffffddd0) at /usr/include/c++/14.2.1/bits/stl_vector.h:1235 1235 back() _GLIBCXX_NOEXCEPT (gdb) fin Run till exit from #0 std::vector<int, std::allocator<int> >::back (this=0x7fffffffddd0) at /usr/include/c++/14.2.1/bits/stl_vector.h:1235 0x00005555555566f8 in main () at a.cc:13 13 foo(*i, v.back()); Value returned is $1 = (__gnu_cxx::__alloc_traits<std::allocator<int>, int>::value_type &) @0x55555556c2d4: 2 (gdb) s std::unique_ptr<int, std::default_delete<int> >::operator* (this=0x7fffffffddc0) at /usr/include/c++/14.2.1/bits/unique_ptr.h:447 447 __glibcxx_assert(get() != pointer()); (gdb) fin Run till exit from #0 std::unique_ptr<int, std::default_delete<int> >::operator* (this=0x7fffffffddc0) at /usr/include/c++/14.2.1/bits/unique_ptr.h:447 0x0000555555556706 in main () at a.cc:13 13 foo(*i, v.back()); Value returned is $2 = (int &) @0x55555556c2b0: 3 (gdb) s foo (i=3, j=2) at a.cc:7 7 printf("%d %d\n", i, j);
This problem was tracked as a feature request in 2003: https://sourceware.org/bugzilla/show_bug.cgi?id=8287.Fortunately, GDB provides the skipcommand to skip functions that match a regex or filenames that matcha glob (GDB 7.12 feature). You can skip all demangled function namesthat start with std::.
1
skip -rfu ^std::
Alternatively, you can executeskip -gfi /usr/include/c++/*/bits/* to skip these libstdc++files.
Important note:
The skip command's file matching behavior uses thefnmatch function with the FNM_FILE_NAMEflag. This means the wildcard character (*) won't matchslashes. So, skip -gfi /usr/* won't exclude/usr/include/c++/14.2.1/bits/stl_vector.h.
I proposed to dropthe FNM_FILE_NAME flag. With GDB 17, I will be able toskip a project directory with
1
skip -gfi */include/llvm/ADT/*
instead of
1
skip -gfi /home/ray/llvm/llvm/include/llvm/ADT/*
User functionscalled by skipped functions
When a function (let's call it "A") is skipped during debugging, anyuser-defined functions that are called by "A" will also be skipped.
For example, consider the following code snippet:
1 2 3
std::vector<int> a{1, 2}; if (std::all_of(a.begin(), a.end(), predicate)) { }
If std::all_of is skipped due to a skipcommand, predicate called within std::all_ofwill also be skipped when you execute s at the ifstatement.
LLDB
By default, LLDB avoids stepping into functions whose names startwith std:: when you use the s(step, thread step-in) command. This behavioris controlled by a setting:
1 2 3 4
(lldb) settings show target.process.thread.step-avoid-regexp target.process.thread.step-avoid-regexp (regex) = ^std:: (lldb) set sh target.process.thread.step-avoid-libraries target.process.thread.step-avoid-libraries (file-list) =
target.process.thread.step-avoid-libraries can be usedto skip functions defined in a library.
While the command settings set is long, you can shortenit to set set.
Visual Studio
Visual Studio provides a debugging feature JustMy Code that automatically steps over calls to system,framework, and other non-user code.
It also supports a Step Into Specific command, whichseems interesting.
The implementation inserts a call to__CheckForDebuggerJustMyCode at the start of every userfunction. The function(void __CheckForDebuggerJustMyCode(const char *flag)) takesa global variable defined in the .msvcjmc section anddetermines whether the debugger should stop.
This LLDB feature request has a nice description: https://github.com/llvm/llvm-project/issues/61152.
For the all_of example, the feature can possibly allowthe debugger to stop at test.
1 2 3
std::vector<int> a{1, 2}; if (std::all_of(a.begin(), a.end(), test)) { }
Fuchsia zxdb
The Fuchsia debugger "zxdb" provides a command "ss"similar to Visual Studio's "Step Into Specific".
On https://x.com/settings/, clickMore -> Settings and privacy -> Download an archive of your data.Wait for a message from x.com: "@XXX your X data is ready" Download thearchive.
1
cp data/tweets.js tweets.ts
Change the first line from window.YTD.tweets.part0 = [to let part0 = [, and append
Both compiler developers and security researchers have builtdisassemblers. They often prioritize different aspects. Compilertoolchains, benefiting from direct contributions from CPU vendors, tendto offer more accurate and robust decoding. Security-focused tools, onthe other hand, often excel in user interface design.
For quick disassembly tasks, rizinprovides a convenient command-line interface.
1 2 3
% rz-asm -a x86 -b 64 -d 4829c390 sub rbx, rax nop
-a x86 can be omitted.
llvm-mc
Within the LLVM ecosystem, llvm-objdump serves as a drop-inreplacement for the traditional GNU objdump, leveraging instructioninformation from LLVM's TableGen files(llvm/lib/Target/*/*.td). Another LLVM tool, llvm-mc, wasoriginally designed for internal testing of the Machine Code (MC) layer,particularly the assembler and disassembler components. There arenumerous RUN: llvm-mc ... tests withinllvm/test/MC. Despite its internal origins, llvm-mc isoften distributed as part of the LLVM toolset, making it accessible tousers.
However, using llvm-mc for simple disassembly tasks can becumbersome. It requires explicitly prefixing hexadecimal byte valueswith 0x:
Let's break down the options used in this command:
--triple=x86_64: This specifies the targetarchitecture. If your LLVM build's default target triple is alreadyx86_64-*-*, this option can be omitted.
--output-asm-variant=1:LLVM, like GCC, defaults to AT&T syntax for x86 assembly. Thisoption switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntaxif you prefer the Intel syntax in compiler toolchains.
--cdis: Introduced in LLVM 18, this option enablescolored disassembly. In older LLVM versions, you have to use--disassemble.
I have contributed patches to remove.text and allow disassemblingraw bytes without the 0x prefix. You can now use the--hex option:
% disasm 4829c390 sub rbx, rax nop % disasm $'4829 c3\n# comment\n90' sub rbx, rax nop
The --hex option conveniently ignores whitespace and#-style comments within the input.
Atomic blocks
llvm-mc handles decoding failures by skipping a number of bytes, asdetermined by the target-specificllvm::MCDisassembler::getInstruction. To treat a sequenceof bytes as a single unit during disassembly, enclose them within[].
(I've contributed a change to LLVM 20 that removesthe previously printed .text directive.)
llvm-objdump
For address information, llvm-mc falls short. We need to turn tollvm-objdump to get that detail. Here is a little fish script that takesraw hex bytes as input, converts them to a binary format(xxd -r -p), and then creates an ELF relocatable file(llvm-objcopy -I binary) targeting the x86-64 architecture.Finally, llvm-objdump with the -D flag disassembles thedata section (.data) containing the converted binary.
#!/usr/bin/env fish argparse a/arch= att r -- $argv; or return 1 if test -z "$_flag_arch"; set _flag_arch x86_64; end set opt --triple=$_flag_arch if test -z "$_flag_att" && string match -rq 'i.86|x86_64' $_flag_arch; set -a opt -M intel; end if test -n "$_flag_r"; set -a opt --no-leading-addr; set -a opt --no-show-raw-insn; end
switch $_flag_arch case arm; set bfdname elf32-littlearm case aarch64; set bfdname elf64-littleaarch64 case ppc32; set bfdname elf32-powerpc case ppc32le; set bfdname elf32-powerpcle case ppc64; set bfdname elf64-powerpc case ppc64le; set bfdname elf64-powerpcle case riscv32; set bfdname elf32-littleriscv case riscv64; set bfdname elf64-littleriscv case 'i?86'; set bfdname elf32-i386 case x86_64; set bfdname elf64-x86-64 case '*'; echo unknown arch >&2; return 1 end llvm-objdump -D -j .data $opt (echo $argv | xxd -r -p | llvm-objcopy -I binary -O $bfdname - - | psub) | sed '1,/<_binary__stdin__start>:/d'
Both compiler developers and security researchers have builtdisassemblers. They often prioritize different aspects. Compilertoolchains, benefiting from direct contributions from CPU vendors, tendto offer more accurate and robust decoding. Security-focused tools, onthe other hand, often excel in user interface design.
For quick disassembly tasks, rizinprovides a convenient command-line interface.
1 2 3
% rz-asm -a x86 -b 64 -d 4829c390 sub rbx, rax nop
-a x86 can be omitted.
Within the LLVM ecosystem, llvm-objdump serves as a drop-inreplacement for the traditional GNU objdump, leveraging instructioninformation from LLVM's TableGen files(llvm/lib/Target/*/*.td). Another LLVM tool, llvm-mc, wasoriginally designed for internal testing of the Machine Code (MC) layer,particularly the assembler and disassembler components. There arenumerous RUN: llvm-mc ... tests withinllvm/test/MC. Despite its internal origins, llvm-mc isoften distributed as part of the LLVM toolset, making it accessible tousers.
However, using llvm-mc for simple disassembly tasks can becumbersome. It requires explicitly prefixing hexadecimal byte valueswith 0x:
Let's break down the options used in this command:
--triple=x86_64: This specifies the targetarchitecture. If your LLVM build's default target triple is alreadyx86_64-*-*, this option can be omitted.
--output-asm-variant=1:LLVM, like GCC, defaults to AT&T syntax for x86 assembly. Thisoption switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntaxif you prefer the Intel syntax in compiler toolchains.
--cdis: Introduced in LLVM 18, this option enablescolored disassembly. In older LLVM versions, you have to use--disassemble.
I have contributed patches to remove.text and allow disassemblingraw bytes without the 0x prefix. You can now use the--hex option:
The Google C++ Style is widely adopted by projects. It contains abrace omission guideline in Loopingand branching statements:
For historical reasons, we allow one exception to the above rules:the curly braces for the controlled statement or the line breaks insidethe curly braces may be omitted if as a result the entire statementappears on either a single line (in which case there is a space betweenthe closing parenthesis and the controlled statement) or on two lines(in which case there is a line break after the closing parenthesis andthere are no braces).
1 2 3 4 5 6 7 8 9
// OK - fits on one line. if (x == kFoo) { returnnewFoo(); }
// OK - braces are optional in this case. if (x == kFoo) returnnewFoo();
// OK - condition fits on one line, body fits on another. if (x == kBar) Bar(arg1, arg2, arg3);
In clang-format's predefined Google style for C++, there are tworelated style options:
The two options cause clang-format to aggressively join lines for thefollowing code:
1 2 3 4 5 6 7 8
for (int x : a) foo(x);
while (cond()) foo(x);
if (x) foo(x);
As a heavy debugger user, I find this behavior cumbersome.
1 2 3 4 5 6 7
// clang-format --style=Google #include<vector> voidfoo(int v){} intmain(){ std::vector<int> a{1, 2, 3}; for (int x : a) foo(x); // breakpoint }
When GDB stops at the for loop, how can I step into theloop body? Unfortunately, it's not simple.
If I run step, GDB will dive into the implementationdetail of the range-based for loop. It will stop at thestd::vector::begin function. Stepping out and executingstep again will stop at the std::vector::endfunction. Stepping out and executing step another time willstop at the operator!= function of the iterator type. Hereis an interaction example with GDB:
(gdb) n 5 for (int x : a) foo(v); (gdb) s std::vector<int, std::allocator<int> >::begin (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:873 873 begin() _GLIBCXX_NOEXCEPT (gdb) fin Run till exit from #0 std::vector<int, std::allocator<int> >::begin (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:873 0x00005555555561d5 in main () at a.cc:5 5 for (int x : a) foo(v); Value returned is $1 = 1 (gdb) s std::vector<int, std::allocator<int> >::end (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:893 893 end() _GLIBCXX_NOEXCEPT (gdb) fin Run till exit from #0 std::vector<int, std::allocator<int> >::end (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:893 0x00005555555561e5 in main () at a.cc:5 5 for (int x : a) foo(v); Value returned is $2 = 0 (gdb) s __gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > > (__lhs=1, __rhs=0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1235 1235 { return __lhs.base() != __rhs.base(); } (gdb) fin Run till exit from #0 __gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > > (__lhs=1, __rhs=0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1235 0x0000555555556225 in main () at a.cc:5 5 for (int x : a) foo(v); Value returned is $3 = true (gdb) s __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator* (this=0x7fffffffdca0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1091 1091 { return *_M_current; } (gdb) fin Run till exit from #0 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator* (this=0x7fffffffdca0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1091 0x00005555555561f7 in main () at a.cc:5 5 for (int x : a) foo(v); Value returned is $4 = (int &) @0x55555556b2b0: 1
You can see that this can significantly hinder the debugging process,as it forces the user to delve into uninteresting function calls of therange-based for loop.
In contrast, when the loop body is on the next line, we can just runnext to skip the three uninteresting function calls:
1 2
for (int x : a) // next foo(x); // step
The AllowShortIfStatementsOnASingleLine style option issimilar. While convenient for simple scenarios, it can sometimes hinderdebuggability.
For the following code, it's not easy to skip the c()and d() function calls if you just want to step intofoo(v).
1
if (c() && d()) foo(v);
Many developers, mindful of potential goto fail-likeissues, often opt to include braces in their code. clang-format'sdefault style can further reinforce this practice.
1 2 3 4 5 6 7
// clang-format does not join lines. if (v) { foo(v); } for (int x : a) { foo(x); }
Other predefined styles
clang-format's Chromium style is a variant of the Google style anddoes not have the aforementioned problem. The LLVM style, and manystyles derived from it, do not have the problem either.
Go, Odin, and Rust require {} for if statements but omit(), striking a balance between clarity and conciseness.C/C++'s required ()` makes opt-in braces feel a bit verbose.
LLD, the LLVM linker, is a matureand fast linker supporting multiple binary formats (ELF, Mach-O,PE/COFF, WebAssembly). Designed as a standalone program, the code baserelies heavily on global state, making it less than ideal for libraryintegration. As outlined in RFC:Revisiting LLD-as-a-library design, two main hurdles exist:
Fatal errors: they exit the process without returning control to thecaller. This was actually addressed for most scenarios in 2020 byutilizing llvm::sys::Process::Exit(val, /*NoCleanup=*/true)and CrashRecoveryContext (longjmp under thehood).
Global variable conflicts: shared global variables do not allow twoconcurrent invocation.
I understand that calling a linker API could be convenient,especially when you want to avoid shipping another executable (which canbe large when you link against LLVM statically). However, I believe thatinvoking LLD as a separate process remains the recommended approach.There are several advantages:
Build system control: Build systems gain greater control overscheduling and resource allocation for LLD. In an edit-compile-linkcycle, the link could need more resources and threading is moreuseful.
Better parallelism management
Global state isolation: LLVM's global state (primarilycl::opt and ManagedStatic) is isolated.
While spawning a new process offers build system benefits, the issueof global state usage within LLD remains a concern. This is a factor toconsider, especially for advanced use cases. Here are global variablesin the LLD 15 code base.
In 2021, global variables were removed fromlld/Common.
The COFF port followed suite, eliminating most of its globalvariables.
Inspired by theseadvancements, I conceived a plan to eliminate globalvariables from the ELF port. In 2022, as part of the work to enableparallel section initialization, I introduced a classstruct Ctx to lld/ELF/Config.h. Here is myplan:
Global variables will be migrated into Ctx.
Functions will be modified to accept a new Ctx &ctxparameter.
The previously global variable lld::elf::ctx will be transformedinto a local variable within lld::elf::link.
Encapsulating globalvariables into Ctx
Over the past two years and a half, I have migrated global variablesinto the Ctx class, e.g..
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h index 590c19e6d88d..915c4d94e870 100644 --- a/lld/ELF/Config.h +++ b/lld/ELF/Config.h @@ -382,2 +382,10 @@ struct Ctx { std::atomic<bool> hasSympart{false}; + // A tuple of (reference, extractedFile, sym). Used by --why-extract=. + SmallVector<std::tuple<std::string, const InputFile *, const Symbol &>, 0> + whyExtractRecords; + // A mapping from a symbol to an InputFile referencing it backward. Used by + // --warn-backrefs. + llvm::DenseMap<const Symbol *, + std::pair<const InputFile *, const InputFile *>> + backwardReferences; }; diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp index 8315d43c776e..2ab698c91b01 100644 --- a/lld/ELF/Driver.cpp +++ b/lld/ELF/Driver.cpp @@ -1776,3 +1776,3 @@ static void handleUndefined(Symbol *sym, const char *option) { if (!config->whyExtract.empty()) - driver->whyExtract.emplace_back(option, sym->file, *sym); + ctx->whyExtractRecords.emplace_back(option, sym->file, *sym); } @@ -1812,3 +1812,3 @@ static void handleLibcall(StringRef name) {
-void LinkerDriver::writeArchiveStats() const { +static void writeArchiveStats() { if (config->printArchiveStats.empty()) @@ -1834,3 +1834,3 @@ void LinkerDriver::writeArchiveStats() const { ++extracted[CachedHashStringRef(file->archiveName)]; - for (std::pair<StringRef, unsigned> f : archiveFiles) { + for (std::pair<StringRef, unsigned> f : driver->archiveFiles) { unsigned &v = extracted[CachedHashString(f.first)];
I did not do anything thing with the global variables in 2024. Thework was resumed in July 2024. I moved TarWriter,SymbolAux, Out, ElfSym,outputSections, etc into Ctx.
The config variable, used to store command-line options,was pervasive throughout lld/ELF. To enhance code clarity andmaintainability, I renamed it to ctx.arg (mold naming).
I've removed other instances of static storage variables throughtlld/ELF, e.g.
staticmember LinkerDriver::nextGroupId
staticmember SharedFile::vernauxNum
sectionMapin lld/ELF/Arch/ARM.cpp
Passing Ctx &ctxas parameters
The subsequent phase involved adding Ctx &ctx as aparameter to numerous functions and classes, gradually eliminatingreferences to the global ctx.
I incorporated Ctx &ctx as a member variable to afew classes (e.g. SyntheticSection,OutputSection) to minimize the modifications to memberfunctions. This approach was not suitable for Symbol andInputSection, since even a single word could increasememory consumption significantly.
-LLVM_LIBRARY_VISIBILITY extern Ctx ctx; - // The first two elements of versionDefinitions represent VER_NDX_LOCAL and diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp index 334dfc0e3ba1..631051c27381 100644 --- a/lld/ELF/Driver.cpp +++ b/lld/ELF/Driver.cpp @@ -81,4 +81,2 @@ using namespace lld::elf;
Prior to this modification, the cleanupCallback function wasessential for resetting the global ctx when lld::elf::link was calledmultiple times.
Previously, cleanupCallback was essential for resettingthe global ctx when lld::elf::link was invokedmultiple times. With the removal of the global variable, this callbackis no longer necessary. We can now rely on the constructor to initializeCtx and avoid the need for a resetfunction.
Removing global state fromlld/Common
While significant progress has been made to lld/ELF,lld/Common needs a lot of work as well. A lot of sharedutility code (diagnostics, bump allocator) utilizes the globallld::context().
1 2 3 4 5 6 7 8 9 10
/// Returns the default error handler. ErrorHandler &errorHandler();
Although thread-local variables are an option, worker threads spawnedby llvm/lib/Support/Parallel.cpp don't inherit their valuesfrom the main thread. Given our direct access toCtx &ctx, we can leverage context-aware APIs asreplacements.
errorOrWarn(toString(f) + "xxx") =>Err(ctx) << f << "xxx"
error(toString(f) + "xxx") =>ErrAlways(ctx) << f << "xxx"
fatal("xxx") =>Fatal(ctx) << "xxx"
As of Nov 16, 2024, I have eliminatedlog/warn/error/fatal from lld/ELF.
The underlying functions lld::ErrorHandler::fatal, andlld::ErrorHandler::error when the error limit is hit andexitEarly is true, call exitLld(1).
This transformation eliminates a lot of code size overhead due tollvm::Twine. Even in the simplest Twine(123)case, the generated code needs a stack object to hold the value and aTwine kind.
lld::make from lld/include/lld/Common/Memory.his an allocation function that uses the global context. When theownership is clear, std::make_unique might be a betterchoice.
Avoid lld::make from lld/include/lld/Common/Memory.h
Avoid fatal error in a half-initialized object, e.g. fatal error ina base class constructor (ELFFileBase::init) ([LLD][COFF] When usingLLD-as-a-library, always prevent re-entrance on failures)
Global state in LLVM
LTO link jobs utilize LLVM. Understanding its global state iscrucial.
While LLVM allows for multiple LLVMContext instances tobe allocated and used concurrently, it's important to note that theseinstances share certain global states, such as cl::opt andManagedStatic. Specifically, it's not possible to run twoconcurrent LLVM compilations (including LTO link jobs) with distinctsets of cl::opt option values. To link with distinctcl::opt values, even after removing LLD's global state,you'll need to spawn a new LLD process.
Any proposal that moves away from global state seems to complicatecl::opt usage, making it impractical.
LLD also utilizes functions from llvm/Support/Parallel.hfor parallelism. These functions rely on global state likegetDefaultExecutor andllvm::parallel::strategy. Ongoing work by Alexandre Ganeaaims to make these functions context-aware. (It's nice to meet you inperson in LLVM Developers' Meeting last month)
Supported library usagescenarios
You can repeatedly call lld::lldMain from lld/Common/Driver.h.If fatal has been invoked, it will not be safe to calllld::lldMain again in certain rare scenarios. Runninglld::lldMain concurrently in two threads is notsupported.
The command LLD_IN_TEST=3 lld-link ... runs the linkprocess three times, but only the final invocation outputs diagnosticsto stdout/stderr. lld/test/lit.cfg.py has configured theCOFF port to run tests twice ([lld] Add test suite mode forrunning LLD main twice). Other ports need work to make this modework.
LLVM's C++ API doesn't offer a stability guarantee. This meansfunction signatures can change or be removed between versions, forcingprojects to adapt.
On the other hand, LLVM has an extensive API surface. When a librarylike llvm/lib/Y relies functionality from another library,the API is often exported in header files underllvm/include/llvm/X/, even if it is not intended to beuser-facing.
To be compatible with multiple LLVM versions, many projects rely on#if directives based on the LLVM_VERSION_MAJORmacro. This post explores the specific techniques used by ccls to ensurecompatibility with LLVM versions 7 to 19. For the latest release (ccls0.20241108), support for LLVM versions 7 to 9 has beendiscontinued.
Given the tight coupling between LLVM and Clang, theLLVM_VERSION_MAJOR macro can be used for both versiondetection. There's no need to checkCLANG_VERSION_MAJOR.
Changed namespaces
In Oct 2018, https://reviews.llvm.org/D52783 moved the namespaceclang::vfs to llvm::vfs. To remaincompatibility, I renamed clang::vfs uses and added aconditional namespace alias:
In March 2019, https://reviews.llvm.org/D59377 removed the membervariable VirtualFileSystem and removedsetVirtualFileSystem. To adapt to this change, ccls employsan #if.
In April 2020, the LLVM monorepo integrated a new subproject: flang.flang developers made many changes to clangDriver to reuse it for flang.https://reviews.llvm.org/D86089 changed the constructorclang::driver::Driver. I added
In November 2020, https://reviews.llvm.org/D90890 changed an argument ofComputePreambleBounds fromconst llvm::MemoryBuffer *Buffer toconst llvm::MemoryBufferRef &Buffer.
In April 2024, https://github.com/llvm/llvm-project/pull/89548/ removedllvm::StringRef::startswith in favor ofstarts_with. starts_with has been available since Oct 2022 andstartswith had been deprecated. I added the followingsnippet:
could break code that callsstd::string_view::starts_with.
Changed enumerators
In November 2023, https://github.com/llvm/llvm-project/pull/71160 changedan unnamed enumeration to a scoped enumeration. To keep the followingsnippet compiling,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
switch (tag_d->getTagKind()) { case TTK_Struct: tag = "struct"; break; case TTK_Interface: tag = "__interface"; break; case TTK_Union: tag = "union"; break; case TTK_Class: tag = "class"; break; case TTK_Enum: tag = "enum"; break; }
The above examples illustrate how to adapt to changes in the LLVM andClang APIs. It's important to remember that API changes are a naturalpart of software development, and testing with different releases iscrucial for maintaining compatibility with a wide range of LLVMversions.
When introducing new interfaces, we should pay a lot of attention toreduce the chance that the interface will be changed in a way thatcauses disruption to the downstream. That said, changes are normal. Whenan API change is justified, do it.
Downstream projects should be mindful of the stability guarantees ofdifferent LLVM APIs. Some API may be more prone to change than others.It's essential to write code in a way that can easily adapt to changesin the LLVM API.
LLVM C API
While LLVM offers a C API with an effort made towards compatibility,its capabilities often fall short.
Clang provides a C API called libclang. Whilehighly stable, libclang's limited functionality makes it unsuitable formany tasks.
In 2018, when creating ccls (a fork of cquery), I encounteredmultiple limitations in libclang's ability to handle code completion andindexing. This led to rewriting the relevant code to leverage the ClangC++ API for a more comprehensive solution. The following commits offerinsights into how the C API and the mostly equivalent but better C++ APIworks:
Firstdraft: replace libclang indexer with clangIndex
After migrating fromVim to Emacs as my primary C++ editor in 2015, I switched from Vimto Neovim for miscellaneous non-C++ tasks as it is more convenient in aterminal. Customizing the editor with a language you are comfortablewith is important. I found myself increasingly drawn to Neovim'sterminal-based simplicity for various tasks. Recently, I've refined myNeovim setup to the point where I can confidently migrate my entire C++workflow away from Emacs.
This post explores the key improvements I've made to achieve thistransition. My focus is on code navigation.
Key mapping
I've implemented custom functions that simplify key mappings.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
localfunctionmap(mode, lhs, rhs, opts) local options = {} if opts then iftype(opts) == 'string'then opts = {desc = opts} end options = vim.tbl_extend('force', options, opts) end vim.keymap.set(mode, lhs, rhs, options) end localfunctionnmap(lhs, rhs, opts) map('n', lhs, rhs, opts) end localfunctiontmap(lhs, rhs, opts) map('t', lhs, rhs, opts) end
I've swapped ; and : for easier access toEx commands, especially since leap.nvim renders ; lessuseful for repeating ftFT.
Like many developers, I spend significantly more time reading codethan writing it. Efficiently navigating definitions and references iscrucial for productivity.
While the built-in LSP client's C-] is functional (see:h lsp-defaultstagfunc), I found it lessconvenient. Many Emacs and Neovim configurations advocate forgd. However, both G and D are placed on the left half ofthe QWERTY keyboard, making it slow to press them using the lefthand.
For years, I relied on M-j to quickly jump todefinitions.
To avoid a conflict with my recent zellij change (I adoptedM-hjkl for pane navigation), I've reassigned Jto trigger definition jumps. Although I've lost the originalJ (join lines) functionality, vJ provides asuitable workaround.
After making a LSP-based jump, the jump list can quickly fill withirrelevant entries as I navigate the codebase. Thankfully, Telescope'sLSP functionality sets push_tagstack_on_edit to push anentry to the tag stack (see :h tag-stack). To efficientlyreturn to my previous position, I've mapped H to:pop and L to :tag.
I utilize xn and xp to find the next orprevious reference. The implementation, copied from from LazyVim, onlyworks with references within the current file. I want to enable thexn map to automatically transition to the next file whenreaching the last reference in the current file.
While using Emacs, I created a hydra with x as the prefix key tocycle through next references. Unfortunately, I haven't been able toreplicate this behavior in Neovim.
1 2 3 4 5 6 7 8 9 10 11 12
;; This does not work. local Hydra = require('hydra') Hydra({ name = 'lsp xref', mode = 'n', body = 'x', heads = { {'n', function() M.lsp.words.jump(1) end}, {'p', function() M.lsp.words.jump(-1) end}, { "q", nil, { exit = true, nowait = true } }, }, })
Movement
I use leap.nvim to quickly jump to specific identifiers(s{char1}{char2}), followed by telescope.nvim to exploredefinitions and references. Somtimes, I use the following binding:
I've implemented rainbow semantic highlighting using ccls. Pleaserefer to cclsand LSP Semantic Tokens for my setup.
Other LSP features
I have configured the CursorHold event to triggertextDocument/documentHighlight. When using Emacs,lsp-ui-doc automatically requests textDocument/hover, whichI now lose.
Additionally, the LspAttach and BufEnterevents trigger textDocument/codeLens.
Window navigation
While I've been content with the traditional C-w + hjklmapping for years, I've recently opted for the more efficientC-hjkl approach.
To accommodate this change, I've shifted my tmux prefix key fromC-l to C-Space. Consequently, I've alsoadjusted my input method toggling from C-Space toC-S-Space.
Debugging
For C++ debugging, I primarily rely on cgdb. I find it superior toGDB's single-key mode and significantly more user-friendly than LLDB'sgui command.
1 2 3 4
cgdb --args ./a.out args
rr record ./a.out args rr replay -d cgdb
I typically arrange Neovim and cgdb side-by-side in tmux or zellij.During single-stepping, when encountering interesting code snippets, Ioften need to manually input filenames into Neovim. While Telescope aidsin this process, automatic file and line updates would be ideal.
Given these considerations, nvim-dap appears to be a promisingsolution. However, I haven't yet determined the configuration forintegrating rr with nvim-dap.
Live grep
Telescope's extension telescope-fzf-native is useful.
I've defined mappings to streamline directory and project-widesearches using Telescope's live grep functionality:
Additionally, I've mapped M-n to insert the word underthe cursor, mimicking Emacs Ivy'sM-n (ivy-next-history-element) behavior.
Task runner
I use overseer.nvim torun build commands like ninja -C /tmp/Debug llc llvm-mc.This plugin allows me to view build errors directly in Neovim's quickfixwindow.
Following LazyVim, I use <leader>oo to run buildsand <leader>ow to toggle the overseer window. Tonavigate errors, I use trouble.nvim with the ]q and[q keys.
nmap('<leader>oo', '<cmd>OverseerRun<cr>') nmap('<leader>ow', '<cmd>OverseerToggle<cr>') nmap('[q', function() ifrequire('trouble').is_open() then require('trouble').prev({ skip_groups = true, jump = true }) else local ok, err = pcall(vim.cmd.cprev) ifnot ok then vim.notify(err, vim.log.levels.ERROR) end end end) nmap(']q', function() ifrequire('trouble').is_open() then require('trouble').next({ skip_groups = true, jump = true }) else local ok, err = pcall(vim.cmd.cnext) ifnot ok then vim.notify(err, vim.log.levels.ERROR) end end end)
Reducing reliance onterminal multiplexer
As https://rutar.org/writing/from-vim-and-tmux-to-neovim/nicely summarizes, running Neovim under tmux has some annoyance. I'vebeen experimenting with reducing my reliance on zellij. Instead, I'llutilize more Neovim's terminal functionality.
toggleterm.nvim is a particularly useful plugin that allows me toeasily split windows, open terminals, and hide them when not in use.
The default command <C-\><C-n> (switch tothe Normal mode) is clumsy. I've mapped it to <C-s>(useless feature pausetransmission, fwd-i-search in zsh).
tmap('<C-s>', '<C-\\><C-n>') -- Binding C-/ doesn't work in tmux/zellij map({'n', 't'}, '<C-/>', '<cmd>ToggleTerm<cr>') -- This actually binds C-/ in tmux/zellij map({'n', 't'}, '<C-_>', '<cmd>ToggleTerm<cr>')
neovim-remoteallows me to open files without starting a nested Neovim process.
I use mini.sessions tomanage sessions.
Config switcher
Neovim's NVIM_APPNAMEfeature is fantastic for exploring pre-configured distributions to getinspiration.
Lua
Neovim embraces Lua 5.1 as a preferred scripting language. WhileLua's syntax is lightweight and easy to learn, it doesn't shy away fromconvenience features like func 'arg' andfunc {a=42}.
LuaJIT offers exceptional performance.
LuaJIT with the JIT enabled is much faster than all of the otherlanguages benchmarked, including Wren, because Mike Pall is a robot fromthe future. -- wren.io
This translates into noticeably smoother editing with LSP, especiallyfor hefty C++ files – a significant advantage over Emacs. With Emacs,I've always felt that editing a large C++ file is slow.
The non-default local variables and 1-based indexing(shared with languages like Awk and Julia) are annoyances that I canlive with when using a configuration language. So far, I've only neededindex-sensitive looping in one specific location.
1 2 3 4 5 6
-- For LSP semantic tokens fortype, colors inpairs(all_colors) do for i = 1,#colors do vim.api.nvim_set_hl(0, string.format('@lsp.typemod.%s.id%s.cpp', type, i-1), {fg=colors[i]}) end end
Dual-role keys
I utilize the software keyboard remapper kanata to make some keys bothas normals keys and as a modifier. I have followed the guide https://shom.dev/start/using-kanata-to-remap-any-keyboard/as the official configuration guide is intimidating.
~/.config/kanatta/config.kbdis my current configuration. A simplified version is provided below:
(defsrc tab q w e r t y u i o p [ caps a s d f g h j k l ; ' lsft z x c v b n m , . / rsft ) (deflayer default @tab _ _ _ _ _ _ _ _ _ _ _ @cap @a @s @d @f _ _ @j @k @l @; _ _ _ _ _ _ _ _ _ _ _ _ _ ) (deflayer extend _ _ _ _ lrld _ _ C-S-tab C-tab _ _ _ _ _ _ _ _ _ left down up rght _ _ _ _ _ _ _ _ home pgdn pgup end _ _ )
(defchordsv2 (j k ) esc 100 all-released () ( k l ) = 100 all-released () (j l ) S-= 100 all-released () ( l ;) - 100 all-released () )
I've spent countless hours writing and reading C++ code. For manyyears, Emacs has been my primary editor, and I leverage ccls' (my C++ languageserver) rainbow semantic highlighting feature.
The feature relies on two custom notification messages$ccls/publishSemanticHighlight and$ccls/publishSkippedRanges.$ccls/publishSemanticHighlight provides a list of symbols,each with kind information (function, type, or variable) of itself andits semantic parent (e.g. a member function's parent is a class),storage duration, and a list of ranges.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
structCclsSemanticHighlightSymbol { int id = 0; SymbolKind parentKind; SymbolKind kind; uint8_t storage; std::vector<std::pair<int, int>> ranges;
std::vector<lsRange> lsRanges; // Only used by vscode-ccls };
An editor can use consistent colors to highlight differentoccurrences of a symbol. Different colors can be assigned to differentsymbols.
Tobias Pisani created emacs-cquery (the predecessor to emacs-ccls) inNov 2017. Despite not being a fan of Emacs Lisp, I added the rainbowsemantic highlighting feature for my own use in early 2018. My setupalso relied heavily on these two settings:
Bolding and underlining variables of static duration storage
Key symbol properties (member, static) were visually prominent in myEmacs environment.
My Emacs hacking days are a distant memory – beyond basicconfiguration tweaks, I haven't touched elisp code since 2018. As myElisp skills faded, I increasingly turned to Neovim for various editingtasks. Naturally, I wanted to migrate my C++ development workflow toNeovim as well. However, a major hurdle emerged: Neovim lacked thebeloved rainbow highlighting I enjoyed in Emacs.
Thankfully, Neovim supports "semantic tokens" from LSP 3.16, astandardized approach adopted by many editors.
I've made changes to ccls (available on abranch; PR)to support semantic tokens. This involves adapting the$ccls/publishSemanticHighlight code to additionally supporttextDocument/semanticTokens/full andtextDocument/semanticTokens/range.
I utilize a few token modifiers (static,classScope, functionScope,namespaceScope) for highlighting:
1 2 3 4 5
vim.cmd([[ hi @lsp.mod.classScope.cpp gui=italic hi @lsp.mod.static.cpp gui=bold hi @lsp.typemod.variable.namespaceScope.cpp gui=bold,underline ]])
treesitter, tokyonight-moon
While this approach is a significant improvement over relying solelyon nvim-treesitter, I'm still eager to implement rainbow semantictokens. Although LSP semantic tokens don't directly distinguish symbols,we can create custom modifiers to achieve similar results.
In the user-provided initialization options, I sethighlight.rainbow to 10.
ccls assigns the same modifier ID to tokens belonging to the samesymbol, aiming for unique IDs for different symbols. While we only havea few predefined IDs (each linked to a specific color), there's a slightpossibility of collisions. However, this is uncommon and generallyacceptable.
For a token with type variable, Neovim's built-in LSPplugin assigns a highlight group@lsp.typemod.variable.id$i.cpp where $i is aninteger between 0 and 9. This allows us to customize a unique foregroundcolor for each modifier ID.
local func_colors = { '#e5b124', '#927754', '#eb992c', '#e2bf8f', '#d67c17', '#88651e', '#e4b953', '#a36526', '#b28927', '#d69855', } local type_colors = { '#e1afc3', '#d533bb', '#9b677f', '#e350b6', '#a04360', '#dd82bc', '#de3864', '#ad3f87', '#dd7a90', '#e0438a', } local param_colors = { '#e5b124', '#927754', '#eb992c', '#e2bf8f', '#d67c17', '#88651e', '#e4b953', '#a36526', '#b28927', '#d69855', } local var_colors = { '#429921', '#58c1a4', '#5ec648', '#36815b', '#83c65d', '#419b2f', '#43cc71', '#7eb769', '#58bf89', '#3e9f4a', } local all_colors = { class = type_colors, constructor = func_colors, enum = type_colors, enumMember = var_colors, field = var_colors, ['function'] = func_colors, method = func_colors, parameter = param_colors, struct = type_colors, typeAlias = type_colors, typeParameter = type_colors, variable = var_colors } fortype, colors inpairs(all_colors) do for i = 1,#colors do for _, lang inpairs({'c', 'cpp'}) do vim.api.nvim_set_hl(0, string.format('@lsp.typemod.%s.id%s.%s', type, i-1, lang), {fg=colors[i]}) end end end
vim.cmd([[ hi @lsp.mod.classScope.cpp gui=italic hi @lsp.mod.static.cpp gui=bold hi @lsp.typemod.variable.namespaceScope.cpp gui=bold,underline ]])
Now, let's analyze the C++ code above using this configuration.
tokyonight-moon
While the results are visually pleasing, I need help implementingcode lens functionality.
Inactive code highlighting
Inactive code regions (skipped ranges in Clang) are typicallydisplayed in grey. While this can be helpful for identifying unusedcode, it can sometimes hinder understanding the details. I simplydisabled the inactive code feature.
1 2 3 4 5
#ifdef X ... // colorful #else ... // normal instead of grey #endif
Refresh
When opening a large project, the initial indexing or cache loadingprocess can be time-consuming, often leading to empty lists of semantictokens for the initially opened files. While ccls prioritizes indexingthese files, it's unclear how to notify the client to refresh the files.The existing workspace/semanticTokens/refresh request,unfortunately, doesn't accept text document parameters.
In contrast, with $ccls/publishSemanticHighlight, cclsproactively sends the notification after an index update (seemain_OnIndexed).
// Update indexed content, skipped ranges, and semantic highlighting. if (update->files_def_update) { auto &def_u = *update->files_def_update; if (WorkingFile *wfile = wfiles->getFile(def_u.first.path)) { wfile->setIndexContent(g_config->index.onChange ? wfile->buffer_content : def_u.second); QueryFile &file = db->files[update->file_id]; // Publish notifications to the file. emitSkippedRanges(wfile, file); emitSemanticHighlight(db, wfile, file); // But how do we send a workspace/semanticTokens/refresh request????? } } }
While the semantic token request supports partial results in thespecification, Neovim lacks this implementation. Even if it were, Ibelieve a notification message with a text document parameter would be amore efficient and direct approach.
1 2 3 4 5 6 7
exportinterfaceSemanticTokensParamsextendsWorkDoneProgressParams, PartialResultParams { /** * The text document. */ textDocument: TextDocumentIdentifier; }
Other clients
emacs-ccls
Once this feature branch is merged, Emacs users can simply remove thefollowing lines:
(setq lsp-semantic-tokens-enable t) (defface lsp-face-semhl-namespace-scope '((t :weight bold)) "highlight for namespace scope symbols":group 'lsp-semantic-tokens) (cl-loop for color in '("#429921""#58c1a4""#5ec648""#36815b""#83c65d" "#417b2f""#43cc71""#7eb769""#58bf89""#3e9f4a") for i = 0 then (1+ i) do (custom-declare-face (intern (format"lsp-face-semhl-id%d" i)) `((t :foreground ,color)) "":group 'lsp-semantic-tokens)) (setq lsp-semantic-token-modifier-faces `(("declaration" . lsp-face-semhl-interface) ("definition" . lsp-face-semhl-definition) ("implementation" . lsp-face-semhl-implementation) ("readonly" . lsp-face-semhl-constant) ("static" . lsp-face-semhl-static) ("deprecated" . lsp-face-semhl-deprecated) ("abstract" . lsp-face-semhl-keyword) ("async" . lsp-face-semhl-macro) ("modification" . lsp-face-semhl-operator) ("documentation" . lsp-face-semhl-comment) ("defaultLibrary" . lsp-face-semhl-default-library) ("classScope" . lsp-face-semhl-member) ("namespaceScope" . lsp-face-semhl-namespace-scope) ,@(cl-loop for i from 0 to 10 collect (cons (format "id%d" i) (intern (format "lsp-face-semhl-id%d" i)))) ))
vscode-ccls
We require assistance to eliminate the$ccls/publishSemanticHighlight feature and adopt built-insemantic tokens support. Due to the lack of active maintenance forvscode-ccls, I'm unable to maintain this plugin for an editor I don'tfrequently use.
Misc
I use a trick to switch ccls builds without changing editorconfigurations.
[sanitizer]Reject unsupported -static at link time
__asan_register_elf_globals:properly check the "no instrumented global variable" case
[asan,test]Disable _FORTIFY_SOURCE test incompatible with glibc 2.40
LLVM binary utilities
[llvm-readobj,ELF]Support --decompress/-z
[llvm-objcopy]Improve help messages
[llvm-readelf]Print a blank line for the first hex/string dump
[llvm-objcopy]Add --compress-sections
[llvm-readelf]Print more information for RELR
Hashing
I optimized the bit mixer used byllvm::DenseMap<std::pair<X, Y>> andllvm::DenseMap<std::tuple<X...>>.llvm/ADT/Hashing.h, used by StringRef hashingand DenseMap, was supposed to be non-deterministic. Despitethis, a lot of code relied on a specific iteration order. I mademultiple fixes across the code base and landed [Hashing] Use anon-deterministic seed if LLVM_ENABLE_ABI_BREAKING_CHECKS to improvetest coverage (e.g. assertion builds) and ensure future flexibility toreplace the algorithm.
I optimizedDenseMap::{find,erase}, yielding compile timeimprovement.
Optimizations to the bit mixer in Hashing.h and theDenseMap code have yielded significant benefits, reducingboth compile time and code size. This suggests there's further potentialfor improvement in this area.
However, the reduced code size also highlights potential significantcode size increase when considering faster unordered map implementationslike boost::unordered_flat_map,Abseil's SwissTable, and Folly'sF14. While these libraries may offer better performance, they oftencome with a significant increase in code complexity and size.
Introducing a new container alongside DenseMap toselectively replace performance-critical instances could lead tosubstantial code modifications. This approach requires carefulconsideration to balance potential performance gains with the additionalcomplexity.
NumericalStabilitySanitizer
NumericalStabilitySanitizer is a new feature for the 19.x releases. Ihave made many changes on the compiler-rt part.
Options used by the LLVM integrated assembler are currently handledin an ad-hoc way. There is deduplication with and without LTO.Eventually we might want to adopt TableGen for these -Wa,options.
I reviewed a wide range of patches, including areas like ADT/Support,binary utilities, MC, lld, clangDriver, LTO, sanitizers, LoongArch,RISC-V, and new features like NumericalStabilitySanitizer andRealTimeSanitizer.
To quantify my involvement, a search for patches I commented on(repo:llvm/llvm-project is:pr -author:MaskRay commenter:MaskRay created:>2024-01-23)yields 780 results.