LLVM integrated assembler: Improving MCExpr and MCValue
In my previous post,
The LLVM integrated assembler handles fixups and relocatableexpressions as distinct entities. Relocatable expressions, inparticular, are encoded using the MCValue
class, whichoriginally looked like this:
1 |
class MCValue { |
In this structure:
-
RefKind
acts as an optional relocation specifier,though only a handful of targets actually use it. -
SymA
represents an optional symbol reference (theaddend). -
SymB
represents another optional symbol reference (thesubtrahend). -
Cst
holds a constant value.
While functional, this design had its flaws. For one, the wayrelocation specifiers were encoded varied across architectures:
- Targets like COFF, Mach-O, and ELF's PowerPC, SystemZ, and X86 embedthe relocation specifier within
MCSymbolRefExpr *SymA
aspart ofSubclassData
. - Conversely, ELF targets such as AArch64, MIPS, and RISC-V store itas a target-specific subclass of
MCTargetExpr
, and convertit toMCValue::RefKind
duringMCValue::evaluateAsRelocatable
.
Another issue was with SymB
. Despite being typed asconst MCSymbolRefExpr *
, itsMCSymbolRefExpr::VariantKind
field went unused. This isbecause expressions like add - sub@got
are notrelocatable.
Over the weekend, I tackled these inconsistencies and reworked therepresentation into something cleaner:
1 |
class MCValue { |
This updated design not only aligns more closely with the concept ofrelocatable expressions but also shaves off some compiler time in LLVM.The ambiguous RefKind
has been renamed toSpecifier
for clarity. Additionally, targets thatpreviously encoded the relocation specifier withinMCSymbolRefExpr
(rather than usingMCTargetExpr
) can now access it directly viaMCValue::Specifier
.
To support this change, I made a few adjustments:
Introduced getAddSym
andgetSubSym
methods, returningconst MCSymbol *
, as replacements forgetSymA
andgetSymB
.- Eliminated dependencies on the old accessors,
MCValue::getSymA
andMCValue::getSymB
. Reworkedthe expression folding code that handles + and - Storedthe const MCSymbolRefExpr *SymA
specifier atMCValue::Specifier
- Some targets relied on PC-relative fixups with explicit specifiersforcing relocations. I have
defined MCAsmBackend::shouldForceRelocation
for SystemZ andcleanedup ARM and PowerPC Changedthe type of SymA
andSymB
toconst MCSymbol *
Replacedthe temporary getSymSpecifier
withgetSpecifier
Replacedthe legacy getAccessVariant
withgetSpecifier
Streamlining Mach-O support
Mach-O assembler support in LLVM has accumulated significanttechnical debt, impacting both target-specific and generic code. Oneparticularly nagging issue was theconst SectionAddrMap *Addrs
parameter inMCExpr::evaluateAs*
functions. This parameter existed tohandle cross-section label differences, primarily for generating(compact) unwind information in Mach-O. A typical example of this can beseen in assembly like:
1 |
.section __TEXT,__text,regular,pure_instructions |
The SectionAddrMap *Addrs
parameter always felt like aclunky workaround to me. It wasn’t until I dug into the SectionAddrMap
for ARM and X86 and eliminatethe parameter:
[MC,MachO]Replace SectionAddrMap workaround with cleaner variablehandling MCExpr:Remove unused SectionAddrMap workaround
While I was at it, I also tidied up MCSymbolRefExpr
byHasSubsectionsViaSymbolsBit
, furthersimplifying the codebase.