普通视图

发现新文章,点击刷新页面。
今天 — 2025年4月7日MaskRay

LLVM integrated assembler: Improving MCExpr and MCValue

作者 MaskRay
2025年4月6日 15:00

In my previous post, RelocationGeneration in Assemblers, I explored some key concepts behindLLVM’s integrated assemblers. This post dives into recent improvementsI’ve made to refine that system.

The LLVM integrated assembler handles fixups and relocatableexpressions as distinct entities. Relocatable expressions, inparticular, are encoded using the MCValue class, whichoriginally looked like this:

1
2
3
4
5
class MCValue {
const MCSymbolRefExpr *SymA = nullptr, *SymB = nullptr;
int64_t Cst = 0;
uint32_t RefKind = 0;
};

In this structure:

  • RefKind acts as an optional relocation specifier,though only a handful of targets actually use it.
  • SymA represents an optional symbol reference (theaddend).
  • SymB represents another optional symbol reference (thesubtrahend).
  • Cst holds a constant value.

While functional, this design had its flaws. For one, the wayrelocation specifiers were encoded varied across architectures:

  • Targets like COFF, Mach-O, and ELF's PowerPC, SystemZ, and X86 embedthe relocation specifier within MCSymbolRefExpr *SymA aspart of SubclassData.
  • Conversely, ELF targets such as AArch64, MIPS, and RISC-V store itas a target-specific subclass of MCTargetExpr, and convertit to MCValue::RefKind duringMCValue::evaluateAsRelocatable.

Another issue was with SymB. Despite being typed asconst MCSymbolRefExpr *, itsMCSymbolRefExpr::VariantKind field went unused. This isbecause expressions like add - sub@got are notrelocatable.

Over the weekend, I tackled these inconsistencies and reworked therepresentation into something cleaner:

1
2
3
4
5
6
class MCValue {
const MCSymbol *SymA = nullptr, *SymB = nullptr;
int64_t Cst = 0;
uint32_t Specifier = 0;
};

This updated design not only aligns more closely with the concept ofrelocatable expressions but also shaves off some compiler time in LLVM.The ambiguous RefKind has been renamed toSpecifier for clarity. Additionally, targets thatpreviously encoded the relocation specifier withinMCSymbolRefExpr (rather than usingMCTargetExpr) can now access it directly viaMCValue::Specifier.

To support this change, I made a few adjustments:

  • IntroducedgetAddSym and getSubSym methods, returningconst MCSymbol *, as replacements for getSymAand getSymB.
  • Eliminated dependencies on the old accessors,MCValue::getSymA and MCValue::getSymB.
  • Reworkedthe expression folding code that handles + and -
  • Storedthe const MCSymbolRefExpr *SymA specifier atMCValue::Specifier
  • Some targets relied on PC-relative fixups with explicit specifiersforcing relocations. I have definedMCAsmBackend::shouldForceRelocation for SystemZ and cleanedup ARM and PowerPC
  • Changedthe type of SymA and SymB toconst MCSymbol *
  • Replacedthe temporary getSymSpecifier withgetSpecifier
  • Replacedthe legacy getAccessVariant withgetSpecifier

Streamlining Mach-O support

Mach-O assembler support in LLVM has accumulated significanttechnical debt, impacting both target-specific and generic code. Oneparticularly nagging issue was theconst SectionAddrMap *Addrs parameter inMCExpr::evaluateAs* functions. This parameter existed tohandle cross-section label differences, primarily for generating(compact) unwind information in Mach-O. A typical example of this can beseen in assembly like:

1
2
3
4
5
6
        .section        __TEXT,__text,regular,pure_instructions
Leh_func_begin0:
.section __TEXT,__eh_frame,coalesced,no_toc+strip_static_syms+live_support
Ltmp3:
Ltmp4 = Leh_func_begin0-Ltmp3
.long Ltmp4

The SectionAddrMap *Addrs parameter always felt like aclunky workaround to me. It wasn’t until I dug into the Mach-OAArch64 object writer that I realized this hack wasn't necessary forthat writer. This discovery prompted a cleanup effort to remove thedependency on SectionAddrMap for ARM and X86 and eliminatethe parameter:

  • [MC,MachO]Replace SectionAddrMap workaround with cleaner variablehandling
  • MCExpr:Remove unused SectionAddrMap workaround

While I was at it, I also tidied up MCSymbolRefExpr byremovingthe clunky HasSubsectionsViaSymbolsBit, furthersimplifying the codebase.

❌
❌