阅读视图

发现新文章,点击刷新页面。

LLVM integrated assembler: Improving sections and symbols

In my previous post, LLVMintegrated assembler: Improving expressions and relocations delvedinto enhancements made to LLVM's expression resolving and relocationgeneration. This post covers recent refinements to MC, focusing onsections and symbols.

Sections

Sections are named, contiguous blocks of code or data within anobject file. They allow you to logically group related parts of yourprogram. The assembler places code and data into these sections as itprocesses the source file.

1
2
3
4
5
6
7
8
9
10
11
12
class MCSection {
...
enum SectionVariant {
SV_COFF = 0,
SV_ELF,
SV_GOFF,
SV_MachO,
SV_Wasm,
SV_XCOFF,
SV_SPIRV,
SV_DXContainer,
};

In LLVM 20, the MCSectionclass used an enum called SectionVariant todifferentiate between various object file formats, such as ELF, Mach-O,and COFF. These subclasses are used in contexts where the section typeis known at compile-time, such as in MCStreamer and MCObjectTargetWriter.This change eliminates the need for runtime type information (RTTI)checks, simplifying the codebase and improving efficiency.

Additionally, the storage for fragments' fixups (adjustments toaddresses and offsets) has been moved into the MCSectionclass.

Symbols

Symbols are names that represent memory addresses or values.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class MCSymbol {
protected:
/// The kind of the symbol. If it is any value other than unset then this
/// class is actually one of the appropriate subclasses of MCSymbol.
enum SymbolKind {
SymbolKindUnset,
SymbolKindCOFF,
SymbolKindELF,
SymbolKindGOFF,
SymbolKindMachO,
SymbolKindWasm,
SymbolKindXCOFF,
};

/// A symbol can contain an Offset, or Value, or be Common, but never more
/// than one of these.
enum Contents : uint8_t {
SymContentsUnset,
SymContentsOffset,
SymContentsVariable,
SymContentsCommon,
SymContentsTargetCommon, // Index stores the section index
};

Similar to sections, the MCSymbolclass also used a discriminator enum, SymbolKind, to distinguishbetween object file formats. This enum has also been removed.

Furthermore, the MCSymbol class had anenum Contents to specify the kind of symbol. This name wasa bit confusing, so it has been renamedto enum Kind for clarity.

A special enumerator, SymContentsTargetCommon, which wasused by AMDGPU for a specific type of common symbol, has also been removed.The functionality it provided is now handled by updatingELFObjectWriter to respect the symbol's section index(SHN_AMDGPU_LDS for this special AMDGPU symbol).

sizeof(MCSymbol) has been reduced to 24 bytes on 64-bitsystems.

The previous blog post LLVMintegrated assembler: Improving expressions and relocationsdescribes other changes:

  • The MCSymbol::IsUsed flag was a workaround fordetecting a subset of invalid reassignments and is removed.
  • The MCSymbol::IsResolving flag is added to detectcyclic dependencies of equated symbols.
❌