普通视图

今天 — 2025年8月25日MaskRay

MaskRay
Understanding alignment - from source to object fileMaskRay
2025年8月24日 15:00

Understanding alignment - from source to object file

作者 MaskRay

2025年8月24日 15:00

Alignment refers to the practice of placing data or code at memoryaddresses that are multiples of a specific value, typically a power of2. This is typically done to meet the requirements of the programminglanguage, ABI, or the underlying hardware. Misaligned memory accessesmight be expensive or will cause traps on certain architectures.

This blog post explores how alignment is represented and managed asC++ code is transformed through the compilation pipeline: from sourcecode to LLVM IR, assembly, and finally the object file. We'll focus onalignment for both variables and functions.

Alignment in C++ source code

C++ [basic.align]specifies

Object types have alignment requirements ([basic.fundamental],[basic.compound]) which place restrictions on the addresses at which anobject of that type may be allocated. An alignment is animplementation-defined integer value representing the number of bytesbetween successive addresses at which a given object can be allocated.An object type imposes an alignment requirement on every object of thattype; stricter alignment can be requested using the alignment specifier([dcl.align]). Attempting to create an object ([intro.object]) instorage that does not meet the alignment requirements of the object'stype is undefined behavior.

alignas can be used to request a stricter alignment. [decl.align]

An alignment-specifier may be applied to a variable or to a classdata member, but it shall not be applied to a bit-field, a functionparameter, or an exception-declaration ([except.handle]). Analignment-specifier may also be applied to the declaration of a class(in an elaborated-type-specifier ([dcl.type.elab]) or class-head([class]), respectively). An alignment-specifier with an ellipsis is apack expansion ([temp.variadic]).

Example:

1 2	alignas(16) int i0; struct alignas(8) S {};

If the strictest alignas on a declaration is weaker thanthe alignment it would have without any alignas specifiers, the programis ill-formed.

% echo 'alignas(2) int v;' | clang -fsyntax-only -xc++ -
<stdin>:1:1: error: requested alignment is less than minimum alignment of 4 for type 'int'
    1 | alignas(2) int v;
      | ^
1 error generated.

However, the GNU extension __attribute__((aligned(1)))can request a weaker alignment.

1	typedef int32_t __attribute__((aligned(1))) unaligned_int32_t;

Further reading: Whatis the Strict Aliasing Rule and Why do we care?

LLVM IR representation

In the LLVM Intermediate Representation (IR), both global variablesand functions can have an align attribute to specify theirrequired alignment.

Globalvariable alignment:

An explicit alignment may be specified for a global, which must be apower of 2. If not present, or if the alignment is set to zero, thealignment of the global is set by the target to whatever it feelsconvenient. If an explicit alignment is specified, the global is forcedto have exactly that alignment. Targets and optimizers are not allowedto over-align the global if the global has an assigned section. In thiscase, the extra alignment could be observable: for example, code couldassume that the globals are densely packed in their section and try toiterate over them as an array, alignment padding would break thisiteration. For TLS variables, the module flag MaxTLSAlign, if present,limits the alignment to the given value. Optimizers are not allowed toimpose a stronger alignment on these variables. The maximum alignment is1 << 32.

Function alignment

An explicit alignment may be specified for a function. If notpresent, or if the alignment is set to zero, the alignment of thefunction is set by the target to whatever it feels convenient. If anexplicit alignment is specified, the function is forced to have at leastthat much alignment. All alignments must be a power of 2.

A backend can override this with a preferred function alignment(STI->getTargetLowering()->getPrefFunctionAlignment()),if that is larger than the specified align value. (https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019/3)

In addition, align can be used in parameter attributesto decorate a pointer or vector of pointers.

LLVM back end representation

Global variablesAsmPrinter::emitGlobalVariable determines the alignment forglobal variables based on a set of nuanced rules:

With an explicit alignment (explicit),
- If the variable has a section attribute, returnexplicit.
- Otherwise, compute a preferred alignment for the data layout(getPrefTypeAlign, referred to as pref).Returnpref < explicit ? explicit : max(E, getABITypeAlign).
Without an explicit alignment: returngetPrefTypeAlign.

getPrefTypeAlign employs a heuristic for global variabledefinitions: if the variable's size exceeds 16 bytes and the preferredalignment is less than 16 bytes, it sets the alignment to 16 bytes. Thisheuristic balances performance and memory efficiency for common cases,though it may not be optimal for all scenarios. (See Preferredalignment of globals > 16bytes in 2012)

For assembly output, AsmPrinter emits .p2align (power of2 alignment) directives with a zero fill value (i.e. the padding bytesare zeros).

% echo 'int v0;' | clang --target=x86_64 -S -xc - -o -
        .file   "-"
        .type   v0,@object                      # @v0
        .bss
        .globl  v0
        .p2align        2, 0x0
v0:
        .long   0                               # 0x0
        .size   v0, 4
...

Functions For functions,AsmPrinter::emitFunctionHeader emits alignment directivesbased on the machine function's alignment settings.

void MachineFunction::init() {
...
  Alignment = STI.getTargetLowering()->getMinFunctionAlignment();

  // FIXME: Shouldn't use pref alignment if explicit alignment is set on F.
  if (!F.hasOptSize())
    Alignment = std::max(Alignment,
                         STI.getTargetLowering()->getPrefFunctionAlignment());

The subtarget's minimum function alignment
If the function is not optimized for size (i.e. not compiled with-Os or -Oz), take the maximum of the minimumalignment and the preferred alignment. For example,X86TargetLowering sets the preferred function alignment to16.

% echo 'void f(){} [[gnu::aligned(32)]] void g(){}' | clang --target=x86_64 -S -xc - -o -
        .file   "-"
        .text
        .globl  f                               # -- Begin function f
        .p2align        4
        .type   f,@function
f:                                      # @f
...
        .globl  g                               # -- Begin function g
        .p2align        5
        .type   g,@function
g:                                      # @g

The emitted .p2align directives omits the fill valueargument: for code sections, this space is filled with no-opinstructions.

Assembly representation

GNU Assembler supports multiple alignment directives:

.p2align 3: align to 2**3
.balign 8: align to 8
.align 8: this is identical to .balign onsome targets and .p2align on the others.

Clang supports "direct object emission" (clang -ctypically bypasses a separate assembler), the LLVMAsmPrinter directlyuses the MCObjectStreamer API. This allows Clang to emitthe machine code directly into the object file, bypassing the need toparse and interpret alignment directives and instructions from atext-based assembly file.

These alignment directives has an optional third argument: themaximum number of bytes to skip. If doing the alignment would requireskipping more bytes than the specified maximum, the alignment is notdone at all. GCC's -falign-functions=m:n utilizes thisfeature.

Object file format

In an object file, the section alignment is determined by thestrictest alignment directive present in that section. The assemblersets the section's overall alignment to the maximum of all thesedirectives, as if an implicit directive were at the start.

.section .text.a,"ax"
# implicit alignment max(4, 8)

.long 0
.balign 4
.long 0
.balign 8

This alignment is stored in the sh_addralign fieldwithin the ELF section header table. You can inspect this value usingtools such as readelf -WS (llvm-readelf -S) orobjdump -h (llvm-objdump -h).

Linker considerations

The linker combines multiple object files into a single executable.When it maps input sections from each object file into output sectionsin the final executable, it ensures that section alignments specified inthe object files are preserved.

How the linker handlessection alignment

Output section alignment: This is the maximumsh_addralign value among all its contributing inputsections. This ensures the strictest alignment requirements are met.

Section placement: The linker also uses inputsh_addralign information to position each input sectionwithin the output section. As illustrated in the following example, eachinput section (like a.o:.text.f or b.o:.text)is aligned according to its sh_addralign value before beingplaced sequentially.

output .text
  # align to sh_addralign(a.o:.text). No-op if this is the first section without any preceding DOT assignment or data command.
  a.o:.text
  # align to sh_addralign(a.o:.text.f)
  a.o:.text.f
  # align to sh_addralign(b.o:.text)
  b.o:.text
  # align to sh_addralign(b.o:.text.g)
  b.o:.text.g

Link script control A linker script can override thedefault alignment behavior. The ALIGN keyword enforces astricter alignment. For example .text : ALIGN(32) { ... }aligns the section to at least a 32-byte boundary. This is often done tooptimize for specific hardware or for memory mapping requirements.

The SUBALIGN keyword on an output section overrides theinput section alignments.

Padding: To achieve the required alignment, thelinker may insert padding between sections or before the first inputsection (if there is a gap after the output section start). The fillvalue is determined by the following rules:

If specified, use the =fillexpoutput section attribute (within an output sectiondescription).
If a non-code section, use zero.
Otherwise, use a trap or no-op instructin.

Padding and sectionreordering

Linkers typically preserve the order of input sections from objectfiles. To minimize the padding required between sections, linker scriptscan use a SORT_BY_ALIGNMENT keyword to arrange inputsections in descending order of their alignment requirements. Similarly,GNU ld supports --sort-commonto sort COMMON symbols by decreasing alignment.

While this sorting can reduce wasted space, modern linking strategiesoften prioritize other factors, such as cache locality (for performance)and data similarity (for Lempel–Ziv compression ratio), which canconflict with sorting by alignment. (Search--bp-compression-sort= on Explain GNU stylelinker options).

ABI compliance

Some platforms have special rules. For example,

On SystemZ, the larl (load address relative long)instruction cannot generate odd addresses. To prevent GOT indirection,compilers ensure that symbols are at least aligned by 2. (Toolchainnotes on z/Architecture)
On AIX, the default alignment mode is power: for doubleand long double, the first member of this data type is aligned accordingto its natural alignment value; subsequent members of the aggregate arealigned on 4-byte boundaries. (https://reviews.llvm.org/D79719)
z/OS caps the maximum alignment of static storage variables to 16.(https://reviews.llvm.org/D98864)

The standard representation of the the Itanium C++ ABI requiresmember function pointers to be even, to distinguish between virtual andnon-virtual functions.

In the standard representation, a member function pointer for avirtual function is represented with ptr set to 1 plus the function'sv-table entry offset (in bytes), converted to a function pointer as ifbyreinterpret_cast<fnptr_t>(uintfnptr_t(1 + offset)),where uintfnptr_t is an unsigned integer of the same sizeas fnptr_t.

Conceptually, a pointer to member function is a tuple:

A function pointer or virtual table index, discriminated by theleast significant bit
A displacement to apply to the this pointer

Due to the least significant bit discriminator, members function needa stricter alignment even if __attribute__((aligned(1))) isspecified:

1	virtual void bar1() __attribute__((aligned(1)));

Side note: check out MSVC C++ ABI MemberFunction Pointers for a comparison with the MSVC C++ ABI.

Architecture considerations

Contemporary architectures generally support unaligned memory access,likely with very small performance penalties. However, someimplementations might restrict or penalize unaligned accesses heavily,or require specific handling. Even on architectures supporting unalignedaccess, atomic operations might still require alignment.

On AArch64, a bit in the system control registersctlr_el1 enables alignment check.
On x86, if the AM bit is set in the CR0 register and the AC bit isset in the EFLAGS register, alignment checking of user-mode dataaccessing is enabled.

Linux's RISC-V port supportsprctl(PR_SET_UNALIGN, PR_UNALIGN_SIGBUS); to enable strictalignment.

clang -fsanitize=alignment can detect misaligned memoryaccess. Check out my write-up.

In 1989, US Patent 4814976, which covers "RISC computer withunaligned reference handling and method for the same" (4 instructions:lwl, lwr, swl, and swr), was granted to MIPS Computer Systems Inc. Itcaused a barrier for other RISC processors, see The Lexra Story.

Almost every microprocessor in the world can emulate thefunctionality of unaligned loads and stores in software. MIPSTechnologies did not invent that. By any reasonable interpretation ofthe MIPS Technologies' patent, Lexra did not infringe. In mid-2001 Lexrareceived a ruling from the USPTO that all claims in the the lawsuit wereinvalid because of prior art in an IBM CISC patent. However, MIPSTechnologies appealed the USPTO ruling in Federal court, adding toLexra's legal costs and hurting its sales. That forced Lexra into anunfavorable settlement. The patent expired on December 23, 2006 at whichpoint it became legal for anybody to implement the complete MIPS-Iinstruction set, including unaligned loads and stores.

Aligning code forperformance

GCC offers a family of performance-tuning options named-falign-*, that instruct the compiler to align certain codesegments to specific memory boundaries. These options might improveperformance by preventing certain instructions from crossing cache lineboundaries (or instruction fetch boundaries), which can otherwise causean extra cache miss.

-falign-function=n: Align functions.
-falign-labels=n: Align branch targets.
-falign-jumps=n: Align branch targets, for branchtargets where the targets can only be reached by jumping.
-falign-loops=n: Align the beginning of loops.

Important considerations

Inefficiency with Small Functions: Aligning smallfunctions can be inefficient and may not be worth the overhead. Toaddress this, GCC introduced -flimit-function-alignment in2016. The option sets .p2align directive's max-skip operandto the estiminated function size minus one.

% echo 'int add1(int a){return a+1;}' | gcc -O2 -S -fcf-protection=none -xc - -o - -falign-functions=16 | grep p2align
        .p2align 4
% echo 'int add1(int a){return a+1;}' | gcc -O2 -S -fcf-protection=none -xc - -o - -falign-functions=16 -flimit-function-alignment | p2align
        .p2align 4,,3

The max-skip operand, if present, is evaluated at parse time, so youcannot do:

.p2align 4, , b-a
a:
  nop
b:

In LLVM, the x86 backend does not implementTargetInstrInfo::getInstSizeInBytes, making it challengingto implement -flimit-function-alignment.

Cold code: These options don't apply to coldfunctions. To ensure that cold functions are also aligned, use-fmin-function-alignment=n instead.

Benchmarking: Aligning functions can make benchmarksmore reliable. For example, on x86-64, a hot function less than 32 bytesmight be placed in a way that uses one or two cache lines (determined byfunction_addr % cache_line_size), making benchmark resultsnoisy. Using -falign-functions=32 can ensure the functionalways occupies a single cache line, leading to more consistentperformance measurements.

LLVM notes: In clang/lib/CodeGen/CodeGenModule.cpp,-falign-function=N sets the alignment if a function doesnot have the gnu::aligned attribute.

A hardware loop typically consistants of 3 parts:

A low-overhead loop (also called a zero-overhead loop) is ahardware-assisted looping mechanism found in many processorarchitectures, particularly digital signal processors (DSPs). Theprocessor includes dedicated registers that store the loop startaddress, loop end address, and loop count. A hardware loop typicallyconsists of three components:

Loop setup instruction: Sets the loop end address and iterationcount
Loop body: Contains the actual instructions to be repeated
Loop end instruction: Jumps back to the loop body if furtheriterations are required

Here is an example from Arm v8.1-M low-overhead branch extension.

1:
  dls lr, Rn    // Setup loop with count in Rn
  ...           // Loop body instructions
2:
  le lr, 1b     // Loop end - branch back to label 1 if needed

To minimize the number of cache lines used by the loop body, ideallythe loop body (the instruction immediately following DLS) should bealigned to a 64-byte boundary. However, GNU Assembler lacks a directiveto specify alignment like "align DLS to a multiple of 64 plus 60 bytes."Inserting an alignment after the DLS is counterproductive, as it wouldintroduce unwanted NOP instructions at the beginning of the loop body,negating the performance benefits of the low-overhead loopmechanism.

It would be desirable to simulate the functionality with.org ((.+4+63) & -64) - 4 // ensure that .+4 is aligned to 64-byte boundary,but this complex expression involves bitwise AND and is not arelocatable expression. LLVM integrated assembler would reportexpected absolute expression while GNU Assembler has asimilar error.

A potential solution would be to extend the alignment directives withan optional offset parameter:

# Align to 64-byte boundary with 60-byte offset, using NOP padding in code sections
.balign 64, , , 60

# Same alignment with offset, but skip at most 16 bytes of padding
.balign 64, , 16, 60

Xtensa's LOOP instructions has similar alignmentrequirement, but I am not familiar with the detail. The GNU Assembleruses the special alignment as a special machine-dependent fragment. (https://sourceware.org/binutils/docs/as/Xtensa-Automatic-Alignment.html)

昨天以前MaskRay

MaskRay
LLVM integrated assembler: Improving sections and symbolsMaskRay
2025年8月17日 15:00

LLVM integrated assembler: Improving sections and symbols

MaskRay

作者 MaskRay

2025年8月17日 15:00

In my previous post, LLVMintegrated assembler: Improving expressions and relocations delvedinto enhancements made to LLVM's expression resolving and relocationgeneration. This post covers recent refinements to MC, focusing onsections and symbols.

Sections

Sections are named, contiguous blocks of code or data within anobject file. They allow you to logically group related parts of yourprogram. The assembler places code and data into these sections as itprocesses the source file.

class MCSection {
...
  enum SectionVariant {
    SV_COFF = 0,
    SV_ELF,
    SV_GOFF,
    SV_MachO,
    SV_Wasm,
    SV_XCOFF,
    SV_SPIRV,
    SV_DXContainer,
  };

In LLVM 20, the MCSectionclass used an enum called SectionVariant todifferentiate between various object file formats, such as ELF, Mach-O,and COFF. These subclasses are used in contexts where the section typeis known at compile-time, such as in MCStreamer and MCObjectTargetWriter.This change eliminates the need for runtime type information (RTTI)checks, simplifying the codebase and improving efficiency.

Additionally, the storage for fragments' fixups (adjustments toaddresses and offsets) has been moved into the MCSectionclass.

Symbols

Symbols are names that represent memory addresses or values.

class MCSymbol {
protected:
  /// The kind of the symbol.  If it is any value other than unset then this
  /// class is actually one of the appropriate subclasses of MCSymbol.
  enum SymbolKind {
    SymbolKindUnset,
    SymbolKindCOFF,
    SymbolKindELF,
    SymbolKindGOFF,
    SymbolKindMachO,
    SymbolKindWasm,
    SymbolKindXCOFF,
  };

  /// A symbol can contain an Offset, or Value, or be Common, but never more
  /// than one of these.
  enum Contents : uint8_t {
    SymContentsUnset,
    SymContentsOffset,
    SymContentsVariable,
    SymContentsCommon,
    SymContentsTargetCommon, // Index stores the section index
  };

Similar to sections, the MCSymbolclass also used a discriminator enum, SymbolKind, to distinguishbetween object file formats. This enum has also been removed.

Furthermore, the MCSymbol class had anenum Contents to specify the kind of symbol. This name wasa bit confusing, so it has been renamedto enum Kind for clarity.

regular symbol
equatedsymbol
commonsymbol

A special enumerator, SymContentsTargetCommon, which wasused by AMDGPU for a specific type of common symbol, has also been removed.The functionality it provided is now handled by updatingELFObjectWriter to respect the symbol's section index(SHN_AMDGPU_LDS for this special AMDGPU symbol).

sizeof(MCSymbol) has been reduced to 24 bytes on 64-bitsystems.

The previous blog post LLVMintegrated assembler: Improving expressions and relocationsdescribes other changes:

The MCSymbol::IsUsed flag was a workaround fordetecting a subset of invalid reassignments and is removed.
The MCSymbol::IsResolving flag is added to detectcyclic dependencies of equated symbols.