Remarks on SFrame

作者 MaskRay

2025年9月28日 15:00

The .sframe format is a lightweight alternative to.eh_frame and .eh_frame_hdr designed forefficient stackunwinding. By trading some functionality and flexibility forcompactness, SFrame achieves significantly smaller size whilemaintaining the essential unwinding capabilities needed byprofilers.

SFrame focuses on three fundamental elements for each function:

Canonical Frame Address (CFA): The base address for stack framecalculations
Return address
Frame pointer

An .sframe section follows a straightforward layout:

Header: Contains metadata and offset information
Auxiliary header (optional): Reserved for future extensions
Function Descriptor Entries (FDEs): Array describing eachfunction
Frame Row Entries (FREs): Arrays of unwinding information perfunction

struct [[gnu::packed]] sframe_header {
  struct {
    uint16_t sfp_magic;
    uint8_t sfp_version;
    uint8_t sfp_flags;
  } sfh_preamble;
  uint8_t sfh_abi_arch;
  int8_t sfh_cfa_fixed_fp_offset;
  // Used by x86-64 to define the return address slot relative to CFA
  int8_t sfh_cfa_fixed_ra_offset;
  // Size in bytes of the auxiliary header, allowing extensibility
  uint8_t sfh_auxhdr_len;
  // Numbers of FDEs and FREs
  uint32_t sfh_num_fdes;
  uint32_t sfh_num_fres;
  // Size in bytes of FREs
  uint32_t sfh_fre_len;
  // Offsets in bytes of FDEs and FREs
  uint32_t sfh_fdeoff;
  uint32_t sfh_freoff;
};

While magic is popular choices for file formats, they deviate fromestablished ELF conventions, which simplifies utilizes the section typefor distinction.

The version field resembles the similar uses within DWARF sectionheaders. SFrame will likely evolve over time, unlike ELF's more stablecontrol structures. This means we'll probably need to keep producers andconsumers evolving in lockstep, which creates a stronger case forinternal versioning. An internal version field would allow linkers toupgrade or ignore unsupported low-version input pieces, providing moreflexibility in handling version mismatches.

Data structures

Function Descriptor Entries(FDEs)

Function Descriptor Entries serve as the bridge between functions andtheir unwinding information. Each FDE describes a function's locationand provides a direct link to its corresponding Frame Row Entries(FREs), which contain the actual unwinding data.

struct [[gnu::packed]] sframe_func_desc_entry {
  int32_t sfde_func_start_address;
  uint32_t sfde_func_size;
  uint32_t sfde_func_start_fre_off;
  uint32_t sfde_func_num_fres;
  // bits 0-3 fretype: sfre_start_address type
  // bit 4 fdetype: SFRAME_FDE_TYPE_PCINC or SFRAME_FDE_TYPE_PCMASK
  // bit 5 pauth_key: (AArch64 only) the signing key for the return address
  uint8_t sfde_func_info;
  // The size of the repetitive code block for SFRAME_FDE_TYPE_PCMASK; used by .plt
  uint8_t sfde_func_rep_size;
  uint16_t sfde_func_padding2;
};

The current design has room for optimization. Thesfde_func_num_fres field uses a full 32 bits, which iswasteful for most functions. We could use uint16_t instead,requiring exceptionally large functions to be split across multipleFDEs.

It's important to note that SFrame's function concept represents coderanges rather than logical program functions. This distinction becomesparticularly relevant with compiler optimizations like hot-coldsplitting, where a single logical function may span multiplenon-contiguous code ranges, each requiring its own FDE.

The padding field sfde_func_padding2 representsunnecessary overhead in modern architectures where unaligned memoryaccess performs efficiently, making the alignment benefitsnegligible.

To enable binary search on sfde_func_start_address, FDEsmust maintain a fixed size, which precludes the use of variable-lengthinteger encodings like PrefixVarInt.

Frame Row Entries (FREs)

Frame Row Entries contain the actual unwinding information forspecific program counter ranges within a function. The template designallows for different address sizes based on the function'scharacteristics.

template <class AddrType>
struct [[gnu::packed]] sframe_frame_row_entry {
  // If the fdetype is SFRAME_FDE_TYPE_PCINC, this is an offset relative to sfde_func_start_address
  AddrType sfre_start_address;
  // bit 0 fre_cfa_base_reg_id: define BASE_REG as either FP or SP
  // bits 1-4 fre_offset_count: typically 1 to 3, describing CFA, FP, and RA
  // bits 5-6 fre_offset_size: byte size of offset entries (1, 2, or 4 bytes)
  sframe_fre_info sfre_info;
};

Each FRE contains variable-length stack offsets stored as trailingdata. The fre_offset_size field determines whether offsetsuse 1, 2, or 4 bytes (uint8_t, uint16_t, oruint32_t), allowing optimal space usage based on stackframe sizes.

Architecture-specific stackoffsets

SFrame adapts to different processor architectures by varying itsoffset encoding to match their respective calling conventions andarchitectural constraints.

x86-64

The x86-64 implementation takes advantage of the architecture'spredictable stack layout:

First offset: Encodes CFA as BASE_REG + offset
Second offset (if present): Encodes FP asCFA + offset
Return address: Computed implicitly asCFA + sfh_cfa_fixed_ra_offset (using the header field)

AArch64

AArch64's more flexible calling conventions require explicit returnaddress tracking:

First offset: Encodes CFA as BASE_REG + offset
Second offset: Encodes return address asCFA + offset
Third offset (if present): Encodes FP asCFA + offset

The explicit return address encoding accommodates AArch64's variablestack layouts and link register usage patterns.

s390x

TODO

`.eh_frame` and`.sframe`

SFrame reduces size compared to .eh_frame plus.eh_frame_hdr by:

Eliminating .eh_frame_hdr through sortedsfde_func_start_address fields
Replacing CIE pointers with direct FDE-to-FRE references
Using variable-width sfre_start_address fields (1 or 2bytes) for small functions
Storing start addresses instead of address ranges..eh_frame address ranges
Start addresses in a small function use 1 or 2 byte fields, moreefficient than .eh_frame initial_location, which needs atleast 4 bytes (DW_EH_PE_sdata4).
Hard-coding stack offsets rather than using flexible registerspecifications

However, the bytecode design of .eh_frame can sometimesbe more efficient than .sframe, as demonstrated onx86-64.

SFrame serves as a specialized complement to .eh_framerather than a complement replacement. The current version does notinclude personality routines, Language Specific Data Area (LSDA)information, or the ability to encode extra callee-saved registers.While these constraints make SFrame ideal for profilers and debuggers,they prevent it from supporting C++ exception handling, wherelibstdc++/libc++abi requires the full .eh_frame featureset.

In practice, executables and shared objects will likely contain allthree sections:

.eh_frame: Complete unwinding information for exceptionhandling
.eh_frame_hdr: Fast lookup table for.eh_frame
.sframe: Compact unwinding information forprofilers

The auxiliary header, currently unused, provides a pathway for futureenhancements. It could potentially accommodate .eh_frameaugmentation data such as personality routines, language-specific dataareas (LSDAs), and signal frame handling, bridging some of the currentfunctionality gaps.

Large text section support

The sfde_func_start_address field uses a signed 32-bitoffset to reference functions, providing a ±2GB addressing range fromthe field's location. This signed encoding offers flexibility in sectionordering-.sframe can be placed either before or after textsections.

However, this approach faces limitations with large binaries,particularly when LLVM generates .ltext sections forx86-64. The typical section layout creates significant gaps between.sframe and .ltext:

.ltext          // Large text section
.lrodata        // Large read-only data
.rodata         // Regular read-only data
// .eh_frame and .sframe position
.text           // Regular text section
.data
.bss
.ldata          // Large data
.lbss           // Large BSS

Linking and execution views

SFrame employs a unified indexed format across both relocatable files(linking view) and executable files (execution view). While this designconsistency appears elegant, it introduces significant complications intoolchain implementation.

Currently, Binutils enforces a single-element structure within each.sframe section, regardless of whether it resides in arelocatable object or final executable. This approach differs from DWARFsections, which support multiple concatenated elements, each with itsown header and body.

This design choice stems from Linux kernel requirements, where kernelmodules are relocatable files created with ld -r. Thekernel's SFrame support expects each module to contain a single indexedformat for efficient runtime processing. Consequently, GNU ld merges allinput .sframe sections into a single indexed element, evenwhen producing relocatable files. This behavior deviates from standardrelocatable linkingconventions that suppress synthetic section finalization.

The fundamental design issue lies in making linker merging mandatory.For optimal portability, unwinders should support multiple-elementstructures within a .sframe section. When a linker buildsan index for .sframe, it should be viewed as anoptimization that relieves the unwinder from constructing its own indexat runtime. This index construction should remain optional rather thanrequired. While the SFRAME_F_FDE_SORTED flag can be clearedto permit unsorted FDEs, current unwinder implementations do not seem tosupport multiple elements in a single section.

A future version should distinguish between linking and executionviews:

Linking view: Assemblers produce a simpler format, omittingindex-specific metadata fields
Linkers concatenate .sframe input sections by default,consistent with DWARF and other metadata sections
A new --sframe-index option enables linkers tosynthesize a .sframe_idx section containing the indexedformat, analogous to --gdb-indexand --debug-names. The linker builds.sframe_idx from input .sframe sections. Tosupport the Linux kernel workflow (ld -r for kernelmodules), ld -r --sframe-index must also generate theindexed format.
Linker scripts control placement using:.sframe_idx : { *(.sframe_idx) }. From the linkerperspective, .sframe input sections have been replaced bythe linker-synthesized .sframe_idx. This output sectiondescription places the .sframe_idx into the.sframe_idx output section.

The linking view could omit index-specific metadata fields such assfh_num_fdes, sfh_num_fres,sfh_fdeoff, and sfh_freoff.

The .debug_pubnames/.gdb_index designprovides an excellent model for separate linking and execution views.While DWARF v5's .debug_names unifies both views at thecost of larger linking formats, it represents a reasonable tradeoffsince relocatable files contain only a single .debug_namessection, and debuggers can efficiently load sections with concatenatedname tables.

Section group complianceissues

The current monolithic .sframe design creates ELFspecification violations when dealing with COMDAT sectiongroups. GNU Assembler generates a single .sframesection containing relocations to STB_LOCAL symbols frommultiple text sections, including those in different section groups.

This violates the ELF section group rule, which states:

A symbol table entry with STB_LOCAL binding that isdefined relative to one of a group's sections, and that is contained ina symbol table section that is not part of the group, must be discardedif the group members are discarded. References to this symbol tableentry from outside the group are not allowed.

The problem manifests when inline functions are deduplicated:

cat > a.cc <<'eof'
[[gnu::noinline]] inline int inl() { return 0; }
auto *fa = inl;
eof
cat > b.cc <<'eof'
[[gnu::noinline]] inline int inl() { return 0; }
auto *fb = inl;
eof
~/opt/gcc-15/bin/g++ -Wa,--gsframe -c a.cc b.cc

Linkers correctly reject this violation:

% ld.lld a.o b.o
ld.lld: error: relocation refers to a discarded section: .text._Z3inlv
>>> defined in b.o
>>> referenced by b.cc
>>>               b.o:(.sframe+0x1c)

% gold a.o b.o
b.o(.sframe+0x1c): error: relocation refers to local symbol ".text._Z3inlv" [2], which is defined in a discarded section
  section group signature: "inl()"
  prevailing definition is from a.o

(In 2020, I reported a similarissue for GCC -fpatchable-function-entry=.)

Some linkers don't implement this error check. A separate issuearises with garbage collection: by default, an unreferenced.sframe section will be discarded. If the linker implementsa workaround to force-retain .sframe, it mightinadvertently retain all text sections referenced by.sframe, even those that would otherwise be garbagecollected.

The solution requires restructuring the assembler's output strategy.Instead of creating a monolithic .sframe section, theassembler should generate individual SFrame sections corresponding toeach text section. When a text section belongs to a COMDAT group, itsassociated SFrame section must join the same group. For standalone textsections, the SHF_LINK_ORDER flag should establish theproper association.

This approach would create multiple SFrame sections withinrelocatable files, making the size optimization benefits of a simplifiedlinking view format even more compelling. While this comes with theoverhead of additional section headers (where eachElf64_Shdr consumes 64 bytes), it's a cost we should pay tobe a good ELF citizen. This reinforces the value of my sectionheader reduction proposal.

Linker relaxationconsiderations

Since .sframe carries the SHF_ALLOC flag,it affects text section addresses and consequently influences linkerrelaxation on architectures like RISC-V and LoongArch.

If variable-length encoding is introduced to the format,.sframe would behave as an address-dependent sectionsimilar to .relr.dyn. However, this dependency should notpose significant implementation challenges.

Linker complexity

Endianness considerations

The SFrame format currently supports endianness variants, whichcomplicates toolchain implementation. While runtime consumers typicallytarget a single endianness, development tools must handle both variantsto support cross-compilation workflows.

The endianness discussion in The future of 32-bit support inthe kernel reinforces my belief in preferring universallittle-endian for new formats. A universal little-endian approach wouldreduce implementation complexity by eliminating the need for:

Endianness-aware function calls likeread32le(config, p) where config->endianspecifies the object file's byte order
Template-based abstractions such astemplate <class Endian> that must wrap every dataaccess function

Instead, toolchain code could use straightforward calls likeread32le(p), streamlining both implementation andmaintenance.

This approach remains efficient even on big-endian architectures likeIBM z/Architecture and POWER. z/Architecture's LOAD REVERSEDinstructions, for instance, handle byte swapping with minimal overhead,often requiring no additional instructions beyond normal loads. Whileslight performance differences may exist compared to native endianoperations, the toolchain simplification benefits generally outweighthese concerns.

#define WIDTH(x) \
typedef __UINT##x##_TYPE__ [[gnu::aligned(1)]] uint##x; \
uint##x load_inc##x(uint##x *p) { return *p+1; } \
uint##x load_bswap_inc##x(uint##x *p) { return __builtin_bswap##x(*p)+1; }; \
uint##x load_eq##x(uint##x *p) { return *p==3; } \
uint##x load_bswap_eq##x(uint##x *p) { return __builtin_bswap##x(*p)==3; }; \

WIDTH(16);
WIDTH(32);
WIDTH(64);

However, I understand that my opinion is probably not popular withinthe object file format community and faces resistance from stakeholderswith significant big-endian investments.

Questioned benefits

SFrame's primary value proposition centers on enabling frame pointeromission while preserving unwinding capabilities. In scenarios whereusers already omit leaf frame pointers, SFrame could theoretically allowswitching from-fno-omit-frame-pointer -momit-leaf-frame-pointer to-fomit-frame-pointer -momit-leaf-frame-pointer. Thisbenefit appears most significant on x86-64, which has limitedgeneral-purpose registers (without APX). Performance analyses show mixedresults: some studies claim frame pointers degrade performance by lessthan 1%, while others suggest 1-2%. However, this argument overlooks acritical tradeoff—SFrame unwinding itself performs worse than framepointer unwinding, potentially negating any performance gains fromregister availability.

Another claimed advantage is SFrame's ability to provide coverage infunction prologues and epilogues, where frame-pointer-based unwindingmay miss frames. Yet this overlooks a straightforward alternative: framepointer unwinding can be enhanced to detect prologue and epiloguepatterns by disassembling instructions at the program counter. Nocomparative analysis exists between this enhancement approach andSFrame's solution.

SFrame also faces a practical consideration: the .sframesection likely requires kernel page-in during unwinding, while theprocess stack is more likely already resident in physical memory.

Looking ahead, hardware-assisted unwinding through features like x86Shadow Stack and AArch64 Guarded Control Stack may reshape the entirelandscape, potentially reducing the relevance of metadata-basedunwinding formats.

Summary

SFrame represents a pragmatic approach to stack unwinding thatachieves significant size reductions by trading flexibility forcompactness. Its design presents several implementation challenges thatmerit consideration for future versions.

The unified linking/execution view complicates toolchainimplementation without clear benefits
Section group compliance issues create significant concerns forlinker developers
Limited large text section support restricts deployment in modernbinaries
Uncertainty remains about SFrame's viability as a complete.eh_frame replacement

Beyond these implementation concerns, SFrame faces broader ecosystemchallenges. As Ian Rogers noted in LWN, system-wide profilingencounters limitations when system calls haven't transitioned to usercode, BPF helpers may return placeholder values, and JIT compilersrequire additional SFrame support.

The format's future also depends on evolving unwinding strategies.Frame pointer unwinding could potentially be enhanced to detect prologueand epilogue patterns, though comprehensive comparisons with SFrameremain absent from current literature.

普通视图