Remarks on SFrame
The .sframe format is a lightweight alternative to.eh_frame and .eh_frame_hdr designed forprofilers' stackunwinding. SFrame achieves some size reduction by tradingfunctionality and flexibility for compactness and eliminating.eh_frame CIE/FDE overhead, though its stack offsets areless compact than .eh_frame's CFI instructions (bytecodedesign). However, it remains significantly larger than highly compactunwinding schemes such as
SFrame focuses on three fundamental elements for each function:
- Canonical Frame Address (CFA): The base address for stack framecalculations
- Return address
- Frame pointer
An .sframe section follows a straightforward layout:
- Header: Contains metadata and offset information
- Auxiliary header (optional): Reserved for future extensions
- Function Descriptor Entries (FDEs): Array describing eachfunction
- Frame Row Entries (FREs): Arrays of unwinding information perfunction
1 |
struct [[gnu::packed]] sframe_header { |
While magic is popular choices for file formats, they deviate fromestablished ELF conventions, which simplifies utilizes the section typefor distinction.
The version field resembles the similar uses within DWARF sectionheaders. SFrame will likely evolve over time, unlike ELF's more stablecontrol structures. This means we'll probably need to keep producers andconsumers evolving in lockstep, which creates a stronger case forinternal versioning. An internal version field would allow linkers toupgrade or ignore unsupported low-version input pieces, providing moreflexibility in handling version mismatches.
Data structures
Function Descriptor Entries(FDEs)
Function Descriptor Entries serve as the bridge between functions andtheir unwinding information. Each FDE describes a function's locationand provides a direct link to its corresponding Frame Row Entries(FREs), which contain the actual unwinding data.
1 |
struct [[gnu::packed]] sframe_func_desc_entry { |
The current design has room for optimization. Thesfde_func_num_fres field uses a full 32 bits, which iswasteful for most functions. We could use uint16_t instead,requiring exceptionally large functions to be split across multipleFDEs.
It's important to note that SFrame's function concept represents coderanges rather than logical program functions. This distinction becomesparticularly relevant with compiler optimizations like hot-coldsplitting, where a single logical function may span multiplenon-contiguous code ranges, each requiring its own FDE.
The padding field sfde_func_padding2 representsunnecessary overhead in modern architectures where unaligned memoryaccess performs efficiently, making the alignment benefitsnegligible.
To enable binary search on sfde_func_start_address, FDEsmust maintain a fixed size, which precludes the use of variable-lengthinteger encodings like PrefixVarInt.
Frame Row Entries (FREs)
Frame Row Entries contain the actual unwinding information forspecific program counter ranges within a function. The template designallows for different address sizes based on the function'scharacteristics.
1 |
template <class AddrType> |
Each FRE contains variable-length stack offsets stored as trailingdata. The fre_offset_size field determines whether offsetsuse 1, 2, or 4 bytes (uint8_t, uint16_t, oruint32_t), allowing optimal space usage based on stackframe sizes.
Architecture-specific stackoffsets
SFrame adapts to different processor architectures by varying itsoffset encoding to match their respective calling conventions andarchitectural constraints.
x86-64
The x86-64 implementation takes advantage of the architecture'spredictable stack layout:
- First offset: Encodes CFA as
BASE_REG + offset - Second offset (if present): Encodes FP as
CFA + offset - Return address: Computed implicitly as
CFA + sfh_cfa_fixed_ra_offset(using the header field)
AArch64
AArch64's more flexible calling conventions require explicit returnaddress tracking:
- First offset: Encodes CFA as
BASE_REG + offset - Second offset: Encodes return address as
CFA + offset - Third offset (if present): Encodes FP as
CFA + offset
The explicit return address encoding accommodates AArch64's variablestack layouts and link register usage patterns.
s390x
TODO
.eh_frame and.sframe
SFrame reduces size compared to .eh_frame plus.eh_frame_hdr by:
- Eliminating
.eh_frame_hdrthrough sortedsfde_func_start_addressfields - Replacing CIE pointers with direct FDE-to-FRE references
- Using variable-width
sfre_start_addressfields (1 or 2bytes) for small functions - Storing start addresses instead of address ranges.
.eh_frameaddress ranges - Start addresses in a small function use 1 or 2 byte fields, moreefficient than
.eh_frameinitial_location, which needs atleast 4 bytes (DW_EH_PE_sdata4). - Hard-coding stack offsets rather than using flexible registerspecifications
However, the bytecode design of .eh_frame can sometimesbe more efficient than .sframe, as demonstrated onx86-64.
SFrame serves as a specialized complement to .eh_framerather than a complete replacement. The current version does not includepersonality routines, Language Specific Data Area (LSDA) information, orthe ability to encode extra callee-saved registers. While theseconstraints make SFrame ideal for profilers and debuggers, they preventit from supporting C++ exception handling, where libstdc++/libc++abirequires the full .eh_frame feature set.
In practice, executables and shared objects will likely contain allthree sections:
-
.eh_frame: Complete unwinding information for exceptionhandling -
.eh_frame_hdr: Fast lookup table for.eh_frame -
.sframe: Compact unwinding information forprofilers
The auxiliary header, currently unused, provides a pathway for futureenhancements. It could potentially accommodate .eh_frameaugmentation data such as personality routines, language-specific dataareas (LSDAs), and signal frame handling, bridging some of the currentfunctionality gaps.
Large text section support
The sfde_func_start_address field uses a signed 32-bitoffset to reference functions, providing a ±2GB addressing range fromthe field's location. This signed encoding offers flexibility in sectionordering-.sframe can be placed either before or after textsections.
However, this approach faces limitations with large binaries,particularly when LLVM generates .ltext sections forx86-64. The typical section layout creates significant gaps between.sframe and .ltext:
1 |
.ltext // Large text section |
Linking and execution views
SFrame employs a unified indexed format across both relocatable files(linking view) and executable files (execution view). While this designsimplifies stack tracers, it introduces significant complications intoolchain implementation.
Currently, Binutils enforces a single-element structure within each.sframe section, regardless of whether it resides in arelocatable object or final executable. This approach differs from DWARFsections, which support multiple concatenated elements, each with itsown header and body.
This design choice stems from Linux kernel requirements, where kernelmodules are relocatable files created with ld -r. Thepending SFrame support for linux-perf expects each module to contain asingle indexed format for efficient runtime processing. Consequently,GNU ld merges all input .sframe sections into a singleindexed element, even when producing relocatable files. This behaviordeviates from standard
For optimal portability, unwinders should support multiple-elementstructures within a .sframe section. When a linker buildsan index for .sframe, it should be viewed as anoptimization that relieves the unwinder from constructing its own indexat runtime. This index construction should remain optional rather thanrequired. While the SFRAME_F_FDE_SORTED flag can be clearedto permit unsorted FDEs, current unwinder implementations do not seem tosupport multiple elements in a single section.
The fundamental design issue lies in making linker merging mandatoryrather than optional. LLVM supports numerous well-behaved metadatasections (__asan_globals, .stack_sizes,__patchable_function_entries, __llvm_prf_cnts,__sancov_bools, __llvm_covmap,__llvm_gcov_ctr_section, .llvmcmd, andllvm_offload_entries) that concatenate without issues.SFrame stands apart as the only metadata section demandingversion-specific merging as default linker behavior, creatingunprecedented maintenance burden.
For optimal portability, we must support object files from diverseorigins—not just those built from a single toolchain. In environmentswhere almost everything is built from source with a single toolchainoffering strong SFrame support, forcing default-on index building may beacceptable. However, we must also accommodate environments with prebuiltobject files using older SFrame versions, or toolchains that don'tsupport old formats.
A future version should distinguish between linking and executionviews:
- Linking view: Assemblers produce a simpler format, omittingindex-specific metadata fields
- Linkers concatenate
.sframeinput sections by default,consistent with DWARF and other metadata sections - A new
--sframe-indexoption enables linkers tosynthesize a.sframe_idxsection containing the indexedformat, analogous to--gdb-indexand--debug-names. The linker builds.sframe_idxfrom input.sframesections. Tosupport the Linux kernel workflow (ld -rfor kernelmodules),ld -r --sframe-indexmust also generate theindexed format. - Linker scripts control placement using:
.sframe_idx : { *(.sframe_idx) }. From the linkerperspective,.sframeinput sections have been replaced bythe linker-synthesized.sframe_idx. This output sectiondescription places the.sframe_idxinto the.sframe_idxoutput section.
The linking view could omit index-specific metadata fields such assfh_num_fdes, sfh_num_fres,sfh_fdeoff, and sfh_freoff.
The .debug_pubnames/.gdb_index designprovides an excellent model for separate linking and execution views.While DWARF v5's .debug_names unifies both views at thecost of larger linking formats, it represents a reasonable tradeoffsince relocatable files contain only a single .debug_namessection, and debuggers can efficiently load sections with concatenatedname tables.
Version compatibilitychallenges
The current design creates significant version compatibilityproblems. When a linker supports v3 but encounters object files with v2.sframe sections, it faces impossible choices:
- Discard v2 sections: Silently losing functionality
- Report errors: Breaking builds with mixed-version object files
- Concatenate sections: Currently unsupported by unwinders
- Upgrade v2 to v3: Requires maintaining version-specific merge logicfor every version
This differs fundamentally from reading a format—each version needsversion-specific merging logic in every linker. Consider thescenario where v2 uses layout A, v3 uses layout B, and v4 uses layout C.A linker receiving objects with all three versions must produce coherentoutput with proper indexing while maintaining version-specific mergelogic for each.
Real-world mixing scenarios include:
- Third-party vendor libraries built with older toolchains
- Users linking against prebuilt libraries from different sources
- Users who don't need SFrame but must handle prebuilt libraries witholder versions
- Users updating their linker to a newer version that drops legacySFrame support
The only feasible approach for handling version mismatches is toconcatenate .sframe sections by default, with consumersrequesting indices explicitly via --sframe-index. When--sframe-index is used, the linker can report warnings forunrecognized versions while gracefully handling mixed scenarios throughconcatenation.
Section group complianceissues
The current monolithic .sframe design creates ELFspecification violations when dealing with .sframesection containing relocations to STB_LOCAL symbols frommultiple text sections, including those in different section groups.
This violates the ELF section group rule, which states:
A symbol table entry with
STB_LOCALbinding that isdefined relative to one of a group's sections, and that is contained ina symbol table section that is not part of the group, must be discardedif the group members are discarded. References to this symbol tableentry from outside the group are not allowed.
The problem manifests when inline functions are deduplicated:
1 |
cat > a.cc <<'eof' |
Linkers correctly reject this violation:
1 |
% ld.lld a.o b.o |
(In 2020, I reported a similarissue for GCC -fpatchable-function-entry=.)
Some linkers don't implement this error check. A separate issuearises with garbage collection: by default, an unreferenced.sframe section will be discarded. If the linker implementsa workaround to force-retain .sframe, it mightinadvertently retain all text sections referenced by.sframe, even those that would otherwise be garbagecollected.
The solution requires restructuring the assembler's output strategy.Instead of creating a monolithic .sframe section, theassembler should generate individual SFrame sections corresponding toeach text section. When a text section belongs to a COMDAT group, itsassociated SFrame section must join the same group. For standalone textsections, the SHF_LINK_ORDER flag should establish theproper association.
This approach would create multiple SFrame sections withinrelocatable files, making the size optimization benefits of a simplifiedlinking view format even more compelling. While this comes with theoverhead of additional section headers (where eachElf64_Shdr consumes 64 bytes), it's a cost we should pay tobe a good ELF citizen. This reinforces the value of my
Linker relaxationconsiderations
Since .sframe carries the SHF_ALLOC flag,it affects text section addresses and consequently influences
If variable-length encoding is introduced to the format,.sframe would behave as an address-dependent sectionsimilar to .relr.dyn. However, this dependency should notpose significant implementation challenges.
Linker complexity
SFrame introduces unprecedented complexity compared to other metadataformats. The requirement for version-specific merging as defaultbehavior creates maintenance burden unmatched by any other loadablemetadata section.
The opt-in --sframe-index approach would provide severalbenefits:
- Linkers can support basic
.sframehandling(concatenation) without implementing full index-building logic - Mixed-version scenarios degrade gracefully to concatenation ratherthan failing
- Consistent with
.debug_namesprecedent, which offersoptional indexing - Distros can roll out SFrame support incrementally without requiringall linkers to support index building immediately
- The format can mature and prove its value before committing tocomplex default behavior
Stack unwinders already need to support .sframe sectionsacross the main executable and all shared objects. Supporting multipleconcatenated elements within a single .sframe sectionpresents no fundamental technical barrier. Runtime merging support isnecessary for optimal portability (e.g., old linkers), and a library canbe provided to share code between GNU ld and the kernel.
In lld/ELF, linker-created sections are called synthetic sections. Nosynthetic section requires version-specific merging. Even the quitecomplex .gdb_index section doesn't require this, provingits stability after years of use.
Endianness considerations
The SFrame format currently supports endianness variants, whichcomplicates toolchain implementation. While runtime consumers typicallytarget a single endianness, development tools must handle both variantsto support cross-compilation workflows.
The endianness discussion in
- Endianness-aware function calls like
read32le(config, p)whereconfig->endianspecifies the object file's byte order - Template-based abstractions such as
template <class Endian>that must wrap every dataaccess function
Instead, toolchain code could use straightforward calls likeread32le(p), streamlining both implementation andmaintenance.
This approach remains efficient even on big-endian architectures likeIBM z/Architecture and POWER. z/Architecture's LOAD REVERSEDinstructions, for instance, handle byte swapping with minimal overhead,often requiring no additional instructions beyond normal loads. Whileslight performance differences may exist compared to native endianoperations, the toolchain simplification benefits generally outweighthese concerns.
1 |
#define WIDTH(x) \ |
However, I understand that my opinion is probably not popular withinthe object file format community and faces resistance from stakeholderswith significant big-endian investments.
Alternative:linker-generated section
There is an alternative design that makes the assembler's.sframe section unnecessary.
This design allows the linker to merge and optimize the.eh_frame section as usual (which requires CIE and FDEboundary information). The linker then analyzes the CFI instructions (astep previously unneeded) and generates the .sframesection. Since the linker only reads the stable .eh_frameand produces .sframe, there's no mix-and-match concern.
CFI instruction decoding introduces additional complexity. However,this is balanced by the architectural advantage of centralizing thelogic; it avoids scattering processing code (similar toSHF_MERGE and .eh_frame) across the linkercode. However, this fine-grained knowledge of the format may expose thelinker to more frequent updates--a serious risk, given that the linker'sfoundational role in the build process demands exceptional stability androbustness.
A more cautious intermediate strategy could leverage existing Linuxdistribution post-processing tools, modifying them to append.sframe sections to executable and shared object files.While this introduces more friction than native linker support andrequires integration into package build systems, it offers severaladvantages:
- Allows
.sframeformat experimentation without imposinglinker complexity - Provides time for the format to mature and prove its value
- Enables testing across diverse userspace packages before committingto linker integration
- Post-link tools can optimize and even overwrite sectionsin-place
- For cases where optimization significantly shrinks the section,
.sframecan be placed at the end of the file (similar toBOLT moving.rodata)
However, this approach faces practical challenges. Post-processingadds build complexity, particularly with features like build-ids andread-only file systems. The success of .gdb_index, wherelinker support (--gdb-index) proved more popular thanpost-link tools, suggests that native linker support eventually becomesnecessary for widespread adoption. The key question is timing: shouldlinker integration be the starting point or the outcome of provenstability?
SHF_ALLOC considerations
The .sframe section carries the SHF_ALLOCflag, meaning it's loaded as part of the program's read-only datasegment. This design choice creates tradeoffs:
With SHF_ALLOC: - .sframe contributesto initial read-only data segment consumption - Can be accessed directlyas part of the memory-mapped area - No runtime mmap cost for tracers
Without SHF_ALLOC: - No upfront memory cost -Tracers must open the file and mmap the section on demand - Runtime costmay not amortize well for frequent tracing
Analysis of 337 files in /usr/bin and /usr/lib/x86_64-linux-gnu/shows .eh_frame typically consumes 5.2% (median: 5.1%) offile size:
1 |
EH_Frame size distribution: |
If .sframe size is comparable to .eh_frame,this represents significant overhead for applications that never usestack tracing—likely the majority of users. Most users will not needstack trace features, raising the question of whether having.sframe always loaded is an acceptable overhead fordistributions shipping it by default.
perf supports .debug_frame(tools/perf/util/unwind-libunwind-local.c), which does not haveSHF_ALLOC. While there's a difference between status quoand what's optimal, the non-SHF_ALLOC approach deservesconsideration for scenarios where runtime tracing overhead can beamortized or where memory footprint matters more than immediateaccess.
Kernel challenges
The .sframe section may not be resident in the physicalmemory. SFrame proposers are attempting to defer user stack traces untilsyscall boundaries.
Ian Rogers points out that BPF programs can no longer simply stacktrace user code. This change breaks stack trace deduplication, acommonly used BPF primitive.
Summary
SFrame represents a pragmatic approach to stack unwinding thatachieves size reductions by trading flexibility for compactness. Itsdesign presents several implementation challenges that meritconsideration for future versions:
- The unified linking/execution view complicates toolchainimplementation without clear benefits
- Section group compliance issues create significant concerns forlinker developers
- Limited large text section support restricts deployment in modernbinaries
- Uncertainty remains about SFrame's viability as a complete
.eh_framereplacement
Beyond these implementation concerns, SFrame faces broader ecosystemchallenges.
Questioned benefits
Even setting aside the technical implementation challenges, SFrame'sfundamental value proposition warrants scrutiny.
SFrame's primary benefit centers on enabling frame pointer omissionwhile preserving unwinding capabilities. In scenarios where usersalready omit leaf frame pointers, SFrame could theoretically allowswitching from-fno-omit-frame-pointer -momit-leaf-frame-pointer to-fomit-frame-pointer -momit-leaf-frame-pointer. Thisbenefit appears most significant on x86-64, which has limitedgeneral-purpose registers (without APX). Performance analyses show mixedresults: some studies claim frame pointers degrade performance by lessthan 1%, while others suggest 1-2%. However, this argument overlooks acritical tradeoff—SFrame unwinding itself performs worse than framepointer unwinding, potentially negating any performance gains fromregister availability.
Another claimed advantage is SFrame's ability to provide coverage infunction prologues and epilogues, where frame-pointer-based unwindingmay miss frames. Yet this overlooks a straightforward alternative: framepointer unwinding can be enhanced to detect prologue and epiloguepatterns by disassembling instructions at the program counter. Nocomparative analysis exists between this enhancement approach andSFrame's solution.
SFrame also faces a practical consideration: the .sframesection likely requires kernel page-in during unwinding, while theprocess stack is more likely already resident in physical memory. As IanRogers noted in LWN,system-wide profiling encounters limitations when system calls haven'ttransitioned to user code, BPF helpers may return placeholder values,and JIT compilers require additional SFrame support.
Looking ahead, hardware-assisted unwinding through features like x86Shadow Stack and AArch64 Guarded Control Stack may reshape the entirelandscape, potentially reducing the relevance of metadata-basedunwinding formats. Meanwhile, compact unwinding schemes like .eh_frame. There isa feature request for a compact information for AArch64
If we proceed, here ishow to do it right
According to
To ensure rapid SFrame evolution without compatibility concerns, abetter approach is to build a library that parses .eh_frameand generates SFrame. The Linux kernel can then use this library (inobjtool?) to generate SFrame for vmlinux and modules. Relying onassembler/linker output for this critical metadata format requires alevel of stability that is currently concerning.
The ongoing maintenance implications warrant particular attention.Observing the binutils mailing list reveals a significant volume ofSFrame commits. Most linker features stabilize quickly after initialimplementation, but SFrame appears to require continued evolution. Giventhe linker's foundational role in the build process, which demandsexceptional stability and robustness, the long-term maintenance burdendeserves careful consideration.
Early integration into GNU toolchain has provided valuable feedbackfor format evolution, but this comes at the cost of coupling theformat's maturity to linker stability. The SFrame GNU toolchaindevelopers exhibit a