Remarks on SFrame
The .sframe format is a lightweight alternative to.eh_frame and .eh_frame_hdr designed forprofilers' stackunwinding. SFrame achieves some size reduction by tradingfunctionality and flexibility for compactness and eliminating.eh_frame CIE/FDE overhead, though its stack offsets areless compact than .eh_frame's CFI instructions (bytecodedesign). However, it remains significantly larger than highly compactunwinding schemes such as
SFrame focuses on three fundamental elements for each function:
- Canonical Frame Address (CFA): The base address for stack framecalculations
- Return address
- Frame pointer
An .sframe section follows a straightforward layout:
- Header: Contains metadata and offset information
- Auxiliary header (optional): Reserved for future extensions
- Function Descriptor Entries (FDEs): Array describing eachfunction
- Frame Row Entries (FREs): Arrays of unwinding information perfunction
1 |
struct [[gnu::packed]] sframe_header { |
While magic is popular choices for file formats, they deviate fromestablished ELF conventions, which simplifies utilizes the section typefor distinction.
The version field resembles the similar uses within DWARF sectionheaders. SFrame will likely evolve over time, unlike ELF's more stablecontrol structures. This means we'll probably need to keep producers andconsumers evolving in lockstep, which creates a stronger case forinternal versioning. An internal version field would allow linkers toupgrade or ignore unsupported low-version input pieces, providing moreflexibility in handling version mismatches.
Data structures
Function Descriptor Entries(FDEs)
Function Descriptor Entries serve as the bridge between functions andtheir unwinding information. Each FDE describes a function's locationand provides a direct link to its corresponding Frame Row Entries(FREs), which contain the actual unwinding data.
1 |
struct [[gnu::packed]] sframe_func_desc_entry { |
The current design has room for optimization. Thesfde_func_num_fres field uses a full 32 bits, which iswasteful for most functions. We could use uint16_t instead,requiring exceptionally large functions to be split across multipleFDEs.
It's important to note that SFrame's function concept represents coderanges rather than logical program functions. This distinction becomesparticularly relevant with compiler optimizations like hot-coldsplitting, where a single logical function may span multiplenon-contiguous code ranges, each requiring its own FDE.
The padding field sfde_func_padding2 representsunnecessary overhead in modern architectures where unaligned memoryaccess performs efficiently, making the alignment benefitsnegligible.
To enable binary search on sfde_func_start_address, FDEsmust maintain a fixed size, which precludes the use of variable-lengthinteger encodings like PrefixVarInt.
Frame Row Entries (FREs)
Frame Row Entries contain the actual unwinding information forspecific program counter ranges within a function. The template designallows for different address sizes based on the function'scharacteristics.
1 |
template <class AddrType> |
Each FRE contains variable-length stack offsets stored as trailingdata. The fre_offset_size field determines whether offsetsuse 1, 2, or 4 bytes (uint8_t, uint16_t, oruint32_t), allowing optimal space usage based on stackframe sizes.
Architecture-specific stackoffsets
SFrame adapts to different processor architectures by varying itsoffset encoding to match their respective calling conventions andarchitectural constraints.
x86-64
The x86-64 implementation takes advantage of the architecture'spredictable stack layout:
- First offset: Encodes CFA as
BASE_REG + offset - Second offset (if present): Encodes FP as
CFA + offset - Return address: Computed implicitly as
CFA + sfh_cfa_fixed_ra_offset(using the header field)
AArch64
AArch64's more flexible calling conventions require explicit returnaddress tracking:
- First offset: Encodes CFA as
BASE_REG + offset - Second offset: Encodes return address as
CFA + offset - Third offset (if present): Encodes FP as
CFA + offset
The explicit return address encoding accommodates AArch64's variablestack layouts and link register usage patterns.
s390x
TODO
.eh_frame and.sframe
SFrame reduces size compared to .eh_frame plus.eh_frame_hdr by:
- Eliminating
.eh_frame_hdrthrough sortedsfde_func_start_addressfields - Replacing CIE pointers with direct FDE-to-FRE references
- Using variable-width
sfre_start_addressfields (1 or 2bytes) for small functions - Storing start addresses instead of address ranges.
.eh_frameaddress ranges - Start addresses in a small function use 1 or 2 byte fields, moreefficient than
.eh_frameinitial_location, which needs atleast 4 bytes (DW_EH_PE_sdata4). - Hard-coding stack offsets rather than using flexible registerspecifications
However, the bytecode design of .eh_frame can sometimesbe more efficient than .sframe, as demonstrated onx86-64.
SFrame serves as a specialized complement to .eh_framerather than a complete replacement. The current version does not includepersonality routines, Language Specific Data Area (LSDA) information, orthe ability to encode extra callee-saved registers. While theseconstraints make SFrame ideal for profilers and debuggers, they preventit from supporting C++ exception handling, where libstdc++/libc++abirequires the full .eh_frame feature set.
In practice, executables and shared objects will likely contain allthree sections:
-
.eh_frame: Complete unwinding information for exceptionhandling -
.eh_frame_hdr: Fast lookup table for.eh_frame -
.sframe: Compact unwinding information forprofilers
The auxiliary header, currently unused, provides a pathway for futureenhancements. It could potentially accommodate .eh_frameaugmentation data such as personality routines, language-specific dataareas (LSDAs), and signal frame handling, bridging some of the currentfunctionality gaps.
Large text section support
The sfde_func_start_address field uses a signed 32-bitoffset to reference functions, providing a ±2GB addressing range fromthe field's location. This signed encoding offers flexibility in sectionordering-.sframe can be placed either before or after textsections.
However, this approach faces limitations with large binaries,particularly when LLVM generates .ltext sections forx86-64. The typical section layout creates significant gaps between.sframe and .ltext:
1 |
.ltext // Large text section |
Object file format designissues
Mandatory index buildingproblems
Currently, Binutils enforces a single-element structure within each.sframe section, regardless of whether it resides in arelocatable object or final executable. While theSFRAME_F_FDE_SORTED flag can be cleared to permit unsortedFDEs, proposed unwinder implementations for the Linux kernel do not seemto support multiple elements in a single section. The design choicemakes linker merging mandatory rather than optional.
This design choice stems from Linux kernel requirements, where kernelmodules are relocatable files created with ld -r. Thepending SFrame support for linux-perf expects each module to contain asingle indexed format for efficient runtime processing. Consequently,GNU ld merges all input .sframe sections into a singleindexed element, even when producing relocatable files. This behaviordeviates from standard
This approach differs from almost every metadata section, whichsupport multiple concatenated elements, each with its own header andbody. LLVM supports numerous well-behaved metadata sections(__asan_globals, .stack_sizes,__patchable_function_entries, __llvm_prf_cnts,__sancov_bools, __llvm_covmap,__llvm_gcov_ctr_section, .llvmcmd, andllvm_offload_entries) that concatenate without issues.SFrame stands apart as the only metadata section demandingversion-specific merging as default linker behavior, creatingunprecedented maintenance burden. For optimal portability, unwindersshould support multiple-element structures within a .sframesection.
For optimal portability, we must support object files from diverseorigins—not just those built from a single toolchain. In environmentswhere almost everything is built from source with a single toolchainoffering strong SFrame support, forcing default-on index building may beacceptable. However, we must also accommodate environments with prebuiltobject files using older SFrame versions, or toolchains that don'tsupport old formats. I believe unwinders should support multiple-elementstructures within a .sframe section. When a linker buildsan index for .sframe, it should be viewed as anoptimization that relieves the unwinder from constructing its own indexat runtime. This index construction should remain optional rather thanrequired.
Sectiongroup compliance and garbage collection issues
GNU Assembler generates a single .sframe sectioncontaining relocations to STB_LOCAL symbols from multipletext sections, including those in different section groups.
This creates ELF specification violations when a referenced textsection is discarded by the
A symbol table entry with
STB_LOCALbinding that isdefined relative to one of a group's sections, and that is contained ina symbol table section that is not part of the group, must be discardedif the group members are discarded. References to this symbol tableentry from outside the group are not allowed.
The problem manifests when inline functions are deduplicated:
1 |
cat > a.cc <<'eof' |
Linkers correctly reject this violation:
1 |
% ld.lld a.o b.o |
(In 2020, I reported a similarissue for GCC -fpatchable-function-entry=.)
Some linkers don't implement this error check. A separate issuearises with garbage collection: by default, an unreferenced.sframe section will be discarded. If the linker implementsa workaround to force-retain .sframe, it mightinadvertently retain all text sections referenced by.sframe, even those that would otherwise be garbagecollected.
The solution requires restructuring the assembler's output strategy.Instead of creating a monolithic .sframe section, theassembler should generate individual SFrame sections corresponding toeach text section. When a text section belongs to a COMDAT group, itsassociated SFrame section must join the same group. For standalone textsections, the SHF_LINK_ORDER flag should establish theproper association.
This approach would create multiple SFrame sections withinrelocatable files, making the size optimization benefits of a simplifiedlinking view format even more compelling. While this comes with theoverhead of additional section headers (where eachElf64_Shdr consumes 64 bytes), it's a cost we should pay tobe a good ELF citizen. This reinforces the value of my
Version compatibilitychallenges
The current design creates significant version compatibilityproblems. When a linker only supports v3 but encounters object fileswith v2 .sframe sections, it faces impossible choices:
- Discard v2 sections: Silently losing functionality
- Report errors: Breaking builds with mixed-version object files
- Concatenate sections: Currently unsupported by unwinders
- Upgrade v2 to v3: Requires maintaining version-specific merge logicfor every version
This differs fundamentally from reading a format—each version needsversion-specific merging logic in every linker. Consider thescenario where v2 uses layout A, v3 uses layout B, and v4 uses layout C.A linker receiving objects with all three versions must produce coherentoutput with proper indexing while maintaining version-specific mergelogic for each.
Real-world mixing scenarios include:
- Third-party vendor libraries built with older toolchains
- Users linking against prebuilt libraries from different sources
- Users who don't need SFrame but must handle prebuilt libraries witholder versions
- Users updating their linker to a newer version that drops legacySFrame support
Most users will not need stack tracing features—this may changeeventually, but that will take many years. In the meantime, they mustaccept unneeded information while handling the resulting compatibilityissues.
Requiring version-specific merging as default behavior would createmaintenance burden unmatched by any other loadable metadata section.
Proposed format separation
A future version should distinguish between linking and executionviews to resolve the compatibility and maintenance challenges outlinedabove. This separation has precedent in existing debug formats:.debug_pubnames/.gdb_index provides anexcellent model for separate linking and execution views. DWARF v5's.debug_names takes a different approach, unifying bothviews at the cost of larger linking formats—a reasonable tradeoff sincerelocatable files contain only a single .debug_namessection, and debuggers can efficiently load sections with concatenatedname tables.
For SFrame, the separation would work as follows:
Separate linking format. Assemblers produce asimpler format, omitting index-specific metadata fields such assfh_num_fdes, sfh_num_fres,sfh_fdeoff, and sfh_freoff.
Default concatenation behavior. Linkers concatenate.sframe input sections by default, consistent with DWARFand other metadata sections. Linkers can handle mixed-version scenariosgracefully without requiring version-specific merge logic, eliminatingthe impossible maintenance burden of keeping version-specific mergelogic for every SFrame version in every linker implementation.Distributions can roll out SFrame support incrementally withoutrequiring all linkers to support index building immediately.
The unwinder implementation cost is manageable. Stack unwindersalready need to support .sframe sections across the mainexecutable and all shared objects. Supporting multiple concatenatedelements within a single .sframe section presents nofundamental technical barrier—this is a one-time implementation costthat provides forward and backward compatibility.
Optional index construction. When the opt-in option--sframe-index is requested, the linker builds an indexfrom recognized versions while reporting warnings for unrecognized ones.This is analogous to --gdb-indexand --debug-names.
With this approach, the linker builds .sframe_idx frominput .sframe sections. To support the Linux kernelworkflow (ld -r for kernel modules),ld -r --sframe-index must also generate the indexedformat.
The index construction happens before section matching in linkescripts. The output section description.sframe_idx : { *(.sframe_idx) } places the synthesized.sframe_idx into the .sframe_idx outputsection. .sframe input sections have been replaced by thelinker-synthesized .sframe_idx, so we don't write*(.sframe).
Alternative:Deriving SFrame from .eh_frame
An alternative approach could eliminate the need for assemblers togenerate .sframe sections directly. Instead, the linkerwould merge and optimize .eh_frame as usual (which requiresCIE and FDE boundary information), then derive .sframe (or.sframe_idx) from the optimized .eh_frame.
This approach offers a significant advantage: since the linker onlyreads the stable .eh_frame format and produces.sframe or .sframe_idx as output, versioncompatibility concerns disappear entirely.
While CFI instruction decoding introduces additional complexity (astep previously unneeded), this is balanced by the architecturaladvantage of centralizing the conversion logic. Rather than scatteringformat-specific processing code throughout the linker (similar to howSHF_MERGE and .eh_frame require specialinternal representations), the transformation logic remainslocalized.
The counterargument centers on maintenance burden. This fine-grainedknowledge of the SFrame format may expose the linker to more frequentupdates as the format evolves—a serious risk, given that the linker'sfoundational role in the build process demands exceptional stability androbustness.
Post-processing alternative
A more cautious intermediate strategy could leverage existing Linuxdistribution post-processing tools, modifying them to append.sframe sections to executable and shared object filesafter linking completes. While this introduces more friction than nativelinker support and requires integration into package build systems, itoffers several compelling advantages:
- Allows
.sframeformat experimentation without imposinglinker complexity - Provides time for the format to mature and prove its value beforecommitting to linker integration
- Enables testing across diverse userspace packages in real-worldscenarios
- Post-link tools can optimize and even overwrite sections in-placewithout linker constraints
- For cases where optimization significantly shrinks the section,
.sframecan be placed at the end of the file (similar toBOLT moving.rodata)
However, this approach faces practical challenges. Post-processingadds build complexity, particularly with features like build-ids andread-only file systems. The success of .gdb_index, wherelinker support (--gdb-index) proved more popular thanpost-link tools, suggests that native linker support eventually becomesnecessary for widespread adoption.
The key question is timing: should linker integration be the startingpoint or the outcome of proven stability?
SHF_ALLOC considerations
The .sframe section carries the SHF_ALLOCflag, meaning it's loaded as part of the program's read-only datasegment. This design choice creates tradeoffs:
With SHF_ALLOC: - .sframe contributesto initial read-only data segment consumption - Can be accessed directlyas part of the memory-mapped area - No runtime mmap cost for tracers
Without SHF_ALLOC: - No upfront memory cost -Tracers must open the file and mmap the section on demand - Runtime costmay not amortize well for frequent tracing
Analysis of 337 files in /usr/bin and /usr/lib/x86_64-linux-gnu/shows .eh_frame typically consumes 5.2% (median: 5.1%) offile size:
1 |
EH_Frame size distribution: |
If .sframe size is comparable to .eh_frame,this represents significant overhead for applications that never usestack tracing—likely the majority of users. Most users will not needstack trace features, raising the question of whether having.sframe always loaded is an acceptable overhead fordistributions shipping it by default.
perf supports .debug_frame(tools/perf/util/unwind-libunwind-local.c), which does not haveSHF_ALLOC. While there's a difference between status quoand what's optimal, the non-SHF_ALLOC approach deservesconsideration for scenarios where runtime tracing overhead can beamortized or where memory footprint matters more than immediateaccess.
Kernel challenges
The .sframe section may not be resident in the physicalmemory. SFrame proposers are attempting to defer user stack traces untilsyscall boundaries.
Ian Rogers points out that BPF programs can no longer simply stacktrace user code. This change breaks stack trace deduplication, acommonly used BPF primitive.
Miscellaneous minorconsiderations
Linker relaxation considerations:
Since .sframe carries the SHF_ALLOC flag,it affects text section addresses and consequently influences
If variable-length encoding is introduced to the format,.sframe would behave as an address-dependent sectionsimilar to .relr.dyn. However, this dependency should notpose significant implementation challenges.
Endianness considerations:
The SFrame format currently supports endianness variants, whichcomplicates toolchain implementation. While runtime consumers typicallytarget a single endianness, development tools must handle both variantsto support cross-compilation workflows.
The endianness discussion in
- Endianness-aware function calls like
read32le(config, p)whereconfig->endianspecifies the object file's byte order - Template-based abstractions such as
template <class Endian>that must wrap every dataaccess function
Instead, toolchain code could use straightforward calls likeread32le(p), streamlining both implementation andmaintenance.
This approach remains efficient even on big-endian architectures likeIBM z/Architecture and POWER. z/Architecture's LOAD REVERSEDinstructions, for instance, handle byte swapping with minimal overhead,often requiring no additional instructions beyond normal loads. Whileslight performance differences may exist compared to native endianoperations, the toolchain simplification benefits generally outweighthese concerns.
1 |
#define WIDTH(x) \ |
However, I understand that my opinion is probably not popular withinthe object file format community and faces resistance from stakeholderswith significant big-endian investments.
Summary
SFrame represents a pragmatic approach to stack unwinding thatachieves size reductions by trading flexibility for compactness. Itsdesign presents several implementation challenges that meritconsideration for future versions:
- The unified linking/execution view complicates toolchainimplementation without clear benefits
- Section group compliance issues create significant concerns forlinker developers
- Limited large text section support restricts deployment in modernbinaries
- Uncertainty remains about SFrame's viability as a complete
.eh_framereplacement
Beyond these implementation concerns, SFrame faces broader ecosystemchallenges.
Questioned benefits
Even setting aside the technical implementation challenges, SFrame'sfundamental value proposition warrants scrutiny.
SFrame's primary benefit centers on enabling frame pointer omissionwhile preserving unwinding capabilities. In scenarios where usersalready omit leaf frame pointers, SFrame could theoretically allowswitching from-fno-omit-frame-pointer -momit-leaf-frame-pointer to-fomit-frame-pointer -momit-leaf-frame-pointer. Thisbenefit appears most significant on x86-64, which has limitedgeneral-purpose registers (without APX). Performance analyses show mixedresults: some studies claim frame pointers degrade performance by lessthan 1%, while others suggest 1-2%. However, this argument overlooks acritical tradeoff—SFrame unwinding itself performs worse than framepointer unwinding, potentially negating any performance gains fromregister availability.
Another claimed advantage is SFrame's ability to provide coverage infunction prologues and epilogues, where frame-pointer-based unwindingmay miss frames. Yet this overlooks a straightforward alternative: framepointer unwinding can be enhanced to detect prologue and epiloguepatterns by disassembling instructions at the program counter. Nocomparative analysis exists between this enhancement approach andSFrame's solution.
SFrame also faces a practical consideration: the .sframesection likely requires kernel page-in during unwinding, while theprocess stack is more likely already resident in physical memory. As IanRogers noted in LWN,system-wide profiling encounters limitations when system calls haven'ttransitioned to user code, BPF helpers may return placeholder values,and JIT compilers require additional SFrame support.
Looking ahead, hardware-assisted unwinding through features like x86Shadow Stack and AArch64 Guarded Control Stack may reshape the entirelandscape, potentially reducing the relevance of metadata-basedunwinding formats. Meanwhile, compact unwinding schemes like .eh_frame. There isa feature request for a compact information for AArch64
If we proceed, here ishow to do it right
According to
To ensure rapid SFrame evolution without compatibility concerns, abetter approach is to build a library that parses .eh_frameand generates SFrame. The Linux kernel can then use this library (inobjtool?) to generate SFrame for vmlinux and modules. Relying onassembler/linker output for this critical metadata format requires alevel of stability that is currently concerning.
The ongoing maintenance implications warrant particular attention.Observing the binutils mailing list reveals a significant volume ofSFrame commits. Most linker features stabilize quickly after initialimplementation, but SFrame appears to require continued evolution. Giventhe linker's foundational role in the build process, which demandsexceptional stability and robustness, the long-term maintenance burdendeserves careful consideration.
Early integration into GNU toolchain has provided valuable feedbackfor format evolution, but this comes at the cost of coupling theformat's maturity to linker stability. The SFrame GNU toolchaindevelopers exhibit a