Remarks on SFrame
The .sframe
format is a lightweight alternative to.eh_frame
and .eh_frame_hdr
designed forefficient stackunwinding. By trading some functionality and flexibility forcompactness, SFrame achieves significantly smaller size whilemaintaining the essential unwinding capabilities needed byprofilers.
SFrame focuses on three fundamental elements for each function:
- Canonical Frame Address (CFA): The base address for stack framecalculations
- Return address
- Frame pointer
An .sframe
section follows a straightforward layout:
- Header: Contains metadata and offset information
- Auxiliary header (optional): Reserved for future extensions
- Function Descriptor Entries (FDEs): Array describing eachfunction
- Frame Row Entries (FREs): Arrays of unwinding information perfunction
1 |
struct [[gnu::packed]] sframe_header { |
While magic is popular choices for file formats, they deviate fromestablished ELF conventions, which simplifies utilizes the section typefor distinction.
The version field resembles the similar uses within DWARF sectionheaders. SFrame will likely evolve over time, unlike ELF's more stablecontrol structures. This means we'll probably need to keep producers andconsumers evolving in lockstep, which creates a stronger case forinternal versioning. An internal version field would allow linkers toupgrade or ignore unsupported low-version input pieces, providing moreflexibility in handling version mismatches.
Data structures
Function Descriptor Entries(FDEs)
Function Descriptor Entries serve as the bridge between functions andtheir unwinding information. Each FDE describes a function's locationand provides a direct link to its corresponding Frame Row Entries(FREs), which contain the actual unwinding data.
1 |
struct [[gnu::packed]] sframe_func_desc_entry { |
The current design has room for optimization. Thesfde_func_num_fres
field uses a full 32 bits, which iswasteful for most functions. We could use uint16_t
instead,requiring exceptionally large functions to be split across multipleFDEs.
It's important to note that SFrame's function concept represents coderanges rather than logical program functions. This distinction becomesparticularly relevant with compiler optimizations like hot-coldsplitting, where a single logical function may span multiplenon-contiguous code ranges, each requiring its own FDE.
The padding field sfde_func_padding2
representsunnecessary overhead in modern architectures where unaligned memoryaccess performs efficiently, making the alignment benefitsnegligible.
To enable binary search on sfde_func_start_address
, FDEsmust maintain a fixed size, which precludes the use of variable-lengthinteger encodings like PrefixVarInt.
Frame Row Entries (FREs)
Frame Row Entries contain the actual unwinding information forspecific program counter ranges within a function. The template designallows for different address sizes based on the function'scharacteristics.
1 |
template <class AddrType> |
Each FRE contains variable-length stack offsets stored as trailingdata. The fre_offset_size
field determines whether offsetsuse 1, 2, or 4 bytes (uint8_t
, uint16_t
, oruint32_t
), allowing optimal space usage based on stackframe sizes.
Architecture-specific stackoffsets
SFrame adapts to different processor architectures by varying itsoffset encoding to match their respective calling conventions andarchitectural constraints.
x86-64
The x86-64 implementation takes advantage of the architecture'spredictable stack layout:
- First offset: Encodes CFA as
BASE_REG + offset
- Second offset (if present): Encodes FP as
CFA + offset
- Return address: Computed implicitly as
CFA + sfh_cfa_fixed_ra_offset
(using the header field)
AArch64
AArch64's more flexible calling conventions require explicit returnaddress tracking:
- First offset: Encodes CFA as
BASE_REG + offset
- Second offset: Encodes return address as
CFA + offset
- Third offset (if present): Encodes FP as
CFA + offset
The explicit return address encoding accommodates AArch64's variablestack layouts and link register usage patterns.
s390x
TODO
.eh_frame
and.sframe
SFrame reduces size compared to .eh_frame
plus.eh_frame_hdr
by:
- Eliminating
.eh_frame_hdr
through sortedsfde_func_start_address
fields - Replacing CIE pointers with direct FDE-to-FRE references
- Using variable-width
sfre_start_address
fields (1 or 2bytes) for small functions - Storing start addresses instead of address ranges.
.eh_frame
address ranges - Start addresses in a small function use 1 or 2 byte fields, moreefficient than
.eh_frame
initial_location, which needs atleast 4 bytes (DW_EH_PE_sdata4
). - Hard-coding stack offsets rather than using flexible registerspecifications
However, the bytecode design of .eh_frame
can sometimesbe more efficient than .sframe
, as demonstrated onx86-64.
SFrame serves as a specialized complement to .eh_frame
rather than a complement replacement. The current version does notinclude personality routines, Language Specific Data Area (LSDA)information, or the ability to encode extra callee-saved registers.While these constraints make SFrame ideal for profilers and debuggers,they prevent it from supporting C++ exception handling, wherelibstdc++/libc++abi requires the full .eh_frame
featureset.
In practice, executables and shared objects will likely contain allthree sections:
-
.eh_frame
: Complete unwinding information for exceptionhandling -
.eh_frame_hdr
: Fast lookup table for.eh_frame
-
.sframe
: Compact unwinding information forprofilers
The auxiliary header, currently unused, provides a pathway for futureenhancements. It could potentially accommodate .eh_frame
augmentation data such as personality routines, language-specific dataareas (LSDAs), and signal frame handling, bridging some of the currentfunctionality gaps.
Large text section support
The sfde_func_start_address
field uses a signed 32-bitoffset to reference functions, providing a ±2GB addressing range fromthe field's location. This signed encoding offers flexibility in sectionordering-.sframe
can be placed either before or after textsections.
However, this approach faces limitations with large binaries,particularly when LLVM generates .ltext
sections forx86-64. The typical section layout creates significant gaps between.sframe
and .ltext
:
1 |
.ltext // Large text section |
Linking and execution views
SFrame employs a unified indexed format across both relocatable files(linking view) and executable files (execution view). While this designconsistency appears elegant, it introduces significant complications intoolchain implementation.
Currently, Binutils enforces a single-element structure within each.sframe
section, regardless of whether it resides in arelocatable object or final executable. This approach differs from DWARFsections, which support multiple concatenated elements, each with itsown header and body.
This design choice stems from Linux kernel requirements, where kernelmodules are relocatable files created with ld -r
. Thekernel's SFrame support expects each module to contain a single indexedformat for efficient runtime processing. Consequently, GNU ld merges allinput .sframe
sections into a single indexed element, evenwhen producing relocatable files. This behavior deviates from standardrelocatable linkingconventions that suppress synthetic section finalization.
The fundamental design issue lies in making linker merging mandatory.For optimal portability, unwinders should support multiple-elementstructures within a .sframe
section. When a linker buildsan index for .sframe
, it should be viewed as anoptimization that relieves the unwinder from constructing its own indexat runtime. This index construction should remain optional rather thanrequired. While the SFRAME_F_FDE_SORTED
flag can be clearedto permit unsorted FDEs, current unwinder implementations do not seem tosupport multiple elements in a single section.
A future version should distinguish between linking and executionviews:
- Linking view: Assemblers produce a simpler format, omittingindex-specific metadata fields
- Linkers concatenate
.sframe
input sections by default,consistent with DWARF and other metadata sections - A new
--sframe-index
option enables linkers tosynthesize a.sframe_idx
section containing the indexedformat, analogous to--gdb-index
and--debug-names
. The linker builds.sframe_idx
from input.sframe
sections. Tosupport the Linux kernel workflow (ld -r
for kernelmodules),ld -r --sframe-index
must also generate theindexed format. - Linker scripts control placement using:
.sframe_idx : { *(.sframe_idx) }
. From the linkerperspective,.sframe
input sections have been replaced bythe linker-synthesized.sframe_idx
. This output sectiondescription places the.sframe_idx
into the.sframe_idx
output section.
The linking view could omit index-specific metadata fields such assfh_num_fdes
, sfh_num_fres
,sfh_fdeoff
, and sfh_freoff
.
The .debug_pubnames
/.gdb_index
designprovides an excellent model for separate linking and execution views.While DWARF v5's .debug_names
unifies both views at thecost of larger linking formats, it represents a reasonable tradeoffsince relocatable files contain only a single .debug_names
section, and debuggers can efficiently load sections with concatenatedname tables.
Section group complianceissues
The current monolithic .sframe
design creates ELFspecification violations when dealing with .sframe
section containing relocations to STB_LOCAL
symbols frommultiple text sections, including those in different section groups.
This violates the ELF section group rule, which states:
A symbol table entry with
STB_LOCAL
binding that isdefined relative to one of a group's sections, and that is contained ina symbol table section that is not part of the group, must be discardedif the group members are discarded. References to this symbol tableentry from outside the group are not allowed.
The problem manifests when inline functions are deduplicated:
1 |
cat > a.cc <<'eof' |
Linkers correctly reject this violation:
1 |
% ld.lld a.o b.o |
(In 2020, I reported a similarissue for GCC -fpatchable-function-entry=
.)
Some linkers don't implement this error check. A separate issuearises with garbage collection: by default, an unreferenced.sframe
section will be discarded. If the linker implementsa workaround to force-retain .sframe
, it mightinadvertently retain all text sections referenced by.sframe
, even those that would otherwise be garbagecollected.
The solution requires restructuring the assembler's output strategy.Instead of creating a monolithic .sframe
section, theassembler should generate individual SFrame sections corresponding toeach text section. When a text section belongs to a COMDAT group, itsassociated SFrame section must join the same group. For standalone textsections, the SHF_LINK_ORDER
flag should establish theproper association.
This approach would create multiple SFrame sections withinrelocatable files, making the size optimization benefits of a simplifiedlinking view format even more compelling. While this comes with theoverhead of additional section headers (where eachElf64_Shdr
consumes 64 bytes), it's a cost we should pay tobe a good ELF citizen. This reinforces the value of my
Linker relaxationconsiderations
Since .sframe
carries the SHF_ALLOC
flag,it affects text section addresses and consequently influences
If variable-length encoding is introduced to the format,.sframe
would behave as an address-dependent sectionsimilar to .relr.dyn
. However, this dependency should notpose significant implementation challenges.
Linker complexity
Endianness considerations
The SFrame format currently supports endianness variants, whichcomplicates toolchain implementation. While runtime consumers typicallytarget a single endianness, development tools must handle both variantsto support cross-compilation workflows.
The endianness discussion in
- Endianness-aware function calls like
read32le(config, p)
whereconfig->endian
specifies the object file's byte order - Template-based abstractions such as
template <class Endian>
that must wrap every dataaccess function
Instead, toolchain code could use straightforward calls likeread32le(p)
, streamlining both implementation andmaintenance.
This approach remains efficient even on big-endian architectures likeIBM z/Architecture and POWER. z/Architecture's LOAD REVERSEDinstructions, for instance, handle byte swapping with minimal overhead,often requiring no additional instructions beyond normal loads. Whileslight performance differences may exist compared to native endianoperations, the toolchain simplification benefits generally outweighthese concerns.
1 |
#define WIDTH(x) \ |
However, I understand that my opinion is probably not popular withinthe object file format community and faces resistance from stakeholderswith significant big-endian investments.
Questioned benefits
SFrame's primary value proposition centers on enabling frame pointeromission while preserving unwinding capabilities. In scenarios whereusers already omit leaf frame pointers, SFrame could theoretically allowswitching from-fno-omit-frame-pointer -momit-leaf-frame-pointer
to-fomit-frame-pointer -momit-leaf-frame-pointer
. Thisbenefit appears most significant on x86-64, which has limitedgeneral-purpose registers (without APX). Performance analyses show mixedresults: some studies claim frame pointers degrade performance by lessthan 1%, while others suggest 1-2%. However, this argument overlooks acritical tradeoff—SFrame unwinding itself performs worse than framepointer unwinding, potentially negating any performance gains fromregister availability.
Another claimed advantage is SFrame's ability to provide coverage infunction prologues and epilogues, where frame-pointer-based unwindingmay miss frames. Yet this overlooks a straightforward alternative: framepointer unwinding can be enhanced to detect prologue and epiloguepatterns by disassembling instructions at the program counter. Nocomparative analysis exists between this enhancement approach andSFrame's solution.
SFrame also faces a practical consideration: the .sframe
section likely requires kernel page-in during unwinding, while theprocess stack is more likely already resident in physical memory.
Looking ahead, hardware-assisted unwinding through features like x86Shadow Stack and AArch64 Guarded Control Stack may reshape the entirelandscape, potentially reducing the relevance of metadata-basedunwinding formats.
Summary
SFrame represents a pragmatic approach to stack unwinding thatachieves significant size reductions by trading flexibility forcompactness. Its design presents several implementation challenges thatmerit consideration for future versions.
- The unified linking/execution view complicates toolchainimplementation without clear benefits
- Section group compliance issues create significant concerns forlinker developers
- Limited large text section support restricts deployment in modernbinaries
- Uncertainty remains about SFrame's viability as a complete
.eh_frame
replacement
Beyond these implementation concerns, SFrame faces broader ecosystemchallenges. As Ian Rogers noted in
The format's future also depends on evolving unwinding strategies.Frame pointer unwinding could potentially be enhanced to detect prologueand epilogue patterns, though comprehensive comparisons with SFrameremain absent from current literature.