Stack walking: space and time trade-offs
On most Linux platforms (except AArch32, which uses.ARM.exidx), DWARF .eh_frame is required forC++ exceptionhandling and stackunwinding to restore callee-saved registers. While.eh_frame can be used for call trace recording, it is oftencriticized for its runtime overhead. As an alternative, developers canenable frame pointers, or adopt SFrame, a newer format designedspecifically for profiling. This article examines the size overhead ofenabling non-DWARF stack walking mechanisms when building several LLVMexecutables.
Runtime performance analysis will be added in a future update.
Stack walking mechanisms
Here is a survey of mechanisms available for x86-64:
- Frame pointers: fast but costs a register
- DWARF
.eh_frame: comprehensive but slower, supportsadditional features like C++ exception handling - SFrame: a new format being developed, profiling only.
.eh_frameis still needed for debugging and C++ exceptionhandling. Check out Remarkson SFrame for details. - x86 Last Branch Record (LBR): Skylake increased the LBR stack sizeto 32. Supported by AMD Zen 4 as
LastBranch Record Extension Version 2 (LbrExtV2) - Apple'sCompact Unwinding Format: This has llvm, lld/MachO, and libunwindimplementation. Supports x86-64 and AArch64. This can mostly replaceDWARF CFI, but some entries need DWARF escape.
- OpenVMS's Compact Unwinding Format: This modifies Apple's CompactUnwinding Format.
Space overhead analysis
Frame pointer size impact
For most architectures, GCC defaults to-fomit-frame-pointer in -O compilation to freeup a register for general use. To enable frame pointers, specify-fno-omit-frame-pointer, which reserves the frame pointerregister (e.g., rbp on x86-64) and emits push/popinstructions in function prologues/epilogues.
For leaf functions (those that don't call other functions), while theframe pointer register should still be reserved for consistency, thepush/pop operations are often unnecessary. Compilers provide-momit-leaf-frame-pointer (with target-specific defaults)to reduce code size.
The viability of this optimization depends on the targetarchitecture:
- On AArch64, the return address is available in the link register(X30). The immediate caller can be retrieved by inspecting X30, so
-momit-leaf-frame-pointerdoes not compromiseunwinding. - On x86-64, after the prologue instructions execute, the returnaddress is stored at RSP plus an offset. An unwinder needs to know thestack frame size to retrieve the return address, or it must utilizeDWARF information for the leaf frame and then switch to the FP chain forparent frames.
Beyond this architectural consideration, there are additionalpractical reasons to use -momit-leaf-frame-pointer onx86-64:
- Many hand-written assembly implementations (including numerous glibcfunctions) don't establish frame pointers, creating gaps in the framepointer chain anyway.
- In the prologue sequence
push rbp; mov rbp, rsp, afterthe first instruction executes, RBP does not yet reference the currentstack frame. When shrink-wrapping optimizations are enabled, theinstruction region where RBP still holds the old value becomes larger,increasing the window where the frame pointer is unreliable.
Given these trade-offs, three common configurations have emerged:
- omitting FP:
-fomit-frame-pointer -momit-leaf-frame-pointer(smallestoverhead) - reserving FP, but removing FP push/pop for leaf functions:
-fno-omit-frame-pointer -momit-leaf-frame-pointer(framepointer chain omitting the leaf frame) - reserving FP:
-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer(complete frame pointer chain, largest overhead)
The size impact varies significantly by program. Here's a section_size.rb that compares section sizes:
1 |
% ~/Dev/unwind-info-size-analyzer/section_size.rb /tmp/out/custom-{none,nonleaf,all}/bin/{llvm-mc,opt} |
For instance, llvm-mc is dominated by read-only data,making the relative .text percentage quite small, so framepointer impact on the VM size is minimal. ("VM size" is a metric used bybloaty, representing the total p_memsz size ofPT_LOAD segments, excluding llvm-mc grows larger as morefunctions set up the frame pointer chain. However, optactually becomes smaller when -fno-omit-frame-pointer isenabled—a counterintuitive result that warrants explanation.
Without frame pointer, the compiler uses RSP-relative addressing toaccess stack objects. When using the register-indirect + disp8/disp32addresing mode, RSP needs an extra SIB byte while RBP doesn't. Forlarger functions accessing many local variables, the savings fromshorter RBP-relative encodings can outweigh the additionalpush rbp; mov rbp, rsp; pop rbp instructions in theprologues/epilogues.
1 |
% echo 'mov rax, [rsp+8]; mov rax, [rbp-8]' | /tmp/Rel/bin/llvm-mc -x86-asm-syntax=intel -output-asm-variant=1 -show-encoding |
SFrame vs .eh_frame
Oracle is advocating for SFrame adoption in Linux distributions. TheSFrame implementation is handled by the assembler and linker rather thanthe compiler. Let's build the latest binutils-gdb to test it.
Building test program
We'll use the clang compiler from
There are still issues related to garbage collection (-Wl,--gc-sections.
1 |
--- i/llvm/cmake/modules/AddLLVM.cmake |
1 |
configure-llvm custom-sframe -DLLVM_TARGETS_TO_BUILD=host -DLLVM_ENABLE_PROJECTS='clang' -DLLVM_ENABLE_UNWIND_TABLES=on -DLLVM_ENABLE_LLD=off -DCMAKE_{EXE,SHARED}_LINKER_FLAGS=-fuse-ld=bfd -DCMAKE_C_COMPILER=$HOME/opt/gcc-15/bin/gcc -DCMAKE_CXX_COMPILER=$HOME/opt/gcc-15/bin/g++ -DCMAKE_C_FLAGS="-B$HOME/opt/binutils/bin -Wa,--gsframe" -DCMAKE_CXX_FLAGS="-B$HOME/opt/binutils/bin -Wa,--gsframe" |
1 |
% ~/Dev/bloaty/out/release/bloaty /tmp/out/custom-sframe/bin/clang |
The results show that .sframe (8.87 MiB) isapproximately 10% larger than the combined size of.eh_frame and .eh_frame_hdr (7.07 + 0.99 =8.06 MiB). While SFrame is designed for efficiency during stack walking,it carries a non-trivial space overhead compared to traditional DWARFunwind information.
SFrame vs FP
Having examined SFrame's overhead compared to .eh_frame,let's now compare the two primary approaches for non-hardware-assistedstack walking.
-
Frame pointer approach: Reserve FP but omitpush/pop for leaf functions
g++ -fno-omit-frame-pointer -momit-leaf-frame-pointer -
SFrame approach: Omit FP and use SFrame metadata
g++ -fomit-frame-pointer -momit-leaf-frame-pointer -Wa,--gsframe
To conduct a fair comparison, we build LLVM executables using bothapproaches with both Clang and GCC compilers. The following scriptconfigures and builds test binaries with each combination:
1 |
#!/bin/zsh |
The results reveal interesting differences between compilerimplementations:
1 |
% ~/Dev/unwind-info-size-analyzer/section_size.rb /tmp/out/custom-{fp,sframe,fp-gcc,sframe-gcc}/bin/{llvm-mc,opt} |
- SFrame incurs a significant VM size increase.
- GCC-built binaries are significantly larger than their Clangcounterparts, probably due to more aggressive inlining or vectorizationstrategies.
With Clang-built binaries, the frame pointer configuration produces asmaller opt executable (55.6 MiB) compared to the SFrameconfiguration (62.5 MiB). This reinforces our earlier observation thatRBP addressing can be more compact than RSP-relative addressing forlarge functions with frequent local variable accesses.
Assembly comparison reveals that functions using RBP and RSPaddressing produce quite similar code.
In contrast, GCC-built binaries show the opposite trend: the framepointer version of opt (70.0 MiB) is smaller than theSFrame version (76.2 MiB).
The generated assembly differs significantly between omit-FP andnon-omit-FP builds, I have compared symbol sizes between two GCC builds.
1
nvim -d =(/tmp/Rel/bin/llvm-nm -U --size-sort /tmp/out/custom-fp-gcc/bin/llvm-mc) =(/tmp/Rel/bin/llvm-nm -U --size-sort /tmp/out/custom-sframe-gcc/bin/llvm-mc)
Many functions, such as_ZN4llvm15ELFObjectWriter24executePostLayoutBindingEv, havesignificant more instructions in the keep-FP build. This suggests thatGCC's frame pointer code generation may not be as optimized as itsdefault omit-FP path.
Runtime performance analysis
TODO
perf record overhead with EH
perf record overhead with FP
Summary
This article examines the space overhead of different stack walkingmechanisms when building LLVM executables.
Frame pointer configurations: Enabling framepointers (-fno-omit-frame-pointer) can paradoxically reducex86-64 binary size when stack object accesses are frequent. This occursbecause RBP-relative addressing produces more compact encodings thanRSP-relative addressing, which requires an extra SIB byte. The savingsfrom shorter instructions can outweigh the prologue/epilogueoverhead.
SFrame vs .eh_frame: For the x86-64clang executable, SFrame metadata is approximately 10%larger than the combined size of .eh_frame and.eh_frame_hdr. Given the significant VM size overhead andthe lack of clear advantages over established alternatives, I amskeptical about SFrame's viability as the future of stack walking foruserspace programs. While SFrame will receive a major revision V3 in theupcoming months, it needs to achieve substantial size reductionscomparable to existing compact unwinding schemes to justify its adoptionover frame pointers. I hope interested folks can implement somethingsimilar to macOS's compact unwind descriptors (with x86-64 support) andOpenVMS's.
GCC's frame pointer code generation appears less optimized than itsdefault omit-frame-pointer path, as evidenced by substantial differencesin generated assembly.
Runtime performance analysis remains to be conducted to complete thetrade-off evaluation.
Appendix:configure-llvm
This script specifies common options when configuring llvm-project:
-
-DCMAKE_CXX_ARCHIVE_CREATE="$HOME/Stable/bin/llvm-ar qc --thin <TARGET> <OBJECTS>" -DCMAKE_CXX_ARCHIVE_FINISH=::Use thin archives to reduce disk usage -
-DLLVM_TARGETS_TO_BUILD=host: Build a singletarget -
-DCLANG_ENABLE_OBJC_REWRITER=off -DCLANG_ENABLE_STATIC_ANALYZER=off:Disable less popular components -
-DLLVM_ENABLE_PLUGINS=off -DCLANG_PLUGIN_SUPPORT=off:Disable-Wl,--export-dynamic, preventing large.dynsymand.dynstrsections
Appendix: My SFrame build
1 |
mkdir -p out/release && cd out/release |
gcc -B$HOME/opt/binutils/bin andclang -B$HOME/opt/binutils/bin -fno-integrated-as will useas and ld from the install directory.
Appendix: Scripts
Ruby scripts used by this post are available at