一个有趣的现象是,在肯定与忧虑并存的情绪中,讨论的焦点悄然发生了变化:从“为什么要花钱买 Skip 而不是用免费的 KMP?”变成了“你是更喜欢写 Swift 还是 Kotlin?”。这个转变意义重大——它意味着 Skip 已经成功将竞争维度从“商业模式”转移到了“语言生态”,而参与竞争的主体也从“一家小公司”扩展为“整个 Swift 社区”。在这场语言之争中,Skip 聪明地(或许是无意间)将自己的角色从“主角”变成了“基础设施”。
Skip 最早的出发点是看到了一个商业机会,认为有足够的商业回报。如果不是有这样的预期,仅靠官方或社区的力量,Swift 可能无法快速推进在其他平台上的进展。这一路径与 The Browser Company 为了打造 Arc 浏览器而大力推动 Swift 在 Windows 平台上的适配如出一辙。
作为一个 Swift 开发者,我由衷希望 Skip 的这次调整能够取得预期效果。开源是一场信任的实验,也是一次生态的投资。如果你也期待 Swift 能在 iOS 之外的平台上拥有更多可能,不妨从成为一个独立赞助者($10/月)做起——这不仅是对 Skip 的支持,更是对整个 Swift 跨平台生态的投票。开源的 Skip 能走多远,取决于有多少人愿意从旁观者变成参与者。
Swift 6 为并发引入了许多新功能与关键字。虽然其中不少内容在日常开发中可能鲜少用到,但一旦遭遇特定场景,若对这些新概念缺乏了解,即便有 AI 辅助也可能陷入僵局。本文将通过一个在开发测试中遇到的实际并发问题,来介绍如何利用 @isolated(any) 以及 #isolation 宏,实现函数的隔离域继承,从而让编译器自动推断闭包的运行环境。
Swift 虽然生来就兼容 C,但直接调用 C API 往往体验不佳——开发者通常需要面对裸指针、手动内存管理以及不符合 Swift 命名习惯的函数。为了获得“原生”体验,开发者不得不编写和维护繁琐的 Wrapper 层。在这篇文章中,Doug Gregor 详细介绍了如何利用 API Notes 和 Clang Attributes 机制,在不修改 C 库实现代码的前提下,“指导” Swift 编译器生成符合 Swift 风格的接口。
这项改进的意义在于,它将“封装 C 库”的成本从逻辑层(编写 Swift 胶水代码)转移到了声明层(编写 API Notes 或添加头文件注解)。这对于 Embedded Swift 的普及至关重要,因为嵌入式开发高度依赖现有的 C 库生态。随着工具链的完善,未来开发者可能只需要给 C 库加几个注解,就能直接在 Swift 中像使用原生库一样调用它。文章还提供了基于正则表达式的自动化脚本,可以为结构化的 C 头文件批量生成 API Notes。
对于大多数 iOS 开发者来说,Yocto 可能比较陌生,但它是嵌入式 Linux 领域的事实标准——它能够精确控制系统中包含的每一个组件。随着 Embedded Swift 的推进,如何在 Yocto 生态中集成 Swift 成为了一个关键问题。Jesse L. Zamora 详细演示了如何使用近期重获更新的 meta-swift 在 Yocto 系统中构建 Swift 运行环境。文章以 Raspberry Pi Zero 2 为例,从 Docker 环境搭建、Yocto 构建、镜像烧录到实际运行,演示了完整的工作流。
✅ didStartProvisionalNavigation
✅ decidePolicyForNavigationAction
❌ didFailProvisionalNavigation: "A server with the specified hostname could not be found."
🔴 类型 2:Commit 后失败
触发方法:didFailNavigation:withError: 典型原因:
SSL/TLS 证书无效或过期(iOS 默认拦截)
服务器在传输中途断开连接
✅ didStartProvisionalNavigation
✅ decidePolicyForNavigationAction
✅ didCommitNavigation
❌ didFailNavigation: "The certificate for this server is invalid."
/* rl is locked, rlm is locked on entrance and exit */static Boolean __CFRunLoopDoSources0(CFRunLoopRef rl, CFRunLoopModeRef rlm, Boolean stopAfterHandle) __attribute__((noinline));
Apple 从 Swift 5.6 开始引入新的 any 关键字,并在 Swift 5.7 对其做了功能强化。这在星际联邦被称为“存在类型(Existential Types)”的终极解放。这意味着现在我们可以更加随心所欲地糅合异构数据了——就像把激光剑(TextFile)和力场盾(ShapeFile)扔进同一个仓库里。
// 添加到特定 modeCFRunLoopPerformBlock(runLoop, kCFRunLoopDefaultMode, ^{
NSLog(@"Execute in default mode only");
});
// 添加到 common modesCFRunLoopPerformBlock(runLoop, kCFRunLoopCommonModes, ^{
NSLog(@"Execute in all common modes");
});
匹配规则:
精确匹配:block.mode == currentMode
Common modes 匹配:block.mode == kCFRunLoopCommonModes && currentMode ∈ commonModes
/* rl is locked, rlm is locked on entrance and exit */staticvoid __CFRunLoopDoObservers(CFRunLoopRef, CFRunLoopModeRef, CFRunLoopActivity) __attribute__((noinline));
Branch instructions on most architectures use PC-relative addressingwith a limited range. When the target is too far away, the branchbecomes "out of range" and requires special handling.
Consider a large binary where main() at address 0x10000calls foo() at address 0x8010000-over 128MiB away. OnAArch64, the bl instruction can only reach ±128MiB, so thiscall cannot be encoded directly. Without proper handling, the linkerwould fail with an error like "relocation out of range." The toolchainmust handle this transparently to produce correct executables.
This article explores how compilers, assemblers, and linkers worktogether to solve the long branch problem.
Compiler (IR to assembly): Handles branches within a function thatexceed the range of conditional branch instructions
Assembler (assembly to relocatable file): Handles branches within asection where the distance is known at assembly time
Linker: Handles cross-section and cross-object branches discoveredduring final layout
Branch range limitations
Different architectures have different branch range limitations.Here's a quick comparison of unconditional / conditional branchranges:
Architecture
Cond
Uncond
Call
Notes
AArch64
±1MiB
±128MiB
±128MiB
Thunks
AArch32 (A32)
±32MiB
±32MiB
±32MiB
Thunks, interworking
AArch32 (T32)
±1MiB
±16MiB
±16MiB
Thunks, interworking
LoongArch
±128KiB
±128MiB
±128MiB
Linker relaxation
M68k (68020+)
±2GiB
±2GiB
±2GiB
Assembler picks size
MIPS (pre-R6)
±128KiB
±128KiB (b offset)
±128KiB (bal offset)
In -fno-pic code, pseudo-absolutej/jal can be used for a 256MiB region.
MIPS R6
±128KiB
±128MiB
±128MiB
PowerPC64
±32KiB
±32MiB
±32MiB
Thunks
RISC-V
±4KiB
±1MiB
±1MiB
Linker relaxation
SPARC
±1MiB
±8MiB
±2GiB
No thunks needed
SuperH
±256B
±4KiB
±4KiB
Use register-indirect if needed
x86-64
±2GiB
±2GiB
±2GiB
Large code model changes call sequence
Xtensa
±2KiB
±128KiB
±512KiB
Linker relaxation
z/Architecture
±64KiB
±4GiB
±4GiB
No thunks needed
The following subsections provide detailed per-architectureinformation, including relocation types relevant for linkerimplementation.
AArch32
In A32 state:
Branch (b/b<cond>), conditionalbranch and link (bl<cond>)(R_ARM_JUMP24): ±32MiB
Unconditional branch and link (bl/blx,R_ARM_CALL): ±32MiB
Note: R_ARM_CALL is for unconditionalbl/blx which can be relaxed to BLX inline;R_ARM_JUMP24 is for branches which require a veneer forinterworking.
The compiler's BranchRelaxation pass handlesout-of-range conditional branches by inverting the condition andinserting an unconditional branch. The AArch64 assembler does notperform branch relaxation; out-of-range branches produce linker errorsif not handled by the compiler.
Medium range call (pcaddu12i+jirl,R_LARCH_CALL30): ±2GiB
Long range call (pcaddu18i+jirl,R_LARCH_CALL36): ±128GiB
M68k
Short branch(Bcc.B/BRA.B/BSR.B): ±128 bytes(8-bit displacement)
Word branch(Bcc.W/BRA.W/BSR.W): ±32KiB(16-bit displacement)
Long branch(Bcc.L/BRA.L/BSR.L, 68020+):±2GiB (32-bit displacement)
GNU Assembler provides pseudoopcodes (jbsr, jra, jXX) that"automatically expand to the shortest instruction capable of reachingthe target". For example, jeq .L0 emits one ofbeq.b, beq.w, and beq.l dependingon the displacement.
With the long forms available on 68020 and later, M68k doesn't needlinker range extension thunks.
Pseudo-absolute jump/call (j/jal,R_MIPS_26): branch within the current 256MiB region, onlysuitable for -fno-pic code. Deprecated in R6 in favor ofbc/balc
The Gocompiler emits a single jal for calls and relies on itslinker to generate trampolines when the target is out of range.
In contrast, GCC and Clang emit auipc+jalrand rely on linker relaxation to shrink the sequence when possible.
The jal range (±1MiB) is notably smaller than other RISCarchitectures (AArch64 ±128MiB, PowerPC64 ±32MiB, LoongArch ±128MiB).This limits the effectiveness of linker relaxation ("start large andshrink"), and leads to frequent trampolines when the compileroptimistically emits jal ("start small and grow").
SPARC
Compare and branch (cxbe, R_SPARC_5): ±64bytes
Conditional branch (bcc, R_SPARC_WDISP19):±1MiB
Unconditional branch (b, R_SPARC_WDISP22):±8MiB
call(R_SPARC_WDISP30/R_SPARC_WPLT30): ±2GiB
With ±2GiB range for call, SPARC doesn't need rangeextension thunks in practice.
SuperH
SuperH uses fixed-width 16-bit instructions, which limits branchranges.
Branch to subroutine (bsr): ±4KiB (12-bitdisplacement)
For longer distances, register-indirect branches(braf/bsrf) are used. The compiler invertsconditions and emits these when targets exceed the short ranges.
SuperH is supported by GCC and binutils, but not by LLVM.
Xtensa
Xtensa uses variable-length instructions: 16-bit (narrow,.n suffix) and 24-bit (standard).
Narrow conditional branch (beqz.n/bnez.n,16-bit): -28 to +35 bytes (6-bit signed + 4)
Conditional branch (compare two registers)(beq/bne/blt/bge/etc,24-bit): ±256 bytes
Conditional branch (compare with zero)(beqz/bnez/bltz/bgez,24-bit): ±2KiB
Unconditional jump (j, 24-bit): ±128KiB
Call(call0/call4/call8/call12,24-bit): ±512KiB
The assembler performs branch relaxation: when a conditional branchtarget is too far, it inverts the condition and inserts a jinstruction.
Per https://www.sourceware.org/binutils/docs/as/Xtensa-Call-Relaxation.html,for calls, GNU Assembler pessimistically generates indirect sequences(l32r+callx8) when the target distance isunknown. GNU ld then performs linker relaxation.
x86-64
Short conditional jump (Jcc rel8): -128 to +127bytes
Short unconditional jump (JMP rel8): -128 to +127bytes
Near conditional jump (Jcc rel32): ±2GiB
Near unconditional jump (JMP rel32): ±2GiB
With a ±2GiB range for near jumps, x86-64 rarely encountersout-of-range branches in practice. That said, Google and Meta Platformsdeploy mostly statically linked executables on x86-64 production serversand have run into the huge executable problem for certainconfigurations.
z/Architecture
Short conditional branch (BRC,R_390_PC16DBL): ±64KiB (16-bit halfword displacement)
Long conditional branch (BRCL,R_390_PC32DBL): ±4GiB (32-bit halfword displacement)
Short call (BRAS, R_390_PC16DBL):±64KiB
Long call (BRASL, R_390_PC32DBL):±4GiB
With ±4GiB range for long forms, z/Architecture doesn't need linkerrange extension thunks. LLVM's SystemZLongBranch passrelaxes short branches (BRC/BRAS) to longforms (BRCL/BRASL) when targets are out ofrange.
Compiler: branch rangehandling
Conditional branch instructions usually have shorter ranges thanunconditional ones, making them less suitable for linker thunks (as wewill explore later). Compilers typically keep conditional branch targetswithin the same section, allowing the compiler to handle out-of-rangecases via branch relaxation.
Within a function, conditional branches may still go out of range.The compiler measures branch distances and relaxes out-of-range branchesby inverting the condition and inserting an unconditional branch:
1 2 3 4 5 6 7
# Before relaxation (out of range) beq .Lfar_target # ±4KiB range on RISC-V
# After relaxation bne .Lskip # Inverted condition, short range j .Lfar_target # Unconditional jump, ±1MiB range .Lskip:
Some architectures have conditional branch instructions that comparewith an immediate, with even shorter ranges due to encoding additionalimmediates. For example, AArch64's cbz/cbnz(compare and branch if zero/non-zero) andtbz/tbnz (test bit and branch) have only±32KiB range. RISC-V Zibi beqi/bnei have ±4KiBrange. The compiler handles these in a similar way:
1 2 3 4 5 6 7
// Before relaxation (cbz has ±32KiB range) cbz w0, far
// After relaxation cbnz w0, .Lskip // Inverted condition b far // Unconditional branch, ±128MiB range .Lskip:
An Intel employee contributed https://reviews.llvm.org/D41634 (in 2017) when inversionof a branch condintion is impossible. This is for an out-of-treebackend. As of Jan 2026 there is no in-tree test for this code path.
In LLVM, this is handled by the BranchRelaxation pass,which runs just before AsmPrinter. Different backends havetheir own implementations:
BranchRelaxation: AArch64, AMDGPU, AVR, RISC-V
HexagonBranchRelaxation: Hexagon
PPCBranchSelector: PowerPC
SystemZLongBranch: SystemZ
MipsBranchExpansion: MIPS
MSP430BSel: MSP430
The generic BranchRelaxation pass computes block sizesand offsets, then iterates until all branches are in range. Forconditional branches, it tries to invert the condition and insert anunconditional branch. For unconditional branches that are still out ofrange, it calls TargetInstrInfo::insertIndirectBranch toemit an indirect jump sequence (e.g.,adrp+add+br on AArch64) or a longjump sequence (e.g., pseudo jump on RISC-V).
Note: The size estimates may be inaccurate due to inline assembly.LLVM uses heuristics to estimate inline assembly sizes, but for certainassembly constructs the size is not precisely known at compile time.
Unconditional branches and calls can target different sections sincethey have larger ranges. If the target is out of reach, the linker caninsert thunks to extend the range.
For x86-64, the large code model uses multiple instructions for callsand jumps to support text sections larger than 2GiB (see Relocationoverflow and code models: x86-64 large code model). This is apessimization if the callee ends up being within reach. Google and MetaPlatforms have interest in allowing range extension thunks as areplacement for the multiple instructions.
Assembler: instructionrelaxation
The assembler converts assembly to machine code. When the target of abranch is within the same section and the distance is known at assemblytime, the assembler can select the appropriate encoding. This isdistinct from linker thunks, which handle cross-section or cross-objectreferences where distances aren't known until link time.
Assembler instruction relaxation handles two cases (see Clang-O0 output: branch displacement and size increase for examples):
Span-dependent instructions: Select an appropriateencoding based on displacement.
On x86, a short jump (jmp rel8) can be relaxed to anear jump (jmp rel32) when the target is far.
On RISC-V, beqz may be assembled to the 2-bytec.beqz when the displacement fits within ±256 bytes.
Conditional branch transform: Invert the conditionand insert an unconditional branch. On RISC-V, a blt mightbe relaxed to bge plus an unconditional branch.
The assembler uses an iterative layout algorithm that alternatesbetween fragment offset assignment and relaxation until all fragmentsbecome legalized. See Integratedassembler improvements in LLVM 19 for implementation details.
Linker: range extensionthunks
When the linker resolves relocations, it may discover that a branchtarget is out of range. At this point, the instruction encoding isfixed, so the linker cannot simply change the instruction. Instead, itgenerates range extension thunks (also called veneers,branch stubs, or trampolines).
A thunk is a small piece of linker-generated code that can reach theactual target using a longer sequence of instructions. The originalbranch is redirected to the thunk, which then jumps to the realdestination.
Range extension thunks are one type of linker-generated thunk. Othertypes include:
ARM interworking veneers: Switch between ARM andThumb instruction sets (see Linker notes onAArch32)
MIPS LA25 thunks: Enable PIC and non-PIC codeinteroperability (see Toolchain notes onMIPS)
PowerPC64 TOC/NOTOC thunks: Handle calls betweenfunctions using different TOC pointer conventions (see Linker notes on PowerISA)
Short range vs long rangethunks
A short range thunk (see lld/ELF's AArch64implementation) contains just a single branch instruction. Since ituses a branch, its reach is also limited by the branch range—it can onlyextend coverage by one branch distance. For targets further away,multiple short range thunks can be chained, or a long range thunk withaddress computation must be used.
Long range thunks use indirection and can jump to (practically)arbitrary locations.
1 2 3 4 5 6 7 8 9
// Short range thunk: single branch, 4 bytes __AArch64AbsLongThunk_dst: b dst // ±128MiB range
__ARMV7PILongThunk_dst: movw ip, :lower16:(dst - .) ; ip = intra-procedure-call scratch register movt ip, :upper16:(dst - .) add ip, ip, pc bx ip
PowerPC64 ELFv2 (see Linker notes on PowerISA):
1 2 3 4 5
__long_branch_dst: addis 12, 2, .branch_lt@ha # Load high bits from branch lookup table ld 12, .branch_lt@l(12) # Load target address mtctr 12 # Move to count register bctr # Branch to count register
Thunk impact ondebugging and profiling
Thunks are transparent at the source level but visible in low-leveltools:
Stack traces: May show thunk symbols (e.g.,__AArch64ADRPThunk_foo) between caller and callee
Profilers: Samples may attribute time to thunkcode; some profilers aggregate thunk time with the target function
Disassembly: objdump orllvm-objdump will show thunk sections interspersed withregular code
Code size: Each thunk adds bytes; large binariesmay have thousands of thunks
lld/ELF's thunk creationalgorithm
lld/ELF uses a multi-pass algorithm infinalizeAddressDependentContent:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
assignAddresses(); for (pass = 0; pass < 30; ++pass) { if (pass == 0) createInitialThunkSections(); // pre-create empty ThunkSections bool changed = false; for (relocation : all_relocations) { if (pass > 0 && normalizeExistingThunk(rel)) continue; // existing thunk still in range if (!needsThunk(rel)) continue; Thunk *t = getOrCreateThunk(rel); ts = findOrCreateThunkSection(rel, src); ts->addThunk(t); rel.sym = t->getThunkTargetSym(); // redirect changed = true; } mergeThunks(); // insert ThunkSections into output if (!changed) break; assignAddresses(); // recalculate with new thunks }
Key details:
Multi-pass: Iterates until convergence (max 30passes). Adding thunks changes addresses, potentially puttingpreviously-in-range calls out of range.
Pre-allocated ThunkSections: On pass 0,createInitialThunkSections places emptyThunkSections at regular intervals(thunkSectionSpacing). For AArch64: 128 MiB - 0x30000 ≈127.8 MiB.
Thunk reuse: getThunk returns existingthunk if one exists for the same target;normalizeExistingThunk checks if a previously-created thunkis still in range.
ThunkSection placement: getISDThunkSecfinds a ThunkSection within branch range of the call site, or createsone adjacent to the calling InputSection.
lld/MachO's thunk creationalgorithm
lld/MachO uses a single-pass algorithm inTextOutputSection::finalize:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
for (callIdx = 0; callIdx < inputs.size(); ++callIdx) { // Finalize sections within forward branch range (minus slop) while (finalIdx < endIdx && fits_in_range(inputs[finalIdx])) finalizeOne(inputs[finalIdx++]);
// Process branch relocations in this section for (Relocation &r : reverse(isec->relocs)) { if (!isBranchReloc(r)) continue; if (targetInRange(r)) continue; if (existingThunkInRange(r)) { reuse it; continue; } // Create new thunk and finalize it createThunk(r); } }
Key differences from lld/ELF:
Single pass: Addresses are assigned monotonicallyand never revisited
Slop reservation: ReservesslopScale * thunkSize bytes (default: 256 × 12 = 3072 byteson ARM64) to leave room for future thunks
Thunk naming:<function>.thunk.<sequence> where sequenceincrements per target
Thunkstarvation problem: If many consecutive branches need thunks, eachthunk (12 bytes) consumes slop faster than call sites (4 bytes apart)advance. The test lld/test/MachO/arm64-thunk-starvation.sdemonstrates this edge case. Mitigation is increasing--slop-scale, but pathological cases with hundreds ofconsecutive out-of-range callees can still fail.
mold's thunk creationalgorithm
mold uses a two-pass approach:
Pessimistically over-allocate thunks. Out-of-section relocations andrelocations referencing to a section not assigned address yetpessimistically need thunks.(requires_thunk(ctx, isec, rel, first_pass) whenfirst_pass=true)
Then remove unnecessary ones.
Linker pass ordering:
compute_section_sizes() callscreate_range_extension_thunks() — final section addressesare NOT yet known
set_osec_offsets() assigns section addresses
remove_redundant_thunks() is called AFTER addresses areknown — check unneeded thunks due to out-of-section relocations
Rerun set_osec_offsets()
Pass 1 (create_range_extension_thunks):Process sections in batches using a sliding window. The window tracksfour positions:
1 2 3 4 5 6 7 8 9
Sections: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] ... ^ ^ ^ ^ A B C D | |_______| | | batch | | | earliest thunk reachable placement from C
[B, C) = current batch of sections to process (size≤ branch_distance/5)
A = earliest section still reachable from C (forthunk expiration)
D = where to place the thunk (furthest pointreachable from B)
// Simplified from OutputSection<E>::create_range_extension_thunks while (b < sections.size()) { // Advance D: find furthest point where thunk is reachable from B while (d < size && thunk_at_d_reachable_from_b) assign_address(sections[d++]);
// Compute batch [B, C) c = b + 1; while (c < d && sections[c] < sections[b] + batch_size) c++;
// Advance A: expire thunks no longer reachable while (a < b && sections[a] + branch_distance < sections[c]) a++; // Expire thunk groups before A: clear symbol flags. for (; t < thunks.size() && thunks[t].offset < sections[a]; t++) for (sym in thunks[t].symbols) sym->flags = 0;
// Scan [B,C) relocations. If a symbol is not assigned to a thunk group yet, // assign it to the new thunk group at D. auto &thunk = thunks.emplace_back(newThunk(offset)); parallel_for(b, c, [&](i64 i) { for (rel in sections[i].relocs) { if (requires_thunk(rel)) { Symbol &sym = rel.symbol; if (!sym.flags.test_and_set()) { // atomic: skip if already set lock_guard lock(mu); thunk.symbols.push_back(&sym); } } } }); offset += thunk.size(); b = c; // Move to next batch }
Pass 2 (remove_redundant_thunks): Afterfinal addresses are known, remove thunk entries for symbols actually inrange.
Key characteristics:
Pessimistic over-allocation: Assumes allout-of-section calls need thunks; safe to shrink later
Batch size: branch_distance/5 (25.6 MiB forAArch64, 3.2 MiB for AArch32)
Parallelism: Uses TBB for parallel relocationscanning within each batch
Single branch range: Uses one conservativebranch_distance per architecture. For AArch32, uses ±16 MiB(Thumb limit) for all branches, whereas lld/ELF uses ±32 MiB for A32branches.
Thunk size not accounted in D-advancement: Theactual thunk group size is unknown when advancing D, so the end of alarge thunk group may be unreachable from the beginning of thebatch.
No convergence loop: Single forward pass foraddress assignment, no risk of non-convergence
GNU ld's thunk creationalgorithm
Each port implements the algorithm on their own. There is no codesharing.
GNU ld's AArch64 port (bfd/elfnn-aarch64.c) uses aniterative algorithm but with a single stub type and no lookup table.
for (;;) { stub_changed = false; _bfd_aarch64_add_call_stub_entries(&stub_changed, ...); if (!stub_changed) returntrue; _bfd_aarch64_resize_stubs(htab); layout_sections_again(); }
GNU ld's ppc64 port (bfd/elf64-ppc.c) uses an iterativemulti-pass algorithm with a branch lookup table(.branch_lt) for long-range stubs.
Section grouping: Sections are grouped bystub_group_size (~28-30 MiB default); each group gets onestub section. For 14-bit conditional branches(R_PPC64_REL14, ±32KiB range), group size is reduced by1024x.
while (1) { // Scan all relocations in all input sections for (input_bfd; section; irela) { // Only process branch relocations (R_PPC64_REL24, R_PPC64_REL14, etc.) stub_type = ppc_type_of_stub(section, irela, ...); if (stub_type == ppc_stub_none) continue; // Create or merge stub entry stub_entry = ppc_add_stub(...); }
// Size all stubs, potentially upgrading long_branch to plt_branch bfd_hash_traverse(&stub_hash_table, ppc_size_one_stub, ...);
// Check for convergence if (!stub_changed && all_sizes_stable) break;
// Re-layout sections layout_sections_again(); }
Convergence control:
STUB_SHRINK_ITER = 20 (PR28827): After 20 iterations,stub sections only grow (prevents oscillation)
Convergence when:!stub_changed && all section sizes stable
Stub type upgrade: ppc_type_of_stub()initially returns ppc_stub_long_branch for out-of-rangebranches. Later, ppc_size_one_stub() checks if the stub'sbranch can reach; if not, it upgrades toppc_stub_plt_branch and allocates an 8-byte entry in.branch_lt.
Comparing linker thunkalgorithms
Aspect
lld/ELF
lld/MachO
mold
GNU ld ppc64
Passes
Multi (max 30)
Single
Two
Multi (shrink after 20)
Strategy
Iterative refinement
Sliding window
Sliding window
Iterative refinement
Thunk placement
Pre-allocated intervals
Inline with slop
Batch intervals
Per stub-group
Linker relaxation
Some architectures take a different approach: instead of onlyexpanding branches, the linker can also shrinkinstruction sequences when the target is close enough. RISC-V andLoongArch both use this technique. See Thedark side of RISC-V linker relaxation for a deeper dive into thecomplexities and tradeoffs.
Consider a function call using the callpseudo-instruction, which expands to auipc +jalr:
1 2 3 4 5
# Before linking (8 bytes) call ext # Expands to: # auipc ra, %pcrel_hi(ext) # jalr ra, ra, %pcrel_lo(ext)
If ext is within ±1MiB, the linker can relax this to:
1 2
# After relaxation (4 bytes) jal ext
This is enabled by R_RISCV_RELAX relocations thataccompany R_RISCV_CALL relocations. TheR_RISCV_RELAX relocation signals to the linker that thisinstruction sequence is a candidate for shrinking.
When the linker deletes instructions, it must also adjust:
Subsequent instruction offsets within the section
Symbol addresses
Other relocations that reference affected locations
Alignment directives (R_RISCV_ALIGN)
This makes RISC-V linker relaxation more complex than thunkinsertion, but it provides code size benefits that other architecturescannot achieve at link time.
LoongArch uses a similar approach. Apcaddu12i+jirl sequence(R_LARCH_CALL36, ±128GiB range) can be relaxed to a singlebl instruction (R_LARCH_B26, ±128MiB range)when the target is close enough.
Diagnosing out-of-rangeerrors
When you encounter a "relocation out of range" error, check thelinker diagnostic and locate the relocatable file and function.Determine how the function call is lowered in assembly.
Summary
Handling long branches requires coordination across thetoolchain:
Stage
Technique
Example
Compiler
Branch relaxation pass
Invert condition + add unconditional jump
Assembler
Instruction relaxation
Invert condition + add unconditional jump
Linker
Range extension thunks
Generate trampolines
Linker
Linker relaxation
Shrink auipc+jalr to jal(RISC-V)
The linker's thunk generation is particularly important for largeprograms where function calls may exceed branch ranges. Differentlinkers use different algorithms with various tradeoffs betweencomplexity, optimality, and robustness.
Linker relaxation approaches adopted by RISC-V and LoongArch is analternative that avoids range extension thunks but introduces othercomplexities.
Branch instructions on most architectures use PC-relative addressingwith a limited range. When the target is too far away, the branchbecomes "out of range" and requires special handling.
Consider a large binary where main() at address 0x10000calls foo() at address 0x8010000-over 128MiB away. OnAArch64, the bl instruction can only reach ±128MiB, so thiscall cannot be encoded directly. Without proper handling, the linkerwould fail with an error like "relocation out of range." The toolchainmust handle this transparently to produce correct executables.
This article explores how compilers, assemblers, and linkers worktogether to solve the long branch problem.
Compiler (IR to assembly): Handles branches within a function thatexceed the range of conditional branch instructions
Assembler (assembly to relocatable file): Handles branches within asection where the distance is known at assembly time
Linker: Handles cross-section and cross-object branches discoveredduring final layout
Branch range limitations
Different architectures have different branch range limitations.Here's a quick comparison of unconditional branch/call ranges:
Architecture
Unconditional Branch
Conditional Branch
Notes
AArch64
±128MiB
±1MiB
Range extension thunks
AArch32 (A32)
±32MiB
±32MiB
Range extension and interworking veneers
AArch32 (T32)
±16MiB
±1MiB
Thumb has shorter ranges
PowerPC64
±32MiB
±32KiB
Range extension and TOC/NOTOC interworking thunks
RISC-V
±1MiB (jal)
±4KiB
Linker relaxation
x86-64
±2GiB
±2GiB
Code models or thunk extension
The following subsections provide detailed per-architectureinformation, including relocation types relevant for linkerimplementation.
AArch32
In A32 state:
Branch (b/b<cond>), conditionalbranch and link (bl<cond>)(R_ARM_JUMP24): ±32MiB
Unconditional branch and link (bl/blx,R_ARM_CALL): ±32MiB
Note: R_ARM_CALL is for unconditionalbl/blx which can be relaxed to BLX inline;R_ARM_JUMP24 is for branches which require a veneer forinterworking.
Note: lld does not implement range extension thunks for SPARC.
x86-64
Short conditional jump (Jcc rel8): -128 to +127bytes
Short unconditional jump (JMP rel8): -128 to +127bytes
Near conditional jump (Jcc rel32): ±2GiB
Near unconditional jump (JMP rel32): ±2GiB
With a ±2GiB range for near jumps, x86-64 rarely encountersout-of-range branches in practice. A single text section would need toexceed 2GiB before thunks become necessary. For this reason, mostlinkers (including lld) do not implement range extension thunks forx86-64.
Compiler: branch relaxation
The compiler typically generates branches using a form with a largerange. However, certain conditional branches may still go out of rangewithin a function.
The compiler measures branch distances and relaxes out-of-rangebranches. In LLVM, this is handled by the BranchRelaxationpass, which runs just before AsmPrinter.
Different backends have their own implementations:
BranchRelaxation: AArch64, AMDGPU, AVR, RISC-V
HexagonBranchRelaxation: Hexagon
PPCBranchSelector: PowerPC
SystemZLongBranch: SystemZ
MipsBranchExpansion: MIPS
MSP430BSel: MSP430
For a conditional branch that is out of range, the pass typicallyinverts the condition and inserts an unconditional branch:
1 2 3 4 5 6 7
# Before relaxation (out of range) beq .Lfar_target # ±4KiB range on RISC-V
# After relaxation bne .Lskip # Inverted condition, short range j .Lfar_target # Unconditional jump, ±1MiB range .Lskip:
Assembler: instructionrelaxation
The assembler converts assembly to machine code. When the target of abranch is within the same section and the distance is known at assemblytime, the assembler can select the appropriate encoding. This isdistinct from linker thunks, which handle cross-section or cross-objectreferences where distances aren't known until link time.
Assembler instruction relaxation handles two cases (see Clang-O0 output: branch displacement and size increase for examples):
Span-dependent instructions: Select a largerencoding when the displacement exceeds the range of the smallerencoding. For x86, a short jump (jmp rel8) can be relaxedto a near jump (jmp rel32).
Conditional branch transform: Invert the conditionand insert an unconditional branch. On RISC-V, a blt mightbe relaxed to bge plus an unconditional branch.
The assembler uses an iterative layout algorithm that alternatesbetween fragment offset assignment and relaxation until all fragmentsbecome legalized. See Integratedassembler improvements in LLVM 19 for implementation details.
Linker: range extensionthunks
When the linker resolves relocations, it may discover that a branchtarget is out of range. At this point, the instruction encoding isfixed, so the linker cannot simply change the instruction. Instead, itgenerates range extension thunks (also called veneers,branch stubs, or trampolines).
A thunk is a small piece of linker-generated code that can reach theactual target using a longer sequence of instructions. The originalbranch is redirected to the thunk, which then jumps to the realdestination.
Range extension thunks are one type of linker-generated thunk. Othertypes include:
ARM interworking veneers: Switch between ARM andThumb instruction sets (see Linker notes onAArch32)
MIPS LA25 thunks: Enable PIC and non-PIC codeinteroperability (see Toolchain notes onMIPS)
PowerPC64 TOC/NOTOC thunks: Handle calls betweenfunctions using different TOC pointer conventions (see Linker notes on PowerISA)
Short range vs long rangethunks
A short range thunk (see lld/ELF's AArch64implementation) contains just a single branch instruction. Since ituses a branch, its reach is also limited by the branch range—it can onlyextend coverage by one branch distance. For targets further away,multiple short range thunks can be chained, or a long range thunk withaddress computation must be used.
Long range thunks use indirection and can jump to (practically)arbitrary locations.
1 2 3 4 5 6 7 8 9
// Short range thunk: single branch, 4 bytes __AArch64AbsLongThunk_dst: b dst // ±128MiB range
__ARMV7PILongThunk_dst: movw ip, :lower16:(dst - .) ; ip = intra-procedure-call scratch register movt ip, :upper16:(dst - .) add ip, ip, pc bx ip
PowerPC64 ELFv2 (see Linker notes on PowerISA):
1 2 3 4 5
__long_branch_dst: addis 12, 2, .branch_lt@ha # Load high bits from branch lookup table ld 12, .branch_lt@l(12) # Load target address mtctr 12 # Move to count register bctr # Branch to count register
Thunk impact ondebugging and profiling
Thunks are transparent at the source level but visible in low-leveltools:
Stack traces: May show thunk symbols (e.g.,__AArch64ADRPThunk_foo) between caller and callee
Profilers: Samples may attribute time to thunkcode; some profilers aggregate thunk time with the target function
Disassembly: objdump orllvm-objdump will show thunk sections interspersed withregular code
Code size: Each thunk adds bytes; large binariesmay have thousands of thunks
lld/ELF's thunk creationalgorithm
lld/ELF uses a multi-pass algorithm infinalizeAddressDependentContent:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
assignAddresses(); for (pass = 0; pass < 30; ++pass) { if (pass == 0) createInitialThunkSections(); // pre-create empty ThunkSections bool changed = false; for (relocation : all_relocations) { if (pass > 0 && normalizeExistingThunk(rel)) continue; // existing thunk still in range if (!needsThunk(rel)) continue; Thunk *t = getOrCreateThunk(rel); ts = findOrCreateThunkSection(rel, src); ts->addThunk(t); rel.sym = t->getThunkTargetSym(); // redirect changed = true; } mergeThunks(); // insert ThunkSections into output if (!changed) break; assignAddresses(); // recalculate with new thunks }
Key details:
Multi-pass: Iterates until convergence (max 30passes). Adding thunks changes addresses, potentially puttingpreviously-in-range calls out of range.
Pre-allocated ThunkSections: On pass 0,createInitialThunkSections places emptyThunkSections at regular intervals(thunkSectionSpacing). For AArch64: 128 MiB - 0x30000 ≈127.8 MiB.
Thunk reuse: getThunk returns existingthunk if one exists for the same target;normalizeExistingThunk checks if a previously-created thunkis still in range.
ThunkSection placement: getISDThunkSecfinds a ThunkSection within branch range of the call site, or createsone adjacent to the calling InputSection.
lld/MachO's thunk creationalgorithm
lld/MachO uses a single-pass algorithm inTextOutputSection::finalize:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
for (callIdx = 0; callIdx < inputs.size(); ++callIdx) { // Finalize sections within forward branch range (minus slop) while (finalIdx < endIdx && fits_in_range(inputs[finalIdx])) finalizeOne(inputs[finalIdx++]);
// Process branch relocations in this section for (Relocation &r : reverse(isec->relocs)) { if (!isBranchReloc(r)) continue; if (targetInRange(r)) continue; if (existingThunkInRange(r)) { reuse it; continue; } // Create new thunk and finalize it createThunk(r); } }
Key differences from lld/ELF:
Single pass: Addresses are assigned monotonicallyand never revisited
Slop reservation: ReservesslopScale * thunkSize bytes (default: 256 × 12 = 3072 byteson ARM64) to leave room for future thunks
Thunk naming:<function>.thunk.<sequence> where sequenceincrements per target
Thunkstarvation problem: If many consecutive branches need thunks, eachthunk (12 bytes) consumes slop faster than call sites (4 bytes apart)advance. The test lld/test/MachO/arm64-thunk-starvation.sdemonstrates this edge case. Mitigation is increasing--slop-scale, but pathological cases with hundreds ofconsecutive out-of-range callees can still fail.
mold's thunk creationalgorithm
mold uses a two-pass approach: first pessimistically over-allocatethunks, then remove unnecessary ones.
Intuition: It's safe to allocate thunk space andlater shrink it, but unsafe to add thunks after addresses are assigned(would create gaps breaking existing references).
Pass 1 (create_range_extension_thunks):Process sections in batches using a sliding window. The window tracksfour positions:
1 2 3 4 5 6 7 8 9
Sections: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] ... ^ ^ ^ ^ A B C D | |_______| | | batch | | | earliest thunk reachable placement from C
[B, C) = current batch of sections to process (size≤ branch_distance/5)
A = earliest section still reachable from C (forthunk expiration)
D = where to place the thunk (furthest pointreachable from B)
// Simplified from OutputSection<E>::create_range_extension_thunks while (b < sections.size()) { // Advance D: find furthest point where thunk is reachable from B while (d < size && thunk_at_d_reachable_from_b) assign_address(sections[d++]);
// Compute batch [B, C) c = b + 1; while (c < d && sections[c] < sections[b] + batch_size) c++;
// Advance A: expire thunks no longer reachable while (a < b && sections[a] + branch_distance < sections[c]) a++; // Expire thunk groups before A: clear symbol flags. for (; t < thunks.size() && thunks[t].offset < sections[a]; t++) for (sym in thunks[t].symbols) sym->flags = 0;
// Scan [B,C) relocations. If a symbol is not assigned to a thunk group yet, // assign it to the new thunk group at D. auto &thunk = thunks.emplace_back(newThunk(offset)); parallel_for(b, c, [&](i64 i) { for (rel in sections[i].relocs) { if (requires_thunk(rel)) { Symbol &sym = rel.symbol; if (!sym.flags.test_and_set()) { // atomic: skip if already set lock_guard lock(mu); thunk.symbols.push_back(&sym); } } } }); offset += thunk.size(); b = c; // Move to next batch }
Pass 2 (remove_redundant_thunks): Afterfinal addresses are known, remove thunk entries for symbols actually inrange.
Key characteristics:
Pessimistic over-allocation: Assumes allout-of-section calls need thunks; safe to shrink later
Batch size: branch_distance/5 (25.6 MiB forAArch64, 3.2 MiB for AArch32)
Parallelism: Uses TBB for parallel relocationscanning within each batch
Single branch range: Uses one conservativebranch_distance per architecture. For AArch32, uses ±16 MiB(Thumb limit) for all branches, whereas lld/ELF uses ±32 MiB for A32branches.
Thunk size not accounted in D-advancement: Theactual thunk group size is unknown when advancing D, so the end of alarge thunk group may be unreachable from the beginning of thebatch.
No convergence loop: Single forward pass foraddress assignment, no risk of non-convergence
Comparing thunk algorithms
Aspect
lld/ELF
lld/MachO
mold
Passes
Multi-pass (max 30)
Single-pass
Two-pass
Strategy
Iterative refinement
Greedy
Greedy
Thunk placement
Pre-allocated at intervals
Inline with slop reservation
Batch-based at intervals
Convergence
Always (bounded iterations)
Almost always
Almost always
Range handling
Per-relocation type
Single conservative range
Single conservative range
Parallelism
Sequential
Sequential
Parallel (TBB)
Linker relaxation (RISC-V)
RISC-V takes a different approach: instead of only expandingbranches, it can also shrink instruction sequences whenthe target is close enough.
Consider a function call using the callpseudo-instruction, which expands to auipc +jalr:
1 2 3 4 5
# Before linking (8 bytes) call ext # Expands to: # auipc ra, %pcrel_hi(ext) # jalr ra, ra, %pcrel_lo(ext)
If ext is within ±1MiB, the linker can relax this to:
1 2
# After relaxation (4 bytes) jal ext
This is enabled by R_RISCV_RELAX relocations thataccompany R_RISCV_CALL relocations. TheR_RISCV_RELAX relocation signals to the linker that thisinstruction sequence is a candidate for shrinking.
When the linker deletes instructions, it must also adjust:
Subsequent instruction offsets within the section
Symbol addresses
Other relocations that reference affected locations
Alignment directives (R_RISCV_ALIGN)
This makes RISC-V linker relaxation more complex than thunkinsertion, but it provides code size benefits that other architecturescannot achieve at link time.
Diagnosing out-of-rangeerrors
When you encounter a "relocation out of range" error, here are somediagnostic steps:
Check the error message: lld reports the sourcelocation, relocation type, and the distance. For example:
1
ld.lld: error: a.o:(.text+0x1000): relocation R_AARCH64_CALL26 out of range: 150000000 is not in [-134217728, 134217727]
Use --verbose or-Map: Generate a link map to see sectionlayout and identify which sections are far apart.
Consider -ffunction-sections:Splitting functions into separate sections gives the linker moreflexibility in placement, potentially reducing distances.
Check for large data in .text:Embedded data (jump tables, constant pools) can push functions apart.Some compilers have options to place these elsewhere.
LTO considerations: Link-time optimization candramatically change code layout. If thunk-related issues appear onlywith LTO, the optimizer may be creating larger functions or differentinlining decisions.
Summary
Handling long branches requires coordination across thetoolchain:
Stage
Technique
Example
Compiler
Branch relaxation pass
Invert condition + add unconditional jump
Assembler
Instruction relaxation
Short jump to near jump
Linker
Range extension thunks
Generate trampolines
Linker
Linker relaxation
Shrink auipc+jalr to jal(RISC-V)
The linker's thunk generation is particularly important for largeprograms where cross-compilation-unit calls may exceed branch ranges.Different linkers use different algorithms with various tradeoffsbetween complexity, optimality, and robustness.
RISC-V's linker relaxation is unique in that it can both expand andshrink code, optimizing for both correctness and code size.
@JonyFang: 本视频深入介绍了如何让 AI 代理(如 Codex GPT 5.2)真正提升 iOS/macOS 开发效率的三个核心策略:
构建脚本自动化(Build Scripts):通过标准化的构建流程,让 AI 能够理解和复现你的构建环境
让构建失败显而易见(Make Build Failures Obvious):优化错误信息的呈现方式,使 AI 能够快速定位问题根源
给你的代理装上"眼睛"(Give Your Agent Eyes):这是最核心的部分 - 让 AI 能够"看到"应用运行时的状态,而不仅仅是读取代码
最有价值之处:作者强调了一个常被忽视的问题 - AI 代码助手不仅需要理解代码逻辑,更需要理解应用的运行时状态。通过工具如 Peekaboo 等,让 AI 能够获取视觉反馈(截图、UI 层级等),从而提供更精准的问题诊断和代码建议。这种"可观测性优先"的思路,与传统的代码审查工作流形成了有趣的对比,值得所有尝试将 AI 工具深度集成到开发流程中的团队参考。
Class
├─ isa
├─ superclass
├─ cache
├─ method list
├─ property list
├─ protocol list
├─ ivar list
├─ class_rw_t / class_ro_t
└─ 元类(Meta Class)
下面我们逐一展开。
二、isa —— 类的“身份指针”
1. isa 是什么
isa 是一个指针
对象的 isa → Class
类的 isa → Meta Class
instance ──isa──▶ Class ──isa──▶ Meta Class
在 arm64 以后:
isa 是 非纯指针(non-pointer isa)
高位存储了:
引用计数信息
weak 标志
是否有关联对象
但 逻辑语义没有变化。
三、cache —— 方法调用的性能核心
1. cache 是什么
cache 是一个 SEL → IMP 的映射表
存在于 Class 中
用于加速方法查找
cache
├─ bucket[SEL → IMP]
└─ mask / occupied
2. cache 在方法查找中的位置
objc_msgSend 查找顺序:
1️⃣ cache
2️⃣ method list
3️⃣ superclass → 重复 1、2
cache 永远是第一站。
3. cache 的填充时机
cache 是 懒加载 的
第一次方法调用:
cache 未命中
method list 找到 IMP
写入 cache
之后同一个 SEL:
直接命中 cache
4. cache 为什么不区分类?
cache 的 key 是:
SEL
但 cache 属于 某一个 Class。
因此:
A.foo → A 的 cache
B.foo → B 的 cache
即使 SEL 相同,也互不干扰。
四、method list —— 方法的“原始数据源”
1. method list 是什么
method list 是一个数组
每一项是一个 method_t
method_t
├─ SEL name
├─ IMP imp
└─ constchar *types
也就是我们熟悉的三要素:
SEL + IMP + Type Encoding
2. method list 的来源
method list 由以下部分合并而来:
类本身实现的方法
Category 中的方法
⚠️ Category 的方法:
会 后加载、前插入
因此可以覆盖原方法
五、property list —— 属性的声明信息
1. property list 是什么
属性列表存的是 声明信息
不是 ivar
不是 getter / setter 的实现
objc_property_t
├─ name
└─ attributes (copy, nonatomic, strong ...)
2. property list 干什么用
Runtime 反射
KVC / KVO
自动序列化 / ORM
但注意:
方法调用完全不依赖 property list
六、ivar list —— 实例变量的真实布局
1. ivar list 是什么
ivar list 描述的是:
成员变量
内存偏移
类型
ivar_t
├─ name
├─ type
└─ offset
2. ivar list 与对象内存
instance memory
├─ isa
├─ ivar1
├─ ivar2
ivar list 决定对象内存布局
子类 ivar 会追加在父类之后
七、protocol list —— 协议信息
1. protocol list 是什么
存储类遵循的协议
包含:
必选方法
可选方法
主要用于:
conformsToProtocol:
Runtime 查询
八、class_rw_t / class_ro_t —— 可变与只读区
1. class_ro_t(只读)
编译期确定
存储:
原始方法列表
ivar list
property list
2. class_rw_t(可写)
运行时动态生成
存储:
Category 方法
动态添加的方法
这也是 Category 能“修改类行为”的根本原因。
九、Meta Class —— 类方法的归宿
1. Meta Class 是什么
类方法不是存在 Class 里
而是存在 Meta Class 的 method list 中
[Class foo]
→ 查找 Meta Class 的 cache / method list
十、一张完整 Runtime 结构图(逻辑)
instance
└─ isa → Class
├─ isa → Meta Class
├─ superclass
├─ cache
├─ method list
├─ property list
├─ ivar list
├─ protocol list
└─ class_rw_t / class_ro_t