普通视图

发现新文章,点击刷新页面。
今天 — 2026年1月26日iOS

Long branches in compilers, assemblers, and linkers

作者 MaskRay
2026年1月25日 16:00

Branch instructions on most architectures use PC-relative addressingwith a limited range. When the target is too far away, the branchbecomes "out of range" and requires special handling.

Consider a large binary where main() at address 0x10000calls foo() at address 0x8010000-over 128MiB away. OnAArch64, the bl instruction can only reach ±128MiB, so thiscall cannot be encoded directly. Without proper handling, the linkerwould fail with an error like "relocation out of range." The toolchainmust handle this transparently to produce correct executables.

This article explores how compilers, assemblers, and linkers worktogether to solve the long branch problem.

  • Compiler (IR to assembly): Handles branches within a function thatexceed the range of conditional branch instructions
  • Assembler (assembly to relocatable file): Handles branches within asection where the distance is known at assembly time
  • Linker: Handles cross-section and cross-object branches discoveredduring final layout

Branch range limitations

Different architectures have different branch range limitations.Here's a quick comparison of unconditional branch/call ranges:

Architecture Unconditional Branch Conditional Branch Notes
AArch64 ±128MiB ±1MiB Range extension thunks
AArch32 (A32) ±32MiB ±32MiB Range extension and interworking veneers
AArch32 (T32) ±16MiB ±1MiB Thumb has shorter ranges
LoongArch ±128MiB ±128KiB Linker relaxation
PowerPC64 ±32MiB ±32KiB Range extension and TOC/NOTOC interworking thunks
RISC-V ±1MiB (jal) ±4KiB Linker relaxation
x86-64 ±2GiB ±2GiB Code models or thunk extension

The following subsections provide detailed per-architectureinformation, including relocation types relevant for linkerimplementation.

AArch32

In A32 state:

  • Branch (b/b<cond>), conditionalbranch and link (bl<cond>)(R_ARM_JUMP24): ±32MiB
  • Unconditional branch and link (bl/blx,R_ARM_CALL): ±32MiB

Note: R_ARM_CALL is for unconditionalbl/blx which can be relaxed to BLX inline;R_ARM_JUMP24 is for branches which require a veneer forinterworking.

In T32 state (Thumb state pre-ARMv8):

  • Conditional branch (b<cond>,R_ARM_THM_JUMP8): ±256 bytes
  • Short unconditional branch (b,R_ARM_THM_JUMP11): ±2KiB
  • ARMv5T branch and link (bl/blx,R_ARM_THM_CALL): ±4MiB
  • ARMv6T2 wide conditional branch (b<cond>.w,R_ARM_THM_JUMP19): ±1MiB
  • ARMv6T2 wide branch (b.w,R_ARM_THM_JUMP24): ±16MiB
  • ARMv6T2 wide branch and link (bl/blx,R_ARM_THM_CALL): ±16MiB. R_ARM_THM_CALL can berelaxed to BLX.

AArch64

  • Test bit and branch (tbz/tbnz,R_AARCH64_TSTBR14): ±32KiB
  • Compare and branch (cbz/cbnz,R_AARCH64_CONDBR19): ±1MiB
  • Conditional branches (b.<cond>,R_AARCH64_CONDBR19): ±1MiB
  • Unconditional branches (b/bl,R_AARCH64_JUMP26/R_AARCH64_CALL26):±128MiB

The compiler's BranchRelaxation pass handlesout-of-range conditional branches by inverting the condition andinserting an unconditional branch. The AArch64 assembler does notperform branch relaxation; out-of-range branches produce linker errorsif not handled by the compiler.

LoongArch

  • Conditional branches(beq/bne/blt/bge/bltu/bgeu,R_LARCH_B16): ±128KiB (18-bit signed)
  • Compare-to-zero branches (beqz/bnez,R_LARCH_B21): ±4MiB (23-bit signed)
  • Unconditional branch/call (b/bl,R_LARCH_B26): ±128MiB (28-bit signed)
  • Medium range call (pcaddu12i+jirl,R_LARCH_CALL30): ±2GiB
  • Long range call (pcaddu18i+jirl,R_LARCH_CALL36): ±128GiB

MIPS

  • Conditional branches(beq/bne/bgez/bltz/etc,R_MIPS_PC16): ±128KiB
  • Jump/call (j/jal, R_MIPS_26):pseudo-absolute branch within the current 256MiB region, only suitablefor -fno-pic code. Deprecated in R6 in favor ofbc/balc

16-bit instructions removed in Release 6:

  • Conditional branch (beqz16,R_MICROMIPS_PC7_S1): ±128 bytes
  • Unconditional branch (b16,R_MICROMIPS_PC10_S1): ±1KiB

MIPS Release 6:

  • Unconditional branch, compact (bc16, unclear toolchainimplementation): ±1KiB
  • Compare and branch, compact(beqc/bnec/bltc/bgec/etc,R_MIPS_PC16): ±128KiB
  • Compare register to zero and branch, compact(beqzc/bnezc/etc,R_MIPS_PC21_S2): ±4MiB
  • Branch (and link), compact (bc/balc,R_MIPS_PC26_S2): ±128MiB

LLVM's MipsBranchExpansion pass handles out-of-rangebranches.

lld implements LA25 thunks for MIPS PIC/non-PIC interoperability, butnot range extension thunks.

PowerPC

  • Conditional branch (bc/bcl,R_PPC64_REL14): ±32KiB
  • Unconditional branch (b/bl,R_PPC64_REL24/R_PPC64_REL24_NOTOC):±32MiB

GCC-generated code relies on linker thunks. However, the legacy-mlongcall can be used to generate long code sequences.

RISC-V

  • Compressed c.beqz: ±256 bytes
  • Compressed c.jal: ±2KiB
  • jalr (I-type immediate): ±2KiB
  • Conditional branches(beq/bne/blt/bge/bltu/bgeu,B-type immediate): ±4KiB
  • jal (J-type immediate, PseudoBR): ±1MiB(notably smaller than other RISC architectures: AArch64 ±128MiB,PowerPC64 ±32MiB, LoongArch ±128MiB)
  • PseudoJump (using auipc +jalr): ±2GiB
  • beqi/bnei (Zibi extension, 5-bit compareimmediate (1 to 31 and -1)): ±4KiB

Qualcomm uC Branch Immediate extension (Xqcibi):

  • qc.beqi/qc.bnei/qc.blti/qc.bgei/qc.bltui/qc.bgeui(32-bit, 5-bit compare immediate): ±4KiB
  • qc.e.beqi/qc.e.bnei/qc.e.blti/qc.e.bgei/qc.e.bltui/qc.e.bgeui(48-bit, 16-bit compare immediate): ±4KiB

Qualcomm uC Long Branch extension (Xqcilb):

  • qc.e.j/qc.e.jal (48-bit,R_RISCV_VENDOR(QUALCOMM)+R_RISCV_QC_E_CALL_PLT): ±2GiB

For function calls:

  • The Gocompiler emits a single jal for calls and relies on itslinker to generate trampolines when the target is out of range.
  • In contrast, GCC and Clang emit auipc+jalrand rely on linker relaxation to shrink the sequence when possible.

The jal range (±1MiB) is notably smaller than other RISCarchitectures (AArch64 ±128MiB, PowerPC64 ±32MiB, LoongArch ±128MiB).This limits the effectiveness of linker relaxation ("start large andshrink"), and leads to frequent trampolines when the compileroptimistically emits jal ("start small and grow").

SPARC

  • Compare and branch (cxbe, R_SPARC_5): ±64bytes
  • Conditional branch (bcc, R_SPARC_WDISP19):±1MiB
  • Unconditional branch (b, R_SPARC_WDISP22):±8MiB
  • call(R_SPARC_WDISP30/R_SPARC_WPLT30): ±2GiB

With ±2GiB range for call, SPARC doesn't need rangeextension thunks in practice.

SuperH

  • Conditional branch (bf/bt): ±256bytes
  • Unconditional branch (bra): ±4KiB
  • Branch to subroutine (bsr): ±4KiB

The very short range for conditional branches (±256 bytes) requiresthe compiler to invert the condition and generate register-indirectbraf/bsrf for longer distances. SuperH is notsupported by LLVM.

Xtensa

  • Narrow conditional branch (beqz.n/bnez.n):-28 to +35 bytes (6-bit signed + 4)
  • Conditional branch (compare two registers)(beq/bne/blt/bge/etc):±256 bytes
  • Conditional branch (compare with zero)(beqz/bnez/bltz/bgez):±2KiB
  • Unconditional jump (j): ±128KiB
  • Call(call0/call4/call8/call12):±512KiB

The assembler performs branch relaxation: when a conditional branchtarget is too far, it inverts the condition and inserts a jinstruction.

Per https://www.sourceware.org/binutils/docs/as/Xtensa-Call-Relaxation.html,for calls, GNU Assembler pessimistically generates indirect sequences(l32r+callx8) when the target distance isunknown. GNU ld then performs linker relaxation.

x86-64

  • Short conditional jump (Jcc rel8): -128 to +127bytes
  • Short unconditional jump (JMP rel8): -128 to +127bytes
  • Near conditional jump (Jcc rel32): ±2GiB
  • Near unconditional jump (JMP rel32): ±2GiB

With a ±2GiB range for near jumps, x86-64 rarely encountersout-of-range branches in practice. That said, Google and Meta Platformsdeploy mostly statically linked executables on x86-64 production serversand have run into the huge executable problem for certainconfigurations.

Compiler: branch rangehandling

Conditional branch instructions usually have shorter ranges thanunconditional ones, making them less suitable for linker thunks (as wewill explore later). Compilers typically keep conditional branch targetswithin the same section, allowing the compiler to handle out-of-rangecases via branch relaxation.

Within a function, conditional branches may still go out of range.The compiler measures branch distances and relaxes out-of-range branchesby inverting the condition and inserting an unconditional branch:

1
2
3
4
5
6
7
# Before relaxation (out of range)
beq .Lfar_target # ±4KiB range on RISC-V

# After relaxation
bne .Lskip # Inverted condition, short range
j .Lfar_target # Unconditional jump, ±1MiB range
.Lskip:

Some architectures have conditional branch instructions that comparewith an immediate, with even shorter ranges due to encoding additionalimmediates. For example, AArch64's cbz/cbnz(compare and branch if zero/non-zero) andtbz/tbnz (test bit and branch) have only±32KiB range. RISC-V Zibi beqi/bnei have ±4KiBrange. The compiler handles these in a similar way:

1
2
3
4
5
6
7
// Before relaxation (cbz has ±32KiB range)
cbz w0, far

// After relaxation
cbnz w0, .Lskip // Inverted condition
b far // Unconditional branch, ±128MiB range
.Lskip:

An Intel employee contributed https://reviews.llvm.org/D41634 (in 2017) when inversionof a branch condintion is impossible. This is for an out-of-treebackend. As of Jan 2026 there is no in-tree test for this code path.

In LLVM, this is handled by the BranchRelaxation pass,which runs just before AsmPrinter. Different backends havetheir own implementations:

  • BranchRelaxation: AArch64, AMDGPU, AVR, RISC-V
  • HexagonBranchRelaxation: Hexagon
  • PPCBranchSelector: PowerPC
  • SystemZLongBranch: SystemZ
  • MipsBranchExpansion: MIPS
  • MSP430BSel: MSP430

The generic BranchRelaxation pass computes block sizesand offsets, then iterates until all branches are in range. Forconditional branches, it tries to invert the condition and insert anunconditional branch. For unconditional branches that are still out ofrange, it calls TargetInstrInfo::insertIndirectBranch toemit an indirect jump sequence (e.g.,adrp+add+br on AArch64) or a longjump sequence (e.g., pseudo jump on RISC-V).

Unconditional branches and calls can target different sections sincethey have larger ranges. If the target is out of reach, the linker caninsert thunks to extend the range.

For x86-64, the large code model uses multiple instructions for callsand jumps to support text sections larger than 2GiB (see Relocationoverflow and code models: x86-64 large code model). This is apessimization if the callee ends up being within reach. Google and MetaPlatforms have interest in allowing range extension thunks as areplacement for the multiple instructions.

Assembler: instructionrelaxation

The assembler converts assembly to machine code. When the target of abranch is within the same section and the distance is known at assemblytime, the assembler can select the appropriate encoding. This isdistinct from linker thunks, which handle cross-section or cross-objectreferences where distances aren't known until link time.

Assembler instruction relaxation handles two cases (see Clang-O0 output: branch displacement and size increase for examples):

  • Span-dependent instructions: Select an appropriateencoding based on displacement.
    • On x86, a short jump (jmp rel8) can be relaxed to anear jump (jmp rel32) when the target is far.
    • On RISC-V, beqz may be assembled to the 2-bytec.beqz when the displacement fits within ±256 bytes.
  • Conditional branch transform: Invert the conditionand insert an unconditional branch. On RISC-V, a blt mightbe relaxed to bge plus an unconditional branch.

The assembler uses an iterative layout algorithm that alternatesbetween fragment offset assignment and relaxation until all fragmentsbecome legalized. See Integratedassembler improvements in LLVM 19 for implementation details.

Linker: range extensionthunks

When the linker resolves relocations, it may discover that a branchtarget is out of range. At this point, the instruction encoding isfixed, so the linker cannot simply change the instruction. Instead, itgenerates range extension thunks (also called veneers,branch stubs, or trampolines).

A thunk is a small piece of linker-generated code that can reach theactual target using a longer sequence of instructions. The originalbranch is redirected to the thunk, which then jumps to the realdestination.

Range extension thunks are one type of linker-generated thunk. Othertypes include:

  • ARM interworking veneers: Switch between ARM andThumb instruction sets (see Linker notes onAArch32)
  • MIPS LA25 thunks: Enable PIC and non-PIC codeinteroperability (see Toolchain notes onMIPS)
  • PowerPC64 TOC/NOTOC thunks: Handle calls betweenfunctions using different TOC pointer conventions (see Linker notes on PowerISA)

Short range vs long rangethunks

A short range thunk (see lld/ELF's AArch64implementation) contains just a single branch instruction. Since ituses a branch, its reach is also limited by the branch range—it can onlyextend coverage by one branch distance. For targets further away,multiple short range thunks can be chained, or a long range thunk withaddress computation must be used.

Long range thunks use indirection and can jump to (practically)arbitrary locations.

1
2
3
4
5
6
7
8
9
// Short range thunk: single branch, 4 bytes
__AArch64AbsLongThunk_dst:
b dst // ±128MiB range

// Long range thunk: address computation, 12 bytes
__AArch64ADRPThunk_dst:
adrp x16, dst // Load page address (±4GiB range)
add x16, x16, :lo12:dst // Add page offset
br x16 // Indirect branch

Thunk examples

AArch32 (PIC) (see Linker notes onAArch32):

1
2
3
4
5
__ARMV7PILongThunk_dst:
movw ip, :lower16:(dst - .) ; ip = intra-procedure-call scratch register
movt ip, :upper16:(dst - .)
add ip, ip, pc
bx ip

PowerPC64 ELFv2 (see Linker notes on PowerISA):

1
2
3
4
5
__long_branch_dst:
addis 12, 2, .branch_lt@ha # Load high bits from branch lookup table
ld 12, .branch_lt@l(12) # Load target address
mtctr 12 # Move to count register
bctr # Branch to count register

Thunk impact ondebugging and profiling

Thunks are transparent at the source level but visible in low-leveltools:

  • Stack traces: May show thunk symbols (e.g.,__AArch64ADRPThunk_foo) between caller and callee
  • Profilers: Samples may attribute time to thunkcode; some profilers aggregate thunk time with the target function
  • Disassembly: objdump orllvm-objdump will show thunk sections interspersed withregular code
  • Code size: Each thunk adds bytes; large binariesmay have thousands of thunks

lld/ELF's thunk creationalgorithm

lld/ELF uses a multi-pass algorithm infinalizeAddressDependentContent:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
assignAddresses();
for (pass = 0; pass < 30; ++pass) {
if (pass == 0)
createInitialThunkSections(); // pre-create empty ThunkSections
bool changed = false;
for (relocation : all_relocations) {
if (pass > 0 && normalizeExistingThunk(rel))
continue; // existing thunk still in range
if (!needsThunk(rel)) continue;
Thunk *t = getOrCreateThunk(rel);
ts = findOrCreateThunkSection(rel, src);
ts->addThunk(t);
rel.sym = t->getThunkTargetSym(); // redirect
changed = true;
}
mergeThunks(); // insert ThunkSections into output
if (!changed) break;
assignAddresses(); // recalculate with new thunks
}

Key details:

  • Multi-pass: Iterates until convergence (max 30passes). Adding thunks changes addresses, potentially puttingpreviously-in-range calls out of range.
  • Pre-allocated ThunkSections: On pass 0,createInitialThunkSections places emptyThunkSections at regular intervals(thunkSectionSpacing). For AArch64: 128 MiB - 0x30000 ≈127.8 MiB.
  • Thunk reuse: getThunk returns existingthunk if one exists for the same target;normalizeExistingThunk checks if a previously-created thunkis still in range.
  • ThunkSection placement: getISDThunkSecfinds a ThunkSection within branch range of the call site, or createsone adjacent to the calling InputSection.

lld/MachO's thunk creationalgorithm

lld/MachO uses a single-pass algorithm inTextOutputSection::finalize:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for (callIdx = 0; callIdx < inputs.size(); ++callIdx) {
// Finalize sections within forward branch range (minus slop)
while (finalIdx < endIdx && fits_in_range(inputs[finalIdx]))
finalizeOne(inputs[finalIdx++]);

// Process branch relocations in this section
for (Relocation &r : reverse(isec->relocs)) {
if (!isBranchReloc(r)) continue;
if (targetInRange(r)) continue;
if (existingThunkInRange(r)) { reuse it; continue; }
// Create new thunk and finalize it
createThunk(r);
}
}

Key differences from lld/ELF:

  • Single pass: Addresses are assigned monotonicallyand never revisited
  • Slop reservation: ReservesslopScale * thunkSize bytes (default: 256 × 12 = 3072 byteson ARM64) to leave room for future thunks
  • Thunk naming:<function>.thunk.<sequence> where sequenceincrements per target

Thunkstarvation problem: If many consecutive branches need thunks, eachthunk (12 bytes) consumes slop faster than call sites (4 bytes apart)advance. The test lld/test/MachO/arm64-thunk-starvation.sdemonstrates this edge case. Mitigation is increasing--slop-scale, but pathological cases with hundreds ofconsecutive out-of-range callees can still fail.

mold's thunk creationalgorithm

mold uses a two-pass approach:

  • Pessimistically over-allocate thunks. Out-of-section relocations andrelocations referencing to a section not assigned address yetpessimistically need thunks.(requires_thunk(ctx, isec, rel, first_pass) whenfirst_pass=true)
  • Then remove unnecessary ones.

Linker pass ordering:

  • compute_section_sizes() callscreate_range_extension_thunks() — final section addressesare NOT yet known
  • set_osec_offsets() assigns section addresses
  • remove_redundant_thunks() is called AFTER addresses areknown — check unneeded thunks due to out-of-section relocations
  • Rerun set_osec_offsets()

Pass 1 (create_range_extension_thunks):Process sections in batches using a sliding window. The window tracksfour positions:

1
2
3
4
5
6
7
8
9
Sections:   [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] ...
^ ^ ^ ^
A B C D
| |_______| |
| batch |
| |
earliest thunk
reachable placement
from C
  • [B, C) = current batch of sections to process (size≤ branch_distance/5)
  • A = earliest section still reachable from C (forthunk expiration)
  • D = where to place the thunk (furthest pointreachable from B)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Simplified from OutputSection<E>::create_range_extension_thunks
while (b < sections.size()) {
// Advance D: find furthest point where thunk is reachable from B
while (d < size && thunk_at_d_reachable_from_b)
assign_address(sections[d++]);

// Compute batch [B, C)
c = b + 1;
while (c < d && sections[c] < sections[b] + batch_size) c++;

// Advance A: expire thunks no longer reachable
while (a < b && sections[a] + branch_distance < sections[c]) a++;
// Expire thunk groups before A: clear symbol flags.
for (; t < thunks.size() && thunks[t].offset < sections[a]; t++)
for (sym in thunks[t].symbols) sym->flags = 0;

// Scan [B,C) relocations. If a symbol is not assigned to a thunk group yet,
// assign it to the new thunk group at D.
auto &thunk = thunks.emplace_back(new Thunk(offset));
parallel_for(b, c, [&](i64 i) {
for (rel in sections[i].relocs) {
if (requires_thunk(rel)) {
Symbol &sym = rel.symbol;
if (!sym.flags.test_and_set()) { // atomic: skip if already set
lock_guard lock(mu);
thunk.symbols.push_back(&sym);
}
}
}
});
offset += thunk.size();
b = c; // Move to next batch
}

Pass 2 (remove_redundant_thunks): Afterfinal addresses are known, remove thunk entries for symbols actually inrange.

Key characteristics:

  • Pessimistic over-allocation: Assumes allout-of-section calls need thunks; safe to shrink later
  • Batch size: branch_distance/5 (25.6 MiB forAArch64, 3.2 MiB for AArch32)
  • Parallelism: Uses TBB for parallel relocationscanning within each batch
  • Single branch range: Uses one conservativebranch_distance per architecture. For AArch32, uses ±16 MiB(Thumb limit) for all branches, whereas lld/ELF uses ±32 MiB for A32branches.
  • Thunk size not accounted in D-advancement: Theactual thunk group size is unknown when advancing D, so the end of alarge thunk group may be unreachable from the beginning of thebatch.
  • No convergence loop: Single forward pass foraddress assignment, no risk of non-convergence

GNU ld's thunk creationalgorithm

Each port implements the algorithm on their own. There is no codesharing.

GNU ld's AArch64 port (bfd/elfnn-aarch64.c) uses aniterative algorithm but with a single stub type and no lookup table.

Main iteration loop(elfNN_aarch64_size_stubs()):

1
2
3
4
5
6
7
8
9
10
11
group_sections(htab, stub_group_size, ...);  // Default: 127 MiB
layout_sections_again();

for (;;) {
stub_changed = false;
_bfd_aarch64_add_call_stub_entries(&stub_changed, ...);
if (!stub_changed)
return true;
_bfd_aarch64_resize_stubs(htab);
layout_sections_again();
}

GNU ld's ppc64 port (bfd/elf64-ppc.c) uses an iterativemulti-pass algorithm with a branch lookup table(.branch_lt) for long-range stubs.

Section grouping: Sections are grouped bystub_group_size (~28-30 MiB default); each group gets onestub section. For 14-bit conditional branches(R_PPC64_REL14, ±32KiB range), group size is reduced by1024x.

Main iteration loop(ppc64_elf_size_stubs()):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
while (1) {
// Scan all relocations in all input sections
for (input_bfd; section; irela) {
// Only process branch relocations (R_PPC64_REL24, R_PPC64_REL14, etc.)
stub_type = ppc_type_of_stub(section, irela, ...);
if (stub_type == ppc_stub_none)
continue;
// Create or merge stub entry
stub_entry = ppc_add_stub(...);
}

// Size all stubs, potentially upgrading long_branch to plt_branch
bfd_hash_traverse(&stub_hash_table, ppc_size_one_stub, ...);

// Check for convergence
if (!stub_changed && all_sizes_stable)
break;

// Re-layout sections
layout_sections_again();
}

Convergence control:

  • STUB_SHRINK_ITER = 20 (PR28827): After 20 iterations,stub sections only grow (prevents oscillation)
  • Convergence when:!stub_changed && all section sizes stable

Stub type upgrade: ppc_type_of_stub()initially returns ppc_stub_long_branch for out-of-rangebranches. Later, ppc_size_one_stub() checks if the stub'sbranch can reach; if not, it upgrades toppc_stub_plt_branch and allocates an 8-byte entry in.branch_lt.

Comparing linker thunkalgorithms

Aspect lld/ELF lld/MachO mold GNU ld ppc64
Passes Multi (max 30) Single Two Multi (shrink after 20)
Strategy Iterative refinement Sliding window Sliding window Iterative refinement
Thunk placement Pre-allocated intervals Inline with slop Batch intervals Per stub-group

Linker relaxation (RISC-V)

In GCC and Clang, their RISC-V ports take a different approach:instead of only expanding branches, it can also shrinkinstruction sequences when the target is close enough. See Thedark side of RISC-V linker relaxation for a deeper dive into thecomplexities and tradeoffs.

Consider a function call using the callpseudo-instruction, which expands to auipc +jalr:

1
2
3
4
5
# Before linking (8 bytes)
call ext
# Expands to:
# auipc ra, %pcrel_hi(ext)
# jalr ra, ra, %pcrel_lo(ext)

If ext is within ±1MiB, the linker can relax this to:

1
2
# After relaxation (4 bytes)
jal ext

This is enabled by R_RISCV_RELAX relocations thataccompany R_RISCV_CALL relocations. TheR_RISCV_RELAX relocation signals to the linker that thisinstruction sequence is a candidate for shrinking.

Example object code before linking:

1
2
3
4
5
6
7
8
9
0000000000000006 <foo>:
6: 97 00 00 00 auipc ra, 0
R_RISCV_CALL ext
R_RISCV_RELAX *ABS*
a: e7 80 00 00 jalr ra
e: 97 00 00 00 auipc ra, 0
R_RISCV_CALL ext
R_RISCV_RELAX *ABS*
12: e7 80 00 00 jalr ra

After linking with relaxation enabled, the 8-byteauipc+jalr pairs become 4-bytejal instructions:

1
2
3
4
5
6
0000000000000244 <foo>:
244: 41 11 addi sp, sp, -16
246: 06 e4 sd ra, 8(sp)
248: ef 00 80 01 jal ext
24c: ef 00 40 01 jal ext
250: ef 00 00 01 jal ext

When the linker deletes instructions, it must also adjust:

  • Subsequent instruction offsets within the section
  • Symbol addresses
  • Other relocations that reference affected locations
  • Alignment directives (R_RISCV_ALIGN)

This makes RISC-V linker relaxation more complex than thunkinsertion, but it provides code size benefits that other architecturescannot achieve at link time.

Diagnosing out-of-rangeerrors

When you encounter a "relocation out of range" error, check thelinker diagnostic and locate the relocatable file and function.Determine how the function call is lowered in assembly.

Summary

Handling long branches requires coordination across thetoolchain:

Stage Technique Example
Compiler Branch relaxation pass Invert condition + add unconditional jump
Assembler Instruction relaxation Invert condition + add unconditional jump
Linker Range extension thunks Generate trampolines
Linker Linker relaxation Shrink auipc+jalr to jal(RISC-V)

The linker's thunk generation is particularly important for largeprograms where function calls may exceed branch ranges. Differentlinkers use different algorithms with various tradeoffs betweencomplexity, optimality, and robustness.

Linker relaxation approaches adopted by RISC-V and LoongArch is analternative that avoids range extension thunks but introduces othercomplexities.

Related

Handling long branches

作者 MaskRay
2026年1月25日 16:00

Branch instructions on most architectures use PC-relative addressingwith a limited range. When the target is too far away, the branchbecomes "out of range" and requires special handling.

Consider a large binary where main() at address 0x10000calls foo() at address 0x8010000-over 128MiB away. OnAArch64, the bl instruction can only reach ±128MiB, so thiscall cannot be encoded directly. Without proper handling, the linkerwould fail with an error like "relocation out of range." The toolchainmust handle this transparently to produce correct executables.

This article explores how compilers, assemblers, and linkers worktogether to solve the long branch problem.

  • Compiler (IR to assembly): Handles branches within a function thatexceed the range of conditional branch instructions
  • Assembler (assembly to relocatable file): Handles branches within asection where the distance is known at assembly time
  • Linker: Handles cross-section and cross-object branches discoveredduring final layout

Branch range limitations

Different architectures have different branch range limitations.Here's a quick comparison of unconditional branch/call ranges:

Architecture Unconditional Branch Conditional Branch Notes
AArch64 ±128MiB ±1MiB Range extension thunks
AArch32 (A32) ±32MiB ±32MiB Range extension and interworking veneers
AArch32 (T32) ±16MiB ±1MiB Thumb has shorter ranges
PowerPC64 ±32MiB ±32KiB Range extension and TOC/NOTOC interworking thunks
RISC-V ±1MiB (jal) ±4KiB Linker relaxation
x86-64 ±2GiB ±2GiB Code models or thunk extension

The following subsections provide detailed per-architectureinformation, including relocation types relevant for linkerimplementation.

AArch32

In A32 state:

  • Branch (b/b<cond>), conditionalbranch and link (bl<cond>)(R_ARM_JUMP24): ±32MiB
  • Unconditional branch and link (bl/blx,R_ARM_CALL): ±32MiB

Note: R_ARM_CALL is for unconditionalbl/blx which can be relaxed to BLX inline;R_ARM_JUMP24 is for branches which require a veneer forinterworking.

In T32 state:

  • Conditional branch (b<cond>,R_ARM_THM_JUMP8): ±256 bytes
  • Short unconditional branch (b,R_ARM_THM_JUMP11): ±2KiB
  • ARMv5T branch and link (bl/blx,R_ARM_THM_CALL): ±4MiB
  • ARMv6T2 wide conditional branch (b<cond>.w,R_ARM_THM_JUMP19): ±1MiB
  • ARMv6T2 wide branch (b.w,R_ARM_THM_JUMP24): ±16MiB
  • ARMv6T2 wide branch and link (bl/blx,R_ARM_THM_CALL): ±16MiB. R_ARM_THM_CALL can berelaxed to BLX.

AArch64

  • Test and compare branches(tbnz/tbz/cbnz/cbz):±32KiB
  • Conditional branches (b.<cond>): ±1MiB
  • Unconditional branches (b/bl):±128MiB

PowerPC

  • Conditional branch (bc/bcl,R_PPC64_REL14): ±32KiB
  • Unconditional branch (b/bl,R_PPC64_REL24/R_PPC64_REL24_NOTOC):±32MiB

RISC-V

  • Compressed c.beqz: ±256 bytes
  • Compressed c.jal: ±2KiB
  • jalr (I-type immediate): ±2KiB
  • Conditional branches(beq/bne/blt/bge/bltu/bgeu,B-type immediate): ±4KiB
  • jal (J-type immediate, PseudoBR):±1MiB
  • PseudoJump (using auipc +jalr): ±2GiB

Qualcomm uC Branch Immediate extension (Xqcibi):

  • qc.beqi/qc.bnei/qc.blti/qc.bgei/qc.bltui/qc.bgeui(32-bit, 5-bit compare immediate): ±4KiB
  • qc.e.beqi/qc.e.bnei/qc.e.blti/qc.e.bgei/qc.e.bltui/qc.e.bgeui(48-bit, 16-bit compare immediate): ±4KiB

Qualcomm uC Long Branch extension (Xqcilb):

  • qc.e.j/qc.e.jal (48-bit,R_RISCV_VENDOR(QUALCOMM)+R_RISCV_QC_E_CALL_PLT): ±2GiB

SPARC

  • Compare and branch (cxbe, R_SPARC_5): ±64bytes
  • Conditional branches (bcc,R_SPARC_WDISP19): ±1MiB
  • call (R_SPARC_WDISP30): ±2GiB

Note: lld does not implement range extension thunks for SPARC.

x86-64

  • Short conditional jump (Jcc rel8): -128 to +127bytes
  • Short unconditional jump (JMP rel8): -128 to +127bytes
  • Near conditional jump (Jcc rel32): ±2GiB
  • Near unconditional jump (JMP rel32): ±2GiB

With a ±2GiB range for near jumps, x86-64 rarely encountersout-of-range branches in practice. A single text section would need toexceed 2GiB before thunks become necessary. For this reason, mostlinkers (including lld) do not implement range extension thunks forx86-64.

Compiler: branch relaxation

The compiler typically generates branches using a form with a largerange. However, certain conditional branches may still go out of rangewithin a function.

The compiler measures branch distances and relaxes out-of-rangebranches. In LLVM, this is handled by the BranchRelaxationpass, which runs just before AsmPrinter.

Different backends have their own implementations:

  • BranchRelaxation: AArch64, AMDGPU, AVR, RISC-V
  • HexagonBranchRelaxation: Hexagon
  • PPCBranchSelector: PowerPC
  • SystemZLongBranch: SystemZ
  • MipsBranchExpansion: MIPS
  • MSP430BSel: MSP430

For a conditional branch that is out of range, the pass typicallyinverts the condition and inserts an unconditional branch:

1
2
3
4
5
6
7
# Before relaxation (out of range)
beq .Lfar_target # ±4KiB range on RISC-V

# After relaxation
bne .Lskip # Inverted condition, short range
j .Lfar_target # Unconditional jump, ±1MiB range
.Lskip:

Assembler: instructionrelaxation

The assembler converts assembly to machine code. When the target of abranch is within the same section and the distance is known at assemblytime, the assembler can select the appropriate encoding. This isdistinct from linker thunks, which handle cross-section or cross-objectreferences where distances aren't known until link time.

Assembler instruction relaxation handles two cases (see Clang-O0 output: branch displacement and size increase for examples):

  • Span-dependent instructions: Select a largerencoding when the displacement exceeds the range of the smallerencoding. For x86, a short jump (jmp rel8) can be relaxedto a near jump (jmp rel32).
  • Conditional branch transform: Invert the conditionand insert an unconditional branch. On RISC-V, a blt mightbe relaxed to bge plus an unconditional branch.

The assembler uses an iterative layout algorithm that alternatesbetween fragment offset assignment and relaxation until all fragmentsbecome legalized. See Integratedassembler improvements in LLVM 19 for implementation details.

Linker: range extensionthunks

When the linker resolves relocations, it may discover that a branchtarget is out of range. At this point, the instruction encoding isfixed, so the linker cannot simply change the instruction. Instead, itgenerates range extension thunks (also called veneers,branch stubs, or trampolines).

A thunk is a small piece of linker-generated code that can reach theactual target using a longer sequence of instructions. The originalbranch is redirected to the thunk, which then jumps to the realdestination.

Range extension thunks are one type of linker-generated thunk. Othertypes include:

  • ARM interworking veneers: Switch between ARM andThumb instruction sets (see Linker notes onAArch32)
  • MIPS LA25 thunks: Enable PIC and non-PIC codeinteroperability (see Toolchain notes onMIPS)
  • PowerPC64 TOC/NOTOC thunks: Handle calls betweenfunctions using different TOC pointer conventions (see Linker notes on PowerISA)

Short range vs long rangethunks

A short range thunk (see lld/ELF's AArch64implementation) contains just a single branch instruction. Since ituses a branch, its reach is also limited by the branch range—it can onlyextend coverage by one branch distance. For targets further away,multiple short range thunks can be chained, or a long range thunk withaddress computation must be used.

Long range thunks use indirection and can jump to (practically)arbitrary locations.

1
2
3
4
5
6
7
8
9
// Short range thunk: single branch, 4 bytes
__AArch64AbsLongThunk_dst:
b dst // ±128MiB range

// Long range thunk: address computation, 12 bytes
__AArch64ADRPThunk_dst:
adrp x16, dst // Load page address (±4GiB range)
add x16, x16, :lo12:dst // Add page offset
br x16 // Indirect branch

Thunk examples

AArch32 (PIC) (see Linker notes onAArch32):

1
2
3
4
5
__ARMV7PILongThunk_dst:
movw ip, :lower16:(dst - .) ; ip = intra-procedure-call scratch register
movt ip, :upper16:(dst - .)
add ip, ip, pc
bx ip

PowerPC64 ELFv2 (see Linker notes on PowerISA):

1
2
3
4
5
__long_branch_dst:
addis 12, 2, .branch_lt@ha # Load high bits from branch lookup table
ld 12, .branch_lt@l(12) # Load target address
mtctr 12 # Move to count register
bctr # Branch to count register

Thunk impact ondebugging and profiling

Thunks are transparent at the source level but visible in low-leveltools:

  • Stack traces: May show thunk symbols (e.g.,__AArch64ADRPThunk_foo) between caller and callee
  • Profilers: Samples may attribute time to thunkcode; some profilers aggregate thunk time with the target function
  • Disassembly: objdump orllvm-objdump will show thunk sections interspersed withregular code
  • Code size: Each thunk adds bytes; large binariesmay have thousands of thunks

lld/ELF's thunk creationalgorithm

lld/ELF uses a multi-pass algorithm infinalizeAddressDependentContent:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
assignAddresses();
for (pass = 0; pass < 30; ++pass) {
if (pass == 0)
createInitialThunkSections(); // pre-create empty ThunkSections
bool changed = false;
for (relocation : all_relocations) {
if (pass > 0 && normalizeExistingThunk(rel))
continue; // existing thunk still in range
if (!needsThunk(rel)) continue;
Thunk *t = getOrCreateThunk(rel);
ts = findOrCreateThunkSection(rel, src);
ts->addThunk(t);
rel.sym = t->getThunkTargetSym(); // redirect
changed = true;
}
mergeThunks(); // insert ThunkSections into output
if (!changed) break;
assignAddresses(); // recalculate with new thunks
}

Key details:

  • Multi-pass: Iterates until convergence (max 30passes). Adding thunks changes addresses, potentially puttingpreviously-in-range calls out of range.
  • Pre-allocated ThunkSections: On pass 0,createInitialThunkSections places emptyThunkSections at regular intervals(thunkSectionSpacing). For AArch64: 128 MiB - 0x30000 ≈127.8 MiB.
  • Thunk reuse: getThunk returns existingthunk if one exists for the same target;normalizeExistingThunk checks if a previously-created thunkis still in range.
  • ThunkSection placement: getISDThunkSecfinds a ThunkSection within branch range of the call site, or createsone adjacent to the calling InputSection.

lld/MachO's thunk creationalgorithm

lld/MachO uses a single-pass algorithm inTextOutputSection::finalize:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for (callIdx = 0; callIdx < inputs.size(); ++callIdx) {
// Finalize sections within forward branch range (minus slop)
while (finalIdx < endIdx && fits_in_range(inputs[finalIdx]))
finalizeOne(inputs[finalIdx++]);

// Process branch relocations in this section
for (Relocation &r : reverse(isec->relocs)) {
if (!isBranchReloc(r)) continue;
if (targetInRange(r)) continue;
if (existingThunkInRange(r)) { reuse it; continue; }
// Create new thunk and finalize it
createThunk(r);
}
}

Key differences from lld/ELF:

  • Single pass: Addresses are assigned monotonicallyand never revisited
  • Slop reservation: ReservesslopScale * thunkSize bytes (default: 256 × 12 = 3072 byteson ARM64) to leave room for future thunks
  • Thunk naming:<function>.thunk.<sequence> where sequenceincrements per target

Thunkstarvation problem: If many consecutive branches need thunks, eachthunk (12 bytes) consumes slop faster than call sites (4 bytes apart)advance. The test lld/test/MachO/arm64-thunk-starvation.sdemonstrates this edge case. Mitigation is increasing--slop-scale, but pathological cases with hundreds ofconsecutive out-of-range callees can still fail.

mold's thunk creationalgorithm

mold uses a two-pass approach: first pessimistically over-allocatethunks, then remove unnecessary ones.

Intuition: It's safe to allocate thunk space andlater shrink it, but unsafe to add thunks after addresses are assigned(would create gaps breaking existing references).

Pass 1 (create_range_extension_thunks):Process sections in batches using a sliding window. The window tracksfour positions:

1
2
3
4
5
6
7
8
9
Sections:   [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] ...
^ ^ ^ ^
A B C D
| |_______| |
| batch |
| |
earliest thunk
reachable placement
from C
  • [B, C) = current batch of sections to process (size≤ branch_distance/5)
  • A = earliest section still reachable from C (forthunk expiration)
  • D = where to place the thunk (furthest pointreachable from B)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Simplified from OutputSection<E>::create_range_extension_thunks
while (b < sections.size()) {
// Advance D: find furthest point where thunk is reachable from B
while (d < size && thunk_at_d_reachable_from_b)
assign_address(sections[d++]);

// Compute batch [B, C)
c = b + 1;
while (c < d && sections[c] < sections[b] + batch_size) c++;

// Advance A: expire thunks no longer reachable
while (a < b && sections[a] + branch_distance < sections[c]) a++;
// Expire thunk groups before A: clear symbol flags.
for (; t < thunks.size() && thunks[t].offset < sections[a]; t++)
for (sym in thunks[t].symbols) sym->flags = 0;

// Scan [B,C) relocations. If a symbol is not assigned to a thunk group yet,
// assign it to the new thunk group at D.
auto &thunk = thunks.emplace_back(new Thunk(offset));
parallel_for(b, c, [&](i64 i) {
for (rel in sections[i].relocs) {
if (requires_thunk(rel)) {
Symbol &sym = rel.symbol;
if (!sym.flags.test_and_set()) { // atomic: skip if already set
lock_guard lock(mu);
thunk.symbols.push_back(&sym);
}
}
}
});
offset += thunk.size();
b = c; // Move to next batch
}

Pass 2 (remove_redundant_thunks): Afterfinal addresses are known, remove thunk entries for symbols actually inrange.

Key characteristics:

  • Pessimistic over-allocation: Assumes allout-of-section calls need thunks; safe to shrink later
  • Batch size: branch_distance/5 (25.6 MiB forAArch64, 3.2 MiB for AArch32)
  • Parallelism: Uses TBB for parallel relocationscanning within each batch
  • Single branch range: Uses one conservativebranch_distance per architecture. For AArch32, uses ±16 MiB(Thumb limit) for all branches, whereas lld/ELF uses ±32 MiB for A32branches.
  • Thunk size not accounted in D-advancement: Theactual thunk group size is unknown when advancing D, so the end of alarge thunk group may be unreachable from the beginning of thebatch.
  • No convergence loop: Single forward pass foraddress assignment, no risk of non-convergence

Comparing thunk algorithms

Aspect lld/ELF lld/MachO mold
Passes Multi-pass (max 30) Single-pass Two-pass
Strategy Iterative refinement Greedy Greedy
Thunk placement Pre-allocated at intervals Inline with slop reservation Batch-based at intervals
Convergence Always (bounded iterations) Almost always Almost always
Range handling Per-relocation type Single conservative range Single conservative range
Parallelism Sequential Sequential Parallel (TBB)

Linker relaxation (RISC-V)

RISC-V takes a different approach: instead of only expandingbranches, it can also shrink instruction sequences whenthe target is close enough.

Consider a function call using the callpseudo-instruction, which expands to auipc +jalr:

1
2
3
4
5
# Before linking (8 bytes)
call ext
# Expands to:
# auipc ra, %pcrel_hi(ext)
# jalr ra, ra, %pcrel_lo(ext)

If ext is within ±1MiB, the linker can relax this to:

1
2
# After relaxation (4 bytes)
jal ext

This is enabled by R_RISCV_RELAX relocations thataccompany R_RISCV_CALL relocations. TheR_RISCV_RELAX relocation signals to the linker that thisinstruction sequence is a candidate for shrinking.

Example object code before linking:

1
2
3
4
5
6
7
8
9
0000000000000006 <foo>:
6: 97 00 00 00 auipc ra, 0
R_RISCV_CALL ext
R_RISCV_RELAX *ABS*
a: e7 80 00 00 jalr ra
e: 97 00 00 00 auipc ra, 0
R_RISCV_CALL ext
R_RISCV_RELAX *ABS*
12: e7 80 00 00 jalr ra

After linking with relaxation enabled, the 8-byteauipc+jalr pairs become 4-bytejal instructions:

1
2
3
4
5
6
0000000000000244 <foo>:
244: 41 11 addi sp, sp, -16
246: 06 e4 sd ra, 8(sp)
248: ef 00 80 01 jal ext
24c: ef 00 40 01 jal ext
250: ef 00 00 01 jal ext

When the linker deletes instructions, it must also adjust:

  • Subsequent instruction offsets within the section
  • Symbol addresses
  • Other relocations that reference affected locations
  • Alignment directives (R_RISCV_ALIGN)

This makes RISC-V linker relaxation more complex than thunkinsertion, but it provides code size benefits that other architecturescannot achieve at link time.

Diagnosing out-of-rangeerrors

When you encounter a "relocation out of range" error, here are somediagnostic steps:

  1. Check the error message: lld reports the sourcelocation, relocation type, and the distance. For example:

    1
    ld.lld: error: a.o:(.text+0x1000): relocation R_AARCH64_CALL26 out of range: 150000000 is not in [-134217728, 134217727]

  2. Use --verbose or-Map: Generate a link map to see sectionlayout and identify which sections are far apart.

  3. Consider -ffunction-sections:Splitting functions into separate sections gives the linker moreflexibility in placement, potentially reducing distances.

  4. Check for large data in .text:Embedded data (jump tables, constant pools) can push functions apart.Some compilers have options to place these elsewhere.

  5. LTO considerations: Link-time optimization candramatically change code layout. If thunk-related issues appear onlywith LTO, the optimizer may be creating larger functions or differentinlining decisions.

Summary

Handling long branches requires coordination across thetoolchain:

Stage Technique Example
Compiler Branch relaxation pass Invert condition + add unconditional jump
Assembler Instruction relaxation Short jump to near jump
Linker Range extension thunks Generate trampolines
Linker Linker relaxation Shrink auipc+jalr to jal(RISC-V)

The linker's thunk generation is particularly important for largeprograms where cross-compilation-unit calls may exceed branch ranges.Different linkers use different algorithms with various tradeoffsbetween complexity, optimality, and robustness.

RISC-V's linker relaxation is unique in that it can both expand andshrink code, optimizing for both correctness and code size.

昨天 — 2026年1月25日iOS

老司机 iOS 周报 #363 | 2026-01-26

作者 ChengzhiHuang
2026年1月25日 20:23

ios-weekly
老司机 iOS 周报,只为你呈现有价值的信息。

你也可以为这个项目出一份力,如果发现有价值的信息、文章、工具等可以到 Issues 里提给我们,我们会尽快处理。记得写上推荐的理由哦。有建议和意见也欢迎到 Issues 提出。

文章

🐕 精通 UITableViewDiffableDataSource ——从入门到重构的现代 iOS 列表开发指南

@阿权:本文围绕 iOS 现代列表 UITableViewDiffableDataSource 展开,核心是替代传统数据源代理模式,解决列表开发中的崩溃、状态不一致等痛点,并在最后提供一个轻量工具集 DiffableDataSourceKit 来简化系统 API 的调用。文章核心内容如下:

  1. 使用 UITableViewDiffableDataSource API,通过声明式的“快照”来管理数据状态,系统自动计算并执行 UI 更新动画,从而一劳永逸地解决了传统模式中数据与 U 状态不同步导致的崩溃问题。
  2. 全文通过构建一个音乐播放列表 App 来贯穿始终,从移除 Storyboard、定义遵循 Hashable 协议的数据模型开始,一步步教你初始化数据源和填充数据。
  3. 文章还详细讲解了:
    • 自定义多样化的单元格:包括使用传统的 Auto Layout 布局,以及利用 iOS 14+ 的 UIContentConfiguration 进行现代化配置。
    • 实现核心交互:具体涉及了拖拽重排、滑动删除,以及如何通过 Cell 的代理事件处理等交互。
    • 处理复杂逻辑:特别讲解了如何利用模型的 Hashable 来实现“原地刷新”而非替换的刷新机制。

除了文章提到的 UITableViewDiffableDataSource,用好这些技术,不妨可以再看看以下几个 WWDC:

  1. WWDC19 - 215 - Advances in Collection View Layout
  2. WWDC19 - 220 - Advances in UI Data Sources
  3. WWDC20 - 10026 - Lists in UICollectionView
  4. WWDC20 - 10027 - Modern cell configuration

另外,与其用 UITableView 承载数据,其实 Apple 更推荐使用 UICollectionView 来实现列表,甚至还提供了增强版的 Cell。

恰逢 App Store 要求用 Xcode 26 带来的强制升级,不少 App 也终于抛弃了 iOS 12、iOS 13,也是用新技术(也不新了)升级项目架构的最好时机。

除了 API 本身,我们也应关注到一些架构或设计模式上的变化与趋势:

  1. 声明式。新的 API 更多使用构造时定义逻辑的声明式语法,在一开始就把相关的布局、样式等逻辑给定义好,后续不再通过各种存储属性配置逻辑。极大地减少了开发者对状态的维护。例如 UICollectionViewCompositionalLayout 通过 Item、可嵌套的 Group、Section 一层一层地在初始化配置布局。
  2. 数据驱动。声明式常常是为了数据驱动,因为声明式定义的不是最终对象,而是一些前置的配置数据结构。例如 Cell 提供了 Configuration 结构体,通过配置和重新赋值结构体来实现 UI 更新,而不是直接操作 View。类似的,UIButton 也提供了类型的 Configuration 结构体用于配置 UI。更深一层的意义,驱动 UI 的配置数据、视图甚至可以复用的和无痛迁移的。例如 UITableViewCellUICollectionViewCell 的配置类及其关联子视图是通用的,自定义 Cell 可以把重心放在自定义 Configuration 配置上,这样就可以把相同的视图样式套用在各种容器中。
  3. 数据绑定。将索引替换为 id 甚至是具体业务类型。以往 UITableViewUICollectionView 的 API 都是围绕索引(IndexPath)展开的,所有的数据(DataSource)、布局(CollectionViewLayout)和视图(Presentation: Cell、ReuseableView)即使有分离,但都需要通过索引来交换。虽然这样简化了不同模块的耦合和通信逻辑,但因为大多数业务场景数据是动态的,这让索引只是个临时态,一不小心就会用错,轻则展示错误,重则引入崩溃。DiffableDataSource 最具里程碑的一点是剔除了索引,直接让具体业务模型跟 Cell 直接绑定,不经过索引。
  4. 类型安全。不再是 Any/AnyObject,而是直接绑定一个具体的类型。常见做法是通过泛型机制将类型传入。
  5. 用轮子,而不是造轮子。系统 API 足够自由,直接使用实例,而不是子类化自定义逻辑。以往的开发经验,都是一言不合就重写实现,重新造控件、布局,UIButtonUICollectionViewLayout 就是两个典型的 case。近年来系统 API 都在丰富使用的自由度和易用程度,例如 UIButton 提供了许多拿来就能用的灵活样式,开发者只需要微调个 Configuration 就是能实现业务效果。UICollectionViewCompositionalLayout 则是用 Item、Group、Section 构造足够复杂的布局场景。另外一点验证了这个趋势的是,iOS 26 中,只有官方提供的控件、导航框架才有完整的液态玻璃交互。

架构的演进一般是为了提高研效、减少出错。一个合理、高效的代码架构,在当业务需求变得复杂的时候,业务调用代码不会随业务的复杂而线性增长,而是逐渐减少。

🐎 Dart 官方再解释为什么放弃了宏编程,并转向优化 build_runner ? 和 Kotlin 的区别又是什么?

@Crazy:本文主要介绍了 Dart 官方放弃宏编程改为优化 build_runner 的原因,在读本文之前,要先明白什么是宏编程。文章中介绍了 Dart 在实现宏编程的过程中试用的方案与思考,放弃的原因总结起来有三个 :

  1. 编译会卡在一个“先有鸡还是先有蛋”的死结
  2. 工具链双前端导致宏支持会引发“工作量爆炸 + 性能灾难”
  3. 即使做成了,也“高不成低不就”:替代不了 build_runner,不如直接扩展 build_runner 能力

文章最后还对比了 Kotlin 的 Compiler Plugins、KSP 与 Swift 的 Swift Macros 的差距,总的来说 build_runner 还有很长的一段路要走。

🐕 @_exported import VS public import

@AidenRao:Swift 6 带来了全新的 import 访问级别控制:@_exported import。它和我们熟悉的 public import 有什么不同?简单来说,public import 只是将一个模块声明为公开 API 的一部分,但使用者仍需手动导入它;而 @_exported import 则是将依赖的符号完全“吸收”,调用方无需关心底层依赖。文章深入对比了两者的意图和应用场景,并给出了明确建议:日常开发中应优先选择官方支持的 public import,仅在封装 SDK 或构建聚合模块(Umbrella Module)这类希望为用户简化导入操作的场景下,才考虑使用 @_exported

🐕 MVVM + Reducer Pattern

@含笑饮砒霜:这篇文章主要讲述如何将 MVVM 架构与 Reducer 模式结合来提升 iOS 应用中状态管理的可控性和可维护性。作者指出:传统的 MVVM 模式在复杂状态下易出现分散的状态变更和难以追踪的问题,这会导致难调试、隐式状态转换、竞态条件等不良后果;而 Reducer 模式(受 Redux/TCA 启发)通过 “单一状态源 + 明确 action + 纯函数 reduce ” 的方式,使状态变更更可预测、更易测试。文章建议在 ViewModel 内部局部引入 reducer,把所有状态通过单一 reduce(state, action) 处理,并把副作用(如异步任务)当作 effects 处理,从而达到更明确、可追踪且易单元测试的效果,同时保留 MVVM 和领域层的清晰分层,不强依赖某个框架。

🐢 用第一性原理拆解 Agentic Coding:从理论到实操

@Cooper Chen:文章从第一性原理出发,系统拆解了 Agentic Coding 背后的底层逻辑与工程现实,澄清了一个常见误区:效率瓶颈不在于上下文窗口不够大,而在于我们如何与 AI 协作。作者以 LLM 的自回归生成与 Attention 机制为起点,深入分析了 Coding Agent 在长任务中常见的“走偏”“失忆”“局部最优”等问题,并指出这些并非工具缺陷,而是模型工作方式的必然结果。

文章最有价值之处,在于将理论约束转化为可执行的工程实践:通过“短对话、单任务”的工作方式控制上下文质量;用结构化配置文件和工具设计引导 Agent 行为;通过 Prompt Caching、Agent Loop、上下文压缩等机制提升系统稳定性。更进一步,作者提出“复利工程(Compounding Engineering)”这一关键理念——不把 AI 当一次性工具,而是通过文档、规范、测试和审查,将每一次经验沉淀为系统的长期记忆。

最终,文章给出的启示非常清晰:AI 编程不是魔法,而是一门需要刻意练习的协作技能。当你真正理解模型的边界,并用工程化方法加以约束和放大,AI 才能从“能写代码”进化为“可靠的编程合作者”。

🐎 Universal Links At Scale: The Challenges Nobody Talks About

@Damien:文章揭示了 Universal Links 在大规模应用中的隐藏复杂性:AASA 文件缺乏 JSON 模式验证导致静默失效,Apple CDN 缓存延迟使问题修复滞后,苹果特有通配符语法和 substitutionVariables 变量无现成工具支持。作者提出通过 CI 集成模式验证、CDN 同步检查、自定义正则解析和 staging 环境测试的完整方案,并开源了 Swift CLI 工具实现全链路自动化验证。

🐕 How I use Codex GPT 5.2 with Xcode (My Complete Workflow)

@JonyFang: 本视频深入介绍了如何让 AI 代理(如 Codex GPT 5.2)真正提升 iOS/macOS 开发效率的三个核心策略:

  1. 构建脚本自动化(Build Scripts):通过标准化的构建流程,让 AI 能够理解和复现你的构建环境
  2. 让构建失败显而易见(Make Build Failures Obvious):优化错误信息的呈现方式,使 AI 能够快速定位问题根源
  3. 给你的代理装上"眼睛"(Give Your Agent Eyes):这是最核心的部分 - 让 AI 能够"看到"应用运行时的状态,而不仅仅是读取代码

最有价值之处:作者强调了一个常被忽视的问题 - AI 代码助手不仅需要理解代码逻辑,更需要理解应用的运行时状态。通过工具如 Peekaboo 等,让 AI 能够获取视觉反馈(截图、UI 层级等),从而提供更精准的问题诊断和代码建议。这种"可观测性优先"的思路,与传统的代码审查工作流形成了有趣的对比,值得所有尝试将 AI 工具深度集成到开发流程中的团队参考。

视频时长约 49 分钟,适合希望系统性提升 AI 辅助开发效率的 iOS/macOS 开发者观看。

工具

🐎 Skip Is Now Free and Open Source

@Crazy:Skip 框架正式免费并且开源,该库从 2023 年开始开发,已有三年的开发历程。该库的目的是让开发者能够仅用一套 Swift 与 SwiftUI 代码库,同时打造 iOS 与 Android 上的高品质移动应用——而且不必接受那些自“跨平台工具诞生以来就一直存在”的妥协。因为 Skip 是采用编译为 Kotlin 与 Compose 的方式,所以相应的执行效率是非常高的。相较于其他的跨平台开发,效率高,并且使用的是 Swift 语言。既然已经免费并开源,移动端开发的时候又多了一个可供选择的跨端技术。

内推

重新开始更新「iOS 靠谱内推专题」,整理了最近明确在招人的岗位,供大家参考

具体信息请移步:https://www.yuque.com/iosalliance/article/bhutav 进行查看(如有招聘需求请联系 iTDriverr)

关注我们

我们是「老司机技术周报」,一个持续追求精品 iOS 内容的技术公众号,欢迎关注。

关注有礼,关注【老司机技术周报】,回复「2024」,领取 2024 及往年内参

同时也支持了 RSS 订阅:https://github.com/SwiftOldDriver/iOS-Weekly/releases.atom

说明

🚧 表示需某工具,🌟 表示编辑推荐

预计阅读时间:🐎 很快就能读完(1 - 10 mins);🐕 中等 (10 - 20 mins);🐢 慢(20+ mins)

昨天以前iOS

介绍几款单人桌游

作者 云风
2026年1月24日 21:08

上个月我花了不少时间在 dotAge 这个游戏中。我很喜欢这种通过精算规划应对确定风险的感觉。由于 dotAge 有很强的欧式桌游的设计感,所以我在桌游中尝试了一些有类似设计元素的单人游戏。

我感觉体验比较接近的有 Voidfall (2023) 和 Spirit Island (2017) 。因为灵魂岛(spirit island )更早一些,而且 steam 上有官方的电子版,bgg 上总体排名也更高,所以我在上面花的时间最多。

这两个游戏的特点都是确定性战斗机制,即在战斗时完全没有投骰这类随机元素介入。在开战之前,玩家就能完全确定战斗结果。战斗只是规划的一环,考虑的是该支付多少成本或许多大的收益。而且灵魂岛作为一款卡牌驱动的游戏,完全排除了抽牌的随机性,只在从市场上加入新牌(新能力)时有一点随机性。一旦进入玩家牌组,什么时候什么卡牌可以使用,完全是在玩家规划之内的。这非常接近 dotAge 中规划应对危机时的体验。

灵魂岛的背景像极了电影 Avatar :岛的灵魂通过原住民发挥神力赶走了外来殖民者。每个回合,把神力的成长、发威(玩家行动)和殖民者(系统危机)的入侵、成长和破坏以固定次序循环。其中,殖民者的入侵在版图上的地点有轻微的随机性,但随后的两个回合就在固定规则下,在同一地点地成长和破坏(玩家需要处理的危机)。扮演岛之灵魂的玩家可以选择到破坏之刻去那个地块消除危机,在此之前玩家有两个回合可以准备;也可以提前在殖民者成长之前将其消灭在萌芽之中,但这给玩家的准备时间更少,却往往意味着更小的消耗;还可以暂时承受损失,集中力量于它处或更快的发展神力。游戏提供给玩家的策略选择着实丰富。

法术卡并不多,每个神灵只有几张专属的固定初始能力卡,其它所有的能力都是所有神灵共用,让玩家自由组合的。每当玩家选择成长时,可以随机 4 选 1 。不像卡牌构筑类游戏会有很多卡片,这个游戏总体卡片不多,每张都有决定性作用。每个回合通常也只能打出一两张 张,待到可以一回合可以打出三张甚至四张(很少见)时,已经进入游戏后期在贯彻通关计划了。法力点数用来支付每张卡的打出费用这个设计粗看和卡牌构筑游戏类似,但实际玩下来感觉有挺大的不同。灵魂岛每个回合未用完的法力点并不会清零,而会留置到下回合使用且没有上限。从玩家规划角度看,更像是需要玩家去规划整局游戏的法力点分配。精确的打出每个回合的很少的几张卡片。因为抽回打过的法术卡并不随机,玩家便要在法力成长和法术重置上做明确选择。挑选法术序列变成了精密规划的一环。

在 dotAge 中,版图是需要规划的,玩家需要取舍每个格子上到底放什么建筑以达到连锁功效最大化。而在灵魂岛中,每张法术会提供一些元素,同一回合激活的元素组合可以给法术本身效果加成。我觉得这两个设定有异曲同工之秒。我在思考游戏设计时,受 dotAge 和 Dawnmaker 的影响,总觉得需要在版图的位置上做文章才好体现出建筑的组合,玩过灵魂岛才发现,其实单靠卡牌不考虑版图布局其实也能实现类似的体验:几张特定的法术卡组合在同一回合打出会对单一法术有额外加成,而这种组合可以非常丰富。去掉随机抽卡机制,让玩家可以 100% 控制自己牌库中的组合选择;而且总牌量很少,每个回合出牌数及其有限(受单回合出牌数及法力点双重限制),让发牌组合必须有所取舍。这像极了我在 dotAge 的狭小地图空间中布局建筑的体验,这个格子放了这个,那个建筑就得不到加成。

但受限于桌游,灵魂岛的游戏体验和 dotAge 差别还是很大的。我玩了(并击败了)多级难度的灵魂岛,难度越高差异越明显。桌游必须要求短回合快节奏,这让游戏规划的容错性大大降低。dotAge 一局游戏可以玩一整天,即使是超高难度,也允许玩家犯点小错误。由于电子游戏可以把元素做得更多,让机器负责运转规则,单点的数值关系就可以更简单直白。而灵魂岛这种需要在很少的行动中体现复杂计划的多样性,那些法术的真正功效就显得过于晦涩:虽然法术字面上的解释并不负责,但理解每个法术背后的设计逻辑,在游戏中做出准确的决策要难得多。

我在标准难度下,玩了十几盘才真正胜利过一次灵魂岛。之后每增加一点难度,感觉挑战就大了不少;反观 dotAge 我在第二盘就领会了游戏得玩法而通关,困难难度也并未带来太大的挫折感。但现在往上加难度玩灵魂岛,我还是心有余悸,不太把握得住。而且直到现在我都没敢尝试 2 个神灵以上的组合玩法,那真是太烧脑了。难怪实体版桌游都是多人合作,而不是 1 控 2 去玩。


Voidfall 从游戏结构上更接近 dotAge 一点。它完全没有战斗,就是纯跑分。只要你跑分速度超过了系统规则,就胜利了。dotAge 几乎就是这个框架:玩家需要在疾病、恐惧、温度和自然四个领域积累积分抵抗系统产生的四类危机。在每次危机来领前做好准备,也就是积累产生对应领域积分的能力。

但无论是 spirit island 还是 voidfall 都没有 dotAge 中最重要的工人分配机制。从游戏机制角度看,dotAge 更像是电子化的 Agricola (2007) 农场主。因为农场主在桌游玩家中太经典,几乎所有桌游玩家都玩过,这里就不多作介绍了。虚空陨落(voidfall)则是一个比较新的游戏,值得简单讲一下。它没有官方电子版,但在 Tabletop Simulator 中有 mod 可以玩。

和 dotAge 的四个领域有点类似,voidfall 中玩家有军事、经济、科技、政治四个方向的议程可以选择。获得对应的议程卡后,就可以大致确定一个得分路线。不同的路线同时影响着玩家当局游戏的游戏过程。

桌游的流程不会设计的太长,在 voidfall 中只设计了三个阶段,每个阶段有一张事件卡,引导玩家的得分手段。这些事件的效果是可预测的,这和 dotAge 的预言很像。三个阶段也和 dotAge 的季节交替末日来临类似:用规则控制游戏节奏,明确的区分游戏不同阶段要作的事情。一开始生产建设、然后扩张战斗、最后将得分最大化。

我没有特别仔细的玩这个游戏,但从粗浅的游戏体验看,还是颇为喜欢的。过几天会多试试。


我对“确定性战斗机制”这点其实没有特别的偏爱。基于骰子的风险管理机制也很喜欢。

前两年就特别关注过 ISS Vanguard (2022) 这个游戏。最近又(在 Tabletop Simulator 上)玩了一下 Robinson Crusoe: Adventures on the Cursed Island (2012) 和 Civolution (2024) 。这几个游戏都特别重,几句话比较难说清楚,而且我游戏时长也不多,这里就不展开了。

顺便说一句,同样是鲁宾逊的荒岛求生题材的单人桌游 Friday (2011) 是一个非常不错的轻量游戏。如果不想花太多时间在重度游戏上,它非常值得一玩。这是一款及其特别的卡牌构筑类游戏,整个游戏机制不多见的把重点放在卡组瘦身上:即玩家更多考虑的是如何有效的把初始卡组中效率低效的卡精简掉。

游戏上手容易,大约花 5 分钟就能读完规则;设置成本极低,只使用一组卡片;但却颇有难度,我差不多在玩了 20 盘之后才找到胜利的诀窍。淘宝上就可以买到中文版(中文名:星期五),推荐一试。

理财学习笔记(一):每个人必须自己懂理财

作者 唐巧
2026年1月24日 08:20

序言

我打算系统性整理一下这几年投学习投资理财的心得。因为一方面通过总结,可以让自己进一步加深对投资的理解。另一方面我也想分享给同样想学习理财的读者们。

我的女儿虽然还在读小学,但我也给她报了一个针对小学生的财商课。她对理财非常有兴趣,我也想通过这一系列的文章,给她分享她爸爸的理财成长经历。

这是这是本系列的第一篇,主题是每个人必须自己懂理财。

我身边的案例

我是 80 年代出生的,不得不说,我所处的是那个年代是缺乏理财和财商教育的。因此,我发现我身边的人大多不具备优秀的理财能力。

下面我举几个身边朋友的真实例子。

朋友 A:

他都把挣到的钱存银行定期或者余额宝。但是在现在这个年代,收益率是非常低的,只有一点几。但是他非常胆小,怕买其他的产品会导致亏损,所以说不敢碰。

朋友 B:

朋友 B 买了很多基金。但是他胆子很小,每个只买 1000 - 5000 块钱。然后账户里面有着几十只基金。既看不过来,也不知道应该如何操作。

唯一好的一面是:不管任何行业有行情,他都有一只基金命中。这让他的错失恐惧症(FOMO)小了很多。

朋友 C:

我这个朋友之前在快手上班,在 P2P 盛行的年代,把自己的所有积蓄都投在 P2P 上,最后爆雷,损失惨重。

朋友 D:

这个朋友通过另外一个朋友了解到有一个股票正在做庄阶段,未来会大涨,于是就听信买入,最后损失了 90%。

朋友 E:

朋友 E 的大学同学有一个在香港卖保险,于是听朋友的推荐在香港买了很多保险。但是过了 5 年,他发现收益率和最初承诺的相差非常大。这个时候看合同才发现,合同上写的收益测算并不保证。但是现在赎回的话,只能拿到非常少的本金,所以他只能继续硬着头皮每年交钱。

只有理解才能有效持有

听完上面几个朋友的故事,你身边有类似的朋友吗?

我跟一些朋友交流,我问他们,你们为什么不自己先学习投资理财的知识,之后再去做相关的操作呢?他们很多回答说,这个事情太专业了,专业的事情交给专业的人做就可以了。

当我反问他们:假如你买了一个专业人士管理的基金,那你对他的信仰来自于哪呢?你其实对他每个月发的报告并没有完全的判断能力,你只能选择相信他。

大多数时候,你其实相信的是他过去的业绩。如果它连续三年、连续五年一直都盈利,或者有超额收益,你就会持续持有它,甚至买入更多。

如果它连续几年亏损或者某一年大额亏损,你就会质疑它,甚至赎回它。

你的信心其实就是来源于过去的业绩表现。那这和散户的追涨杀跌有什么本质区别呢?

在你持仓持续下跌的那些时间,你能睡好觉吗?如果你不能理解它,那显然不能。

所以我说,每个人必须懂投资理财。

只有你深刻理解了你买入的是什么,才能在它下跌的时候有信心继续持有它,甚至抄底,才能睡得着觉。

小结

每个人都必须懂理财。因为银行的定期存款利率太低,而其他理财产品都需要深刻理解,才可能做到长期持有。

另外,社会上充斥着像 P2P 一类的产品,以及宣传这类产品的巧舌如簧的销售。他们不断地诱惑着我们,如果我们没有辨识能力,也可能将自己辛苦一辈子挣到的钱损失掉。

以上。

Maintaining shadow branches for GitHub PRs

作者 MaskRay
2026年1月22日 16:00

I've created pr-shadow with vibecoding, a tool that maintains a shadow branch for GitHub pull requests(PR) that never requires force-pushing. This addresses pain points Idescribed in Reflectionson LLVM's switch to GitHub pull requests#Patch evolution.

The problem

GitHub structures pull requests around branches, enforcing abranch-centric workflow. There are multiple problems when you force-pusha branch after a rebase:

  • The UI displays "force-pushed the BB branch from X to Y". Clicking"compare" shows git diff X..Y, which includes unrelatedupstream commits—not the actual patch difference. For a project likeLLVM with 100+ commits daily, this makes the comparison essentiallyuseless.
  • Inline comments may become "outdated" or misplaced after forcepushes.
  • If your commit message references an issue or another PR, each forcepush creates a new link on the referenced page, cluttering it withduplicate mentions. (Adding backticks around the link text works aroundthis, but it's not ideal.)

These difficulties lead to recommendations favoring less flexibleworkflows that only append commits (including merge commits) anddiscourage rebases. However, this means working with an outdated base,and switching between the main branch and PR branches causes numerousrebuilds-especially painful for large repositories likellvm-project.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
git switch main; git pull; ninja -C build

# Switching to a feature branch with an outdated base requires numerous rebuilds.
git switch feature0
git merge origin/main # I prefer `git rebase main` to remove merge commits, which clutter the history
ninja -C out/release

# Switching to another feature branch with an outdated base requires numerous rebuilds.
git switch feature1
git merge origin/main
ninja -C out/release

# Listing fixup commits ignoring upstream merges requires the clumsy --first-parent.
git log --first-parent

In a large repository, avoiding rebases isn't realistic—other commitsfrequently modify nearby lines, and rebasing is often the only way todiscover that your patch needs adjustments due to interactions withother landed changes.

In 2022, GitHub introduced "Pull request title and description" forsquash merging. This means updating the final commit message requiresediting via the web UI. I prefer editing the local commit message andsyncing the PR description from it.

The solution

After updating my main branch, before switching to afeature branch, I always run

1
git rebase main feature

to minimize the number of modified files. To avoid the force-pushproblems, I use pr-shadow to maintain a shadow PR branch (e.g.,pr/feature) that only receives fast-forward commits(including merge commits).

I work freely on my local branch (rebase, amend, squash), then syncto the PR branch using git commit-tree to create a commitwith the same tree but parented to the previous PR HEAD.

1
2
3
4
5
6
Local branch (feature)     PR branch (pr/feature)
A A (init)
| |
B (amend) C1 "Fix bug"
| |
C (rebase) C2 "Address review"

Reviewers see clean diffs between C1 and C2, even though theunderlying commits were rewritten.

When a rebase is detected (git merge-base withmain/master changed), the new PR commit is created as a merge commitwith the new merge-base as the second parent. GitHub displays these as"condensed" merges, preserving the diff view for reviewers.

Usage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Initialize and create PR
git switch -c feature
edit && git commit -m feature

# Set `git merge-base origin/main feature` as the initial base. Push to pr/feature and open a GitHub PR.
prs init
# Same but create a draft PR. Repeated `init`s are rejected.
prs init --draft

# Work locally (rebase, amend, etc.)
git fetch origin main:main
git rebase main
git commit --amend

# Sync to PR
prs push "Rebase and fix bug"
# Force push if remote diverged due to messing with pr/feature directly.
prs push --force "Rewrite"

# Update PR title/body from local commit message.
prs desc

# Run gh commands on the PR.
prs gh view
prs gh checks

The tool supports both fork-based workflows (pushing to your fork)and same-repo workflows (for branches likeuser/<name>/feature). It also works with GitHubEnterprise, auto-detecting the host from the repository URL.

Related work

The name "prs" is a tribute to spr, which implements asimilar shadow branch concept. However, spr pushes user branches to themain repository rather than a personal fork. While necessary for stackedpull requests, this approach is discouraged for single PRs as itclutters the upstream repository. pr-shadow avoids this by pushing toyour fork by default.

I owe an apology to folks who receiveusers/MaskRay/feature branches (if they use the defaultfetch = +refs/heads/*:refs/remotes/origin/* to receive userbranches). I had been abusing spr for a long time after LLVM'sGitHub transition to avoid unnecessary rebuilds when switchingbetween the main branch and PR branches.

Additionally, spr embeds a PR URL in commit messages (e.g.,Pull Request: https://github.com/llvm/llvm-project/pull/150816),which can cause downstream forks to add unwanted backlinks to theoriginal PR.

If I need stacked pull requests, I will probably use pr-shadow withthe base patch and just rebase stacked ones - it's unclear how sprhandles stacked PRs.

❌
❌