[转载] 一文读懂 Skills｜从概念到实操的完整指南

微言 | wyanassert

作者 wyanassert

2026年1月27日 11:05

转载一文读懂 Skills｜从概念到实操的完整指南

原文地址

Agent 正在经历从“聊天机器人”到“得力干将”的进化，而 Skills 正是这场进化的关键催化剂。

你是否曾被 Agent 的“不听话”、“执行乱”和“工具荒”搞得焦头烂额？

本文将带你一文弄懂 Skills ——这个让 Agent 变得可靠、可控、可复用的“高级技能包”。

我们将从 Skills 是什么、如何工作，一路聊到怎样写好一个 Skills，并为你推荐实用的社区资源，带领大家在 TRAE 中实际使用 Skills 落地一个场景。

无论你是开发者还是普通用户，都能在这里找到让你的 Agent “开窍”的秘诀。

你是否也经历过或者正在经历这样的“ Agent 调教”崩溃时刻？

规则失效： 在 Agent.md 里写下千言万语，Agent 却视若无睹，完全“已读不回”。
执行失控： 精心打磨了无数 Prompt，Agent 执行起来依旧像无头苍蝇，混乱无序。
工具迷失： 明明集成了强大的 MCP 工具库，Agent 却两手一摊说“没工具”，让人摸不着头脑。

如果这些场景让你感同身受，别急着放弃。终结这场混乱的答案，可能就是 Skills。

什么是 Skills

“Skills” 这个概念最早由 Anthropic公司提出，作为其大模型 Claude的一种能力扩展机制。简单来说，它允许用户为 Claude 添加自定义的功能和工具。随着这套做法越来越成熟，并被社区广泛接受，Skills 如今已成为大多数 Agent 开发工具和 IDE 都支持的一种标准扩展规范。

一个 Skills 通常以一个文件夹的形式存在，里面主要装着三样东西：一份说明书（SKILL.md）、一堆操作脚本（Script）、以及一些参考资料（Reference）。

你可以把一个 Skill 想象成一个打包好的“技能包”。它把完成某个特定任务所需的领域知识、操作流程、要用到的工具、以及最佳实践全都封装在了一起。当 AI 面对相应请求时，就能像一位经验丰富的专家那样，有条不紊地自主执行。

一句话总结： 要是把 Agent 比作一个有很大潜力的大脑，那 Skills 就像是给这个大脑的一套套能反复用的“高级武功秘籍”。 有了它，Agent 能从一个“什么都略知一二”的通才，变成在特定领域“什么都擅长”的专家。

Skill 原理介绍

📚 官方解释：Agent Skills

Skill 的架构原理：渐进式加载

Skill 的设计很巧妙，它运行在一个沙盒环境里，这个环境允许大模型访问文件系统和执行 bash 命令（可以理解为一种电脑操作指令）。在这个环境里，一个个 Skill 就像一个个文件夹。Agent 就像一个熟悉电脑操作的人，通过命令行来读取文件、执行脚本，然后利用结果去完成你交代的任务。这种“按需取用”的架构，让 Skill 成为一个既强大又高效的“工具箱”。

为了平衡效果和效率，Skill 设计了一套聪明的三层分级加载机制：

Level 1：元数据（始终加载）

元数据就像是 Skill 的“名片”，里面有名称（name）和描述（description），是用 YAML 格式来定义的。Claude 在启动的时候，会把所有已经安装的 Skill 的元数据都加载进来，这样它就能知道每个 Skill 有什么用、什么时候该用。因为元数据很轻量，所以你可以安装很多 Skill，不用担心把上下文占满。

Level 2：说明文档（触发时加载）

SKILL.md 文件的正文就是说明文档，里面有工作流程、最佳实践和操作指南。只有用户的请求和 Skills 元数据里的描述相符时，Claude 才会用 bash 指令读取这份文档，把内容加载到上下文里。这种“触发式加载”能保证只有相关的详细指令才会消耗 Token。

Level 3：资源与代码（按需加载）

Skills 还能打包一些更深入的资源，比如更详细的说明文档（FORMS.md）、可执行脚本（.py）或者参考资料（像 API 文档、数据库结构等）。Claude 只有在需要的时候，才会通过 bash 去读取或执行这些文件，而且脚本代码本身不会进入上下文。这样一来，Skills 就能捆绑大量信息，几乎不会增加额外的上下文成本。

Skills 的调用逻辑：从理解意图到稳定执行

那么，Agent 是如何智能地选择并执行一个 Skill 的呢？整个过程就像一位经验丰富的助理在处理工作：

意图匹配（找到对的人）： Agent 首先聆听你的需求，然后快速扫一眼自己手头所有 Skill 的“名片夹”（元数据），寻找最匹配的那一张。
读取手册（看懂怎么干）： 找到合适的 Skills 后，Agent 会像模像样地翻开它的“操作手册”（SKILL.md），仔细研究详细的执行步骤和注意事项。
按需执行（动手开干）： 根据手册的指引，Agent 开始工作。如果需要，它会随时从“工具箱”里拿出脚本或工具来完成具体操作。
反馈结果（事毕复命）： 任务完成后，Agent 向你汇报最终结果，或者在遇到困难时，及时向你请教。

Skills vs. 其他概念的区别

为了更清晰地理解 Skills 的独特价值，我们不妨把它和另外两个容易混淆的概念——快捷指令（Command） 和 原子工具（MCP）——放在一起做个对比。用一个厨房的例子就很好懂了：

我们也列举了几个大家容易混淆的其他功能，一起来对比看看。

📚 官方博客解释：Skills explained: How Skills compares to prompts, Projects, MCP, and subagents

什么是好的 Skills：从“能用”到“好用”

Good Skills vs Bad Skills

如何写好 Skills

原子性（Atomicity）： 坚持单一职责，让每个 Skill 都像一块积木，小而美，专注于解决一个具体问题，便于日后的复用和组合。
给例子（Few-Shot Prompting）：这是最关键的一点，与其费尽口舌解释，不如直接给出几个清晰的输入输出示例。榜样的力量是无穷的，模型能通过具体例子，秒懂你想要的格式、风格和行为。
立规矩（Structured Instructions）：
1. 定角色：给它一个明确的专家人设，比如“你现在是一个资深的市场分析师”。
2. 拆步骤：把任务流程拆解成一步步的具体指令，引导它“思考”。
3. 画红线：明确告诉它“不能做什么”，防止它天马行空地“幻觉”
造接口（Interface Design）：像设计软件 API 一样，明确定义 Skill 的输入参数和输出格式（比如固定输出 JSON 或 Markdown）。这让你的 Skill 可以被其他程序稳定调用和集成。
勤复盘（Iterative Refinement）：把 Skills 当作一个产品来迭代。在实际使用中留心那些不尽如人意的“Bad Case”，然后把它们变成新的规则或反例，补充到你的 Skills 定义里，让它持续进化，越来越聪明、越来越靠谱。

📚 一些官方最佳实践指南：技能创作最佳实践

社区热门 Skills 推荐

刚开始接触 Skills，不知从何下手？不妨从社区沉淀的这些热门 Skills 开始，寻找灵感，或直接在你的工作流中复用它们。

Claude 官方提供的 Skills

📚 官方 Skills 仓库：https://github.com/anthropics/skills

学习 Claude 官方的 Skills 仓库可以帮助我们最快的了解 Skills 的最佳实践，便于我们沉淀出自己的 Skills。

如何快速使用官方 Skills？
大多数官方 Skills 都能直接下载，或者通过 Git 克隆到本地。在 TRAE 等工具里，一般只需把这些 Skills 的文件夹放到指定的 Skills 目录，接着重启或刷新 Agent，它就会自动识别并加载这些新能力。具体操作可参考工具的使用文档。
更多细节可参考下面这部分内容：如何在 TRAE 里快速用起来

Claude 官方提供的 Skills 列表

社区其他最佳实践

如何在 TRAE 里快速使用

理论说再多，不如亲手一试。我们先讲一下如何在 TRAE SOLO 中创建并应用一个 Skill 并以基于飞书文档的 Spec Coding为例讲解一下如何利用 Skills 快速解决一个实际问题。

Skill 创建

方式一：设置中直接创建

TRAE 支持在设置页面可以快速创建一个 Skill

按下快捷键 Cmd +/ Ctrl + 通过快捷键打开设置面板。

在设置面板左侧找到「规则技能」选项

找到技能板块，点击右侧的「创建」按钮。

你会看到一个简洁的创建界面，包含三要素：Skill 名称、Skill 描述、Skill 主体。我们以创建一个“按规范提交 git commit”的 Skill 为例，填入相应内容后点击「确认」即可。

填入我们需要的内容「确认」即可

方式二：直接解析 SKILL.md

在当前项目目录下，新增目录.trae/Skills/xxx 导入你需要文件夹，和 TRAE 进行对话，即可使用。

可以在「设置 - 规则技能」中看到已经成功导入

方式三：在对话中创建

目前 TRAE 中内置了 Skills-creator Skills ，你可以在对话中直接和 TRAE 要求创建需要的 Skills

Skill 使用

在 TRAE 里使用技能很容易，你加载好需要的技能后，只需在对话框中用日常语言说明你的需求就行。

例如，输入“帮我设计一个有科技感的登录页面”，系统就会自动调用“frontend-design”技能。
例如，输入“帮我提取这个 PDF 里的所有表格”，系统会自动调用“document-Skills/pdf”技能。
例如，输入“帮我把这片技术文档转为飞书文档”，系统会自动调用“using-feishu-doc”技能。

系统会自动分析你的需求，加载技能文档，还会一步步指导你完成任务！

实践场景举例

还记得引言里提到的那些问题吗？比如说，项目规则文件（project\_rules）有字符数量的限制；又或者，就算你在根规则文件里明确写好了“在什么情况下读取哪个文件”，Agent 在执行任务时也不会按照要求来做。

这些问题的根本原因是，规则（Rules）对于 Agent 而言是固定不变的，它会在任务开始时就把所有规则一次性加载到上下文中，这样既占用空间，又不够灵活。而 技能（Skill）采用的是“逐步加载”的动态方式，刚好可以解决这个问题。所以，我们可以把之前那些复杂的规则场景，重新拆分成一个个独立的技能。

接下来，我们通过一个基于飞书文档的“Spec Coding”简单流程，来实际操作一下如何用技能解决问题。

什么是 Spec Coding？

Spec Coding 提倡“先思考后行动”，也就是通过详细定义可以执行的需求规范（Specification）来推动 AI 开发。它的流程包含“需求分析、技术设计、任务拆解”的文档编写过程，最后让 AI 根据规范来完成编码。这种一步步的工作流程能保证每一步都有依据，实现从需求到代码的准确转化。

让我来分析一下这个场景

上面提到将开发过程划分为四个关键阶段，所以要完成 “需求分析、技术设计、任务拆解” 的飞书文档撰写，还有最终的代码实现。为此，我们需要不同的技能来满足不同场景下的文档编写需求，并且要教会 Agent 如何使用飞书工具进行创作协同。

下面我们就一起完成上面提到的 Skills 的设计实现。

多角色专家 Skills

通过实现多角色 Skills 通过创建多个交付物过程文档，约束后续的编码，为编码提供足够且明确的上下文，每个Skill 专注完成一件事

下面让我们进一步详细设计

按照上述的表格我们就可以大致明确我们需要的 Skills 该如何实现了。

本次只作为一个例子大家可以参考上面创建 Skill 的教程自己完成一下这个多角色 Skills 的创建和调试，当然正如上面所述好的 Skill 需要在实践中逐渐优化并通过场景调用不断进行优化的

飞书文档使用 Skill

飞书文档的格式是 markdown 的超集，我们 Skill 的目的则是教会 Agent 飞书文档的语法，便于 Agent 写出符合格式的 md 文件。并通过约束 Agent 行为，充分利用飞书文档的评论的读写完成多人协作审阅的过程，用户通过在飞书文档评论完成相关建议的提出，Agent 重新阅读文档和评论，根据建议进一步优化文档，实现文档协作工作流。

Spec Coding Skill

上面我们实现了多个角色 Skills 和一个功能 Skill，但实际使用时，还需要有一个能统筹全局的技能，来实现分工协作。把上述多个技能组合起来，告诉智能体（agent）整体的规格编码（spec coding）流程，完成工具技能和角色技能的组合与调度。

如此我们就能快速搭建一个规格编码工作流程，完成基础开发。当然也可以参考上面的逻辑，用技能来重新复刻社区里的规格编码实践（如 SpecKit、OpenSpec 等）。

总结

上述场景提到了两种不同风格的 Skill（角色型，工具型），利用 Skill 的动态加载机制（取代固定规则的一次性加载方式），完成了复杂场景下的任务分解；通过 不同角色技能的分工协作（避免 Agent 什么都做导致执行混乱）；尝试借助飞书文档形成协作闭环（打通人机交互的最后一步），有效解决了 Agent “不听话、执行乱、工具少” 的问题，让 AI 从 “对话助手” 真正转变为 “可信赖的实干家”，实现从需求提出到代码产出的高效、精准、协作式交付。

Q & A | 一些常见问题

为什么我写的 Skills 不生效，或者效果不符合预期？

那十有八九是你的“名片”（Description）没写好。

记住，Agent 是通过读取 Skills 的 Description 来判断“什么时候该用哪个 Skill”的。要是你的描述写得含糊不清、太专业或者太简单，Agent 就很难明白你的意思，自然在需要的时候就不会调用这个 Skill。所以，用大白话写的清晰、准确的Description，对 Skill 能否起作用至关重要。

使用 Skills 的效果，会受到我选择的大语言模型（LLM）的影响吗？

会有影响，不过影响的方面不一样。

一个更强大的模型，主要影响“挑选”和“安排”技能的能力。 它能更准确地明白你的真实想法，然后从一堆技能里挑出最适合的一个或几个来解决问题。它的优势体现在制定策略方面。
而技能本身，决定了具体任务执行的“最低水平”和“稳定性”。 一旦某个技能被选中，它里面设定好的流程和代码是固定的，会稳定地运行。所以，技能编写得好不好，直接决定了具体任务能不能出色完成。。

Skills 是不是万能的？有什么它不擅长做的事情吗？

当然不是万能的。 Skills 的主要优势是 处理那些流程明确、边界清晰的任务。 在下面这些情况中，它可能就不是最好的选择了：

需要高度创造力的任务： 像写一首饱含情感的诗，或者设计一个全新的品牌标志。这类工作更需要大模型本身的“灵感”。
需要实时、动态做决策的复杂策略游戏： 比如在变化极快的金融市场中做交易决策。
单纯的知识问答或开放式闲聊： 如果你只是想问“文艺复兴三杰是谁？”，直接问大模型就可以，不用动用 Skills 这个“大杀器”。

我发现一个社区的 Skills 很好用，但我可以修改它以适应我的特殊需求吗？

当然可以，我们强烈建议你这么做！

大多数共享的 Skill 都支持用户“Fork”（也就是“复制一份”）并进行二次开发。你可以把通用的 Skill 当作模板，在自己的工作空间里复制一份，然后修改里面的逻辑或参数，以适应你自己的业务需求。这对整个生态的共建和知识复用很重要。

结语｜让 Agent 成为你真正的“行动派”

Skill 的出现，为 AI 从“对话式助手”转变为“可信赖的执行者”搭建了关键的技术桥梁。它用结构化的方法把领域知识、操作流程和工具调用逻辑封装起来，解决了 Agent 规则失效、执行失控的混乱问题，让 AI 的能力输出变得可以控制、值得信赖且高效。

Skill 的核心价值在于：

精准实际痛点： 通过巧妙的三级加载机制（元数据→说明文档→资源）平衡上下文效率与功能深度，在功能深度和上下文效率之间找到了一个绝佳的平衡点，既避免了宝贵 Token 的浪费，又确保了任务执行的精准性，实现了 Agent 上下文的动态加载能力。
生态赋能，降低门槛： 无论是官方还是社区，都提供了丰富的资源（如 Claude 官方仓库、SkillsMP 市场等），让普通用户也能轻松站在巨人的肩膀上，快速复用各种成熟的能力。

虽然 Skill 不是万能的，但它在“确定性流程任务”上的优势无可替代。未来，随着 AI 模型能力的提升与 Skill 生态的进一步完善，我们有望看到更多跨领域、可组合的 Skill 出现——让 AI 从“样样懂一点”的通才，真正进化为“事事做得好”的专家协作伙伴。

不妨从今天开始，尝试创建你的第一个 Skill：将你最擅长的领域经验封装成可复用的能力，让 AI 成为你延伸专业价值的放大器。

掘金 iOS
Skip 开源：从“卖工具”到“卖信任”的豪赌 -- 肘子的 Swift 周报 #120东坡肘子
2026年1月27日 08:13

Skip 开源：从“卖工具”到“卖信任”的豪赌 -- 肘子的 Swift 周报 #120

掘金 iOS

作者东坡肘子

2026年1月27日 08:13

本期聚焦 Skip 全面开源，转换商业模式，其他内容涵盖 SwiftData 数据迁移、SwiftUI 架构、Swift 嵌入式进展、AI 客户端和 Agent 管理工具等。

肘子的Swift记事本
Skip 开源：从“卖工具”到“卖信任”的豪赌 - 肘子的 Swift 周报 #120Fatbobman
2026年1月26日 22:00

Skip 开源：从“卖工具”到“卖信任”的豪赌 - 肘子的 Swift 周报 #120

肘子的Swift记事本

作者 Fatbobman

2026年1月26日 22:00

Skip Tools 日前宣布全面免费并开源核心引擎 skipstone。这意味着 Skip 彻底改变了经营方式：从“卖产品”转向“卖服务+社区赞助”。这次变化，既有对之前商业模式执行不佳而被迫调整的无奈，也体现了 Skip 团队的果敢——在当前 AI 盛行、开发工具格局固化的背景下，主动求变，力求突破。

Subscribe English RSS

阅读全文

掘金 iOS
Swift 常用框架Kingfisher、KingfisherWebP详解Haha_bj
2026年1月26日 18:00

Swift 常用框架Kingfisher、KingfisherWebP详解

掘金 iOS

作者 Haha_bj

2026年1月26日 18:00

1.1 什么是 Kingfisher 、KingfisherWebP Kingfisher 是一个功能强大的 Swift 库，专门用于处理图像的下载、缓存和展示。目前已成为 iOS/macOS 开发中

掘金 iOS
深入理解 WKWebView：代理方法与 WKWebView 生命周期的执行顺序江东小bug王
2026年1月26日 14:58

深入理解 WKWebView：代理方法与 WKWebView 生命周期的执行顺序

掘金 iOS

作者江东小bug王

2026年1月26日 14:58

在 iOS 开发中，`WKWebView` 是构建混合应用（Hybrid App）的核心组件。它基于现代 WebKit 引擎，性能优异、安全性高，但其复杂的生命周期机制也让不少开发者感到困惑——尤其是当页面加载失败时，错误回调到底在哪个阶段触发？

__CFRunLoopDoSources0函数详解

掘金 iOS

作者 iOS在入门

2026年1月26日 14:11

__CFRunLoopDoSources0 函数逐行注释函数概述 __CFRunLoopDoSources0 是 RunLoop 中负责处理 Source0 事件源的核心函数。Source0 是需要

掘金 iOS
星际穿越：SwiftUI 如何让 ForEach 遍历异构数据（Heterogeneous）集合大熊猫侯佩
2026年1月26日 13:39

星际穿越：SwiftUI 如何让 ForEach 遍历异构数据（Heterogeneous）集合

掘金 iOS

作者大熊猫侯佩

2026年1月26日 13:39

🌌 引子：红色警报公元 2077 年，地球联邦主力战舰“Runtime 号”正在穿越 Swift 5.7 星系。舰桥上，警报声大作。 “舰长亚历克斯（Alex），大事不妙！前方出现高能反应，我们的

掘金 iOS
越狱沙盒：SwiftUI fileImporter 的“数据偷渡”指南大熊猫侯佩
2026年1月26日 13:34

越狱沙盒：SwiftUI fileImporter 的“数据偷渡”指南

掘金 iOS

作者大熊猫侯佩

2026年1月26日 13:34

🫆引子 2077 年，新西雅图的地下避难所。 Neo 盯着全息屏幕上那行红色的 Access Denied，手里的合成咖啡早就凉透了。作为反抗军的首席代码架构师，他此刻正面临着一个令人头秃的难题：如

__CFRunLoopDoBlocks函数详解

掘金 iOS

作者 iOS在入门

2026年1月26日 12:41

函数概述 __CFRunLoopDoBlocks 是 RunLoop 中负责执行 block 的核心函数。它处理通过 CFRunLoopPerformBlock 添加到 RunLoop 中的异步 bl

__CFRunLoopDoObservers函数详解

掘金 iOS

作者 iOS在入门

2026年1月26日 11:27

__CFRunLoopDoObservers 函数逐行注释函数概述 __CFRunLoopDoObservers 是 RunLoop 中负责触发观察者回调的核心函数。当 RunLoop 的状态发生变

TN3187：迁移至基于 UIKit 场景的生命周期

掘金 iOS

作者 Meicy

2026年1月26日 10:48

概述许多较旧的 iOS 应用使用一个 UIApplicationDelegate 对象作为其应用的主要入口点，并管理应用的生命周期。使用场景的应用则不同，它们使用一个 UISceneDelegate

【AI Video Generator】迎来开年第一波大清洗！

掘金 iOS

作者 iOS研究院

2026年1月26日 10:30

背景

看似一个平平无奇的周末，却让做AI Video Generator的开发者天塌了。

好消息：竞品家的都嘎了！

坏消息：自己家的也嘎了！

此次以Video关键词检索，共计21款相关产品。有单纯上架海外市场，也有全体地区分发。

企业微信20260126-095856.png

所以集中下架的行为，不只是单纯的某些国家或地区。大概率是集中触发了苹果的查杀。（法国佬口音：我要验牌！）

u=2455538048,33299884&fm=224&app=112&f=JPEG.jpg

随机验牌

在众多被标记了下架的产品中，随机抽选了2家APP。单纯从应用市场的截图入手。

企业微信20260126-095138.png

企业微信20260126-094803.png

从AppStore市场图，就能明显发现存在社交风格的市场截图，充斥着袒胸露乳的行为。基本上都带勾！

基本上随机抽查的产品中或多或少都存在此类问题。从应用截图就充斥着擦边行为！，莫非是社交类大佬集体转型？

下架原因

为了更好的解释这种集中下架行为，特意在Developer审核指南，匹配对应内容审核的条款。

不出意外 1.1.4 - 公然宣传黄色或色情内容的材料 (这一概念的定义是：“对性器官或性活动的露骨描述或展示，目的在于刺激性快感，而非带来美学价值或触发情感”)，其中包括一夜情约会 App 和其他可能包含色情内容或用于嫖娼或人口贩卖和剥削的 App。

当然，这种集中行为大概率苹果算法升级【或者通过鉴黄系统】，从AppStore净化入手，简单纯粹的一刀切！

遵守规则，方得长治久安，最后祝大家大吉大利，今晚过审！

附：更多文章欢迎关注同名公众号，By-iOS研究院。

MaskRay
Long branches in compilers, assemblers, and linkersMaskRay
2026年1月25日 16:00

Long branches in compilers, assemblers, and linkers

MaskRay

作者 MaskRay

2026年1月25日 16:00

Branch instructions on most architectures use PC-relative addressingwith a limited range. When the target is too far away, the branchbecomes "out of range" and requires special handling.

Consider a large binary where main() at address 0x10000calls foo() at address 0x8010000-over 128MiB away. OnAArch64, the bl instruction can only reach ±128MiB, so thiscall cannot be encoded directly. Without proper handling, the linkerwould fail with an error like "relocation out of range." The toolchainmust handle this transparently to produce correct executables.

This article explores how compilers, assemblers, and linkers worktogether to solve the long branch problem.

Compiler (IR to assembly): Handles branches within a function thatexceed the range of conditional branch instructions
Assembler (assembly to relocatable file): Handles branches within asection where the distance is known at assembly time
Linker: Handles cross-section and cross-object branches discoveredduring final layout

Branch range limitations

Different architectures have different branch range limitations.Here's a quick comparison of unconditional / conditional branchranges:

Architecture	Cond	Uncond	Call	Notes
AArch64	±1MiB	±128MiB	±128MiB	Thunks
AArch32 (A32)	±32MiB	±32MiB	±32MiB	Thunks, interworking
AArch32 (T32)	±1MiB	±16MiB	±16MiB	Thunks, interworking
LoongArch	±128KiB	±128MiB	±128MiB	Linker relaxation
M68k (68020+)	±2GiB	±2GiB	±2GiB	Assembler picks size
MIPS (pre-R6)	±128KiB	±128KiB (`b offset`)	±128KiB (`bal offset`)	In `-fno-pic` code, pseudo-absolute`j`/`jal` can be used for a 256MiB region.
MIPS R6	±128KiB	±128MiB	±128MiB
PowerPC64	±32KiB	±32MiB	±32MiB	Thunks
RISC-V	±4KiB	±1MiB	±1MiB	Linker relaxation
SPARC	±1MiB	±8MiB	±2GiB	No thunks needed
SuperH	±256B	±4KiB	±4KiB	Use register-indirect if needed
x86-64	±2GiB	±2GiB	±2GiB	Large code model changes call sequence
Xtensa	±2KiB	±128KiB	±512KiB	Linker relaxation
z/Architecture	±64KiB	±4GiB	±4GiB	No thunks needed

The following subsections provide detailed per-architectureinformation, including relocation types relevant for linkerimplementation.

AArch32

In A32 state:

Branch (b/b<cond>), conditionalbranch and link (bl<cond>)(R_ARM_JUMP24): ±32MiB
Unconditional branch and link (bl/blx,R_ARM_CALL): ±32MiB

Note: R_ARM_CALL is for unconditionalbl/blx which can be relaxed to BLX inline;R_ARM_JUMP24 is for branches which require a veneer forinterworking.

In T32 state (Thumb state pre-ARMv8):

Conditional branch (b<cond>,R_ARM_THM_JUMP8): ±256 bytes
Short unconditional branch (b,R_ARM_THM_JUMP11): ±2KiB
ARMv5T branch and link (bl/blx,R_ARM_THM_CALL): ±4MiB
ARMv6T2 wide conditional branch (b<cond>.w,R_ARM_THM_JUMP19): ±1MiB
ARMv6T2 wide branch (b.w,R_ARM_THM_JUMP24): ±16MiB
ARMv6T2 wide branch and link (bl/blx,R_ARM_THM_CALL): ±16MiB. R_ARM_THM_CALL can berelaxed to BLX.

AArch64

Test bit and branch (tbz/tbnz,R_AARCH64_TSTBR14): ±32KiB
Compare and branch (cbz/cbnz,R_AARCH64_CONDBR19): ±1MiB
Conditional branches (b.<cond>,R_AARCH64_CONDBR19): ±1MiB
Unconditional branches (b/bl,R_AARCH64_JUMP26/R_AARCH64_CALL26):±128MiB

The compiler's BranchRelaxation pass handlesout-of-range conditional branches by inverting the condition andinserting an unconditional branch. The AArch64 assembler does notperform branch relaxation; out-of-range branches produce linker errorsif not handled by the compiler.

LoongArch

Conditional branches(beq/bne/blt/bge/bltu/bgeu,R_LARCH_B16): ±128KiB (18-bit signed)
Compare-to-zero branches (beqz/bnez,R_LARCH_B21): ±4MiB (23-bit signed)
Unconditional branch/call (b/bl,R_LARCH_B26): ±128MiB (28-bit signed)
Medium range call (pcaddu12i+jirl,R_LARCH_CALL30): ±2GiB
Long range call (pcaddu18i+jirl,R_LARCH_CALL36): ±128GiB

M68k

Short branch(Bcc.B/BRA.B/BSR.B): ±128 bytes(8-bit displacement)
Word branch(Bcc.W/BRA.W/BSR.W): ±32KiB(16-bit displacement)
Long branch(Bcc.L/BRA.L/BSR.L, 68020+):±2GiB (32-bit displacement)

GNU Assembler provides pseudoopcodes (jbsr, jra, jXX) that"automatically expand to the shortest instruction capable of reachingthe target". For example, jeq .L0 emits one ofbeq.b, beq.w, and beq.l dependingon the displacement.

With the long forms available on 68020 and later, M68k doesn't needlinker range extension thunks.

MIPS

Conditional branches(beq/bne/bgez/bltz/etc,R_MIPS_PC16): ±128KiB
PC-relative jump (b offset(bgez $zero, offset)): ±128KiB
PC-relative call (bal offset(bgezal $zero, offset)): ±128KiB
Pseudo-absolute jump/call (j/jal,R_MIPS_26): branch within the current 256MiB region, onlysuitable for -fno-pic code. Deprecated in R6 in favor ofbc/balc

16-bit instructions removed in Release 6:

Conditional branch (beqz16,R_MICROMIPS_PC7_S1): ±128 bytes
Unconditional branch (b16,R_MICROMIPS_PC10_S1): ±1KiB

MIPS Release 6:

Unconditional branch, compact (bc16, unclear toolchainimplementation): ±1KiB
Compare and branch, compact(beqc/bnec/bltc/bgec/etc,R_MIPS_PC16): ±128KiB
Compare register to zero and branch, compact(beqzc/bnezc/etc,R_MIPS_PC21_S2): ±4MiB
Branch (and link), compact (bc/balc,R_MIPS_PC26_S2): ±128MiB

LLVM's MipsBranchExpansion pass handles out-of-rangebranches.

lld implements LA25 thunks for MIPS PIC/non-PIC interoperability, butnot range extension thunks.

GCC's mips port ported added-mlong-calls in 1993-03.

PowerPC

Conditional branch (bc/bcl,R_PPC64_REL14): ±32KiB
Unconditional branch (b/bl,R_PPC64_REL24/R_PPC64_REL24_NOTOC):±32MiB

GCC-generated code relies on linker thunks. However, the legacy-mlongcall can be used to generate long code sequences.

RISC-V

Compressed c.beqz: ±256 bytes
Compressed c.jal: ±2KiB
jalr (I-type immediate): ±2KiB
Conditional branches(beq/bne/blt/bge/bltu/bgeu,B-type immediate): ±4KiB
jal (J-type immediate, PseudoBR): ±1MiB(notably smaller than other RISC architectures: AArch64 ±128MiB,PowerPC64 ±32MiB, LoongArch ±128MiB)
PseudoJump (using auipc +jalr): ±2GiB
beqi/bnei (Zibi extension, 5-bit compareimmediate (1 to 31 and -1)): ±4KiB

Qualcomm uC Branch Immediate extension (Xqcibi):

qc.beqi/qc.bnei/qc.blti/qc.bgei/qc.bltui/qc.bgeui(32-bit, 5-bit compare immediate): ±4KiB
qc.e.beqi/qc.e.bnei/qc.e.blti/qc.e.bgei/qc.e.bltui/qc.e.bgeui(48-bit, 16-bit compare immediate): ±4KiB

Qualcomm uC Long Branch extension (Xqcilb):

qc.e.j/qc.e.jal (48-bit,R_RISCV_VENDOR(QUALCOMM)+R_RISCV_QC_E_CALL_PLT): ±2GiB

For function calls:

The Gocompiler emits a single jal for calls and relies on itslinker to generate trampolines when the target is out of range.
In contrast, GCC and Clang emit auipc+jalrand rely on linker relaxation to shrink the sequence when possible.

The jal range (±1MiB) is notably smaller than other RISCarchitectures (AArch64 ±128MiB, PowerPC64 ±32MiB, LoongArch ±128MiB).This limits the effectiveness of linker relaxation ("start large andshrink"), and leads to frequent trampolines when the compileroptimistically emits jal ("start small and grow").

SPARC

Compare and branch (cxbe, R_SPARC_5): ±64bytes
Conditional branch (bcc, R_SPARC_WDISP19):±1MiB
Unconditional branch (b, R_SPARC_WDISP22):±8MiB
call(R_SPARC_WDISP30/R_SPARC_WPLT30): ±2GiB

With ±2GiB range for call, SPARC doesn't need rangeextension thunks in practice.

SuperH

SuperH uses fixed-width 16-bit instructions, which limits branchranges.

Conditional branch (bf/bt): ±256 bytes(8-bit displacement)
Unconditional branch (bra): ±4KiB (12-bitdisplacement)
Branch to subroutine (bsr): ±4KiB (12-bitdisplacement)

For longer distances, register-indirect branches(braf/bsrf) are used. The compiler invertsconditions and emits these when targets exceed the short ranges.

SuperH is supported by GCC and binutils, but not by LLVM.

Xtensa

Xtensa uses variable-length instructions: 16-bit (narrow,.n suffix) and 24-bit (standard).

Narrow conditional branch (beqz.n/bnez.n,16-bit): -28 to +35 bytes (6-bit signed + 4)
Conditional branch (compare two registers)(beq/bne/blt/bge/etc,24-bit): ±256 bytes
Conditional branch (compare with zero)(beqz/bnez/bltz/bgez,24-bit): ±2KiB
Unconditional jump (j, 24-bit): ±128KiB
Call(call0/call4/call8/call12,24-bit): ±512KiB

The assembler performs branch relaxation: when a conditional branchtarget is too far, it inverts the condition and inserts a jinstruction.

Per https://www.sourceware.org/binutils/docs/as/Xtensa-Call-Relaxation.html,for calls, GNU Assembler pessimistically generates indirect sequences(l32r+callx8) when the target distance isunknown. GNU ld then performs linker relaxation.

x86-64

Short conditional jump (Jcc rel8): -128 to +127bytes
Short unconditional jump (JMP rel8): -128 to +127bytes
Near conditional jump (Jcc rel32): ±2GiB
Near unconditional jump (JMP rel32): ±2GiB

With a ±2GiB range for near jumps, x86-64 rarely encountersout-of-range branches in practice. That said, Google and Meta Platformsdeploy mostly statically linked executables on x86-64 production serversand have run into the huge executable problem for certainconfigurations.

z/Architecture

Short conditional branch (BRC,R_390_PC16DBL): ±64KiB (16-bit halfword displacement)
Long conditional branch (BRCL,R_390_PC32DBL): ±4GiB (32-bit halfword displacement)
Short call (BRAS, R_390_PC16DBL):±64KiB
Long call (BRASL, R_390_PC32DBL):±4GiB

With ±4GiB range for long forms, z/Architecture doesn't need linkerrange extension thunks. LLVM's SystemZLongBranch passrelaxes short branches (BRC/BRAS) to longforms (BRCL/BRASL) when targets are out ofrange.

Compiler: branch rangehandling

Conditional branch instructions usually have shorter ranges thanunconditional ones, making them less suitable for linker thunks (as wewill explore later). Compilers typically keep conditional branch targetswithin the same section, allowing the compiler to handle out-of-rangecases via branch relaxation.

Within a function, conditional branches may still go out of range.The compiler measures branch distances and relaxes out-of-range branchesby inverting the condition and inserting an unconditional branch:

# Before relaxation (out of range)
beq .Lfar_target       # ±4KiB range on RISC-V

# After relaxation
bne .Lskip             # Inverted condition, short range
j .Lfar_target         # Unconditional jump, ±1MiB range
.Lskip:

Some architectures have conditional branch instructions that comparewith an immediate, with even shorter ranges due to encoding additionalimmediates. For example, AArch64's cbz/cbnz(compare and branch if zero/non-zero) andtbz/tbnz (test bit and branch) have only±32KiB range. RISC-V Zibi beqi/bnei have ±4KiBrange. The compiler handles these in a similar way:

// Before relaxation (cbz has ±32KiB range)
  cbz w0, far

// After relaxation
  cbnz w0, .Lskip       // Inverted condition
  b far                 // Unconditional branch, ±128MiB range
.Lskip:

An Intel employee contributed https://reviews.llvm.org/D41634 (in 2017) when inversionof a branch condintion is impossible. This is for an out-of-treebackend. As of Jan 2026 there is no in-tree test for this code path.

In LLVM, this is handled by the BranchRelaxation pass,which runs just before AsmPrinter. Different backends havetheir own implementations:

BranchRelaxation: AArch64, AMDGPU, AVR, RISC-V
HexagonBranchRelaxation: Hexagon
PPCBranchSelector: PowerPC
SystemZLongBranch: SystemZ
MipsBranchExpansion: MIPS
MSP430BSel: MSP430

The generic BranchRelaxation pass computes block sizesand offsets, then iterates until all branches are in range. Forconditional branches, it tries to invert the condition and insert anunconditional branch. For unconditional branches that are still out ofrange, it calls TargetInstrInfo::insertIndirectBranch toemit an indirect jump sequence (e.g.,adrp+add+br on AArch64) or a longjump sequence (e.g., pseudo jump on RISC-V).

Note: The size estimates may be inaccurate due to inline assembly.LLVM uses heuristics to estimate inline assembly sizes, but for certainassembly constructs the size is not precisely known at compile time.

Unconditional branches and calls can target different sections sincethey have larger ranges. If the target is out of reach, the linker caninsert thunks to extend the range.

For x86-64, the large code model uses multiple instructions for callsand jumps to support text sections larger than 2GiB (see Relocationoverflow and code models: x86-64 large code model). This is apessimization if the callee ends up being within reach. Google and MetaPlatforms have interest in allowing range extension thunks as areplacement for the multiple instructions.

Assembler: instructionrelaxation

The assembler converts assembly to machine code. When the target of abranch is within the same section and the distance is known at assemblytime, the assembler can select the appropriate encoding. This isdistinct from linker thunks, which handle cross-section or cross-objectreferences where distances aren't known until link time.

Assembler instruction relaxation handles two cases (see Clang-O0 output: branch displacement and size increase for examples):

Span-dependent instructions: Select an appropriateencoding based on displacement.
- On x86, a short jump (jmp rel8) can be relaxed to anear jump (jmp rel32) when the target is far.
- On RISC-V, beqz may be assembled to the 2-bytec.beqz when the displacement fits within ±256 bytes.
Conditional branch transform: Invert the conditionand insert an unconditional branch. On RISC-V, a blt mightbe relaxed to bge plus an unconditional branch.

The assembler uses an iterative layout algorithm that alternatesbetween fragment offset assignment and relaxation until all fragmentsbecome legalized. See Integratedassembler improvements in LLVM 19 for implementation details.

Linker: range extensionthunks

When the linker resolves relocations, it may discover that a branchtarget is out of range. At this point, the instruction encoding isfixed, so the linker cannot simply change the instruction. Instead, itgenerates range extension thunks (also called veneers,branch stubs, or trampolines).

A thunk is a small piece of linker-generated code that can reach theactual target using a longer sequence of instructions. The originalbranch is redirected to the thunk, which then jumps to the realdestination.

Range extension thunks are one type of linker-generated thunk. Othertypes include:

ARM interworking veneers: Switch between ARM andThumb instruction sets (see Linker notes onAArch32)
MIPS LA25 thunks: Enable PIC and non-PIC codeinteroperability (see Toolchain notes onMIPS)
PowerPC64 TOC/NOTOC thunks: Handle calls betweenfunctions using different TOC pointer conventions (see Linker notes on PowerISA)

Short range vs long rangethunks

A short range thunk (see lld/ELF's AArch64implementation) contains just a single branch instruction. Since ituses a branch, its reach is also limited by the branch range—it can onlyextend coverage by one branch distance. For targets further away,multiple short range thunks can be chained, or a long range thunk withaddress computation must be used.

Long range thunks use indirection and can jump to (practically)arbitrary locations.

// Short range thunk: single branch, 4 bytes
__AArch64AbsLongThunk_dst:
  b dst                         // ±128MiB range

// Long range thunk: address computation, 12 bytes
__AArch64ADRPThunk_dst:
  adrp x16, dst                 // Load page address (±4GiB range)
  add x16, x16, :lo12:dst       // Add page offset
  br x16                        // Indirect branch

Thunk examples

AArch32 (PIC) (see Linker notes onAArch32):

__ARMV7PILongThunk_dst:
  movw ip, :lower16:(dst - .)   ; ip = intra-procedure-call scratch register
  movt ip, :upper16:(dst - .)
  add ip, ip, pc
  bx ip

PowerPC64 ELFv2 (see Linker notes on PowerISA):

__long_branch_dst:
  addis 12, 2, .branch_lt@ha    # Load high bits from branch lookup table
  ld 12, .branch_lt@l(12)       # Load target address
  mtctr 12                      # Move to count register
  bctr                          # Branch to count register

Thunk impact ondebugging and profiling

Thunks are transparent at the source level but visible in low-leveltools:

Stack traces: May show thunk symbols (e.g.,__AArch64ADRPThunk_foo) between caller and callee
Profilers: Samples may attribute time to thunkcode; some profilers aggregate thunk time with the target function
Disassembly: objdump orllvm-objdump will show thunk sections interspersed withregular code
Code size: Each thunk adds bytes; large binariesmay have thousands of thunks

lld/ELF's thunk creationalgorithm

lld/ELF uses a multi-pass algorithm infinalizeAddressDependentContent:

assignAddresses();
for (pass = 0; pass < 30; ++pass) {
  if (pass == 0)
    createInitialThunkSections();  // pre-create empty ThunkSections
  bool changed = false;
  for (relocation : all_relocations) {
    if (pass > 0 && normalizeExistingThunk(rel))
      continue;  // existing thunk still in range
    if (!needsThunk(rel)) continue;
    Thunk *t = getOrCreateThunk(rel);
    ts = findOrCreateThunkSection(rel, src);
    ts->addThunk(t);
    rel.sym = t->getThunkTargetSym();  // redirect
    changed = true;
  }
  mergeThunks();  // insert ThunkSections into output
  if (!changed) break;
  assignAddresses();  // recalculate with new thunks
}

Key details:

Multi-pass: Iterates until convergence (max 30passes). Adding thunks changes addresses, potentially puttingpreviously-in-range calls out of range.
Pre-allocated ThunkSections: On pass 0,createInitialThunkSections places emptyThunkSections at regular intervals(thunkSectionSpacing). For AArch64: 128 MiB - 0x30000 ≈127.8 MiB.
Thunk reuse: getThunk returns existingthunk if one exists for the same target;normalizeExistingThunk checks if a previously-created thunkis still in range.
ThunkSection placement: getISDThunkSecfinds a ThunkSection within branch range of the call site, or createsone adjacent to the calling InputSection.

lld/MachO's thunk creationalgorithm

lld/MachO uses a single-pass algorithm inTextOutputSection::finalize:

for (callIdx = 0; callIdx < inputs.size(); ++callIdx) {
  // Finalize sections within forward branch range (minus slop)
  while (finalIdx < endIdx && fits_in_range(inputs[finalIdx]))
    finalizeOne(inputs[finalIdx++]);

  // Process branch relocations in this section
  for (Relocation &r : reverse(isec->relocs)) {
    if (!isBranchReloc(r)) continue;
    if (targetInRange(r)) continue;
    if (existingThunkInRange(r)) { reuse it; continue; }
    // Create new thunk and finalize it
    createThunk(r);
  }
}

Key differences from lld/ELF:

Single pass: Addresses are assigned monotonicallyand never revisited
Slop reservation: ReservesslopScale * thunkSize bytes (default: 256 × 12 = 3072 byteson ARM64) to leave room for future thunks
Thunk naming:<function>.thunk.<sequence> where sequenceincrements per target

Thunkstarvation problem: If many consecutive branches need thunks, eachthunk (12 bytes) consumes slop faster than call sites (4 bytes apart)advance. The test lld/test/MachO/arm64-thunk-starvation.sdemonstrates this edge case. Mitigation is increasing--slop-scale, but pathological cases with hundreds ofconsecutive out-of-range callees can still fail.

mold's thunk creationalgorithm

mold uses a two-pass approach:

Pessimistically over-allocate thunks. Out-of-section relocations andrelocations referencing to a section not assigned address yetpessimistically need thunks.(requires_thunk(ctx, isec, rel, first_pass) whenfirst_pass=true)
Then remove unnecessary ones.

Linker pass ordering:

compute_section_sizes() callscreate_range_extension_thunks() — final section addressesare NOT yet known
set_osec_offsets() assigns section addresses
remove_redundant_thunks() is called AFTER addresses areknown — check unneeded thunks due to out-of-section relocations
Rerun set_osec_offsets()

Pass 1 (create_range_extension_thunks):Process sections in batches using a sliding window. The window tracksfour positions:

Sections:   [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] ...
             ^       ^       ^           ^
             A       B       C           D
             |       |_______|           |
             |         batch             |
             |                           |
             earliest                    thunk
             reachable                   placement
             from C

[B, C) = current batch of sections to process (size≤ branch_distance/5)
A = earliest section still reachable from C (forthunk expiration)
D = where to place the thunk (furthest pointreachable from B)

// Simplified from OutputSection<E>::create_range_extension_thunks
while (b < sections.size()) {
  // Advance D: find furthest point where thunk is reachable from B
  while (d < size && thunk_at_d_reachable_from_b)
    assign_address(sections[d++]);

  // Compute batch [B, C)
  c = b + 1;
  while (c < d && sections[c] < sections[b] + batch_size) c++;

  // Advance A: expire thunks no longer reachable
  while (a < b && sections[a] + branch_distance < sections[c]) a++;
  // Expire thunk groups before A: clear symbol flags.
  for (; t < thunks.size() && thunks[t].offset < sections[a]; t++)
    for (sym in thunks[t].symbols) sym->flags = 0;

  // Scan [B,C) relocations. If a symbol is not assigned to a thunk group yet,
  // assign it to the new thunk group at D.
  auto &thunk = thunks.emplace_back(new Thunk(offset));
  parallel_for(b, c, [&](i64 i) {
    for (rel in sections[i].relocs) {
      if (requires_thunk(rel)) {
        Symbol &sym = rel.symbol;
        if (!sym.flags.test_and_set()) {  // atomic: skip if already set
          lock_guard lock(mu);
          thunk.symbols.push_back(&sym);
        }
      }
    }
  });
  offset += thunk.size();
  b = c;  // Move to next batch
}

Pass 2 (remove_redundant_thunks): Afterfinal addresses are known, remove thunk entries for symbols actually inrange.

Key characteristics:

Pessimistic over-allocation: Assumes allout-of-section calls need thunks; safe to shrink later
Batch size: branch_distance/5 (25.6 MiB forAArch64, 3.2 MiB for AArch32)
Parallelism: Uses TBB for parallel relocationscanning within each batch
Single branch range: Uses one conservativebranch_distance per architecture. For AArch32, uses ±16 MiB(Thumb limit) for all branches, whereas lld/ELF uses ±32 MiB for A32branches.
Thunk size not accounted in D-advancement: Theactual thunk group size is unknown when advancing D, so the end of alarge thunk group may be unreachable from the beginning of thebatch.
No convergence loop: Single forward pass foraddress assignment, no risk of non-convergence

GNU ld's thunk creationalgorithm

Each port implements the algorithm on their own. There is no codesharing.

GNU ld's AArch64 port (bfd/elfnn-aarch64.c) uses aniterative algorithm but with a single stub type and no lookup table.

Main iteration loop(elfNN_aarch64_size_stubs()):

group_sections(htab, stub_group_size, ...);  // Default: 127 MiB
layout_sections_again();

for (;;) {
  stub_changed = false;
  _bfd_aarch64_add_call_stub_entries(&stub_changed, ...);
  if (!stub_changed)
    return true;
  _bfd_aarch64_resize_stubs(htab);
  layout_sections_again();
}

GNU ld's ppc64 port (bfd/elf64-ppc.c) uses an iterativemulti-pass algorithm with a branch lookup table(.branch_lt) for long-range stubs.

Section grouping: Sections are grouped bystub_group_size (~28-30 MiB default); each group gets onestub section. For 14-bit conditional branches(R_PPC64_REL14, ±32KiB range), group size is reduced by1024x.

Main iteration loop(ppc64_elf_size_stubs()):

while (1) {
  // Scan all relocations in all input sections
  for (input_bfd; section; irela) {
    // Only process branch relocations (R_PPC64_REL24, R_PPC64_REL14, etc.)
    stub_type = ppc_type_of_stub(section, irela, ...);
    if (stub_type == ppc_stub_none)
      continue;
    // Create or merge stub entry
    stub_entry = ppc_add_stub(...);
  }

  // Size all stubs, potentially upgrading long_branch to plt_branch
  bfd_hash_traverse(&stub_hash_table, ppc_size_one_stub, ...);

  // Check for convergence
  if (!stub_changed && all_sizes_stable)
    break;

  // Re-layout sections
  layout_sections_again();
}

Convergence control:

STUB_SHRINK_ITER = 20 (PR28827): After 20 iterations,stub sections only grow (prevents oscillation)
Convergence when:!stub_changed && all section sizes stable

Stub type upgrade: ppc_type_of_stub()initially returns ppc_stub_long_branch for out-of-rangebranches. Later, ppc_size_one_stub() checks if the stub'sbranch can reach; if not, it upgrades toppc_stub_plt_branch and allocates an 8-byte entry in.branch_lt.

Comparing linker thunkalgorithms

Aspect	lld/ELF	lld/MachO	mold	GNU ld ppc64
Passes	Multi (max 30)	Single	Two	Multi (shrink after 20)
Strategy	Iterative refinement	Sliding window	Sliding window	Iterative refinement
Thunk placement	Pre-allocated intervals	Inline with slop	Batch intervals	Per stub-group

Linker relaxation

Some architectures take a different approach: instead of onlyexpanding branches, the linker can also shrinkinstruction sequences when the target is close enough. RISC-V andLoongArch both use this technique. See Thedark side of RISC-V linker relaxation for a deeper dive into thecomplexities and tradeoffs.

Consider a function call using the callpseudo-instruction, which expands to auipc +jalr:

# Before linking (8 bytes)
call ext
# Expands to:
#   auipc ra, %pcrel_hi(ext)
#   jalr ra, ra, %pcrel_lo(ext)

If ext is within ±1MiB, the linker can relax this to:

1 2	# After relaxation (4 bytes) jal ext

This is enabled by R_RISCV_RELAX relocations thataccompany R_RISCV_CALL relocations. TheR_RISCV_RELAX relocation signals to the linker that thisinstruction sequence is a candidate for shrinking.

Example object code before linking:

0000000000000006 <foo>:
       6: 97 00 00 00   auipc   ra, 0
                R_RISCV_CALL ext
                R_RISCV_RELAX *ABS*
       a: e7 80 00 00   jalr    ra
       e: 97 00 00 00   auipc   ra, 0
                R_RISCV_CALL ext
                R_RISCV_RELAX *ABS*
      12: e7 80 00 00   jalr    ra

After linking with relaxation enabled, the 8-byteauipc+jalr pairs become 4-bytejal instructions:

0000000000000244 <foo>:
     244: 41 11         addi    sp, sp, -16
     246: 06 e4         sd      ra, 8(sp)
     248: ef 00 80 01   jal     ext
     24c: ef 00 40 01   jal     ext
     250: ef 00 00 01   jal     ext

When the linker deletes instructions, it must also adjust:

Subsequent instruction offsets within the section
Symbol addresses
Other relocations that reference affected locations
Alignment directives (R_RISCV_ALIGN)

This makes RISC-V linker relaxation more complex than thunkinsertion, but it provides code size benefits that other architecturescannot achieve at link time.

LoongArch uses a similar approach. Apcaddu12i+jirl sequence(R_LARCH_CALL36, ±128GiB range) can be relaxed to a singlebl instruction (R_LARCH_B26, ±128MiB range)when the target is close enough.

Diagnosing out-of-rangeerrors

When you encounter a "relocation out of range" error, check thelinker diagnostic and locate the relocatable file and function.Determine how the function call is lowered in assembly.

Summary

Handling long branches requires coordination across thetoolchain:

Stage	Technique	Example
Compiler	Branch relaxation pass	Invert condition + add unconditional jump
Assembler	Instruction relaxation	Invert condition + add unconditional jump
Linker	Range extension thunks	Generate trampolines
Linker	Linker relaxation	Shrink `auipc`+`jalr` to `jal`(RISC-V)

The linker's thunk generation is particularly important for largeprograms where function calls may exceed branch ranges. Differentlinkers use different algorithms with various tradeoffs betweencomplexity, optimality, and robustness.

Linker relaxation approaches adopted by RISC-V and LoongArch is analternative that avoids range extension thunks but introduces othercomplexities.

Handling long branches

MaskRay

作者 MaskRay

2026年1月25日 16:00

Branch instructions on most architectures use PC-relative addressingwith a limited range. When the target is too far away, the branchbecomes "out of range" and requires special handling.

Consider a large binary where main() at address 0x10000calls foo() at address 0x8010000-over 128MiB away. OnAArch64, the bl instruction can only reach ±128MiB, so thiscall cannot be encoded directly. Without proper handling, the linkerwould fail with an error like "relocation out of range." The toolchainmust handle this transparently to produce correct executables.

This article explores how compilers, assemblers, and linkers worktogether to solve the long branch problem.

Compiler (IR to assembly): Handles branches within a function thatexceed the range of conditional branch instructions
Assembler (assembly to relocatable file): Handles branches within asection where the distance is known at assembly time
Linker: Handles cross-section and cross-object branches discoveredduring final layout

Branch range limitations

Different architectures have different branch range limitations.Here's a quick comparison of unconditional branch/call ranges:

Architecture	Unconditional Branch	Conditional Branch	Notes
AArch64	±128MiB	±1MiB	Range extension thunks
AArch32 (A32)	±32MiB	±32MiB	Range extension and interworking veneers
AArch32 (T32)	±16MiB	±1MiB	Thumb has shorter ranges
PowerPC64	±32MiB	±32KiB	Range extension and TOC/NOTOC interworking thunks
RISC-V	±1MiB (`jal`)	±4KiB	Linker relaxation
x86-64	±2GiB	±2GiB	Code models or thunk extension

The following subsections provide detailed per-architectureinformation, including relocation types relevant for linkerimplementation.

AArch32

In A32 state:

Branch (b/b<cond>), conditionalbranch and link (bl<cond>)(R_ARM_JUMP24): ±32MiB
Unconditional branch and link (bl/blx,R_ARM_CALL): ±32MiB

Note: R_ARM_CALL is for unconditionalbl/blx which can be relaxed to BLX inline;R_ARM_JUMP24 is for branches which require a veneer forinterworking.

In T32 state:

Conditional branch (b<cond>,R_ARM_THM_JUMP8): ±256 bytes
Short unconditional branch (b,R_ARM_THM_JUMP11): ±2KiB
ARMv5T branch and link (bl/blx,R_ARM_THM_CALL): ±4MiB
ARMv6T2 wide conditional branch (b<cond>.w,R_ARM_THM_JUMP19): ±1MiB
ARMv6T2 wide branch (b.w,R_ARM_THM_JUMP24): ±16MiB
ARMv6T2 wide branch and link (bl/blx,R_ARM_THM_CALL): ±16MiB. R_ARM_THM_CALL can berelaxed to BLX.

AArch64

Test and compare branches(tbnz/tbz/cbnz/cbz):±32KiB
Conditional branches (b.<cond>): ±1MiB
Unconditional branches (b/bl):±128MiB

PowerPC

Conditional branch (bc/bcl,R_PPC64_REL14): ±32KiB
Unconditional branch (b/bl,R_PPC64_REL24/R_PPC64_REL24_NOTOC):±32MiB

RISC-V

Compressed c.beqz: ±256 bytes
Compressed c.jal: ±2KiB
jalr (I-type immediate): ±2KiB
Conditional branches(beq/bne/blt/bge/bltu/bgeu,B-type immediate): ±4KiB
jal (J-type immediate, PseudoBR):±1MiB
PseudoJump (using auipc +jalr): ±2GiB

Qualcomm uC Branch Immediate extension (Xqcibi):

qc.beqi/qc.bnei/qc.blti/qc.bgei/qc.bltui/qc.bgeui(32-bit, 5-bit compare immediate): ±4KiB
qc.e.beqi/qc.e.bnei/qc.e.blti/qc.e.bgei/qc.e.bltui/qc.e.bgeui(48-bit, 16-bit compare immediate): ±4KiB

Qualcomm uC Long Branch extension (Xqcilb):

qc.e.j/qc.e.jal (48-bit,R_RISCV_VENDOR(QUALCOMM)+R_RISCV_QC_E_CALL_PLT): ±2GiB

SPARC

Compare and branch (cxbe, R_SPARC_5): ±64bytes
Conditional branches (bcc,R_SPARC_WDISP19): ±1MiB
call (R_SPARC_WDISP30): ±2GiB

Note: lld does not implement range extension thunks for SPARC.

x86-64

Short conditional jump (Jcc rel8): -128 to +127bytes
Short unconditional jump (JMP rel8): -128 to +127bytes
Near conditional jump (Jcc rel32): ±2GiB
Near unconditional jump (JMP rel32): ±2GiB

With a ±2GiB range for near jumps, x86-64 rarely encountersout-of-range branches in practice. A single text section would need toexceed 2GiB before thunks become necessary. For this reason, mostlinkers (including lld) do not implement range extension thunks forx86-64.

Compiler: branch relaxation

The compiler typically generates branches using a form with a largerange. However, certain conditional branches may still go out of rangewithin a function.

The compiler measures branch distances and relaxes out-of-rangebranches. In LLVM, this is handled by the BranchRelaxationpass, which runs just before AsmPrinter.

Different backends have their own implementations:

BranchRelaxation: AArch64, AMDGPU, AVR, RISC-V
HexagonBranchRelaxation: Hexagon
PPCBranchSelector: PowerPC
SystemZLongBranch: SystemZ
MipsBranchExpansion: MIPS
MSP430BSel: MSP430

For a conditional branch that is out of range, the pass typicallyinverts the condition and inserts an unconditional branch:

# Before relaxation (out of range)
beq .Lfar_target       # ±4KiB range on RISC-V

# After relaxation
bne .Lskip             # Inverted condition, short range
j .Lfar_target         # Unconditional jump, ±1MiB range
.Lskip:

Assembler: instructionrelaxation

The assembler converts assembly to machine code. When the target of abranch is within the same section and the distance is known at assemblytime, the assembler can select the appropriate encoding. This isdistinct from linker thunks, which handle cross-section or cross-objectreferences where distances aren't known until link time.

Assembler instruction relaxation handles two cases (see Clang-O0 output: branch displacement and size increase for examples):

Span-dependent instructions: Select a largerencoding when the displacement exceeds the range of the smallerencoding. For x86, a short jump (jmp rel8) can be relaxedto a near jump (jmp rel32).
Conditional branch transform: Invert the conditionand insert an unconditional branch. On RISC-V, a blt mightbe relaxed to bge plus an unconditional branch.

The assembler uses an iterative layout algorithm that alternatesbetween fragment offset assignment and relaxation until all fragmentsbecome legalized. See Integratedassembler improvements in LLVM 19 for implementation details.

Linker: range extensionthunks

When the linker resolves relocations, it may discover that a branchtarget is out of range. At this point, the instruction encoding isfixed, so the linker cannot simply change the instruction. Instead, itgenerates range extension thunks (also called veneers,branch stubs, or trampolines).

A thunk is a small piece of linker-generated code that can reach theactual target using a longer sequence of instructions. The originalbranch is redirected to the thunk, which then jumps to the realdestination.

Range extension thunks are one type of linker-generated thunk. Othertypes include:

ARM interworking veneers: Switch between ARM andThumb instruction sets (see Linker notes onAArch32)
MIPS LA25 thunks: Enable PIC and non-PIC codeinteroperability (see Toolchain notes onMIPS)
PowerPC64 TOC/NOTOC thunks: Handle calls betweenfunctions using different TOC pointer conventions (see Linker notes on PowerISA)

Short range vs long rangethunks

A short range thunk (see lld/ELF's AArch64implementation) contains just a single branch instruction. Since ituses a branch, its reach is also limited by the branch range—it can onlyextend coverage by one branch distance. For targets further away,multiple short range thunks can be chained, or a long range thunk withaddress computation must be used.

Long range thunks use indirection and can jump to (practically)arbitrary locations.

// Short range thunk: single branch, 4 bytes
__AArch64AbsLongThunk_dst:
  b dst                         // ±128MiB range

// Long range thunk: address computation, 12 bytes
__AArch64ADRPThunk_dst:
  adrp x16, dst                 // Load page address (±4GiB range)
  add x16, x16, :lo12:dst       // Add page offset
  br x16                        // Indirect branch

Thunk examples

AArch32 (PIC) (see Linker notes onAArch32):

__ARMV7PILongThunk_dst:
  movw ip, :lower16:(dst - .)   ; ip = intra-procedure-call scratch register
  movt ip, :upper16:(dst - .)
  add ip, ip, pc
  bx ip

PowerPC64 ELFv2 (see Linker notes on PowerISA):

__long_branch_dst:
  addis 12, 2, .branch_lt@ha    # Load high bits from branch lookup table
  ld 12, .branch_lt@l(12)       # Load target address
  mtctr 12                      # Move to count register
  bctr                          # Branch to count register

Thunk impact ondebugging and profiling

Thunks are transparent at the source level but visible in low-leveltools:

Stack traces: May show thunk symbols (e.g.,__AArch64ADRPThunk_foo) between caller and callee
Profilers: Samples may attribute time to thunkcode; some profilers aggregate thunk time with the target function
Disassembly: objdump orllvm-objdump will show thunk sections interspersed withregular code
Code size: Each thunk adds bytes; large binariesmay have thousands of thunks

lld/ELF's thunk creationalgorithm

lld/ELF uses a multi-pass algorithm infinalizeAddressDependentContent:

assignAddresses();
for (pass = 0; pass < 30; ++pass) {
  if (pass == 0)
    createInitialThunkSections();  // pre-create empty ThunkSections
  bool changed = false;
  for (relocation : all_relocations) {
    if (pass > 0 && normalizeExistingThunk(rel))
      continue;  // existing thunk still in range
    if (!needsThunk(rel)) continue;
    Thunk *t = getOrCreateThunk(rel);
    ts = findOrCreateThunkSection(rel, src);
    ts->addThunk(t);
    rel.sym = t->getThunkTargetSym();  // redirect
    changed = true;
  }
  mergeThunks();  // insert ThunkSections into output
  if (!changed) break;
  assignAddresses();  // recalculate with new thunks
}

Key details:

Multi-pass: Iterates until convergence (max 30passes). Adding thunks changes addresses, potentially puttingpreviously-in-range calls out of range.
Pre-allocated ThunkSections: On pass 0,createInitialThunkSections places emptyThunkSections at regular intervals(thunkSectionSpacing). For AArch64: 128 MiB - 0x30000 ≈127.8 MiB.
Thunk reuse: getThunk returns existingthunk if one exists for the same target;normalizeExistingThunk checks if a previously-created thunkis still in range.
ThunkSection placement: getISDThunkSecfinds a ThunkSection within branch range of the call site, or createsone adjacent to the calling InputSection.

lld/MachO's thunk creationalgorithm

lld/MachO uses a single-pass algorithm inTextOutputSection::finalize:

for (callIdx = 0; callIdx < inputs.size(); ++callIdx) {
  // Finalize sections within forward branch range (minus slop)
  while (finalIdx < endIdx && fits_in_range(inputs[finalIdx]))
    finalizeOne(inputs[finalIdx++]);

  // Process branch relocations in this section
  for (Relocation &r : reverse(isec->relocs)) {
    if (!isBranchReloc(r)) continue;
    if (targetInRange(r)) continue;
    if (existingThunkInRange(r)) { reuse it; continue; }
    // Create new thunk and finalize it
    createThunk(r);
  }
}

Key differences from lld/ELF:

Single pass: Addresses are assigned monotonicallyand never revisited
Slop reservation: ReservesslopScale * thunkSize bytes (default: 256 × 12 = 3072 byteson ARM64) to leave room for future thunks
Thunk naming:<function>.thunk.<sequence> where sequenceincrements per target

Thunkstarvation problem: If many consecutive branches need thunks, eachthunk (12 bytes) consumes slop faster than call sites (4 bytes apart)advance. The test lld/test/MachO/arm64-thunk-starvation.sdemonstrates this edge case. Mitigation is increasing--slop-scale, but pathological cases with hundreds ofconsecutive out-of-range callees can still fail.

mold's thunk creationalgorithm

mold uses a two-pass approach: first pessimistically over-allocatethunks, then remove unnecessary ones.

Intuition: It's safe to allocate thunk space andlater shrink it, but unsafe to add thunks after addresses are assigned(would create gaps breaking existing references).

Pass 1 (create_range_extension_thunks):Process sections in batches using a sliding window. The window tracksfour positions:

Sections:   [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] ...
             ^       ^       ^           ^
             A       B       C           D
             |       |_______|           |
             |         batch             |
             |                           |
             earliest                    thunk
             reachable                   placement
             from C

[B, C) = current batch of sections to process (size≤ branch_distance/5)
A = earliest section still reachable from C (forthunk expiration)
D = where to place the thunk (furthest pointreachable from B)

// Simplified from OutputSection<E>::create_range_extension_thunks
while (b < sections.size()) {
  // Advance D: find furthest point where thunk is reachable from B
  while (d < size && thunk_at_d_reachable_from_b)
    assign_address(sections[d++]);

  // Compute batch [B, C)
  c = b + 1;
  while (c < d && sections[c] < sections[b] + batch_size) c++;

  // Advance A: expire thunks no longer reachable
  while (a < b && sections[a] + branch_distance < sections[c]) a++;
  // Expire thunk groups before A: clear symbol flags.
  for (; t < thunks.size() && thunks[t].offset < sections[a]; t++)
    for (sym in thunks[t].symbols) sym->flags = 0;

  // Scan [B,C) relocations. If a symbol is not assigned to a thunk group yet,
  // assign it to the new thunk group at D.
  auto &thunk = thunks.emplace_back(new Thunk(offset));
  parallel_for(b, c, [&](i64 i) {
    for (rel in sections[i].relocs) {
      if (requires_thunk(rel)) {
        Symbol &sym = rel.symbol;
        if (!sym.flags.test_and_set()) {  // atomic: skip if already set
          lock_guard lock(mu);
          thunk.symbols.push_back(&sym);
        }
      }
    }
  });
  offset += thunk.size();
  b = c;  // Move to next batch
}

Pass 2 (remove_redundant_thunks): Afterfinal addresses are known, remove thunk entries for symbols actually inrange.

Key characteristics:

Pessimistic over-allocation: Assumes allout-of-section calls need thunks; safe to shrink later
Batch size: branch_distance/5 (25.6 MiB forAArch64, 3.2 MiB for AArch32)
Parallelism: Uses TBB for parallel relocationscanning within each batch
Single branch range: Uses one conservativebranch_distance per architecture. For AArch32, uses ±16 MiB(Thumb limit) for all branches, whereas lld/ELF uses ±32 MiB for A32branches.
Thunk size not accounted in D-advancement: Theactual thunk group size is unknown when advancing D, so the end of alarge thunk group may be unreachable from the beginning of thebatch.
No convergence loop: Single forward pass foraddress assignment, no risk of non-convergence

Comparing thunk algorithms

Aspect	lld/ELF	lld/MachO	mold
Passes	Multi-pass (max 30)	Single-pass	Two-pass
Strategy	Iterative refinement	Greedy	Greedy
Thunk placement	Pre-allocated at intervals	Inline with slop reservation	Batch-based at intervals
Convergence	Always (bounded iterations)	Almost always	Almost always
Range handling	Per-relocation type	Single conservative range	Single conservative range
Parallelism	Sequential	Sequential	Parallel (TBB)

Linker relaxation (RISC-V)

RISC-V takes a different approach: instead of only expandingbranches, it can also shrink instruction sequences whenthe target is close enough.

Consider a function call using the callpseudo-instruction, which expands to auipc +jalr:

# Before linking (8 bytes)
call ext
# Expands to:
#   auipc ra, %pcrel_hi(ext)
#   jalr ra, ra, %pcrel_lo(ext)

If ext is within ±1MiB, the linker can relax this to:

1 2	# After relaxation (4 bytes) jal ext

This is enabled by R_RISCV_RELAX relocations thataccompany R_RISCV_CALL relocations. TheR_RISCV_RELAX relocation signals to the linker that thisinstruction sequence is a candidate for shrinking.

Example object code before linking:

0000000000000006 <foo>:
       6: 97 00 00 00   auipc   ra, 0
                R_RISCV_CALL ext
                R_RISCV_RELAX *ABS*
       a: e7 80 00 00   jalr    ra
       e: 97 00 00 00   auipc   ra, 0
                R_RISCV_CALL ext
                R_RISCV_RELAX *ABS*
      12: e7 80 00 00   jalr    ra

After linking with relaxation enabled, the 8-byteauipc+jalr pairs become 4-bytejal instructions:

0000000000000244 <foo>:
     244: 41 11         addi    sp, sp, -16
     246: 06 e4         sd      ra, 8(sp)
     248: ef 00 80 01   jal     ext
     24c: ef 00 40 01   jal     ext
     250: ef 00 00 01   jal     ext

When the linker deletes instructions, it must also adjust:

Subsequent instruction offsets within the section
Symbol addresses
Other relocations that reference affected locations
Alignment directives (R_RISCV_ALIGN)

This makes RISC-V linker relaxation more complex than thunkinsertion, but it provides code size benefits that other architecturescannot achieve at link time.

Diagnosing out-of-rangeerrors

When you encounter a "relocation out of range" error, here are somediagnostic steps:

Check the error message: lld reports the sourcelocation, relocation type, and the distance. For example:

1	ld.lld: error: a.o:(.text+0x1000): relocation R_AARCH64_CALL26 out of range: 150000000 is not in [-134217728, 134217727]

Use --verbose or-Map: Generate a link map to see sectionlayout and identify which sections are far apart.
Consider -ffunction-sections:Splitting functions into separate sections gives the linker moreflexibility in placement, potentially reducing distances.
Check for large data in .text:Embedded data (jump tables, constant pools) can push functions apart.Some compilers have options to place these elsewhere.
LTO considerations: Link-time optimization candramatically change code layout. If thunk-related issues appear onlywith LTO, the optimizer may be creating larger functions or differentinlining decisions.

Summary

Handling long branches requires coordination across thetoolchain:

Stage	Technique	Example
Compiler	Branch relaxation pass	Invert condition + add unconditional jump
Assembler	Instruction relaxation	Short jump to near jump
Linker	Range extension thunks	Generate trampolines
Linker	Linker relaxation	Shrink `auipc`+`jalr` to `jal`(RISC-V)

The linker's thunk generation is particularly important for largeprograms where cross-compilation-unit calls may exceed branch ranges.Different linkers use different algorithms with various tradeoffsbetween complexity, optimality, and robustness.

RISC-V's linker relaxation is unique in that it can both expand andshrink code, optimizing for both correctness and code size.

iOS 老司机
老司机 iOS 周报 #363 | 2026-01-26ChengzhiHuang
2026年1月25日 20:23

老司机 iOS 周报 #363 | 2026-01-26

iOS 老司机

作者 ChengzhiHuang

2026年1月25日 20:23

老司机 iOS 周报，只为你呈现有价值的信息。

你也可以为这个项目出一份力，如果发现有价值的信息、文章、工具等可以到 Issues 里提给我们，我们会尽快处理。记得写上推荐的理由哦。有建议和意见也欢迎到 Issues 提出。

文章

🐕 精通 UITableViewDiffableDataSource ——从入门到重构的现代 iOS 列表开发指南

@阿权：本文围绕 iOS 现代列表 UITableViewDiffableDataSource 展开，核心是替代传统数据源代理模式，解决列表开发中的崩溃、状态不一致等痛点，并在最后提供一个轻量工具集 DiffableDataSourceKit 来简化系统 API 的调用。文章核心内容如下：

使用 UITableViewDiffableDataSource API，通过声明式的“快照”来管理数据状态，系统自动计算并执行 UI 更新动画，从而一劳永逸地解决了传统模式中数据与 U 状态不同步导致的崩溃问题。
全文通过构建一个音乐播放列表 App 来贯穿始终，从移除 Storyboard、定义遵循 Hashable 协议的数据模型开始，一步步教你初始化数据源和填充数据。
文章还详细讲解了：
- 自定义多样化的单元格：包括使用传统的 Auto Layout 布局，以及利用 iOS 14+ 的 UIContentConfiguration 进行现代化配置。
- 实现核心交互：具体涉及了拖拽重排、滑动删除，以及如何通过 Cell 的代理事件处理等交互。
- 处理复杂逻辑：特别讲解了如何利用模型的 Hashable 来实现“原地刷新”而非替换的刷新机制。

除了文章提到的 UITableViewDiffableDataSource，用好这些技术，不妨可以再看看以下几个 WWDC：

另外，与其用 UITableView 承载数据，其实 Apple 更推荐使用 UICollectionView 来实现列表，甚至还提供了增强版的 Cell。

恰逢 App Store 要求用 Xcode 26 带来的强制升级，不少 App 也终于抛弃了 iOS 12、iOS 13，也是用新技术（也不新了）升级项目架构的最好时机。

除了 API 本身，我们也应关注到一些架构或设计模式上的变化与趋势：

声明式。新的 API 更多使用构造时定义逻辑的声明式语法，在一开始就把相关的布局、样式等逻辑给定义好，后续不再通过各种存储属性配置逻辑。极大地减少了开发者对状态的维护。例如 UICollectionViewCompositionalLayout 通过 Item、可嵌套的 Group、Section 一层一层地在初始化配置布局。
数据驱动。声明式常常是为了数据驱动，因为声明式定义的不是最终对象，而是一些前置的配置数据结构。例如 Cell 提供了 Configuration 结构体，通过配置和重新赋值结构体来实现 UI 更新，而不是直接操作 View。类似的，UIButton 也提供了类型的 Configuration 结构体用于配置 UI。更深一层的意义，驱动 UI 的配置数据、视图甚至可以复用的和无痛迁移的。例如 UITableViewCell 和 UICollectionViewCell 的配置类及其关联子视图是通用的，自定义 Cell 可以把重心放在自定义 Configuration 配置上，这样就可以把相同的视图样式套用在各种容器中。
数据绑定。将索引替换为 id 甚至是具体业务类型。以往 UITableView、UICollectionView 的 API 都是围绕索引（IndexPath）展开的，所有的数据（DataSource）、布局（CollectionViewLayout）和视图（Presentation: Cell、ReuseableView）即使有分离，但都需要通过索引来交换。虽然这样简化了不同模块的耦合和通信逻辑，但因为大多数业务场景数据是动态的，这让索引只是个临时态，一不小心就会用错，轻则展示错误，重则引入崩溃。DiffableDataSource 最具里程碑的一点是剔除了索引，直接让具体业务模型跟 Cell 直接绑定，不经过索引。
类型安全。不再是 Any/AnyObject，而是直接绑定一个具体的类型。常见做法是通过泛型机制将类型传入。
用轮子，而不是造轮子。系统 API 足够自由，直接使用实例，而不是子类化自定义逻辑。以往的开发经验，都是一言不合就重写实现，重新造控件、布局，UIButton 和 UICollectionViewLayout 就是两个典型的 case。近年来系统 API 都在丰富使用的自由度和易用程度，例如 UIButton 提供了许多拿来就能用的灵活样式，开发者只需要微调个 Configuration 就是能实现业务效果。UICollectionViewCompositionalLayout 则是用 Item、Group、Section 构造足够复杂的布局场景。另外一点验证了这个趋势的是，iOS 26 中，只有官方提供的控件、导航框架才有完整的液态玻璃交互。

架构的演进一般是为了提高研效、减少出错。一个合理、高效的代码架构，在当业务需求变得复杂的时候，业务调用代码不会随业务的复杂而线性增长，而是逐渐减少。

🐎 Dart 官方再解释为什么放弃了宏编程，并转向优化 build_runner ? 和 Kotlin 的区别又是什么？

@Crazy：本文主要介绍了 Dart 官方放弃宏编程改为优化 build_runner 的原因，在读本文之前，要先明白什么是宏编程。文章中介绍了 Dart 在实现宏编程的过程中试用的方案与思考，放弃的原因总结起来有三个 :

编译会卡在一个“先有鸡还是先有蛋”的死结
工具链双前端导致宏支持会引发“工作量爆炸 + 性能灾难”
即使做成了，也“高不成低不就”：替代不了 build_runner，不如直接扩展 build_runner 能力

文章最后还对比了 Kotlin 的 Compiler Plugins、KSP 与 Swift 的 Swift Macros 的差距，总的来说 build_runner 还有很长的一段路要走。

🐕 @_exported import VS public import

@AidenRao：Swift 6 带来了全新的 import 访问级别控制：@_exported import。它和我们熟悉的 public import 有什么不同？简单来说，public import 只是将一个模块声明为公开 API 的一部分，但使用者仍需手动导入它；而 @_exported import 则是将依赖的符号完全“吸收”，调用方无需关心底层依赖。文章深入对比了两者的意图和应用场景，并给出了明确建议：日常开发中应优先选择官方支持的 public import，仅在封装 SDK 或构建聚合模块（Umbrella Module）这类希望为用户简化导入操作的场景下，才考虑使用 @_exported。

🐕 MVVM + Reducer Pattern

@含笑饮砒霜：这篇文章主要讲述如何将 MVVM 架构与 Reducer 模式结合来提升 iOS 应用中状态管理的可控性和可维护性。作者指出：传统的 MVVM 模式在复杂状态下易出现分散的状态变更和难以追踪的问题，这会导致难调试、隐式状态转换、竞态条件等不良后果；而 Reducer 模式（受 Redux/TCA 启发）通过 “单一状态源 + 明确 action + 纯函数 reduce ” 的方式，使状态变更更可预测、更易测试。文章建议在 ViewModel 内部局部引入 reducer，把所有状态通过单一 reduce(state, action) 处理，并把副作用（如异步任务）当作 effects 处理，从而达到更明确、可追踪且易单元测试的效果，同时保留 MVVM 和领域层的清晰分层，不强依赖某个框架。

🐢 用第一性原理拆解 Agentic Coding：从理论到实操

@Cooper Chen：文章从第一性原理出发，系统拆解了 Agentic Coding 背后的底层逻辑与工程现实，澄清了一个常见误区：效率瓶颈不在于上下文窗口不够大，而在于我们如何与 AI 协作。作者以 LLM 的自回归生成与 Attention 机制为起点，深入分析了 Coding Agent 在长任务中常见的“走偏”“失忆”“局部最优”等问题，并指出这些并非工具缺陷，而是模型工作方式的必然结果。

文章最有价值之处，在于将理论约束转化为可执行的工程实践：通过“短对话、单任务”的工作方式控制上下文质量；用结构化配置文件和工具设计引导 Agent 行为；通过 Prompt Caching、Agent Loop、上下文压缩等机制提升系统稳定性。更进一步，作者提出“复利工程（Compounding Engineering）”这一关键理念——不把 AI 当一次性工具，而是通过文档、规范、测试和审查，将每一次经验沉淀为系统的长期记忆。

最终，文章给出的启示非常清晰：AI 编程不是魔法，而是一门需要刻意练习的协作技能。当你真正理解模型的边界，并用工程化方法加以约束和放大，AI 才能从“能写代码”进化为“可靠的编程合作者”。

🐎 Universal Links At Scale: The Challenges Nobody Talks About

@Damien：文章揭示了 Universal Links 在大规模应用中的隐藏复杂性：AASA 文件缺乏 JSON 模式验证导致静默失效，Apple CDN 缓存延迟使问题修复滞后，苹果特有通配符语法和 substitutionVariables 变量无现成工具支持。作者提出通过 CI 集成模式验证、CDN 同步检查、自定义正则解析和 staging 环境测试的完整方案，并开源了 Swift CLI 工具实现全链路自动化验证。

🐕 How I use Codex GPT 5.2 with Xcode (My Complete Workflow)

@JonyFang: 本视频深入介绍了如何让 AI 代理（如 Codex GPT 5.2）真正提升 iOS/macOS 开发效率的三个核心策略：

构建脚本自动化（Build Scripts）：通过标准化的构建流程，让 AI 能够理解和复现你的构建环境
让构建失败显而易见（Make Build Failures Obvious）：优化错误信息的呈现方式，使 AI 能够快速定位问题根源
给你的代理装上"眼睛"（Give Your Agent Eyes）：这是最核心的部分 - 让 AI 能够"看到"应用运行时的状态，而不仅仅是读取代码

最有价值之处：作者强调了一个常被忽视的问题 - AI 代码助手不仅需要理解代码逻辑，更需要理解应用的运行时状态。通过工具如 Peekaboo 等，让 AI 能够获取视觉反馈（截图、UI 层级等），从而提供更精准的问题诊断和代码建议。这种"可观测性优先"的思路，与传统的代码审查工作流形成了有趣的对比，值得所有尝试将 AI 工具深度集成到开发流程中的团队参考。

视频时长约 49 分钟，适合希望系统性提升 AI 辅助开发效率的 iOS/macOS 开发者观看。

工具

🐎 Skip Is Now Free and Open Source

@Crazy：Skip 框架正式免费并且开源，该库从 2023 年开始开发，已有三年的开发历程。该库的目的是让开发者能够仅用一套 Swift 与 SwiftUI 代码库，同时打造 iOS 与 Android 上的高品质移动应用——而且不必接受那些自“跨平台工具诞生以来就一直存在”的妥协。因为 Skip 是采用编译为 Kotlin 与 Compose 的方式，所以相应的执行效率是非常高的。相较于其他的跨平台开发，效率高，并且使用的是 Swift 语言。既然已经免费并开源，移动端开发的时候又多了一个可供选择的跨端技术。

内推

重新开始更新「iOS 靠谱内推专题」，整理了最近明确在招人的岗位，供大家参考

具体信息请移步：https://www.yuque.com/iosalliance/article/bhutav 进行查看（如有招聘需求请联系 iTDriverr）

关注我们

我们是「老司机技术周报」，一个持续追求精品 iOS 内容的技术公众号，欢迎关注。

关注有礼，关注【老司机技术周报】，回复「2024」，领取 2024 及往年内参

同时也支持了 RSS 订阅：https://github.com/SwiftOldDriver/iOS-Weekly/releases.atom 。

说明

🚧 表示需某工具，🌟 表示编辑推荐

预计阅读时间：🐎 很快就能读完（1 - 10 mins）；🐕 中等（10 - 20 mins）；🐢 慢（20+ mins）

ARC 原理与 weak 底层实现（Side Table 深度解析）

掘金 iOS

作者汉秋

2026年1月25日 17:22

ARC 原理与 weak 底层实现（Side Table 深度解析）一、先给结论（非常重要）换句话说： ❌ 不是「A weak 引用 B」 ❌ 不是「对象记住了谁 weak 它」 ✅ 是「Runt

掘金 iOS
Flutter 底层原理忆江南
2026年1月25日 16:25

Flutter 底层原理

掘金 iOS

作者忆江南

2026年1月25日 16:25

一、Flutter 架构原理 1. Flutter 的整体架构是怎样的？ Flutter 采用分层架构设计，从上到下分为三层： Framework 层（Dart） Material/Cupertino

KVC / KVO 与 ivar / property 的底层关系

掘金 iOS

作者汉秋

2026年1月25日 16:20

KVC / KVO 与 ivar / property 的底层关系一、为什么 KVC / KVO 一定要和 ivar / property 一起理解在 Objective-C 中： ivar 是数

Objective-C 类结构全景解析

掘金 iOS

作者汉秋

2026年1月25日 15:16

在 Runtime 视角下，Objective-C 的 Class 并不是一个抽象概念，而是一块结构严谨、职责清晰的内存结构。本文将围绕 Class 的真实组成，系统讲解： isa 指针到底指向哪

普通视图

转载 一文读懂 Skills｜从概念到实操的完整指南

什么是 Skills

Skill 原理介绍

Skill 的架构原理：渐进式加载

Skills 的调用逻辑：从理解意图到稳定执行

Skills vs. 其他概念的区别

什么是好的 Skills：从“能用”到“好用”

社区热门 Skills 推荐

Claude 官方提供的 Skills

社区其他最佳实践

如何在 TRAE 里快速使用

Skill 创建

Skill 使用

实践场景举例

什么是 Spec Coding？

下面我们就一起完成上面提到的 Skills 的设计实现。

Q & A | 一些常见问题

为什么我写的 Skills 不生效，或者效果不符合预期？

使用 Skills 的效果，会受到我选择的大语言模型（LLM）的影响吗？

Skills 是不是万能的？有什么它不擅长做的事情吗？

我发现一个社区的 Skills 很好用，但我可以修改它以适应我的特殊需求吗？

结语｜让 Agent 成为你真正的“行动派”

背景

随机验牌

下架原因

Branch range limitations

AArch32

AArch64

LoongArch

M68k

MIPS

PowerPC

RISC-V

SPARC

SuperH

Xtensa

x86-64

z/Architecture

Compiler: branch rangehandling

Assembler: instructionrelaxation

Linker: range extensionthunks

Short range vs long rangethunks

Thunk examples

Thunk impact ondebugging and profiling

lld/ELF's thunk creationalgorithm

lld/MachO's thunk creationalgorithm

mold's thunk creationalgorithm

GNU ld's thunk creationalgorithm

Comparing linker thunkalgorithms

Linker relaxation

Diagnosing out-of-rangeerrors

Summary

Related

Branch range limitations

AArch32

AArch64

PowerPC

RISC-V

SPARC

x86-64

Compiler: branch relaxation

Assembler: instructionrelaxation

Linker: range extensionthunks

Short range vs long rangethunks

Thunk examples

Thunk impact ondebugging and profiling

lld/ELF's thunk creationalgorithm

lld/MachO's thunk creationalgorithm

mold's thunk creationalgorithm

Comparing thunk algorithms

Linker relaxation (RISC-V)

Diagnosing out-of-rangeerrors

Summary

文章

🐕 精通 UITableViewDiffableDataSource ——从入门到重构的现代 iOS 列表开发指南

🐎 Dart 官方再解释为什么放弃了宏编程，并转向优化 build_runner ? 和 Kotlin 的区别又是什么？

🐕 @_exported import VS public import

🐕 MVVM + Reducer Pattern

🐢 用第一性原理拆解 Agentic Coding：从理论到实操

转载一文读懂 Skills｜从概念到实操的完整指南