普通视图

发现新文章,点击刷新页面。
昨天 — 2025年2月18日MaskRay

Migrating comments to giscus

作者 MaskRay
2025年2月17日 16:00

Followed this guide: https://www.patrickthurmond.com/blog/2023/12/11/commenting-is-available-now-thanks-to-giscus

Add the following to layout/_partial/article.ejs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<% if (!index && post.comments) { %>
<section class="giscus"></section>
<script src="https://giscus.app/client.js"
data-repo="MaskRay/maskray.me"
data-repo-id="FILL IT UP"
data-category="Blog Post Comments"
data-category-id="FILL IT UP"
data-mapping="pathname"
data-strict="0"
data-reactions-enabled="1"
data-emit-metadata="0"
data-input-position="bottom"
data-theme="preferred_color_scheme"
data-lang="en"
data-loading="lazy"
crossorigin="anonymous"
async>
</script>
<% } %>

Unfortunately comments from Disqus have not been migrated yet. Ifyou've left comments in the past, thank you. Apologies they are nowgone.

While you can create Github Discussions via GraphQL API, I haven'tfound a solution that works out of the box. https://www.davidangulo.xyz/posts/dirty-ruby-script-to-migrate-comments-from-disqus-to-giscus/provides a Ruby solution, which is promising but no longer works.

1
2
3
4
5
6
7
8
9
Failed to define value method for :name, because EnterpriseOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value me
thod generation.
Failed to define value method for :name, because EnvironmentOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value m
ethod generation.
Failed to define value method for :name, because LabelOrderField already responds to that method. Use `value_method:` to override the method name or `value_method: false` to disable Enum value method
generation.
...
.local/share/gem/ruby/3.3.0/gems/graphql-client-0.25.0/lib/graphql/client.rb:338:in `query': wrong number of arguments (given 2, expected 1) (ArgumentError)
from g.rb:42:in `create_discussion'
昨天以前MaskRay

lld 20 ELF changes

作者 MaskRay
2025年2月2日 16:00

LLVM 20 will be released. As usual, I maintain lld/ELF and have addedsome notes to https://github.com/llvm/llvm-project/blob/release/20.x/lld/docs/ReleaseNotes.rst.I've meticulously reviewed nearly all the patches that are not authoredby me. I'll delve into some of the key changes.

  • -z nosectionheader has been implemented to omit thesection header table. The operation is similar tollvm-objcopy --strip-sections. (#101286)
  • --randomize-section-padding=<seed> is introducedto insert random padding between input sections and at the start of eachsegment. This can be used to control measurement bias in A/Bexperiments. (#117653)
  • The reproduce tarball created with --reproduce= nowexcludes directories specified in the --dependency-fileargument (used by Ninja). This resolves an error where non-existentdirectories could cause issues when invokingld.lld @response.txt.
  • --symbol-ordering-file= and call graph profile can nowbe used together.
  • When --call-graph-ordering-file= is specified,.llvm.call-graph-profile sections in relocatable files areno longer used.
  • --lto-basic-block-sections=labels is deprecated infavor of --lto-basic-block-address-map. (#110697)
  • In non-relocatable links, a .note.GNU-stack sectionwith the SHF_EXECINSTR flag is now rejected unless-z execstack is specified. (#124068)
  • In relocatable links, the sh_entsize member of aSHF_MERGE section with relocations is now respected in theoutput.
  • Quoted names can now be used in output section phdr, memory regionnames, OVERLAY, the LHS of --defsym, andINSERT AFTER.
  • Section CLASS linker script syntax binds input sectionsto named classes, which are referenced later one or more times. Thisprovides access to the automatic spilling mechanism of--enable-non-contiguous-regions without globally changingthe semantics of section matching. It also independently increases theexpressive power of linker scripts. (#95323)
  • INCLUDE cycle detection has been fixed. A linker scriptcan now be included twice.
  • The archivename: syntax when matching input sections isnow supported. (#119293)
  • To support Arm v6-M, short thunks using B.w are no longer generated.(#118111)
  • For AArch64, BTI-aware long branch thunks can now be created to adestination function without a BTI instruction. (#108989) (#116402)
  • Relocations related to GOT and TLSDESC for the AArch64 PointerAuthentication ABI are now supported.
  • Supported relocation types for x86-64 target:
    • R_X86_64_CODE_4_GOTPCRELX (#109783) (#116737)
    • R_X86_64_CODE_4_GOTTPOFF (#116634)
    • R_X86_64_CODE_4_GOTPC32_TLSDESC (#116909)
    • R_X86_64_CODE_6_GOTTPOFF (#117675)
  • Supported relocation types for LoongArch target:R_LARCH_TLS_{LD,GD,DESC}_PCREL20_S2. (#100105)

Linker scripts

The CLASS keyword, which separates section matching andreferring, is a noteworthy new feature to the linker script support.Here is the GNU ld featurerequest.

Section layout

If --symbol-ordering-file= is specified,--symbol-ordering-file= specified sections are placedfirst. In LLD 20, SHT_LLVM_CALL_GRAPH_PROFILE sections inrelocatable files are still used for other sections.

The next release will support options--bp-compression-sort=both and--bp-startup-sort=function --irpgo-profile=a.profdata thatimproves Lempel-Ziv compression and reduces page faults during programstartup for mobile applications.

.dynsym computation

The purpose of Symbol::includeInDynsym was somewhatambiguous, as it was used both to determine if a symbol should beexported to .dynsym and to conservatively suppresstransformations in other contexts like MarkLive and ICF. LLD 20clarifies this by introducing Symbol::isExportedspecifically for indicating whether a defined symbol should be exported.All previous uses of Symbol::includeInDynsym have beenupdated to use Symbol::isExported instead. The oldconfusing Symbol::exportDynamic has been removed.

A special case within Symbol::includeInDynsym checkedfor isUndefWeak() && ctx.arg.noDynamicLinker. (Thiscould be generalized toisUndefined() && ctx.arg.noDynamicLinker, asnon-weak undefined symbols led to errors.) This condition ensures thatundefined symbols are not included in .dynsym forstatically linked ET_DYN executables (created withclang -static-pie).

This condition has been generalized in LLD 20 to(ctx.arg.shared || !ctx.sharedFiles.empty()) && (sym->isUndefined() || sym->isExported).This means undefined symbols are excluded from .dynsym inboth ld.lld -pie a.o andld.lld -pie --no-dynamic-linker a.o, but notld.lld -pie a.o b.so. This change brings LLD's behaviormore in line with GNU ld.

Symbol::isPreemptible, indicating whether a symbol couldbe bound to another component, is computed along withisExported. This is computed in two places: during symbolversioning handling, and before relocation scanning. In LLD 19,computeIsPreemptible is called during Identical CodeFolding (ICF).

In LLD 20, a symbol is exported to .dynsym when((sym->isExported || sym->isPreemptible) && !sym->isLocal())is true.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for (Symbol *sym : ctx.symtab->getSymbols()) {
if (!sym->isUsedInRegularObj || !includeInSymtab(ctx, *sym))
continue;
if (!ctx.arg.relocatable)
sym->binding = sym->computeBinding(ctx);
if (ctx.in.symTab)
ctx.in.symTab->addSymbol(sym);

// computeBinding might localize a linker-synthesized hidden symbol
// that was considered exported.
if ((sym->isExported || sym->isPreemptible) && !sym->isLocal()) {
ctx.partitions[sym->partition - 1].dynSymTab->addSymbol(sym);
if (auto *file = dyn_cast<SharedFile>(sym->file))
if (file->isNeeded && !sym->isUndefined())
addVerneed(ctx, *sym);
}
}

Symbol::isPreemptible, indicating whether a symbol couldbe bound to another component, was calculated before relocation scanningand, in LLD 19, also during Identical Code Folding (ICF). In LLD 20, theICF-related calculation has been moved to the symbol versioning parsingstage.


Link: lld 19 ELFchanges

Natural loops

作者 MaskRay
2025年1月20日 13:00

A dominator tree can beused to compute natural loops.

  • For every node H in a post-order traversal of thedominator tree (or the original CFG), find all predecessors that aredominated by H. This identifies all back edges.
  • Each back edge T->H identifies a natural loop withH as the header.
    • Perform a flood fill starting from T in the reverseddominator tree (from exiting block to header)
    • All visited nodes reachable from the root belong to the natural loopassociated with the back edge. These nodes are guaranteed to bereachable from H due to the dominator property.
    • Visited nodes unreachable from the root should be ignored.
    • Loops associated with visited nodes are considered subloops.

Here is an C++ implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
#include <cstdio>
#include <deque>
#include <numeric>
#include <vector>
using namespace std;

vector<vector<int>> e, ee, edom;
vector<int> dfn, dfn2, rdfn, uf, best, sdom, idom;
int tick;

void dfs(int u) {
dfn[u] = tick;
rdfn[tick++] = u;
for (int v : e[u])
if (dfn[v] < 0) {
uf[v] = u;
dfs(v);
}
}

int eval(int v, int cur) {
if (dfn[v] <= cur)
return v;
int u = uf[v], r = eval(u, cur);
if (dfn[best[u]] < dfn[best[v]])
best[v] = best[u];
return uf[v] = r;
}

void semiNca(int n, int r) {
idom.assign(n, -1);
dfn.assign(n, -1);
rdfn.resize(n); // initial values are unused
uf.resize(n); // initial values are unused
sdom.resize(n); // initial values are unused
tick = 0;
dfs(r);
best.resize(n);
iota(best.begin(), best.end(), 0);
for (int i = tick; --i; ) {
int v = rdfn[i];
sdom[v] = v;
for (int u : ee[v])
if (~dfn[u]) {
eval(u, i);
if (dfn[best[u]] < dfn[sdom[v]])
sdom[v] = best[u];
}
best[v] = sdom[v];
idom[v] = uf[v];
}
edom.assign(n, vector<int>());
for (int i = 1; i < tick; i++) {
int v = rdfn[i];
while (dfn[idom[v]] > dfn[sdom[v]])
idom[v] = idom[idom[v]];
edom[idom[v]].push_back(v);
}
}

struct Loop {
int idx, header;
Loop *parent = nullptr, *child = nullptr, *next = nullptr;
vector<int> nodes;
};
deque<Loop> loops;

void postorder(int u) {
dfn[u] = tick;
for (int v : edom[u])
if (dfn[v] < 0)
postorder(v);
rdfn[tick++] = u;
dfn2[u] = tick;
}

void identifyLoops(int n, int r) {
vector<int> worklist;
vector<Loop *> to_loop(n);
dfn.assign(n, -1);
dfn2.assign(n, -1);
tick = 0;
postorder(r);
loops.clear();
for (int i = 0; i < tick; i++) {
int header = rdfn[i];
for (int u : ee[header])
if (dfn[header] <= dfn[u] && dfn2[u] <= dfn2[header])
worklist.push_back(u);
if (worklist.empty())
continue;
loops.push_back(Loop{(int)loops.size(), header});
Loop *lp = &loops.back();
while (worklist.size()) {
int v = worklist.back();
worklist.pop_back();
if (!to_loop[v]) {
if (dfn[v] < 0) // Skip unreachable node
continue;
// Find a node not in a loop.
to_loop[v] = lp;
lp->nodes.push_back(v);
if (v == header)
continue;
for (int u : ee[v])
worklist.push_back(u);
} else {
// Find a subloop.
Loop *sub = to_loop[v];
while (sub->parent)
sub = sub->parent;
if (sub == lp)
continue;
sub->parent = lp;
sub->next = lp->child;
lp->child = sub;
for (int u : ee[sub->header])
if (to_loop[u] != sub)
worklist.push_back(u);
}
}
}
}

int main() {
int n, m;
scanf("%d%d", &n, &m);
e.resize(n);
ee.resize(n);
for (int i = 0; i < m; i++) {
int u, v;
scanf("%d%d", &u, &v);
e[u].push_back(v);
ee[v].push_back(u);
}
semiNca(n, 0);
for (int i = 0; i < n; i++)
printf("%d: %d\n", i, idom[i]);

identifyLoops(n, 0);
for (Loop &lp : loops) {
printf("loop %d:", lp.idx);
for (int v : lp.nodes)
printf(" %d", v);
for (Loop *c = lp.child; c; c = c->next)
printf(" (loop %d)", c->idx);
puts("");
}
}

The code iterates over the dominator tree in post-order.Alternatively, a post-order traversal of the original control flow graphcould be used.

worklist may contain duplicate elements. This isacceptable. You could also deduplicate elements.

Importantly, the header predecessor of a subloop can be anothersubloop.

In the final loops array, parent loops are listed aftertheir child loops.

This example examines multiple subtle details: a self-loop (node 6),an unreachable node (node 8), and a scenario where the headerpredecessor of one subloop (nodes 2 and 3) leads to another subloop(nodes 4 and 5).

1
2
3
4
5
6
7
8
9
10
11
12
13
9 12
0 1
1 2
1 7
2 3
2 4
3 2
8 3
4 5
4 6
5 4
6 1
6 6

Useawk 'BEGIN{print "digraph G{"} NR>1{print $1"->"$2} END{print "}"}'to generate a graphviz dot file.

Natural loops

作者 MaskRay
2025年1月20日 13:00

A dominator tree can beused to compute natural loops.

  • For every node H in a post-order traversal of thedominator tree (or the original CFG), find all predecessors that aredominated by H. This identifies all back edges.
  • Each back edge T->H identifies a natural loop withH as the header.
    • Perform a flood fill starting from T in the reverseddominator tree (from exiting block to header)
    • All visited nodes reachable from the root belong to the natural loopassociated with the back edge. These nodes are guaranteed to bereachable from H due to the dominator property.
    • Visited nodes unreachable from the root should be ignored.
    • Loops associated with visited nodes are considered subloops.

Here is an C++ implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
#include <cstdio>
#include <deque>
#include <numeric>
#include <vector>
using namespace std;

vector<vector<int>> e, ee, edom;
vector<int> dfn, dfn2, rdfn, uf, best, sdom, idom;
int tick;

void dfs(int u) {
dfn[u] = tick;
rdfn[tick++] = u;
for (int v : e[u])
if (dfn[v] < 0) {
uf[v] = u;
dfs(v);
}
}

int eval(int v, int cur) {
if (dfn[v] <= cur)
return v;
int u = uf[v], r = eval(u, cur);
if (dfn[best[u]] < dfn[best[v]])
best[v] = best[u];
return uf[v] = r;
}

void semiNca(int n, int r) {
idom.assign(n, -1);
dfn.assign(n, -1);
rdfn.resize(n); // initial values are unused
uf.resize(n); // initial values are unused
sdom.resize(n); // initial values are unused
tick = 0;
dfs(r);
best.resize(n);
iota(best.begin(), best.end(), 0);
for (int i = tick; --i; ) {
int v = rdfn[i];
sdom[v] = v;
for (int u : ee[v])
if (~dfn[u]) {
eval(u, i);
if (dfn[best[u]] < dfn[sdom[v]])
sdom[v] = best[u];
}
best[v] = sdom[v];
idom[v] = uf[v];
}
edom.assign(n, vector<int>());
for (int i = 1; i < tick; i++) {
int v = rdfn[i];
while (dfn[idom[v]] > dfn[sdom[v]])
idom[v] = idom[idom[v]];
edom[idom[v]].push_back(v);
}
}

struct Loop {
int idx, header;
Loop *parent = nullptr, *child = nullptr, *next = nullptr;
vector<int> nodes;
};
deque<Loop> loops;

void postorder(int u) {
dfn[u] = tick;
for (int v : edom[u])
if (dfn[v] < 0)
postorder(v);
rdfn[tick++] = u;
dfn2[u] = tick;
}

void identifyLoops(int n, int r) {
vector<int> worklist;
vector<Loop *> to_loop(n);
dfn.assign(n, -1);
dfn2.assign(n, -1);
tick = 0;
postorder(r);
loops.clear();
for (int i = 0; i < tick; i++) {
int header = rdfn[i];
for (int u : ee[header])
if (dfn[header] <= dfn[u] && dfn2[u] <= dfn2[header])
worklist.push_back(u);
if (worklist.empty())
continue;
loops.push_back(Loop{(int)loops.size(), header});
Loop *lp = &loops.back();
while (worklist.size()) {
int v = worklist.back();
worklist.pop_back();
if (!to_loop[v]) {
if (dfn[v] < 0) // Skip unreachable node
continue;
// Find a node not in a loop.
to_loop[v] = lp;
lp->nodes.push_back(v);
if (v == header)
continue;
for (int u : ee[v])
worklist.push_back(u);
} else {
// Find a subloop.
Loop *sub = to_loop[v];
while (sub->parent)
sub = sub->parent;
if (sub == lp)
continue;
sub->parent = lp;
sub->next = lp->child;
lp->child = sub;
for (int u : ee[sub->header])
if (to_loop[u] != sub)
worklist.push_back(u);
}
}
}
}

int main() {
int n, m;
scanf("%d%d", &n, &m);
e.resize(n);
ee.resize(n);
for (int i = 0; i < m; i++) {
int u, v;
scanf("%d%d", &u, &v);
e[u].push_back(v);
ee[v].push_back(u);
}
semiNca(n, 0);
for (int i = 0; i < n; i++)
printf("%d: %d\n", i, idom[i]);

identifyLoops(n, 0);
for (Loop &lp : loops) {
printf("loop %d:", lp.idx);
for (int v : lp.nodes)
printf(" %d", v);
for (Loop *c = lp.child; c; c = c->next)
printf(" (loop %d)", c->idx);
puts("");
}
}

The code iterates over the dominator tree in post-order.Alternatively, a post-order traversal of the original control flow graphcould be used.

worklist may contain duplicate elements. This isacceptable. You could also deduplicate elements.

Importantly, the header predecessor of a subloop can be anothersubloop.

In the final loops array, parent loops are listed aftertheir child loops.

This example examines multiple subtle details: a self-loop (node 6),an unreachable node (node 8), and a scenario where the headerpredecessor of one subloop (nodes 2 and 3) leads to another subloop(nodes 4 and 5).

1
2
3
4
5
6
7
8
9
10
11
12
13
9 12
0 1
1 2
1 7
2 3
2 4
3 2
8 3
4 5
4 6
5 4
6 1
6 6

Useawk 'BEGIN{print "digraph G{"} NR>1{print $1"->"$2} END{print "}"}'to generate a graphviz dot file.

Understanding and improving Clang -ftime-report

作者 MaskRay
2025年1月12日 16:00

Clang provides a few options to generate timing report. Among them,-ftime-report and -ftime-trace can be used toanalyze the performance of Clang's internal passes.

  • -fproc-stat-report records time and memory on spawnedprocesses (ld, and gas if-fno-integrated-as).
  • -ftime-trace, introduced in 2019, generates Clangtiming information in the Chrome Trace Event format (JSON). The formatsupports nested events, providing a rich view of the front end.
  • -ftime-report: The option name is borrowed fromGCC.

This post focuses on the traditional -ftime-report,which uses a line-based textual format.

Understanding-ftime-report output

The output consists of information about multiple timer groups. Thelast group spans the largest interval and encompasses timing data fromother groups.

Up to Clang 19, the last group is called "Clang front-end timereport". You would see something like the following.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
% clang -c -w -ftime-report ~/Dev/testsuite/sqlite3.i
...
===-------------------------------------------------------------------------===
Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2993 ( 71.5%) 0.1069 ( 93.5%) 0.4062 ( 76.3%) 0.4066 ( 76.2%) Code Generation Time
0.1190 ( 28.5%) 0.0074 ( 6.5%) 0.1264 ( 23.7%) 0.1270 ( 23.8%) LLVM IR Generation Time
0.4183 (100.0%) 0.1143 (100.0%) 0.5326 (100.0%) 0.5336 (100.0%) Total
...
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 0.7780 seconds (0.7788 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.6538 (100.0%) 0.1241 (100.0%) 0.7780 (100.0%) 0.7788 (100.0%) Clang front-end timer
0.6538 (100.0%) 0.1241 (100.0%) 0.7780 (100.0%) 0.7788 (100.0%) Total

The "Clang front-end timer" timer measured the time spent inclang::FrontendAction::Execute, which includes lexing,parsing, semantic analysis, LLVM IR generation, optimization, andmachine code generation. However, "Code Generation Time" and "LLVM IRGeneration Time" belonged to the default timer group "MiscellaneousUngrouped Timers". This caused confusion for many users. For example, https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/elaborates on the issues.

To address the ambiguity, I revamped the output in Clang 20.

1
2
3
4
5
6
7
8
9
10
11
12
...
===-------------------------------------------------------------------------===
Clang time report
===-------------------------------------------------------------------------===
Total Execution Time: 0.7685 seconds (0.7686 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2798 ( 42.4%) 0.0966 ( 89.6%) 0.3765 ( 49.0%) 0.3768 ( 49.0%) Machine code generation
0.2399 ( 36.3%) 0.0045 ( 4.2%) 0.2445 ( 31.8%) 0.2442 ( 31.8%) Front end
0.1179 ( 17.8%) 0.0067 ( 6.2%) 0.1246 ( 16.2%) 0.1246 ( 16.2%) LLVM IR generation
0.0230 ( 3.5%) 0.0000 ( 0.0%) 0.0230 ( 3.0%) 0.0230 ( 3.0%) Optimizer
0.6606 (100.0%) 0.1079 (100.0%) 0.7685 (100.0%) 0.7686 (100.0%) Total

The last group has been renamed and changed to cover a longerinterval within the invocation. It provides timing information for fourstages:

  • Front end: Includes lexing, parsing, semantic analysis, andmiscellnaenous tasks not captured by the subsequent timers.
  • LLVM IR generation: The time spent in generating LLVM IR.
  • LLVM IR optimization: The time consumed by LLVM's IR optimizationpipeline.
  • Machine code generation: The time taken to generate machine code orassembly from the optimized IR.

The -ftime-report output further elaborates on thesestages through additional groups:

  • "Pass execution timing report" (first instance): A subset of the"Optimizer" group, providing detailed timing for individual optimizationpasses.
  • "Analysis execution timing report": A subset of the first "Passexecution timing report". In LLVM's new pass manager, analyses areexecuted as part of pass invocations.
  • "Pass execution timing report" (second instance): A subset of the"Machine code generation" group. (This group's name should be updatedonce the legacy pass manager is no longer used for IRoptimization.)
  • "Instruction Selection and Scheduling": This group appears whenSelectionDAG is utilized and is part of the "Instruction Selection"timer within the second "Pass execution timing report".

Examples:

"Pass execution timing report" (first instance)

1
2
3
4
5
6
7
8
9
10
===-------------------------------------------------------------------------===
Pass execution timing report
===-------------------------------------------------------------------------===
Total Execution Time: 3.0009 seconds (3.0016 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.9626 ( 32.7%) 0.0162 ( 26.6%) 0.9788 ( 32.6%) 0.9790 ( 32.6%) InstCombinePass
0.3203 ( 10.9%) 0.0056 ( 9.2%) 0.3259 ( 10.9%) 0.3263 ( 10.9%) InlinerPass
0.3123 ( 10.6%) 0.0068 ( 11.1%) 0.3190 ( 10.6%) 0.3187 ( 10.6%) SimplifyCFGPass
...

When -ftime-report=per-run-pass is specified, a timer iscreated for each pass object. This can result in significant output,especially for modules with numerous functions, as each pass will bereported multiple times.

Clang internals

As clang -### -c -ftime-report shows, clangDriverforwards -ftime-report to Clang cc1. Within cc1, thisoption sets the codegen flagclang::CodeGenOptions::TimePasses. This flag enables ethuses of llvm::Timer objects to measure the execution timeof specific code blocks.

From Clang 20 onwards, the placement of the timers can be understoodthrough the following call tree.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
cc1_main
ExecuteCompilerInvocation // "Front end" minus the following timers
... all kinds of initialization
CompilerInstance::ExecuteAction
FrontendAction::BeginSourceFile
FrontendAction::Execute
FrontendAction::ExecutionAction
ASTFrontendAction::ExecuteAction
ParseAST
BackendConsumer::HandleTranslationUnit
clang::emitBackendOutput
EmitAssemblyHelper::emitAssembly
RunOptimizationPipeline // "Optimizer"
RunCodegenPipeline // "Machine code generation"
FrontendAction::EndSourceFile

The measured interval does not cover the whole invocation. integratedcc1 clang -c -ftime-report a.c

LLVM internals

LLVM/lib/Support/Time.cpp implements the timer feature.Timer belongs to a TimerGroup.Timer::startTimer and Timer::stopTimergenerate a TimeRecord. Inclang/tools/driver/cc1_main.cpp,llvm::TimerGroup::printAll(llvm::errs()); dumps theseTimerGroup and TimeRecord information tostderr.

There are a few cl::opt options

  • sort-timers (default: true): sort the timers in a groupin descending wall time.
  • track-memory: record increments or decrements in mallocstatistics. In glibc 2.33 and above, this utilizesmallinfo2::unordblks.
  • info-output-file: dump output to the specifiedfile.

Examples:

1
2
clang -c -ftime-report -mllvm -sort-timers=0 a.c
clang -c -ftime-report -mllvm=-sort-timers=0 a.c

The cl::opt option -time-passes can be used with theLLVM internal tools opt and llc, e.g.

1
2
opt -S -passes='default<O2>' -time-passes < a.ll
llc -time-passes < a.ll

On Apple platforms, LLVM_SUPPORT_XCODE_SIGNPOSTS=onbuilds enableos_signpost forstartTimer/stopTimer.

The -ftime-report system has a significant limitation:it doesn't support nested timers. Although adding more timer groupsmight seem like a solution, the resulting output lacks any hierarchicalstructure, making it difficult to understand.

2024年总结

作者 MaskRay
2024年12月31日 16:00

一如既往,主要在工具链领域耕耘。

Blogging

I have been busy creating posts, authoring a total of 31 blog posts(including this one). 7 posts resonated on Hacker News, garnering over50 points. (https://news.ycombinator.com/from?site=maskray.me).

I have also revised many posts initially written between 2020 and2024.

Mastodon: https://hachyderm.io/@meowray

GCC

I made 5 commits to the project, including the addition of the x86inline asm constraint "Ws". you can read more about that in my earlierpost Rawsymbol names in inline assembly.

I believe that modernizing code review and test infrastructure willenhance the contributor experience and attract more contributors.

llvm-project

  • Reviewednumerous patches. queryis:pr created:>2024-01-01 reviewed-by:MaskRay => "989Closed"
  • Official maintainer status on the MC layer and binary utilities
  • My involvement with LLVM 18 and 19

Key Points:

  • TODO
  • Added a script update_test_body.pyto generate elaborated IR and assembly tests (#89026)
  • MC
    • Made some MCand assembler improvements in LLVM 19
    • Fixed some intrusive changes to the generic code due to AIX andz/OS.
    • Made llvm-mc better as an assemblerand disassembler
  • Light ELF
    • Implementeda compact relocation format for ELF
  • AArch64mapping symbol size optimization
  • Enabled StackSafetyAnalysis for AddressSanitizer to removeinstrumentations on stack-allocated variables that are guaranteed to besafe from memory access bugs
    • Bail out if MemIntrinsic length is -1
    • Bail out when calling ifunc
  • Added the Clang cc1 option--output-asm-variant= and cleaned up internals of itsfriends (x86-asm-syntax).
  • llvm/ADT/Hashing.hstability

llvm/ADT/Hashing.h stability

To facilitate improvements, llvm/ADT/Hashing.h promisedto be non-deteriministic so that users could not depend on exact hashvalues. However, the values were actually deterministic unlessset_fixed_execution_hash_seed was called. A lot of internalcode incorrectly relied on the stability ofhash_value/hash_combine/hash_combine_range. I have fixedthem and landed https://github.com/llvm/llvm-project/pull/96282 to makethe hash value non-deteriministic inLLVM_ENABLE_ABI_BREAKING_CHECKS builds.

lld/ELF

lld/ELF is quite stable. I have made some maintenance changes. Asusual, I wrote the ELF port's release notes for the two releases. See lld 18 ELF changes and lld 19 ELF changes fordetail.

Linux kernel

Contributed 4 commits.

ccls

I finally removed support for LLVM 7, 8, and 9. The latest release https://github.com/MaskRay/ccls/releases/tag/0.20241108has some nice features.

  • didOpen: sort index requests. When you open A/B/foo.cc, files under"A/B/" and "A/" will be prioritized during the initial indexing process,leading to a quicker response time.
  • Support for older these LLVM versions 7, 8, and 9 has beendropped.
  • LSP semantic tokens are now supported. See usage guidehttps://maskray.me/blog/2024-10-20-ccls-and-lsp-semantic-tokens usage(including rainbow semantic highlighting)
  • textDocument/switchSourceHeader (LSP extension) is nowsupported.

Misc

Reported 12 feature requests or bugs to binutils.

  • objdump -R: dump SHT_RELR relocations?
  • gas arm aarch64: missing mapping symbols $d in the absence of alignment directives
  • gas: Extend .loc directive to emit a label
  • Compressed .strtab and .symtab
  • gas: Support \+ in .rept/.irp/.irpc directives
  • ld: Add CLASS to allow separate section matching and referring
  • gas/ld: Implicit addends for non-code sections
  • binutils: Support CREL relocation format
  • ld arm: global/weak non-hidden symbols referenced by R_ARM_FUNCDESC are unnecessarily exported
  • ld arm: fdpic link segfaults on R_ARM_GOTOFFFUNCDESC referencing a hidden symbol
  • ld arm: fdpic link may have null pointer dereference in allocate_dynrelocs_for_symbol
  • objcopy: add --prefix-symbols-remove

Reported 2 feature requests to glibc

  • Feature request: special static-pie capable of loading the interpreter from a relative path
  • rtld: Support DT_CREL relocation format

Skipping boring functions in debuggers

作者 MaskRay
2024年12月30日 16:00

In debuggers, stepping into a function with arguments that involvefunction calls may step into the nested function calls, even if they aresimple and uninteresting, such as those found in the C++ STL.

GDB

Consider the following example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <cstdio>
#include <memory>
#include <vector>
using namespace std;

void foo(int i, int j) {
printf("%d %d\n", i, j);
}

int main() {
auto i = make_unique<int>(3);
vector v{1,2};
foo(*i, v.back()); // step into
}

When GDB stops at the foo call, the step(s) command will step into std::vector::backand std::unique_ptr::operator*. While you can executefinish (fin) and then execute sagain, it's time-consuming and distracting, especially when dealing withcomplex argument expressions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
% g++ -g a.cc -o a
% gdb ./a
...
(gdb) s
std::vector<int, std::allocator<int> >::back (this=0x7fffffffddd0) at /usr/include/c++/14.2.1/bits/stl_vector.h:1235
1235 back() _GLIBCXX_NOEXCEPT
(gdb) fin
Run till exit from #0 std::vector<int, std::allocator<int> >::back (this=0x7fffffffddd0) at /usr/include/c++/14.2.1/bits/stl_vector.h:1235
0x00005555555566f8 in main () at a.cc:13
13 foo(*i, v.back());
Value returned is $1 = (__gnu_cxx::__alloc_traits<std::allocator<int>, int>::value_type &) @0x55555556c2d4: 2
(gdb) s
std::unique_ptr<int, std::default_delete<int> >::operator* (this=0x7fffffffddc0) at /usr/include/c++/14.2.1/bits/unique_ptr.h:447
447 __glibcxx_assert(get() != pointer());
(gdb) fin
Run till exit from #0 std::unique_ptr<int, std::default_delete<int> >::operator* (this=0x7fffffffddc0) at /usr/include/c++/14.2.1/bits/unique_ptr.h:447
0x0000555555556706 in main () at a.cc:13
13 foo(*i, v.back());
Value returned is $2 = (int &) @0x55555556c2b0: 3
(gdb) s
foo (i=3, j=2) at a.cc:7
7 printf("%d %d\n", i, j);

This problem was tracked as a feature request in 2003: https://sourceware.org/bugzilla/show_bug.cgi?id=8287.Fortunately, GDB provides the skipcommand to skip functions that match a regex or filenames that matcha glob (GDB 7.12 feature). You can skip all demangled function namesthat start with std::.

1
skip -rfu ^std::

Alternatively, you can executeskip -gfi /usr/include/c++/*/bits/* to skip these libstdc++files.

Important note:

The skip command's file matching behavior uses thefnmatch function with the FNM_FILE_NAMEflag. This means the wildcard character (*) won't matchslashes. So, skip -gfi /usr/* won't exclude/usr/include/c++/14.2.1/bits/stl_vector.h.

I proposed to dropthe FNM_FILE_NAME flag. With GDB 17, I will be able toskip a project directory with

1
skip -gfi */include/llvm/ADT/*

instead of

1
skip -gfi /home/ray/llvm/llvm/include/llvm/ADT/*

User functionscalled by skipped functions

When a function (let's call it "A") is skipped during debugging, anyuser-defined functions that are called by "A" will also be skipped.

For example, consider the following code snippet:

1
2
3
std::vector<int> a{1, 2};
if (std::all_of(a.begin(), a.end(), predicate)) {
}

If std::all_of is skipped due to a skipcommand, predicate called within std::all_ofwill also be skipped when you execute s at the ifstatement.

LLDB

By default, LLDB avoids stepping into functions whose names startwith std:: when you use the s(step, thread step-in) command. This behavioris controlled by a setting:

1
2
3
4
(lldb) settings show target.process.thread.step-avoid-regexp
target.process.thread.step-avoid-regexp (regex) = ^std::
(lldb) set sh target.process.thread.step-avoid-libraries
target.process.thread.step-avoid-libraries (file-list) =

target.process.thread.step-avoid-libraries can be usedto skip functions defined in a library.

While the command settings set is long, you can shortenit to set set.

Visual Studio

Visual Studio provides a debugging feature JustMy Code that automatically steps over calls to system,framework, and other non-user code.

It also supports a Step Into Specific command, whichseems interesting.

The implementation inserts a call to__CheckForDebuggerJustMyCode at the start of every userfunction. The function(void __CheckForDebuggerJustMyCode(const char *flag)) takesa global variable defined in the .msvcjmc section anddetermines whether the debugger should stop.

This LLDB feature request has a nice description: https://github.com/llvm/llvm-project/issues/61152.

For the all_of example, the feature can possibly allowthe debugger to stop at test.

1
2
3
std::vector<int> a{1, 2};
if (std::all_of(a.begin(), a.end(), test)) {
}

Fuchsia zxdb

The Fuchsia debugger "zxdb" provides a command "ss"similar to Visual Studio's "Step Into Specific".

1
2
3
4
5
6
7
8
[zxdb] ss
1 std::string::string
2 MyClass::MyClass
3 HelperFunctionCall
4 MyClass::~MyClass
5 std::string::~string
quit
>

Exporting Tweets

作者 MaskRay
2024年12月25日 16:00

On https://x.com/settings/, clickMore -> Settings and privacy -> Download an archive of your data.Wait for a message from x.com: "@XXX your X data is ready" Download thearchive.

1
cp data/tweets.js tweets.ts

Change the first line from window.YTD.tweets.part0 = [to let part0 = [, and append

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import { unescape } from "@std/html/entities";

let out = part0.map(tw => [new Date(tw.tweet.created_at), tw.tweet.full_text])
out.sort((a,b) => a[0] - b[0])

let yy0 = 0, mm0 = 0, str = ''
for (let i=0, j=0; i<=out.length; i++) {
let d = i<out.length ? out[i][0] : new Date('9999-12-31')
let yy = d.getYear()+1900, mm = d.getMonth()+1
if (yy0 != yy) {
if (str.length) {
try {
Deno.mkdirSync(String(yy0))
} catch (e) {
}
Deno.writeTextFileSync(`${yy0}/index.md`, str)
}
yy0 = yy
mm0 = 0
str = `# ${yy0}\n`
if (i == out.length) break
}
if (mm0 != mm) {
str += `\n## ${yy}-${String(mm).padStart(2,'0')}\n`
mm0 = mm
}
str += `\n${unescape(out[i][1]).replace(/(http(s)?:[-/.\w]+)/, "<$1>")}\n`
}

Then run deno run --allow-write=. tweets.ts

1
2
3
4
5
6
7
8
9
10
11
12
% cat 2022/index.md
# 2022

## 2022-01

tweet0

tweet1

## 2022-02

...

tweet0

tweet1

Simplifying disassembly with LLVM tools

作者 MaskRay
2024年12月22日 16:00

Both compiler developers and security researchers have builtdisassemblers. They often prioritize different aspects. Compilertoolchains, benefiting from direct contributions from CPU vendors, tendto offer more accurate and robust decoding. Security-focused tools, onthe other hand, often excel in user interface design.

For quick disassembly tasks, rizinprovides a convenient command-line interface.

1
2
3
% rz-asm -a x86 -b 64 -d 4829c390
sub rbx, rax
nop

-a x86 can be omitted.

llvm-mc

Within the LLVM ecosystem, llvm-objdump serves as a drop-inreplacement for the traditional GNU objdump, leveraging instructioninformation from LLVM's TableGen files(llvm/lib/Target/*/*.td). Another LLVM tool, llvm-mc, wasoriginally designed for internal testing of the Machine Code (MC) layer,particularly the assembler and disassembler components. There arenumerous RUN: llvm-mc ... tests withinllvm/test/MC. Despite its internal origins, llvm-mc isoften distributed as part of the LLVM toolset, making it accessible tousers.

However, using llvm-mc for simple disassembly tasks can becumbersome. It requires explicitly prefixing hexadecimal byte valueswith 0x:

1
2
3
4
% echo 0x48 0x29 0xc3 0x90 | llvm-mc --triple=x86_64 --cdis --output-asm-variant=1
.text
sub rbx, rax
nop

Let's break down the options used in this command:

  • --triple=x86_64: This specifies the targetarchitecture. If your LLVM build's default target triple is alreadyx86_64-*-*, this option can be omitted.
  • --output-asm-variant=1:LLVM, like GCC, defaults to AT&T syntax for x86 assembly. Thisoption switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntaxif you prefer the Intel syntax in compiler toolchains.
  • --cdis: Introduced in LLVM 18, this option enablescolored disassembly. In older LLVM versions, you have to use--disassemble.

I have contributed patches to remove.text and allow disassemblingraw bytes without the 0x prefix. You can now use the--hex option:

1
2
3
% echo 4829c390 | llvm-mc --cdis --hex --output-asm-variant=1
sub rbx, rax
nop

You can further simplify this by creating a bash/zsh function. bashand zsh's "here string" feature provides a clean way to specifystdin.

1
2
3
disasm() {
llvm-mc --cdis --hex --output-asm-variant=1 <<< $@
}
1
2
3
4
5
6
% disasm 4829c390
sub rbx, rax
nop
% disasm $'4829 c3\n# comment\n90'
sub rbx, rax
nop

The --hex option conveniently ignores whitespace and#-style comments within the input.

Atomic blocks

llvm-mc handles decoding failures by skipping a number of bytes, asdetermined by the target-specificllvm::MCDisassembler::getInstruction. To treat a sequenceof bytes as a single unit during disassembly, enclose them within[].

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
% echo 'f995ab99f995 ab99' | fllvm-mc --triple=riscv64 --cdis --hex
<stdin>:1:1: warning: invalid instruction encoding
f995ab99f995 ab99
^
<stdin>:1:5: warning: invalid instruction encoding
f995ab99f995 ab99
^
<stdin>:1:14: warning: invalid instruction encoding
f995ab99f995 ab99
^
<stdin>:1:16: warning: invalid instruction encoding
f995ab99f995 ab99
^
% echo '[f995ab99][f995 ab99]' | fllvm-mc --triple=riscv64 --cdis --hex
<stdin>:1:2: warning: invalid instruction encoding
[f995ab99][f995 ab99]
^
<stdin>:1:12: warning: invalid instruction encoding
[f995ab99][f995 ab99]
^

llvm-mc can also function as an assembler:

1
2
% echo 'li t3, 42' | llvm-mc -show-encoding --triple=riscv64
li t3, 42 # encoding: [0x13,0x0e,0xa0,0x02]

(I've contributed a change to LLVM 20 that removesthe previously printed .text directive.)

llvm-objdump

For address information, llvm-mc falls short. We need to turn tollvm-objdump to get that detail. Here is a little fish script that takesraw hex bytes as input, converts them to a binary format(xxd -r -p), and then creates an ELF relocatable file(llvm-objcopy -I binary) targeting the x86-64 architecture.Finally, llvm-objdump with the -D flag disassembles thedata section (.data) containing the converted binary.

1
2
#!/usr/bin/env fish
llvm-objdump -D -j .data (echo $argv | xxd -r -p | llvm-objcopy -I binary -O elf64-x86-64 - - | psub) | sed '1,/<_binary__stdin__start>:/d'

Here is a more feature-rich script that supports multiplearchitectures:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/usr/bin/env fish
argparse a/arch= att r -- $argv; or return 1
if test -z "$_flag_arch"; set _flag_arch x86_64; end
set opt --triple=$_flag_arch
if test -z "$_flag_att" && string match -rq 'i.86|x86_64' $_flag_arch; set -a opt -M intel; end
if test -n "$_flag_r"; set -a opt --no-leading-addr; set -a opt --no-show-raw-insn; end

switch $_flag_arch
case arm; set bfdname elf32-littlearm
case aarch64; set bfdname elf64-littleaarch64
case ppc32; set bfdname elf32-powerpc
case ppc32le; set bfdname elf32-powerpcle
case ppc64; set bfdname elf64-powerpc
case ppc64le; set bfdname elf64-powerpcle
case riscv32; set bfdname elf32-littleriscv
case riscv64; set bfdname elf64-littleriscv
case 'i?86'; set bfdname elf32-i386
case x86_64; set bfdname elf64-x86-64
case '*'; echo unknown arch >&2; return 1
end
llvm-objdump -D -j .data $opt (echo $argv | xxd -r -p | llvm-objcopy -I binary -O $bfdname - - | psub) | sed '1,/<_binary__stdin__start>:/d'
1
2
3
4
5
6
7
8
9
10
11
12
% ./disasm e8 00000000c3 e800000000 c3
0: e8 00 00 00 00 call 0x5 <_binary__stdin__start+0x5>
5: c3 ret
6: e8 00 00 00 00 call 0xb <_binary__stdin__start+0xb>
b: c3 ret
% ./disasm -r e8 00000000c3 e800000000 c3
call 0x5 <_binary__stdin__start+0x5>
ret
call 0xb <_binary__stdin__start+0xb>
ret
% ./disasm -a riscv64 1300 0000
0: 00000013 nop

Summary

  • Assembler: llvm-mc --show-encoding
  • Disassembler: llvm-mc --cdis --hex
  • Disassembler with address information: xxd -r -p,llvm-objcopy, andllvm-objdump -D -j .data

Simplifying disassembly with llvm-mc

作者 MaskRay
2024年12月22日 16:00

Both compiler developers and security researchers have builtdisassemblers. They often prioritize different aspects. Compilertoolchains, benefiting from direct contributions from CPU vendors, tendto offer more accurate and robust decoding. Security-focused tools, onthe other hand, often excel in user interface design.

For quick disassembly tasks, rizinprovides a convenient command-line interface.

1
2
3
% rz-asm -a x86 -b 64 -d 4829c390
sub rbx, rax
nop

-a x86 can be omitted.

Within the LLVM ecosystem, llvm-objdump serves as a drop-inreplacement for the traditional GNU objdump, leveraging instructioninformation from LLVM's TableGen files(llvm/lib/Target/*/*.td). Another LLVM tool, llvm-mc, wasoriginally designed for internal testing of the Machine Code (MC) layer,particularly the assembler and disassembler components. There arenumerous RUN: llvm-mc ... tests withinllvm/test/MC. Despite its internal origins, llvm-mc isoften distributed as part of the LLVM toolset, making it accessible tousers.

However, using llvm-mc for simple disassembly tasks can becumbersome. It requires explicitly prefixing hexadecimal byte valueswith 0x:

1
2
3
4
% echo 0x48 0x29 0xc3 0x90 | llvm-mc --triple=x86_64 --cdis --output-asm-variant=1
.text
sub rbx, rax
nop

Let's break down the options used in this command:

  • --triple=x86_64: This specifies the targetarchitecture. If your LLVM build's default target triple is alreadyx86_64-*-*, this option can be omitted.
  • --output-asm-variant=1:LLVM, like GCC, defaults to AT&T syntax for x86 assembly. Thisoption switches to the Intel syntax. See lhmouse/mcfgthread/wiki/Intel-syntaxif you prefer the Intel syntax in compiler toolchains.
  • --cdis: Introduced in LLVM 18, this option enablescolored disassembly. In older LLVM versions, you have to use--disassemble.

I have contributed patches to remove.text and allow disassemblingraw bytes without the 0x prefix. You can now use the--hex option:

1
2
3
% echo 4829c390 | llvm-mc --cdis --hex --output-asm-variant=1
sub rbx, rax
nop

You can further simplify this by creating a shell alias:

1
alias disasm="llvm-mc --cdis --hex --output-asm-variant=1"

bash and zsh's "here string" feature provides a clean way to specifystdin.

1
2
3
4
5
6
% disasm <<< 4829c390
sub rbx, rax
nop
% disasm <<< $'4829 c3\n# comment\n90'
sub rbx, rax
nop

The --hex option conveniently ignores whitespace and#-style comments within the input.


clang-format and single-line statements

作者 MaskRay
2024年12月1日 16:00

The Google C++ Style is widely adopted by projects. It contains abrace omission guideline in Loopingand branching statements:

For historical reasons, we allow one exception to the above rules:the curly braces for the controlled statement or the line breaks insidethe curly braces may be omitted if as a result the entire statementappears on either a single line (in which case there is a space betweenthe closing parenthesis and the controlled statement) or on two lines(in which case there is a line break after the closing parenthesis andthere are no braces).

1
2
3
4
5
6
7
8
9
// OK - fits on one line.
if (x == kFoo) { return new Foo(); }

// OK - braces are optional in this case.
if (x == kFoo) return new Foo();

// OK - condition fits on one line, body fits on another.
if (x == kBar)
Bar(arg1, arg2, arg3);

In clang-format's predefined Google style for C++, there are tworelated style options:

1
2
3
% clang-format --dump-config --style=Google | grep -E 'AllowShort(If|Loop)'
AllowShortIfStatementsOnASingleLine: WithoutElse
AllowShortLoopsOnASingleLine: true

The two options cause clang-format to aggressively join lines for thefollowing code:

1
2
3
4
5
6
7
8
for (int x : a)
foo(x);

while (cond())
foo(x);

if (x)
foo(x);

As a heavy debugger user, I find this behavior cumbersome.

1
2
3
4
5
6
7
// clang-format --style=Google
#include <vector>
void foo(int v) {}
int main() {
std::vector<int> a{1, 2, 3};
for (int x : a) foo(x); // breakpoint
}

When GDB stops at the for loop, how can I step into theloop body? Unfortunately, it's not simple.

If I run step, GDB will dive into the implementationdetail of the range-based for loop. It will stop at thestd::vector::begin function. Stepping out and executingstep again will stop at the std::vector::endfunction. Stepping out and executing step another time willstop at the operator!= function of the iterator type. Hereis an interaction example with GDB:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
(gdb) n
5 for (int x : a) foo(v);
(gdb) s
std::vector<int, std::allocator<int> >::begin (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:873
873 begin() _GLIBCXX_NOEXCEPT
(gdb) fin
Run till exit from #0 std::vector<int, std::allocator<int> >::begin (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:873
0x00005555555561d5 in main () at a.cc:5
5 for (int x : a) foo(v);
Value returned is $1 = 1
(gdb) s
std::vector<int, std::allocator<int> >::end (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:893
893 end() _GLIBCXX_NOEXCEPT
(gdb) fin
Run till exit from #0 std::vector<int, std::allocator<int> >::end (this=0x7fffffffdcc0) at /usr/include/c++/14.2.1/bits/stl_vector.h:893
0x00005555555561e5 in main () at a.cc:5
5 for (int x : a) foo(v);
Value returned is $2 = 0
(gdb) s
__gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > > (__lhs=1, __rhs=0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1235
1235 { return __lhs.base() != __rhs.base(); }
(gdb) fin
Run till exit from #0 __gnu_cxx::operator!=<int*, std::vector<int, std::allocator<int> > > (__lhs=1, __rhs=0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1235
0x0000555555556225 in main () at a.cc:5
5 for (int x : a) foo(v);
Value returned is $3 = true
(gdb) s
__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator* (this=0x7fffffffdca0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1091
1091 { return *_M_current; }
(gdb) fin
Run till exit from #0 __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator* (this=0x7fffffffdca0) at /usr/include/c++/14.2.1/bits/stl_iterator.h:1091
0x00005555555561f7 in main () at a.cc:5
5 for (int x : a) foo(v);
Value returned is $4 = (int &) @0x55555556b2b0: 1

You can see that this can significantly hinder the debugging process,as it forces the user to delve into uninteresting function calls of therange-based for loop.

In contrast, when the loop body is on the next line, we can just runnext to skip the three uninteresting function calls:

1
2
for (int x : a) // next
foo(x); // step

The AllowShortIfStatementsOnASingleLine style option issimilar. While convenient for simple scenarios, it can sometimes hinderdebuggability.

For the following code, it's not easy to skip the c()and d() function calls if you just want to step intofoo(v).

1
if (c() && d()) foo(v);

Many developers, mindful of potential goto fail-likeissues, often opt to include braces in their code. clang-format'sdefault style can further reinforce this practice.

1
2
3
4
5
6
7
// clang-format does not join lines.
if (v) {
foo(v);
}
for (int x : a) {
foo(x);
}

Other predefined styles

clang-format's Chromium style is a variant of the Google style anddoes not have the aforementioned problem. The LLVM style, and manystyles derived from it, do not have the problem either.

1
2
3
4
5
6
% clang-format --dump-config --style=Chromium | grep -E 'AllowShort(If|Loop)'
AllowShortIfStatementsOnASingleLine: Never
AllowShortLoopsOnASingleLine: false
% clang-format --dump-config --style=LLVM | grep -E 'AllowShort(If|Loop)'
AllowShortIfStatementsOnASingleLine: Never
AllowShortLoopsOnASingleLine: false

A comparative look atother Languages

Go, Odin, and Rust require {} for if statements but omit(), striking a balance between clarity and conciseness.C/C++'s required ()` makes opt-in braces feel a bit verbose.

C3 and Jai, similar to C++, make {} optional.

Removing global state from LLD

作者 MaskRay
2024年11月17日 16:00

LLD, the LLVM linker, is a matureand fast linker supporting multiple binary formats (ELF, Mach-O,PE/COFF, WebAssembly). Designed as a standalone program, the code baserelies heavily on global state, making it less than ideal for libraryintegration. As outlined in RFC:Revisiting LLD-as-a-library design, two main hurdles exist:

  • Fatal errors: they exit the process without returning control to thecaller. This was actually addressed for most scenarios in 2020 byutilizing llvm::sys::Process::Exit(val, /*NoCleanup=*/true)and CrashRecoveryContext (longjmp under thehood).
  • Global variable conflicts: shared global variables do not allow twoconcurrent invocation.

I understand that calling a linker API could be convenient,especially when you want to avoid shipping another executable (which canbe large when you link against LLVM statically). However, I believe thatinvoking LLD as a separate process remains the recommended approach.There are several advantages:

  • Build system control: Build systems gain greater control overscheduling and resource allocation for LLD. In an edit-compile-linkcycle, the link could need more resources and threading is moreuseful.
  • Better parallelism management
  • Global state isolation: LLVM's global state (primarilycl::opt and ManagedStatic) is isolated.

While spawning a new process offers build system benefits, the issueof global state usage within LLD remains a concern. This is a factor toconsider, especially for advanced use cases. Here are global variablesin the LLD 15 code base.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
% rg '^extern [^(]* \w+;' lld/ELF
lld/ELF/SyntheticSections.h
1290:extern InStruct in;

lld/ELF/Symbols.h
51:extern SmallVector<SymbolAux, 0> symAux;

lld/ELF/SymbolTable.h
87:extern std::unique_ptr<SymbolTable> symtab;

lld/ELF/InputSection.h
33:extern std::vector<Partition> partitions;
403:extern SmallVector<InputSectionBase *, 0> inputSections;
408:extern llvm::DenseSet<std::pair<const Symbol *, uint64_t>> ppc64noTocRelax;

lld/ELF/OutputSections.h
156:extern llvm::SmallVector<OutputSection *, 0> outputSections;

lld/ELF/InputFiles.h
43:extern std::unique_ptr<llvm::TarWriter> tar;

lld/ELF/Driver.h
23:extern std::unique_ptr<class LinkerDriver> driver;

lld/ELF/LinkerScript.h
366:extern std::unique_ptr<LinkerScript> script;

lld/ELF/Config.h
372:extern std::unique_ptr<Configuration> config;
406:extern std::unique_ptr<Ctx> ctx;

Some global states exist as static member variables.

Cleaning up global variables

LLD has been undergoing a transformation to reduce its reliance onglobal variables. This improves its suitability for libraryintegration.

  • In 2020, [LLD][COFF] Coverusage of LLD as a library enabled running the LLD driver multipletimes even if there is a fatal error.
  • In 2021, global variables were removed fromlld/Common.
  • The COFF port followed suite, eliminating most of its globalvariables.

Inspired by theseadvancements, I conceived a plan to eliminate globalvariables from the ELF port. In 2022, as part of the work to enableparallel section initialization, I introduced a classstruct Ctx to lld/ELF/Config.h. Here is myplan:

  • Global variables will be migrated into Ctx.
  • Functions will be modified to accept a new Ctx &ctxparameter.
  • The previously global variable lld::elf::ctx will be transformedinto a local variable within lld::elf::link.

Encapsulating globalvariables into Ctx

Over the past two years and a half, I have migrated global variablesinto the Ctx class, e.g..

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 590c19e6d88d..915c4d94e870 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -382,2 +382,10 @@ struct Ctx {
std::atomic<bool> hasSympart{false};
+ // A tuple of (reference, extractedFile, sym). Used by --why-extract=.
+ SmallVector<std::tuple<std::string, const InputFile *, const Symbol &>, 0>
+ whyExtractRecords;
+ // A mapping from a symbol to an InputFile referencing it backward. Used by
+ // --warn-backrefs.
+ llvm::DenseMap<const Symbol *,
+ std::pair<const InputFile *, const InputFile *>>
+ backwardReferences;
};
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 8315d43c776e..2ab698c91b01 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1776,3 +1776,3 @@ static void handleUndefined(Symbol *sym, const char *option) {
if (!config->whyExtract.empty())
- driver->whyExtract.emplace_back(option, sym->file, *sym);
+ ctx->whyExtractRecords.emplace_back(option, sym->file, *sym);
}
@@ -1812,3 +1812,3 @@ static void handleLibcall(StringRef name) {

-void LinkerDriver::writeArchiveStats() const {
+static void writeArchiveStats() {
if (config->printArchiveStats.empty())
@@ -1834,3 +1834,3 @@ void LinkerDriver::writeArchiveStats() const {
++extracted[CachedHashStringRef(file->archiveName)];
- for (std::pair<StringRef, unsigned> f : archiveFiles) {
+ for (std::pair<StringRef, unsigned> f : driver->archiveFiles) {
unsigned &v = extracted[CachedHashString(f.first)];

I did not do anything thing with the global variables in 2024. Thework was resumed in July 2024. I moved TarWriter,SymbolAux, Out, ElfSym,outputSections, etc into Ctx.

1
2
3
4
5
6
7
struct Ctx {
Config arg;
LinkerDriver driver;
LinkerScript *script;
std::unique_ptr<TargetInfo> target;
...
};

The config variable, used to store command-line options,was pervasive throughout lld/ELF. To enhance code clarity andmaintainability, I renamed it to ctx.arg (mold naming).

I've removed other instances of static storage variables throughtlld/ELF, e.g.

  • staticmember LinkerDriver::nextGroupId
  • staticmember SharedFile::vernauxNum
  • sectionMapin lld/ELF/Arch/ARM.cpp

Passing Ctx &ctxas parameters

The subsequent phase involved adding Ctx &ctx as aparameter to numerous functions and classes, gradually eliminatingreferences to the global ctx.

I incorporated Ctx &ctx as a member variable to afew classes (e.g. SyntheticSection,OutputSection) to minimize the modifications to memberfunctions. This approach was not suitable for Symbol andInputSection, since even a single word could increasememory consumption significantly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Writer.cpp
template <class ELFT> class Writer {
public:
LLVM_ELF_IMPORT_TYPES_ELFT(ELFT)

Writer(Ctx &ctx) : ctx(ctx), buffer(ctx.e.outputBuffer) {}
...

template <class ELFT> void elf::writeResult(Ctx &ctx) {
Writer<ELFT>(ctx).run();
}
...

bool elf::includeInSymtab(Ctx &ctx, const Symbol &b) {
if (auto *d = dyn_cast<Defined>(&b)) {
// Always include absolute symbols.
SectionBase *sec = d->section;
if (!sec)
return true;
assert(sec->isLive());

if (auto *s = dyn_cast<MergeInputSection>(sec))
return s->getSectionPiece(d->value).live;
return true;
}
return b.used || !ctx.arg.gcSections;
}

Eliminating the globalctx variable

Once the global ctx variable's reference count reachedzero, it was time to remove it entirely. I implemented the change onNovember 16, 2024.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 72feeb9d49cb..a9b7a98e5b54 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -539,4 +539,2 @@ struct InStruct {
std::unique_ptr<SymtabShndxSection> symTabShndx;
-
- void reset();
};
@@ -664,3 +662,2 @@ struct Ctx {
Ctx();
- void reset();

@@ -671,4 +668,2 @@ struct Ctx {

-LLVM_LIBRARY_VISIBILITY extern Ctx ctx;
-
// The first two elements of versionDefinitions represent VER_NDX_LOCAL and
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 334dfc0e3ba1..631051c27381 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -81,4 +81,2 @@ using namespace lld::elf;

-Ctx elf::ctx;
-
static void setConfigs(Ctx &ctx, opt::InputArgList &args);
@@ -165,2 +114,3 @@ bool link(ArrayRef<const char *> args, llvm::raw_ostream &stdoutOS,
llvm::raw_ostream &stderrOS, bool exitEarly, bool disableOutput) {
+ Ctx ctx;
// This driver-specific context will be freed later by unsafeLldMain().
@@ -169,7 +119,2 @@ bool link(ArrayRef<const char *> args, llvm::raw_ostream &stdoutOS,
context->e.initialize(stdoutOS, stderrOS, exitEarly, disableOutput);
- context->e.cleanupCallback = []() {
- Ctx &ctx = elf::ctx;
- ctx.reset();
- ctx.partitions.emplace_back(ctx);
- };
context->e.logName = args::getFilenameWithoutExe(args[0]);

Prior to this modification, the cleanupCallback function wasessential for resetting the global ctx when lld::elf::link was calledmultiple times.

Previously, cleanupCallback was essential for resettingthe global ctx when lld::elf::link was invokedmultiple times. With the removal of the global variable, this callbackis no longer necessary. We can now rely on the constructor to initializeCtx and avoid the need for a resetfunction.

Removing global state fromlld/Common

While significant progress has been made to lld/ELF,lld/Common needs a lot of work as well. A lot of sharedutility code (diagnostics, bump allocator) utilizes the globallld::context().

1
2
3
4
5
6
7
8
9
10
/// Returns the default error handler.
ErrorHandler &errorHandler();

void error(const Twine &msg);
void error(const Twine &msg, ErrorTag tag, ArrayRef<StringRef> args);
[[noreturn]] void fatal(const Twine &msg);
void log(const Twine &msg);
void message(const Twine &msg, llvm::raw_ostream &s = outs());
void warn(const Twine &msg);
uint64_t errorCount();

Although thread-local variables are an option, worker threads spawnedby llvm/lib/Support/Parallel.cpp don't inherit their valuesfrom the main thread. Given our direct access toCtx &ctx, we can leverage context-aware APIs asreplacements.

https://github.com/llvm/llvm-project/pull/112319introduced context-aware diagnostic utilities:

  • log("xxx") =>Log(ctx) << "xxx"
  • message("xxx") =>Msg(ctx) << "xxx"
  • warn("xxx") =>Warn(ctx) << "xxx"
  • errorOrWarn(toString(f) + "xxx") =>Err(ctx) << f << "xxx"
  • error(toString(f) + "xxx") =>ErrAlways(ctx) << f << "xxx"
  • fatal("xxx") =>Fatal(ctx) << "xxx"

As of Nov 16, 2024, I have eliminatedlog/warn/error/fatal from lld/ELF.

The underlying functions lld::ErrorHandler::fatal, andlld::ErrorHandler::error when the error limit is hit andexitEarly is true, call exitLld(1).

This transformation eliminates a lot of code size overhead due tollvm::Twine. Even in the simplest Twine(123)case, the generated code needs a stack object to hold the value and aTwine kind.

lld::make from lld/include/lld/Common/Memory.his an allocation function that uses the global context. When theownership is clear, std::make_unique might be a betterchoice.

Guideline:

  • Avoid lld::saver
  • Avoidvoid message(const Twine &msg, llvm::raw_ostream &s = outs());,which utilizes lld::outs()
  • Avoid lld::make from lld/include/lld/Common/Memory.h
  • Avoid fatal error in a half-initialized object, e.g. fatal error ina base class constructor (ELFFileBase::init) ([LLD][COFF] When usingLLD-as-a-library, always prevent re-entrance on failures)

Global state in LLVM

LTO link jobs utilize LLVM. Understanding its global state iscrucial.

While LLVM allows for multiple LLVMContext instances tobe allocated and used concurrently, it's important to note that theseinstances share certain global states, such as cl::opt andManagedStatic. Specifically, it's not possible to run twoconcurrent LLVM compilations (including LTO link jobs) with distinctsets of cl::opt option values. To link with distinctcl::opt values, even after removing LLD's global state,you'll need to spawn a new LLD process.

Any proposal that moves away from global state seems to complicatecl::opt usage, making it impractical.

LLD also utilizes functions from llvm/Support/Parallel.hfor parallelism. These functions rely on global state likegetDefaultExecutor andllvm::parallel::strategy. Ongoing work by Alexandre Ganeaaims to make these functions context-aware. (It's nice to meet you inperson in LLVM Developers' Meeting last month)

Supported library usagescenarios

You can repeatedly call lld::lldMain from lld/Common/Driver.h.If fatal has been invoked, it will not be safe to calllld::lldMain again in certain rare scenarios. Runninglld::lldMain concurrently in two threads is notsupported.

The command LLD_IN_TEST=3 lld-link ... runs the linkprocess three times, but only the final invocation outputs diagnosticsto stdout/stderr. lld/test/lit.cfg.py has configured theCOFF port to run tests twice ([lld] Add test suite mode forrunning LLD main twice). Other ports need work to make this modework.

Keeping pace with LLVM: compatibility strategies

作者 MaskRay
2024年11月10日 16:00

LLVM's C++ API doesn't offer a stability guarantee. This meansfunction signatures can change or be removed between versions, forcingprojects to adapt.

On the other hand, LLVM has an extensive API surface. When a librarylike llvm/lib/Y relies functionality from another library,the API is often exported in header files underllvm/include/llvm/X/, even if it is not intended to beuser-facing.

To be compatible with multiple LLVM versions, many projects rely on#if directives based on the LLVM_VERSION_MAJORmacro. This post explores the specific techniques used by ccls to ensurecompatibility with LLVM versions 7 to 19. For the latest release (ccls0.20241108), support for LLVM versions 7 to 9 has beendiscontinued.

Given the tight coupling between LLVM and Clang, theLLVM_VERSION_MAJOR macro can be used for both versiondetection. There's no need to checkCLANG_VERSION_MAJOR.


Changed namespaces

In Oct 2018, https://reviews.llvm.org/D52783 moved the namespaceclang::vfs to llvm::vfs. To remaincompatibility, I renamed clang::vfs uses and added aconditional namespace alias:

1
2
3
4
5
6
#if LLVM_VERSION_MAJOR < 8
// D52783 Lift VFS from clang to llvm
namespace llvm {
namespace vfs = clang::vfs;
}
#endif

Removed functions

In March 2019, https://reviews.llvm.org/D59377 removed the membervariable VirtualFileSystem and removedsetVirtualFileSystem. To adapt to this change, ccls employsan #if.

1
2
3
4
5
6
#if LLVM_VERSION_MAJOR >= 9 // rC357037
Clang->createFileManager(FS);
#else
Clang->setVirtualFileSystem(FS);
Clang->createFileManager();
#endif

Changed function parameters

In April 2020, the LLVM monorepo integrated a new subproject: flang.flang developers made many changes to clangDriver to reuse it for flang.https://reviews.llvm.org/D86089 changed the constructorclang::driver::Driver. I added

1
2
3
4
5
#if LLVM_VERSION_MAJOR < 12 // llvmorg-12-init-5498-g257b29715bb
driver::Driver d(args[0], llvm::sys::getDefaultTargetTriple(), *diags, vfs);
#else
driver::Driver d(args[0], llvm::sys::getDefaultTargetTriple(), *diags, "ccls", vfs);
#endif

In November 2020, https://reviews.llvm.org/D90890 changed an argument ofComputePreambleBounds fromconst llvm::MemoryBuffer *Buffer toconst llvm::MemoryBufferRef &Buffer.

1
2
3
4
5
6
7
std::unique_ptr<llvm::MemoryBuffer> buf =
llvm::MemoryBuffer::getMemBuffer(content);
#if LLVM_VERSION_MAJOR >= 12 // llvmorg-12-init-11522-g4c55c3b66de
auto bounds = ComputePreambleBounds(*ci.getLangOpts(), *buf, 0);
#else
auto bounds = ComputePreambleBounds(*ci.getLangOpts(), buf.get(), 0);
#endif

https://reviews.llvm.org/D91297 made a similar changeand I adapted it similarly.

In Jan 2022, https://reviews.llvm.org/D116317 added a new parameterbool Braced toCodeCompleteConsumer::ProcessOverloadCandidates.

1
2
3
4
5
6
7
8
9
10
11
12
  void ProcessOverloadCandidates(Sema &s, unsigned currentArg,
OverloadCandidate *candidates,
unsigned numCandidates
#if LLVM_VERSION_MAJOR >= 8
,
SourceLocation openParLoc
#endif
#if LLVM_VERSION_MAJOR >= 14
,
bool braced
#endif
) override {

In late 2022 and early 2023, there were many changes to migrate fromllvm::Optional to std::optional.

1
2
3
4
5
6
7
8
#if LLVM_VERSION_MAJOR >= 16 // llvmorg-16-init-12589-ge748db0f7f09
std::array<std::optional<StringRef>, 3>
#else
std::array<Optional<StringRef>, 3>
#endif
redir{StringRef(stdinPath), StringRef(path), StringRef()}; 0 ref
std::vector<StringRef> args{g_config->compilationDatabaseCommand, root}; 0 ref
if (sys::ExecuteAndWait(args[0], args, {}, redir, 0, 0, &err_msg) < 0) {

In Sep 2023, https://github.com/llvm/llvm-project/pull/65647 changedCompilerInvocationRefBase toCompilerInvocationBase. I duplicated the code with..

1
2
3
4
5
6
7
8
9
10
11
#if LLVM_VERSION_MAJOR >= 18
ci->getLangOpts().SpellChecking = false;
ci->getLangOpts().RecoveryAST = true;
ci->getLangOpts().RecoveryASTType = true;
#else
ci->getLangOpts()->SpellChecking = false;
#if LLVM_VERSION_MAJOR >= 11
ci->getLangOpts()->RecoveryAST = true;
ci->getLangOpts()->RecoveryASTType = true;
#endif
#endif

In April 2024, https://github.com/llvm/llvm-project/pull/89548/ removedllvm::StringRef::startswith in favor ofstarts_with. starts_with has been available since Oct 2022 andstartswith had been deprecated. I added the followingsnippet:

1
2
3
4
#if LLVM_VERSION_MAJOR >= 19
#define startswith starts_with
#define endswith ends_with
#endif

It's important to note that the converse approach

1
2
#define starts_with startswith
#define ends_with endswith

could break code that callsstd::string_view::starts_with.

Changed enumerators

In November 2023, https://github.com/llvm/llvm-project/pull/71160 changedan unnamed enumeration to a scoped enumeration. To keep the followingsnippet compiling,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
switch (tag_d->getTagKind()) {
case TTK_Struct:
tag = "struct";
break;
case TTK_Interface:
tag = "__interface";
break;
case TTK_Union:
tag = "union";
break;
case TTK_Class:
tag = "class";
break;
case TTK_Enum:
tag = "enum";
break;
}

I introduced macros.

1
2
3
4
5
6
7
#if LLVM_VERSION_MAJOR >= 18 // llvmorg-18-init-10631-gedd690b02e16
#define TTK_Class TagTypeKind::Class
#define TTK_Enum TagTypeKind::Enum
#define TTK_Interface TagTypeKind::Interface
#define TTK_Struct TagTypeKind::Struct
#define TTK_Union TagTypeKind::Union
#endif

In April 2024, https://github.com/llvm/llvm-project/pull/89639 renamedan enumerator. I have made the following adaptation:

1
2
3
4
5
6
7
#if LLVM_VERSION_MAJOR >= 19 // llvmorg-19-init-9465-g39adc8f42329
case BuiltinType::ArraySection:
#else
case BuiltinType::OMPArraySection:
return "<OpenMP array section type>";
#endif
return "<array section type>";

Build system changes

In Dec 2022, https://reviews.llvm.org/D137838 added a new LLVMlibrary LLVMTargetParser. I adjusted ccls's CMakeLists.txt:

1
2
3
4
target_link_libraries(ccls PRIVATE LLVMOption LLVMSupport)
if(LLVM_VERSION_MAJOR GREATER_EQUAL 16) # llvmorg-16-init-15123-gf09cf34d0062
target_link_libraries(ccls PRIVATE LLVMTargetParser)
endif()

Summary

The above examples illustrate how to adapt to changes in the LLVM andClang APIs. It's important to remember that API changes are a naturalpart of software development, and testing with different releases iscrucial for maintaining compatibility with a wide range of LLVMversions.

When introducing new interfaces, we should pay a lot of attention toreduce the chance that the interface will be changed in a way thatcauses disruption to the downstream. That said, changes are normal. Whenan API change is justified, do it.

Downstream projects should be mindful of the stability guarantees ofdifferent LLVM APIs. Some API may be more prone to change than others.It's essential to write code in a way that can easily adapt to changesin the LLVM API.

LLVM C API

While LLVM offers a C API with an effort made towards compatibility,its capabilities often fall short.

Clang provides a C API called libclang. Whilehighly stable, libclang's limited functionality makes it unsuitable formany tasks.

In 2018, when creating ccls (a fork of cquery), I encounteredmultiple limitations in libclang's ability to handle code completion andindexing. This led to rewriting the relevant code to leverage the ClangC++ API for a more comprehensive solution. The following commits offerinsights into how the C API and the mostly equivalent but better C++ APIworks:

  • Firstdraft: replace libclang indexer with clangIndex
  • UseClang C++ for completion and diagnostics

Tinkering with Neovim

作者 MaskRay
2024年11月2日 15:00

After migrating fromVim to Emacs as my primary C++ editor in 2015, I switched from Vimto Neovim for miscellaneous non-C++ tasks as it is more convenient in aterminal. Customizing the editor with a language you are comfortablewith is important. I found myself increasingly drawn to Neovim'sterminal-based simplicity for various tasks. Recently, I've refined myNeovim setup to the point where I can confidently migrate my entire C++workflow away from Emacs.

This post explores the key improvements I've made to achieve thistransition. My focus is on code navigation.

Key mapping

I've implemented custom functions that simplify key mappings.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
local function map(mode, lhs, rhs, opts)
local options = {}
if opts then
if type(opts) == 'string' then
opts = {desc = opts}
end
options = vim.tbl_extend('force', options, opts)
end
vim.keymap.set(mode, lhs, rhs, options)
end
local function nmap(lhs, rhs, opts)
map('n', lhs, rhs, opts)
end
local function tmap(lhs, rhs, opts)
map('t', lhs, rhs, opts)
end

I've swapped ; and : for easier access toEx commands, especially since leap.nvim renders ; lessuseful for repeating ftFT.

1
2
map({'n', 'x'}, ':', ';')
map({'n', 'x'}, ';', ':')

Cross references

Like many developers, I spend significantly more time reading codethan writing it. Efficiently navigating definitions and references iscrucial for productivity.

While the built-in LSP client's C-] is functional (see:h lsp-defaults tagfunc), I found it lessconvenient. Many Emacs and Neovim configurations advocate forgd. However, both G and D are placed on the left half ofthe QWERTY keyboard, making it slow to press them using the lefthand.

For years, I relied on M-j to quickly jump todefinitions.

To avoid a conflict with my recent zellij change (I adoptedM-hjkl for pane navigation), I've reassigned Jto trigger definition jumps. Although I've lost the originalJ (join lines) functionality, vJ provides asuitable workaround.

1
2
nmap('J', '<cmd>Telescope lsp_definitions<cr>', 'Definitions')
nmap('M', '<cmd>Telescope lsp_references<CR>', 'References')

After making a LSP-based jump, the jump list can quickly fill withirrelevant entries as I navigate the codebase. Thankfully, Telescope'sLSP functionality sets push_tagstack_on_edit to push anentry to the tag stack (see :h tag-stack). To efficientlyreturn to my previous position, I've mapped H to:pop and L to :tag.

1
2
nmap('H', '<cmd>pop<cr>', 'Tag stack backward')
nmap('L', '<cmd>tag<cr>', 'Tag stack forward')

I've adopted x as a prefix key for cross-referencingextensions. dl provide a suitable alternative forx's original functionality.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
nmap('x', '<Nop>')
nmap('xB', '<cmd>CclsBaseHierarchy<cr>')
nmap('xC', '<cmd>CclsOutgoingCalls<cr>', 'callee')
nmap('xD', '<cmd>CclsDerivedHierarchy<cr>')
nmap('xM', '<cmd>CclsMemberHierarchy<cr>', 'member')
nmap('xb', '<cmd>CclsBase<cr>')
nmap('xc', '<cmd>CclsIncomingCalls<cr>', 'caller')
nmap('xd', '<cmd>CclsDerived<cr>')
nmap('xi', '<cmd>lua vim.lsp.buf.implementation()<cr>', 'Implementation')
nmap('xm', '<cmd>CclsMember<cr>', 'member')
nmap('xn', function() M.lsp.words.jump(vim.v.count1) end, 'Next reference')
nmap('xp', function() M.lsp.words.jump(-vim.v.count1) end, 'Prev reference')
nmap('xt', '<cmd>lua vim.lsp.buf.type_definition()<cr>', 'Type definition')
nmap('xv', '<cmd>CclsVars<cr>', 'vars')

I utilize xn and xp to find the next orprevious reference. The implementation, copied from from LazyVim, onlyworks with references within the current file. I want to enable thexn map to automatically transition to the next file whenreaching the last reference in the current file.

While using Emacs, I created a hydra with x as the prefix key tocycle through next references. Unfortunately, I haven't been able toreplicate this behavior in Neovim.

1
2
3
4
5
6
7
8
9
10
11
12
;; This does not work.
local Hydra = require('hydra')
Hydra({
name = 'lsp xref',
mode = 'n',
body = 'x',
heads = {
{'n', function() M.lsp.words.jump(1) end},
{'p', function() M.lsp.words.jump(-1) end},
{ "q", nil, { exit = true, nowait = true } },
},
})

Movement

I use leap.nvim to quickly jump to specific identifiers(s{char1}{char2}), followed by telescope.nvim to exploredefinitions and references. Somtimes, I use the following binding:

1
2
3
4
nmap('U', function()
require'hop'.hint_words()
require'telescope.builtin'.lsp_definitions()
end, 'Hop+definition')

Semantic highlighting

I've implemented rainbow semantic highlighting using ccls. Pleaserefer to cclsand LSP Semantic Tokens for my setup.

Other LSP features

I have configured the CursorHold event to triggertextDocument/documentHighlight. When using Emacs,lsp-ui-doc automatically requests textDocument/hover, whichI now lose.

Additionally, the LspAttach and BufEnterevents trigger textDocument/codeLens.

Window navigation

While I've been content with the traditional C-w + hjklmapping for years, I've recently opted for the more efficientC-hjkl approach.

1
2
3
4
5
6
7
8
9
nmap('<C-h>', '<C-w>h')
nmap('<C-j>', '<C-w>j')
nmap('<C-k>', '<C-w>k')
nmap('<C-l>', '<C-w>l')

tmap('<C-h>', '<cmd>wincmd h<cr>')
tmap('<C-j>', '<cmd>wincmd j<cr>')
tmap('<C-k>', '<cmd>wincmd k<cr>')
tmap('<C-l>', '<cmd>wincmd l<cr>')

The keys mirror my pane navigation preferences in tmux and zellij,where I utilize M-hjkl.

1
2
3
4
5
# tmux select pane or window
bind -n M-h if -F '#{pane_at_left}' 'select-window -p' 'select-pane -L'
bind -n M-j if -F '#{pane_at_bottom}' 'select-window -p' 'select-pane -D'
bind -n M-k if -F '#{pane_at_top}' 'select-window -n' 'select-pane -U'
bind -n M-l if -F '#{pane_at_right}' 'select-window -n' 'select-pane -R'
1
2
3
4
5
6
7
8
9
10
// zellij M-hjkl
keybinds {
normal clear-defaults=true {
...
bind "Alt h" { MoveFocusOrTab "Left"; }
bind "Alt j" { MoveFocus "Down"; }
bind "Alt k" { MoveFocus "Up"; }
bind "Alt l" { MoveFocusOrTab "Right"; }
}
}

To accommodate this change, I've shifted my tmux prefix key fromC-l to C-Space. Consequently, I've alsoadjusted my input method toggling from C-Space toC-S-Space.

Debugging

For C++ debugging, I primarily rely on cgdb. I find it superior toGDB's single-key mode and significantly more user-friendly than LLDB'sgui command.

1
2
3
4
cgdb --args ./a.out args

rr record ./a.out args
rr replay -d cgdb

I typically arrange Neovim and cgdb side-by-side in tmux or zellij.During single-stepping, when encountering interesting code snippets, Ioften need to manually input filenames into Neovim. While Telescope aidsin this process, automatic file and line updates would be ideal.

Given these considerations, nvim-dap appears to be a promisingsolution. However, I haven't yet determined the configuration forintegrating rr with nvim-dap.

Live grep

Telescope's extension telescope-fzf-native is useful.

I've defined mappings to streamline directory and project-widesearches using Telescope's live grep functionality:

1
2
nmap('<leader>sd', '<cmd>lua require("telescope.builtin").live_grep({cwd=vim.fn.expand("%:p:h")})<cr>', 'Search directory')
nmap('<leader>sp', '<cmd>lua require("telescope.builtin").live_grep({cwd=MyProject()})<cr>', 'Search project')

Additionally, I've mapped M-n to insert the word underthe cursor, mimicking Emacs Ivy'sM-n (ivy-next-history-element) behavior.

Task runner

I use overseer.nvim torun build commands like ninja -C /tmp/Debug llc llvm-mc.This plugin allows me to view build errors directly in Neovim's quickfixwindow.

Following LazyVim, I use <leader>oo to run buildsand <leader>ow to toggle the overseer window. Tonavigate errors, I use trouble.nvim with the ]q and[q keys.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
nmap('<leader>oo', '<cmd>OverseerRun<cr>')
nmap('<leader>ow', '<cmd>OverseerToggle<cr>')

nmap('[q', function()
if require('trouble').is_open() then
require('trouble').prev({ skip_groups = true, jump = true })
else
local ok, err = pcall(vim.cmd.cprev)
if not ok then
vim.notify(err, vim.log.levels.ERROR)
end
end
end)
nmap(']q', function()
if require('trouble').is_open() then
require('trouble').next({ skip_groups = true, jump = true })
else
local ok, err = pcall(vim.cmd.cnext)
if not ok then
vim.notify(err, vim.log.levels.ERROR)
end
end
end)

Reducing reliance onterminal multiplexer

As https://rutar.org/writing/from-vim-and-tmux-to-neovim/nicely summarizes, running Neovim under tmux has some annoyance. I'vebeen experimenting with reducing my reliance on zellij. Instead, I'llutilize more Neovim's terminal functionality.

toggleterm.nvim is a particularly useful plugin that allows me toeasily split windows, open terminals, and hide them when not in use.

The default command <C-\><C-n> (switch tothe Normal mode) is clumsy. I've mapped it to <C-s>(useless feature pausetransmission, fwd-i-search in zsh).

1
2
3
4
5
6
7
8
9
nmap('<leader>tf', function() require'toggleterm'.toggle(vim.v.count, nil, MyProject(), 'float', nil) end)
nmap('<leader>th', function() require'toggleterm'.toggle(vim.v.count, 10, MyProject(), 'horizontal', nil) end)
nmap('<leader>tv', function() require'toggleterm'.toggle(vim.v.count, 80, MyProject(), 'vertical', nil) end)

tmap('<C-s>', '<C-\\><C-n>')
-- Binding C-/ doesn't work in tmux/zellij
map({'n', 't'}, '<C-/>', '<cmd>ToggleTerm<cr>')
-- This actually binds C-/ in tmux/zellij
map({'n', 't'}, '<C-_>', '<cmd>ToggleTerm<cr>')

neovim-remoteallows me to open files without starting a nested Neovim process.

I use mini.sessions tomanage sessions.

Config switcher

Neovim's NVIM_APPNAMEfeature is fantastic for exploring pre-configured distributions to getinspiration.

Lua

Neovim embraces Lua 5.1 as a preferred scripting language. WhileLua's syntax is lightweight and easy to learn, it doesn't shy away fromconvenience features like func 'arg' andfunc {a=42}.

LuaJIT offers exceptional performance.

LuaJIT with the JIT enabled is much faster than all of the otherlanguages benchmarked, including Wren, because Mike Pall is a robot fromthe future. -- wren.io

This translates into noticeably smoother editing with LSP, especiallyfor hefty C++ files – a significant advantage over Emacs. With Emacs,I've always felt that editing a large C++ file is slow.

The non-default local variables and 1-based indexing(shared with languages like Awk and Julia) are annoyances that I canlive with when using a configuration language. So far, I've only neededindex-sensitive looping in one specific location.

1
2
3
4
5
6
-- For LSP semantic tokens
for type, colors in pairs(all_colors) do
for i = 1,#colors do
vim.api.nvim_set_hl(0, string.format('@lsp.typemod.%s.id%s.cpp', type, i-1), {fg=colors[i]})
end
end

Dual-role keys

I utilize the software keyboard remapper kanata to make some keys bothas normals keys and as a modifier. I have followed the guide https://shom.dev/start/using-kanata-to-remap-any-keyboard/as the official configuration guide is intimidating.

~/.config/kanatta/config.kbdis my current configuration. A simplified version is provided below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
(defcfg
concurrent-tap-hold yes
log-layer-changes no
process-unmapped-keys yes
)
(defvar
tt 200 ;; tap-time
ht 160 ;; hold-time
)

(defalias
tab (tap-hold $tt $ht tab (layer-while-held extend))
cap (tap-hold $tt $ht esc lctl)
;; cap (tap-hold $tt $hold-time esc (layer-while-held vim-nav))
a (switch ((key-timing 1 less-than $tt)) _ break () (tap-hold $tt $ht a lmet) break)
s (switch ((key-timing 1 less-than $tt)) _ break () (tap-hold $tt $ht s lalt) break)
d (switch ((key-timing 1 less-than $tt)) _ break () (tap-hold $tt $ht d lctl) break)
f (switch ((key-timing 1 less-than $tt)) _ break () (tap-hold $tt $ht f lsft) break)
j (switch ((key-timing 1 less-than $tt)) _ break () (tap-hold-release-timeout $tt $ht j rsft j) break)
k (switch ((key-timing 1 less-than $tt)) _ break () (tap-hold-release-timeout $tt $ht k rctl k) break)
l (switch ((key-timing 1 less-than $tt)) _ break () (tap-hold $tt $ht l ralt) break)
; (switch ((key-timing 1 less-than $tt)) _ break () (tap-hold $tt $ht ; rmet) break)
)

(defsrc
tab q w e r t y u i o p [
caps a s d f g h j k l ; '
lsft z x c v b n m , . / rsft
)
(deflayer default
@tab _ _ _ _ _ _ _ _ _ _ _
@cap @a @s @d @f _ _ @j @k @l @; _
_ _ _ _ _ _ _ _ _ _ _ _
)
(deflayer extend
_ _ _ _ lrld _ _ C-S-tab C-tab _ _ _
_ _ _ _ _ _ left down up rght _ _
_ _ _ _ _ _ home pgdn pgup end _ _
)

(defchordsv2
(j k ) esc 100 all-released ()
( k l ) = 100 all-released ()
(j l ) S-= 100 all-released ()
( l ;) - 100 all-released ()
)

ccls and LSP Semantic Tokens

作者 MaskRay
2024年10月20日 15:00

I've spent countless hours writing and reading C++ code. For manyyears, Emacs has been my primary editor, and I leverage ccls' (my C++ languageserver) rainbow semantic highlighting feature.

The feature relies on two custom notification messages$ccls/publishSemanticHighlight and$ccls/publishSkippedRanges.$ccls/publishSemanticHighlight provides a list of symbols,each with kind information (function, type, or variable) of itself andits semantic parent (e.g. a member function's parent is a class),storage duration, and a list of ranges.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct CclsSemanticHighlightSymbol {
int id = 0;
SymbolKind parentKind;
SymbolKind kind;
uint8_t storage;
std::vector<std::pair<int, int>> ranges;

std::vector<lsRange> lsRanges; // Only used by vscode-ccls
};

struct CclsSemanticHighlight {
DocumentUri uri;
std::vector<CclsSemanticHighlightSymbol> symbols;
};

An editor can use consistent colors to highlight differentoccurrences of a symbol. Different colors can be assigned to differentsymbols.

Tobias Pisani created emacs-cquery (the predecessor to emacs-ccls) inNov 2017. Despite not being a fan of Emacs Lisp, I added the rainbowsemantic highlighting feature for my own use in early 2018. My setupalso relied heavily on these two settings:

  • Bolding and underlining variables of static duration storage
  • Italicizing member functions and variables
1
2
(setq ccls-sem-highlight-method 'font-lock)
(ccls-use-default-rainbow-sem-highlight)

Key symbol properties (member, static) were visually prominent in myEmacs environment.

My Emacs hacking days are a distant memory – beyond basicconfiguration tweaks, I haven't touched elisp code since 2018. As myElisp skills faded, I increasingly turned to Neovim for various editingtasks. Naturally, I wanted to migrate my C++ development workflow toNeovim as well. However, a major hurdle emerged: Neovim lacked thebeloved rainbow highlighting I enjoyed in Emacs.

Thankfully, Neovim supports "semantic tokens" from LSP 3.16, astandardized approach adopted by many editors.

I've made changes to ccls (available on abranch; PR)to support semantic tokens. This involves adapting the$ccls/publishSemanticHighlight code to additionally supporttextDocument/semanticTokens/full andtextDocument/semanticTokens/range.

I utilize a few token modifiers (static,classScope, functionScope,namespaceScope) for highlighting:

1
2
3
4
5
vim.cmd([[
hi @lsp.mod.classScope.cpp gui=italic
hi @lsp.mod.static.cpp gui=bold
hi @lsp.typemod.variable.namespaceScope.cpp gui=bold,underline
]])

While this approach is a significant improvement over relying solelyon nvim-treesitter, I'm still eager to implement rainbow semantictokens. Although LSP semantic tokens don't directly distinguish symbols,we can create custom modifiers to achieve similar results.

1
2
3
4
5
tokenModifiers: {
"declaration", "definition", "static", ...

"id0", "id1", ... "id9",
}

In the user-provided initialization options, I sethighlight.rainbow to 10.

ccls assigns the same modifier ID to tokens belonging to the samesymbol, aiming for unique IDs for different symbols. While we only havea few predefined IDs (each linked to a specific color), there's a slightpossibility of collisions. However, this is uncommon and generallyacceptable.

For a token with type variable, Neovim's built-in LSPplugin assigns a highlight group@lsp.typemod.variable.id$i.cpp where $i is aninteger between 0 and 9. This allows us to customize a unique foregroundcolor for each modifier ID.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
local func_colors = {
'#e5b124', '#927754', '#eb992c', '#e2bf8f', '#d67c17',
'#88651e', '#e4b953', '#a36526', '#b28927', '#d69855',
}
local type_colors = {
'#e1afc3', '#d533bb', '#9b677f', '#e350b6', '#a04360',
'#dd82bc', '#de3864', '#ad3f87', '#dd7a90', '#e0438a',
}
local param_colors = {
'#e5b124', '#927754', '#eb992c', '#e2bf8f', '#d67c17',
'#88651e', '#e4b953', '#a36526', '#b28927', '#d69855',
}
local var_colors = {
'#429921', '#58c1a4', '#5ec648', '#36815b', '#83c65d',
'#419b2f', '#43cc71', '#7eb769', '#58bf89', '#3e9f4a',
}
local all_colors = {
class = type_colors,
constructor = func_colors,
enum = type_colors,
enumMember = var_colors,
field = var_colors,
['function'] = func_colors,
method = func_colors,
parameter = param_colors,
struct = type_colors,
typeAlias = type_colors,
typeParameter = type_colors,
variable = var_colors
}
for type, colors in pairs(all_colors) do
for i = 1,#colors do
for _, lang in pairs({'c', 'cpp'}) do
vim.api.nvim_set_hl(0, string.format('@lsp.typemod.%s.id%s.%s', type, i-1, lang), {fg=colors[i]})
end
end
end

vim.cmd([[
hi @lsp.mod.classScope.cpp gui=italic
hi @lsp.mod.static.cpp gui=bold
hi @lsp.typemod.variable.namespaceScope.cpp gui=bold,underline
]])

Now, let's analyze the C++ code above using this configuration.

While the results are visually pleasing, I need help implementingcode lens functionality.

Inactive code highlighting

Inactive code regions (skipped ranges in Clang) are typicallydisplayed in grey. While this can be helpful for identifying unusedcode, it can sometimes hinder understanding the details. I simplydisabled the inactive code feature.

1
2
3
4
5
#ifdef X
... // colorful
#else
... // normal instead of grey
#endif

Refresh

When opening a large project, the initial indexing or cache loadingprocess can be time-consuming, often leading to empty lists of semantictokens for the initially opened files. While ccls prioritizes indexingthese files, it's unclear how to notify the client to refresh the files.The existing workspace/semanticTokens/refresh request,unfortunately, doesn't accept text document parameters.

In contrast, with $ccls/publishSemanticHighlight, cclsproactively sends the notification after an index update (seemain_OnIndexed).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void main_OnIndexed(DB *db, WorkingFiles *wfiles, IndexUpdate *update) {
...

db->applyIndexUpdate(update);

// Update indexed content, skipped ranges, and semantic highlighting.
if (update->files_def_update) {
auto &def_u = *update->files_def_update;
if (WorkingFile *wfile = wfiles->getFile(def_u.first.path)) {
wfile->setIndexContent(g_config->index.onChange ? wfile->buffer_content
: def_u.second);
QueryFile &file = db->files[update->file_id];
// Publish notifications to the file.
emitSkippedRanges(wfile, file);
emitSemanticHighlight(db, wfile, file);
// But how do we send a workspace/semanticTokens/refresh request?????
}
}
}

While the semantic token request supports partial results in thespecification, Neovim lacks this implementation. Even if it were, Ibelieve a notification message with a text document parameter would be amore efficient and direct approach.

1
2
3
4
5
6
7
export interface SemanticTokensParams extends WorkDoneProgressParams,
PartialResultParams {
/**
* The text document.
*/
textDocument: TextDocumentIdentifier;
}

Other clients

emacs-ccls

Once this feature branch is merged, Emacs users can simply remove thefollowing lines:

1
2
(setq ccls-sem-highlight-method 'font-lock)
(ccls-use-default-rainbow-sem-highlight)

How to change lsp-semantic-token-modifier-faces tosupport rainbow semantic tokens in lsp-mode and emacs-ccls?

The general approach is similar to the following, but we need afeature from lsp-mode (https://github.com/emacs-lsp/lsp-mode/issues/4590).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
(setq lsp-semantic-tokens-enable t)
(defface lsp-face-semhl-namespace-scope
'((t :weight bold)) "highlight for namespace scope symbols" :group 'lsp-semantic-tokens)
(cl-loop for color in '("#429921" "#58c1a4" "#5ec648" "#36815b" "#83c65d"
"#417b2f" "#43cc71" "#7eb769" "#58bf89" "#3e9f4a")
for i = 0 then (1+ i)
do (custom-declare-face (intern (format "lsp-face-semhl-id%d" i))
`((t :foreground ,color))
"" :group 'lsp-semantic-tokens))
(setq lsp-semantic-token-modifier-faces
`(("declaration" . lsp-face-semhl-interface)
("definition" . lsp-face-semhl-definition)
("implementation" . lsp-face-semhl-implementation)
("readonly" . lsp-face-semhl-constant)
("static" . lsp-face-semhl-static)
("deprecated" . lsp-face-semhl-deprecated)
("abstract" . lsp-face-semhl-keyword)
("async" . lsp-face-semhl-macro)
("modification" . lsp-face-semhl-operator)
("documentation" . lsp-face-semhl-comment)
("defaultLibrary" . lsp-face-semhl-default-library)
("classScope" . lsp-face-semhl-member)
("namespaceScope" . lsp-face-semhl-namespace-scope)
,@(cl-loop for i from 0 to 10
collect (cons (format "id%d" i)
(intern (format "lsp-face-semhl-id%d" i))))
))

vscode-ccls

We require assistance to eliminate the$ccls/publishSemanticHighlight feature and adopt built-insemantic tokens support. Due to the lack of active maintenance forvscode-ccls, I'm unable to maintain this plugin for an editor I don'tfrequently use.

Misc

I use a trick to switch ccls builds without changing editorconfigurations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/zsh
#export CCLS_TRACEME=s
export LD_PRELOAD=/usr/lib/libmimalloc.so

type=
[[ -f /tmp/ccls-build ]] && type=$(</tmp/ccls-build)

case $type in
strace)
exec strace -s999 -e read,write -o /tmp/strace.log -f ~/ccls/out/debug/ccls --log-file=/tmp/cc.log -v=1 "$@";;
debug)
exec ~/ccls/out/debug/ccls --log-file=/tmp/cc.log -v=2 "$@";;
release)
exec ~/ccls/out/release/ccls --log-file=/tmp/cc.log -v=1 "$@";;
*)
exec /usr/bin/ccls --log-file=/tmp/cc.log -v=1 "$@";;
esac

Usage:

1
2
echo debug > /tmp/ccls-build
nvim # out/debug/ccls is now used

My involvement with LLVM 19

作者 MaskRay
2024年8月18日 15:00

LLVM 19.1 will soon be released. This post provides a summary of mycontributions in this release cycle to record my learning progress.

LLVM binary utilities

  • [llvm-readobj,ELF]Support --decompress/-z
  • [llvm-objcopy]Improve help messages
  • [llvm-readelf]Print a blank line for the first hex/string dump
  • [llvm-objcopy]Add --compress-sections
  • [llvm-readelf]Print more information for RELR

Hashing

I optimized the bit mixer used byllvm::DenseMap<std::pair<X, Y>> andllvm::DenseMap<std::tuple<X...>>.llvm/ADT/Hashing.h, used by StringRef hashingand DenseMap, was supposed to be non-deterministic. Despitethis, a lot of code relied on a specific iteration order. I mademultiple fixes across the code base and landed [Hashing] Use anon-deterministic seed if LLVM_ENABLE_ABI_BREAKING_CHECKS to improvetest coverage (e.g. assertion builds) and ensure future flexibility toreplace the algorithm.

The change has a noticeable code size reduction

1
2
3
4
5
6
7
8
9
# old
movq _ZN4llvm7hashing6detail19fixed_seed_overrideE@GOTPCREL(%rip), %rax
movq (%rax), %rax
testq %rax, %rax
movabsq $-49064778989728563, %rcx # imm = 0xFF51AFD7ED558CCD
cmoveq %rcx, %rax

# new
movabsq $-49064778989728563, %rcx

... and significantcompile time improvement.

I optimizedDenseMap::{find,erase}, yielding compile timeimprovement.

Optimizations to the bit mixer in Hashing.h and theDenseMap code have yielded significant benefits, reducingboth compile time and code size. This suggests there's further potentialfor improvement in this area.

However, the reduced code size also highlights potential significantcode size increase when considering faster unordered map implementationslike boost::unordered_flat_map,Abseil's SwissTable, and Folly'sF14. While these libraries may offer better performance, they oftencome with a significant increase in code complexity and size.

Introducing a new container alongside DenseMap toselectively replace performance-critical instances could lead tosubstantial code modifications. This approach requires carefulconsideration to balance potential performance gains with the additionalcomplexity.

NumericalStabilitySanitizer

NumericalStabilitySanitizer is a new feature for the 19.x releases. Ihave made many changes on the compiler-rt part.

Clang

Driver maintenance

Options used by the LLVM integrated assembler are currently handledin an ad-hoc way. There is deduplication with and without LTO.Eventually we might want to adopt TableGen for these -Wa,options.

Others:

Code review

I reviewed a wide range of patches, including areas like ADT/Support,binary utilities, MC, lld, clangDriver, LTO, sanitizers, LoongArch,RISC-V, and new features like NumericalStabilitySanitizer andRealTimeSanitizer.

To quantify my involvement, a search for patches I commented on(repo:llvm/llvm-project is:pr -author:MaskRay commenter:MaskRay created:>2024-01-23)yields 780 results.

Link: Myinvolvement with LLVM 18

lld 19 ELF changes

作者 MaskRay
2024年8月4日 15:00

LLVM 19 will be released. As usual, I maintain lld/ELF and have addedsome notes to https://github.com/llvm/llvm-project/blob/release/19.x/lld/docs/ReleaseNotes.rst.I've meticulously reviewed nearly all the patches that are not authoredby me. I'll delve into some of the key changes.

  • Experimental CREL relocations with explicit addends are nowsupported using the temporary section type code 0x40000020(clang -c -Wa,--crel,--allow-experimental-crel). LLVM willchange the code and break compatibility (Clang and lld of differentversions are not guaranteed to cooperate, unlike other features). CRELwith implicit addends are not supported. (#98115)
  • EI_OSABI in the output is now inferred from inputobject files. (#97144)
  • --compress-sections <section-glib>={none,zlib,zstd}[:level]is added to compress matched output sections without theSHF_ALLOC flag. (#84855) (#90567)
  • The default compression level for zlib is now independent of linkeroptimization level (Z_BEST_SPEED).
  • zstd compression parallelism no longer requiresZSTD_MULITHREAD build.
  • GNU_PROPERTY_AARCH64_FEATURE_PAUTH notes,R_AARCH64_AUTH_ABS64 andR_AARCH64_AUTH_RELATIVE relocations are now supported. (#72714)
  • --no-allow-shlib-undefined now rejects non-exporteddefinitions in the def-hidden.so ref.so case. (#86777)
  • --debug-names is added to create a merged.debug_names index from input .debug_namessections. Type units are not handled yet. (#86508)
  • --enable-non-contiguous-regions option allowsautomatically packing input sections into memory regions byautomatically spilling to later matches if a region would overflow. Thisreduces the toil of manually packing regions (typical for embedded). Italso makes full LTO feasible in such cases, since IR merging currentlyprevents the linker script from referring to input files. (#90007)
  • --default-script/-dT is implemented tospecify a default script that is processed if--script/-T is not specified. (#89327)
  • --force-group-allocation is implemented to discardSHT_GROUP sections and combine relocation sections if theirrelocated section group members are placed to the same output section.(#94704)
  • --build-id now defaults to generating a 20-byte digest("sha1") instead of 8-byte ("fast"). This improves compatibility withRPM packaging tools. (#93943)
  • -z lrodata-after-bss is implemented to place.lrodata after .bss. (#81224)
  • --export-dynamic no longer creates dynamic sections for-no-pie static linking.
  • --lto-emit-asm is now added as the canonical spellingof --plugin-opt=emit-llvm.
  • --lto-emit-llvm now uses the pre-codegen module. (#97480)
  • When AArch64 PAuth is enabled, -z pack-relative-relocsnow encodes R_AARCH64_AUTH_RELATIVE relocations in.rela.auth.dyn. (#96496)
  • -z gcs and -z gcs-report are now supportedfor AArch64 Guarded Control Stack extension.
  • -r now forces -Bstatic.
  • Thumb2 PLT is now supported for Cortex-M processors. (#93644)
  • DW_EH_sdata4 of addresses larger than 0x80000000 is nowsupported for MIPS32. (#92438)
  • Certain unknown section types are rejected. (#85173)
  • PROVIDE(lhs = rhs) PROVIDE(rhs = ...), lhsis now defined only if rhs is needed. (#74771) (#87530)
  • OUTPUT_FORMAT(binary) is now supported. (#98837)
  • NOCROSSREFS and NOCRFOSSREFS_TO commandsnow supported to prohibit cross references between certain outputsections. (#98773)
  • Orphan placement is refined to prefer the last similar section whenits rank <= orphan's rank. (#94099)Non-alloc orphan sections are now placed at the end. (#94519)
  • R_X86_64_REX_GOTPCRELX of the addq form is no longerincorrectly optimized when the address is larger than 0x80000000.

CREL

I've developed CREL (compact relocations) to reduce relocatable filetremendously for LLVM 19. LLD now supports CREL with explicit addends.Clang and lld of different versions are not guaranteed to cooperate,unlike other features.

See Integratedassembler improvements in LLVM 19 for details.

--compress-sections

The --compress-sections option has been enhanced. Youcan choose between zlib and zstd for compression, along with specifyingthe desired compression level. Looking ahead, zlib is deprecated infavor of zstd. While zstd offers additional tuning options, we onlyprovide the compression level.

My Compressedarbitrary sections has analyzed potential use cases.

Orphan sections

My Understandingorphan sections explains the changes in detail.

Linker scripts

There are quite a few enhancements to the linker script support.NOCROSSREFS and--enable-non-contiguous-regions are noteworthy newfeatures. There is now an increasing demand of features for embeddedprogramming.

The world of embedded programming is a fascinating mix of open andclosed ecosystems. Developers of proprietary hardware and closed-sourcesoftware are increasingly interested in migrating their toolchains tothe LLVM Linker (LLD). The allure of faster link speeds, a cleancodebase, and seamless LTO integration is undeniable. However, as LLD'smaintainer, I must tread carefully. While accommodating these users isnice for LLD's growth, incorporating custom linker extensions riskscompromising the project's code quality and maintainability. Strikingthe right balance between flexibility and code integrity is essential toensure LLD remains a robust and efficient linker for a wide range ofusers.

GNU ld also supports extensions for embedded programming. Icategorize these extensions into two groups: mature and experimental.Many of the established extensions exhibit well-defined semantics andhave been incorporated into LLD. However, some newer extensions in GNUld appear less thoughtfully designed and inflexible.

When considering a specific extension, we should prioritize practicalneeds over arbitrary adherence to GNU ld's implementation. If compellingreasons justify a particular feature and GNU ld's approach provesrestrictive, we should feel empowered to innovate within LLD.

Conversely, when developing new extensions, it's essential to engagewith the broader community. I often submit feature requests to GNU ld toinform decisions we are going to make. I believe this collaborativeapproach fosters knowledge sharing.


There is no performance-specific change.

In the future, we should refactorRelocationScanner::scanOne to make Arch/*.cppdrive the relocation process, removing the virtual functionoverhead.

Link: lld 18 ELFchanges

Mapping symbols: rethinking for efficiency

作者 MaskRay
2024年7月21日 15:00

In object files, certain code patterns embed data within instructionsor transitions occur between instruction sets. This can create hurdlesfor disassemblers, which might misinterpret data as code, resulting ininaccurate output. Furthermore, code written for one instruction setcould be incorrectly disassembled as another. To address these issues,some architectures (Arm, C-SKY, NDS32, RISC-V, etc) define mappingsymbols to explicitly denote state transition. Let's explore thisconcept using an AArch32 code example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
.text
adr r0, .LJTI0_0
vldr d0, .LCPI0_0
bl thumb_callee
.LBB0_1:
nop
.LBB0_2:
nop

.LCPI0_0:
.long 3367254360 @ double 1.234
.long 1072938614

nop

.LJTI0_0:
.long .LBB0_1
.long .LBB0_2

.thumb
.type thumb_callee, %function
thumb_callee:
bx lr

Jump tables (.LJTI0_0): Jump tables canreside in either data or text sections, each with its trade-offs. Herewe see a jump table in the text section(MachineJumpTableInfo::EK_Inline in LLVM), allowing asingle instruction to take its address. Other architectures generallyprefer to place jump tables in data sections. While avoiding data incode, RISC architectures typically require two instructions tomaterialize the address, since text/data distance can be prettylarge.

Constant pool (.LCPI0_0): Thevldr instruction loads a 16-byte floating-point literal tothe SIMD&FP register.

ISA transition: This code blends A32 and T32instructions (the latter used in thumb_callee).

In these cases, a dumb disassembler might treat data as code and trydisassembling them as instructions. Assemblers create mapping symbols toassist disassemblers. For this example, the assembled object file lookslike the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$a:
...

$d:
.long 3367254360 @ double 1.234
.long 1072938614

$a:
nop

$d:
.long .LBB0_1
.long .LBB0_2
.long .LBB0_3

$t:
thumb_callee:
bx lr

Toolchain

Now, let's delve into how mapping symbols are managed within thetoolchain.

Disassemblers

llvm-objdump sorts symbols, including mapping symbols, relative tothe current section, presenting interleaved labels and instructions.Mapping symbols act as signals for the disassembler to switchstates.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
% llvm-objdump -d --triple=armv7 --show-all-symbols a.o

a.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <$a.0>:
0: e28f0018 add r0, pc, #24
4: ed9f0b02 vldr d0, [pc, #8] @ 0x14
8: ebfffffe bl 0x8 @ imm = #-0x8
c: e320f000 hint #0x0
10: e320f000 hint #0x0

00000014 <$d.1>:
14: 58 39 b4 c8 .word 0xc8b43958
18: 76 be f3 3f .word 0x3ff3be76

0000001c <$a.2>:
1c: e320f000 nop

00000020 <$d.3>:
20: 0c 00 00 00 .word 0x0000000c
24: 10 00 00 00 .word 0x00000010

00000028 <$t.4>:
00000028 <thumb_callee>:
28: 4770 bx lr

I changed llvm-objdump18 to not display mapping symbols as labels unless--show-all-symbols is specified.

nm

Both llvm-nm and GNU nm typically conceal mapping symbols alongsideSTT_FILE and STT_SECTION symbols. However, youcan reveal these special symbols using the --special-symsoption.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
% cat a.s
foo:
bl thumb_callee
.long 42
.thumb
thumb_callee:
bx lr
% clang --target=arm-linux-gnueabi -c a.s
% llvm-nm a.o
00000000 t foo
00000008 t thumb_callee
% llvm-nm --special-syms a.o
00000000 t $a.0
00000004 t $d.1
00000008 t $t.2
00000000 t foo
00000008 t thumb_callee

GNU nm behaves similarly, but with a slight quirk. If the default BFDtarget isn't AArch32, mapping symbols are displayed even without--special-syms.

1
2
3
4
5
6
7
8
9
% arm-linux-gnueabi-nm a.o
00000000 t foo
00000008 t thumb_callee
% nm a.o
00000000 t $a.0
00000004 t $d.1
00000008 t $t.2
00000000 t foo
00000008 t thumb_callee

Symbolizers

Mapping symbols, being non-unique and lacking descriptive names, areintentionally omitted by symbolizers like addr2line and llvm-symbolizer.Their primary role lies in guiding the disassembly process rather thanproviding human-readable context.

Size problem: symbol tablebloat

While mapping symbols are useful, they can significantly inflate thesymbol table, particularly in 64-bit architectures(sizeof(Elf64_Sym) == 24) with larger programs. This issuebecomes more pronounced when using-ffunction-sections -fdata-sections, which generatesnumerous small sections.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
% cat a.c
void f0() {}
void f1() {}
void f2() {}
int d1 = 1;
int d2 = 2;
% clang -c --target=aarch64 -ffunction-sections -fdata-sections a.c
% llvm-objdump -d --show-all-symbols a.o # GNU objdump --show-all-symbols does no display mapping symbols

a.o: file format elf64-littleaarch64

Disassembly of section .text.f0:

0000000000000000 <$x>:
0000000000000000 <f0>:
0: d65f03c0 ret

Disassembly of section .text.f1:

0000000000000000 <$x>:
0000000000000000 <f1>:
0: d65f03c0 ret

Disassembly of section .text.f2:

0000000000000000 <$x>:
0000000000000000 <f2>:
0: d65f03c0 ret
% llvm-readelf -sX a.o

Symbol table '.symtab' contains 16 entries:
Num: Value Size Type Bind Vis+Other Ndx(SecName) Name [+ Version Info]
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS a.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 3 (.text.f0) .text.f0
3: 0000000000000000 0 NOTYPE LOCAL DEFAULT 3 (.text.f0) $x
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 (.text.f1) .text.f1
5: 0000000000000000 0 NOTYPE LOCAL DEFAULT 4 (.text.f1) $x
6: 0000000000000000 0 SECTION LOCAL DEFAULT 5 (.text.f2) .text.f2
7: 0000000000000000 0 NOTYPE LOCAL DEFAULT 5 (.text.f2) $x
8: 0000000000000000 0 NOTYPE LOCAL DEFAULT 6 (.data) $d
9: 0000000000000000 0 NOTYPE LOCAL DEFAULT 7 (.comment) $d
10: 0000000000000000 0 NOTYPE LOCAL DEFAULT 9 (.eh_frame) $d
11: 0000000000000000 4 FUNC GLOBAL DEFAULT 3 (.text.f0) f0
12: 0000000000000000 4 FUNC GLOBAL DEFAULT 4 (.text.f1) f1
13: 0000000000000000 4 FUNC GLOBAL DEFAULT 5 (.text.f2) f2
14: 0000000000000000 4 OBJECT GLOBAL DEFAULT 6 (.data) d1
15: 0000000000000004 4 OBJECT GLOBAL DEFAULT 6 (.data) d2

Except the trivial cases (e.g. empty section), in both GNU assemblerand LLVM integrated assemble's AArch64 ports:

  • A non-text section (data, debug, etc) almost always starts with aninitial $d.
  • A text section almost always starts with an initial $x.ABI requires a mapping symbol at offset 0.

The behaviors ensure that each function or data symbol has acorresponding mapping symbol, while extra mapping symbols might occur inrare cases. Thereore, the number of mapping symbols in the output symboltable usually exceeds 50%.

Most text sections have 2 or 3 symbols:

  • A STT_FUNC symbol.
  • A STT_SECTION symbol due to a referenced from.eh_frame. This symbol is absent if-fno-asynchronous-unwind-tables.
  • A $x mapping symbol.

During the linking process, the linker combines input sections andeliminates STT_SECTION symbols.

Note: LLVM integrated assemblers used to create unique$x.<digit> due to an assembler limitation. I haveupdated LLVM 19 to drop.<digit> suffixes.

In LLVM's ARM port, data sections do not have mapping symbols, unlessthere are A32 or T32 instructions (D30724).

Alternative mapping symbolscheme

I have proposed an alternaive scheme to address the size concern.

  • Text sections: Assume an implicit $x at offset 0. Addan ending $x if the final data isn't instructions.
  • Non-text sections: Assume an implicit $d at offset 0.Add an ending $d only if the final data isn't datadirectives.

This approach eliminates most mapping symbols while ensuring correctdisassembly. Here is an illustrated assembler example:

1
2
3
4
5
6
7
8
.section .text.f0,"ax"
ret
// emit $d
.long 42
// emit $x. Without this, .text.f1 might be interpreted as data.

.section .text.f1,"ax"
ret

The ending mapping symbol is to ensure the subsequent section in thelinker output starts with the desired state. The data in code case isextremely rare for AArch64 as jump tables are placed in.rodata.

Impressive results

I have developed a LLVM patches to add an opt-in optionclang -Wa,-mmapsyms=implicit. Experiments with a Clangbuild using this alternative scheme have shown impressive results,eliminating over 50% of symbol table entries.

1
2
3
4
.o size   |   build |
261345384 | a64-0 | standard
260443728 | a64-1 | optimizing $d
254106784 | a64-2 | optimizing both $x
1
2
3
4
5
6
% bloaty a64-2/bin/clang -- a64-0/bin/clang
FILE SIZE VM SIZE
-------------- --------------
-5.4% -1.13Mi [ = ] 0 .strtab
-50.9% -4.09Mi [ = ] 0 .symtab
-4.0% -5.22Mi [ = ] 0 TOTAL

However, omitting a mapping symbol at offset 0 for sections withinstructions is currently non-conformant. An ABI update has been requestedto address this, though it unlikely has an update in the near term dueto lack of GNU toolchain support and interoperability concern.

I'll elaborate interoperability concerns below. Note, they'reunlikely to impact a majority of users.

For instance, if a text section with trailing data is assembled usingthe traditional behavior, the last mapping symbol will be$d. When linked with another text section assembled usingthe new behavior (lacking an initial $x), disassemblersmight misinterpret the start of the latter section as data.

Similarly, linker scripts that combine non-text and text sectionscould lead to text sections appearing in a data state.

1
2
3
4
SECTIONS {
...
mix : { *(.data.*) *(.text.foo) }
}

However, many developers would classify these scenarios as errorconditions.

A text section may rarely start with data directives (e.g.,-fsanitize=function, LLVM prefixdata). When the linker combines two such sections, the ending$x of the first section and the initial $d ofthe second might have the same address.

1
2
3
4
5
6
7
8
9
.section .text.0, "ax"
// $d
.word 0
// $x this

.section .text.1, "ax"
// $d may have the same address
.word 0
// $x

In a straightforward implementation, symbols are stable-sorted byaddress and the last symbol at an address wins. Ideally we want$d $x $d $x. If the sections are in different files, alinker that respects input order will naturally achieves this. Ifthey're in the same file, the assembler should output$d $x $d $x instead of $d $d $x $x. This worksif .text.0 precedes .text.1 in the linkeroutput, but the other section order might be unexpected. In the worstcase where the linker's section order mismatches the assembler's sectionorder (--symbol-ordering-file=,--call-graph-profile-sort, linker scripts), the initialdata directives could be mistakenly identified as code. But thefollowing code won't, making this an acceptable risk for certainusers.

Teaching linkers to scan and insert missing mapping symbols istechnically possible but inelegant and impactsperformance. There's a strong emphasis on the philosophy of "smartformat, dumb linker," which favors keeping the format itself intelligentand minimizing the complexity of the linker.

Ultimately, the proposed alternative scheme effectively addressessymbol table bloat, but requires careful consideration for complianceand interoperability. With this optimization enabled, the remainingsymbols would primarily stem from range extension thunks, prebuiltlibraries, or highly specialized assembly code.

Mapping symbols forrange extension thunks

When lld creates an AArch64range extension thunk, it defines a $x symbol tosignify the A64 state. This symbol is only relevant when the precedingsection ends with the data state, a scenario that's only possible withthe traditional assembler behavior.

Given the infrequency of range extension thunks, the $xsymbol overhead is generally tolerable.

Peculiar alignmentbehavior in GNU assembler

In contrast to LLVM's integrated assembler, which restricts statetransitions to instructions and data directives, GNU assemblerintroduces additional state transitions for alignments. These alignmentscan be either implicit (arising from alignment requirements) or explicit(specified through directives). This behavior has led to someinteresting edge cases and bug fixes over time. (See related code beside[PATCH][GAS][AARCH64]Fix"align directive causes MAP_DATA symbol to be lost"https://sourceware.org/bugzilla/show_bug.cgi?id=20364)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
.section .foo1,"a"
// no $d
.word 0

.section .foo2,"a"
// $d
.balign 4
.word 0

.section .foo3,"a"
// $d
.word 0
// $a
nop

In the example, .foo1 only contains data directives andthere is no $d. However, .foo2 includes analignment directive, triggering the creation of a $dsymbol. Interestingly, .foo3 starts with data but ends withan instruction, necessitating both a $d and an$a mapping symbol.

It's worth noting that DWARF sections, typically generated by thecompiler, don't include explicit alignment directives. They behavesimilarly to the .foo1 example and lack an associated$d mapping symbol.

AArch32 ld --be8

The BE-8 mode (byte-invariant addressing big-endian mode) requiresthe linker to convert big-endian code to little-endian. This is implemented by scanningmapping symbols. See Linker notes onAArch32#--be8 for context.

RISC-V ISA extension

RISC-V mapping symbols are similar to AArch64, but with a notableextension:

1
2
$x<ISA>       | Start of a sequence of instructions with <ISA> extension.
$x<ISA>.<any>

The alternative scheme for optimizing symbol table size can beadapted to accommodate RISC-V's $x<ISA> symbols. Theapproach remains the same: add an ending $x<ISA> onlyif the final data in a text section doesn't belong to the desiredISA.

The alternative scheme can be adapted to work with$x<ISA>: Add an ending $x<ISA> ifthe final data isn't of the desired ISA.

This adaptation works seamlessly as long as all relocatable filesprovided to the linker share the same baseline ISA. However, inscenarios where the relocatable files are more heterogeneous, a crucialquestion arises: which state should be restored at section end? Wouldthe subsequent section in the linker output be compiled with differentISA extensions?

Technically, we could teach linkers to insert $xsymbols, but scanning each input text section isn't elegant.

Mach-OLC_DATA_IN_CODE load command

In contrast to ELF's symbol pair approach, Mach-O employs theLC_DATA_IN_CODE load command to store non-instructionranges within code sections. This method is remarkably compact, witheach entry requiring only 8 bytes. ELF, on the other hand, needs twosymbols ($d and $x) per data region, consuming48 bytes (in ELFCLASS64) in the symbol table.

1
2
3
4
5
struct data_in_code_entry {
uint32_t offset; /* from mach_header to start of data range*/
uint16_t length; /* number of bytes in data range */
uint16_t kind; /* a DICE_KIND_* value */
};

In llvm-project, the possible kind values are defined inllvm/include/llvm/BinaryFormat/MachO.h. I recentlyrefactored the generic MCAssembler to place this Mach-Ospecific thing, alongside others, to MachObjectWriter.

1
2
3
4
5
6
7
8
enum DataRegionType {
// Constants for the "kind" field in a data_in_code_entry structure
DICE_KIND_DATA = 1u,
DICE_KIND_JUMP_TABLE8 = 2u,
DICE_KIND_JUMP_TABLE16 = 3u,
DICE_KIND_JUMP_TABLE32 = 4u,
DICE_KIND_ABS_JUMP_TABLE32 = 5u
};

Achieving Mach-O'sefficiency in ELF

Given ELF's symbol table bloat due to the st_size member(myprevious analysis), how can it attain Mach-O's level of efficiency?Instead of introducing a new format, we can leverage the standard ELFfeature: SHF_COMPRESSED.

Both .symtab and .strtab lack theSHF_ALLOC flag, making them eligible for compressionwithout requiring any changes to the ELF specification.

  • LLVMdiscussion
  • A featurerequest has already been submitted to binutils to explore thispossibility.

The implementation within LLVM shouldn't be overly complex, and I'mmore than willing to contribute if there's interest from thecommunity.

Linker compatibility and the "User-Agent" problem

作者 MaskRay
2024年7月7日 15:00

The output of ld.lld -v includes a message "compatiblewith GNU linkers" to address detectionmechanism used by GNU Libtool. This problem is described by Softwarecompatibility and our own "User-Agent" problem.

The latest m4/libtool.m4 continues to rely on aGNU check.

1
2
3
4
5
6
7
8
9
10
[AC_CACHE_CHECK([if the linker ($LD) is GNU ld], lt_cv_prog_gnu_ld,
[# I'd rather use --version here, but apparently some GNU lds only accept -v.
case `$LD -v 2>&1 </dev/null` in
*GNU* | *'with BFD'*)
lt_cv_prog_gnu_ld=yes
;;
*)
lt_cv_prog_gnu_ld=no
;;
esac])

Check-basedconfiguration can be a valuable tool, ensuring software remainsfunctional in the future. However, this example highlights how overlyspecific checks can lead to unintended consequences.

If Libtool needs to check whether certain options are available, itcan utilize -v.

1
2
3
4
5
6
7
% ld.bfd -v --whole-archive
GNU ld (GNU Binutils) 2.42.0
% ld.bfd -v --whole-archivex; echo $?
GNU ld (GNU Binutils) 2.42.0
ld.bfd: unrecognized option '--whole-archivex'
ld.bfd: use the --help option for usage information
1

This blog post explores more forms of the "User-Agent" problemexposed by an LLD patch changing the version message format.

LLD supports many object file formats. It largely emulates thebehavior of GNU ld for ELF, while emulating the behavior of MSVClink.exe for PE/COFF. Previously, LLD's ELF port displays the versioninformation like this:

1
2
% /tmp/out/custom2/bin/ld.lld --version
LLD 19.0.0 (compatible with GNU linkers)

A recent patch (llvm-project#97323)changed it to one of the following formats, depending on the build-timevariable LLVM_APPEND_VC_REV:

With LLVM_APPEND_VC_REV=on:

1
2
% /tmp/out/custom2/bin/ld.lld --version
LLD 19.0.0 (git@github.com:llvm/llvm-project.git 0f9fbbb63cfcd2069441aa2ebef622c9716f8dbb), compatible with GNU linkers

With LLVM_APPEND_VC_REV=off:

1
2
% /tmp/out/custom2/bin/ld.lld --version
LLD 19.0.0, compatible with GNU linkers

Meson

In Meson, mesonbuild/linkers/detect.py:guess_win_linkerchecks the --version output to determine whether the LLDinvocation is for ELF or PE/COFF. It performed an overly strict check"(compatible with GNU linkers)", which failed when the parentheses werestripped by #97323.

1
2
3
4
5
6
7
8
9
10
# mesonbuild/linkers/detect.py
if 'LLD' in o.split('\n', maxsplit=1)[0]:
if '(compatible with GNU linkers)' in o:
return linkers.LLVMDynamicLinker(
compiler, for_machine, comp_class.LINKER_PREFIX,
override, version=search_version(o))
elif not invoked_directly:
return linkers.ClangClDynamicLinker(
for_machine, override, exelist=compiler, prefix=comp_class.LINKER_PREFIX,
version=search_version(o), direct=False, machine=None)

The latest Meson has loosened the check (meson#13383).

It seems that the linker detection has a larger problem that--target= is not taken into account with Clang (#6662).

Linux kernel

The Linux kernel's scripts/ld-version.sh script detectslinker versions. Introduced in 2014, it initially checked for GNU ldcompatibility with GCC LTO (though LTO support remains unmerged). It waslater revamped to handle LLD versions as well. While it can handlesuffixes like 2.34-4.fc32, it struggles with versionscontaining with comma suffix (19.0.0,).

1
2
% scripts/ld-version.sh /tmp/out/custom2/bin/ld.lld
scripts/ld-version.sh: line 19: 10000 * 19 + 100 * 0 + 0,: syntax error: operand expected (error token is ",")

The script extracts the version string from the--version output and parses it as major.minor.patch.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Get the first line of the --version output.
IFS='
'
set -- $(LC_ALL=C "$@" --version)

# Split the line on spaces.
IFS=' '
set -- $1

...

# Some distributions append a package release number, as in 2.34-4.fc32
# Trim the hyphen and any characters that follow.
version=${version%-*}

To support suffixes starting with either - or,, the script willemploy a POSIX shell trick utilizing the "Remove Largest SuffixPattern" feature:

1
version=${version%%[!0-9.]*}

More fun with versions

llvm-nm and llvm-objcopy also claim GNU compatibility.

1
2
3
4
5
6
7
8
9
10
% /tmp/Rel/bin/llvm-nm --version
llvm-nm, compatible with GNU nm
LLVM (http://llvm.org/):
LLVM version 19.0.0git
Optimized build with assertions.
% /tmp/Rel/bin/llvm-objcopy --version
llvm-objcopy, compatible with GNU objcopy
LLVM (http://llvm.org/):
LLVM version 19.0.0git
Optimized build with assertions.

Ever wondered what the subtle differences are between-v, -V, and --version when usingGNU ld? Let's break it down:

  • --version skips linker input processing and displaysbrief copyright information.
  • -v and -V keep processing command linearguments and perfoming a linking step. This behavior gives an easy wayto check whether an option is supported.
  • -V goes a step further than -v byincluding a list of supported BFD emulations alongside the versioninformation.

Prior to September 2022, -V in ld.lld used to an aliasfor --version. This caused issues when usinggcc -v -fuse-ld=lld on certain targets like*-freebsd and powerpc-*: gcc passes -V to thelinker, expecting it to process the input files and complete the linkingstep. However, ld.lld's behavior with -V skipped thisprocess.

I made an adjustment by making-V an alias for -v instead. This ensuresthat gcc -v -fuse-ld=lld performs the linking step.

GCC has a similar -v and --versionbehavior, but -V does not exist.

Clang's GNU driver emulates GCC 4.2.1, but you can change the versionwith -fgnuc-version=.

1
2
3
4
5
6
7
8
9
10
% clang -E -dM -xc /dev/null | grep GNU
#define __GNUC_MINOR__ 2
#define __GNUC_PATCHLEVEL__ 1
#define __GNUC_STDC_INLINE__ 1
#define __GNUC__ 4
% clang -E -dM -xc /dev/null -fgnuc-version=5.3.2 | grep GNU
#define __GNUC_MINOR__ 3
#define __GNUC_PATCHLEVEL__ 2
#define __GNUC_STDC_INLINE__ 1
#define __GNUC__ 5
❌
❌