入门状态机解析器

掘金前端

作者风骨

2025年6月26日 11:14

大家好，我是风骨，最近在项目中涉及到 「状态机解析器」 的应用，今天便以这个为话题同大家进行分享，相信可以帮助一些小伙伴解决项目难题。

接下来让我们一起从前端角度来理解状态机，掌握的它的设计思想和应用场景。

附 - 毛遂自荐：笔者当前离职在看工作机会，各位小伙伴如果有合适的内推机会，期待您的引荐 🤝

个人简介：男，27岁，专科 · 计算机专业，6 年前端工作经历，从事过 AIGC、低代码、动态表单、前端基建等方向工作，base 北京 - 高级前端岗，vx 联系方式：iamcegz

1、认识状态机

状态机全称是有限状态机(Finite State Machine, FSM) ，是一种程序设计思想：分析需求，先定义好会出现的 case 状态，然后从初始状态开始，执行对应的状态函数完成该状态的工作，同时推导出下一个状态。继续重复上述操作，直到结束。

在内容解析场景下，状态机的核心概念：

状态(State)： 定义需要处理的状态，一个状态机至少要包含两个状态。以 <button> 代码标签为例，状态可以定义为：< 左括号、 button 标签名、> 右括号 三个状态；
状态处理函数(State handle) ：每个状态都会对应一个处理函数，完成该状态要做的事情。如在标签名状态下，收集得到标签名称 button；
转换(Transition)： 定义从一个状态到另一个状态的变化规则，逻辑可以编写在状态处理函数中，推导出下一个状态。如 < 左括号 的下一个状态是解析标签名。

采用状态机的优势在于：

封装了每一种状态的转换规则，清晰且便于维护，解决 if else 逻辑分支嵌套严重问题；
可扩展性强，新增状态不需要改动核心逻辑，新增一个状态转换函数即可；

在前端，状态机思想常常被应用在 代码解析器 上，如：

词法分析器，类似 Babel 工作原理，将一段字符串 code 解析成 token 组成的数组 tokens；
流式内容提取，AIGC 生成的流式内容的解析。

2、应用于词法分析器

2.1、需求描述

Babel 编译过程首要阶段是 Parsing 解析，分为「词法分析」和「语法分析」两个子阶段。我们的需求是采用状态机思想实现「词法分析」。

以下面 JSX 代码为例：

<h1 id="title"><span>hello</span>world</h1>

经过词法分析，把它们分割成由一个个 token 组成的数组 tokens。输出结果如下：

PS：token 是指代码在词法分析阶段，将原始代码分割成一个个代码碎片，可以是标签符合、名称、属性等。

[
  { type: 'LeftParentheses', value: '<' },
  { type: 'JSXIdentifier', value: 'h1' },
  { type: 'AttributeKey', value: 'id' },
  { type: 'AttributeValue', value: '"title"' },
  { type: 'RightParentheses', value: '>' },
  { type: 'LeftParentheses', value: '<' },
  { type: 'JSXIdentifier', value: 'span' },
  { type: 'RightParentheses', value: '>' },
  { type: 'JSXText', value: 'hello' },
  { type: 'LeftParentheses', value: '<' },
  { type: 'BackSlash', value: '/' },
  { type: 'JSXIdentifier', value: 'span' },
  { type: 'RightParentheses', value: '>' },
  { type: 'JSXText', value: 'world' },
  { type: 'LeftParentheses', value: '<' },
  { type: 'BackSlash', value: '/' },
  { type: 'JSXIdentifier', value: 'h1' },
  { type: 'RightParentheses', value: '>' }
]

2.2、思路分析

首先，根据输入的 JSX，分析出我们需要定义哪些状态，初始状态定义为 Start，其他状态基本和 token 类型一一对应：一个 token 类型代表一类状态。

enum ParserState {
  Start = "Start", // 开始
  LeftParentheses = "LeftParentheses", // <
  RightParentheses = "RightParentheses", // >
  JSXIdentifier = "JSXIdentifier", // 标识符（标签名称）
  AttributeKey = "AttributeKey", // 属性的 key
  AttributeValue = "AttributeValue", // 属性的值
  JSXText = "JSXText", // 文本内容
  BackSlash = "BackSlash", // 反斜杠 /
}

接着，我们思考如何组织处理这些状态：遍历每一个字符，让当前状态对应的处理函数去执行工作。因此 Parser 的框架结构可以这样搭建：

type Token = { type: ParserState | ""; value: string }

class Parser {
  private tokens: Token[] = []
  private currentToken: Token = { type: "", value: "" }
  // 当前状态
  private state: ParserState = ParserState.Start
  // 状态处理函数集合
  private handlers: Record<ParserState, (char: string) => void> = {
    [ParserState.Start]: this.handleStart,
    ... 其他的状态处理函数
  }

  constructor() {}

  private handleStart(char: string) {...}

  public parse(input: string) {
    // 遍历每一个字符，交给 state 对应的状态函数处理
    for (let char of input) {
      const handler = this.handlers[this.state]
      if (handler) {
        handler.call(this, char)
      } else {
        throw new Error(`Unknown state: ${this.state}`)
      }
    }
    return this.tokens;
  }
}

其中：

state 定义了当前所处的状态；
handlers 定义了所有状态对应的处理函数集合；

最后，我们要明确每个状态函数的工作内容：完成两件事情，

完成当前状态下要做的工作：创建 token ；
根据 chat 字符类型，推算出下一步该如何走，即推算出下一个状态。

比如，最初的 state = "Start"，它的状态函数要做的事情是：匹配 < 字符。

1）创建 < 类型的 token
2）推算出下一个状态为 state = "LeftParentheses"

private handleStart(char: string) {
  if (char === "<") {
    // 状态函数要做的工作：创建存储 token 的容器
    this.emit({ type: ParserState.LeftParentheses, value: "<" })
    // 推算出下一个状态
    this.state = ParserState.LeftParentheses
  } else {
    // 第一个字符不是 < 抛出错误
    throw new Error(`第一个字符必须是 <，得到的却是 ${char}`)
  }
}

再往下走，state = "LeftParentheses" 的状态函数要做的事情是：匹配是否是普通字符（字母、数字），若是字符说明是标签名称如 h1。

1）记录当前字符
2）推算出下一个状态为 state = "JSXIdentifier"

const LETTERS = /[A-Za-z0-9]/
private handleLeftParentheses(char: string) {
  // 是否字母，如果是，进入标签名称状态（标识符状态收集）
  if (LETTERS.test(char)) {
    this.currentToken.type = ParserState.JSXIdentifier
    this.currentToken.value += char
    this.state = ParserState.JSXIdentifier
  }
}

再往下走，state = "JSXIdentifier" 的状态函数要做的事情是：

1）如果 chat 是普通字符（字母、数字），记录当前字符即可，不用更改状态，下一个字符还是交给它处理；
2）如果 chat 是空格，这说明标签名称解析完成了，
- 1）创建标签名称对应的 token；
- 2）推算出下一个状态为 state = "AttributeKey"

private handleJSXIdentifier(char: string) {
  if (LETTERS.test(char)) {
    this.currentToken.value += char // 继续收集标识符，且不用更新状态
  } else if (char === " ") {
    // 收集标识符过程中遇到了空格，进入标签结束状态
    this.emit(this.currentToken)
    this.state = ParserState.AttributeKey
  }
}

到这里，目标代码中的 <h1 对应 tokens 已解析完成，以此类推。总结一下就是不断推测下一个状态要做什么事情： < 后面是标签名称，标签名称后面可能是属性名，属性名后面可能是 = ， = 后面可能是属性值 ...

2.3、具体实现

// 定义解析状态，同时一些枚举值也会作为 token 类型（有一些一一对应）
enum ParserState {
  Start = "Start", // 开始
  LeftParentheses = "LeftParentheses", // <
  RightParentheses = "RightParentheses", // >
  JSXIdentifier = "JSXIdentifier", // 标识符（标签名称）
  AttributeKey = "AttributeKey", // 属性的 key
  AttributeValue = "AttributeValue", // 属性的值
  TryLeaveAttribute = "TryLeaveAttribute", // 试图离开属性，若后面没有属性则会离开
  JSXText = "JSXText", // 文本内容
  BackSlash = "BackSlash", // 反斜杠 /
}

// 正则匹配字符
const LETTERS = /[A-Za-z0-9]/

type Token = { type: ParserState | ""; value: string }

class Parser {
  private tokens: Token[] = []
  private currentToken: Token = { type: "", value: "" }
  // 当前状态
  private state: ParserState = ParserState.Start
  // 状态处理函数集合
  private handlers: Record<ParserState, (char: string) => void> = {
    [ParserState.Start]: this.handleStart,
    [ParserState.LeftParentheses]: this.handleLeftParentheses,
    [ParserState.RightParentheses]: this.handleRightParentheses,
    [ParserState.JSXIdentifier]: this.handleJSXIdentifier,
    [ParserState.AttributeKey]: this.handleAttributeKey,
    [ParserState.AttributeValue]: this.handleAttributeValue,
    [ParserState.TryLeaveAttribute]: this.handleTryLeaveAttribute,
    [ParserState.JSXText]: this.handleJSXText,
    [ParserState.BackSlash]: this.handleBackSlash,
  }

  constructor() {}

  private emit(token: Token) {
    this.tokens.push({ ...token }) // 添加到 tokens 中
    this.currentToken.type = this.currentToken.value = ""
  }

  private handleStart(char: string) {
    if (char === "<") {
      // 状态函数要做的工作：创建存储 token 的容器
      this.emit({ type: ParserState.LeftParentheses, value: "<" })
      // 推算出下一个状态
      this.state = ParserState.LeftParentheses
    } else {
      // 第一个字符不是 < 抛出错误
      throw new Error(`第一个字符必须是 <，得到的却是 ${char}`)
    }
  }

  private handleLeftParentheses(char: string) {
    // 是否字母，如果是，进入标签名称状态（标识符状态收集）
    if (LETTERS.test(char)) {
      this.currentToken.type = ParserState.JSXIdentifier
      this.currentToken.value += char
      this.state = ParserState.JSXIdentifier
    } else if (char === "/") {
      // 闭合标签，如：</h1>
      this.emit({ type: ParserState.BackSlash, value: "/" })
      this.state = ParserState.BackSlash
    }
  }

  private handleJSXIdentifier(char: string) {
    if (LETTERS.test(char)) {
      this.currentToken.value += char // 继续收集标识符，且不用更新状态
    } else if (char === " ") {
      // 收集标识符过程中遇到了空格，进入标签结束状态
      this.emit(this.currentToken)
      this.state = ParserState.AttributeKey
    } else if (char === ">") {
      // 说明此标签已没有要处理的属性
      this.emit(this.currentToken)
      this.emit({ type: ParserState.RightParentheses, value: ">" })
      this.state = ParserState.RightParentheses
    }
  }

  private handleAttributeKey(char: string) {
    if (LETTERS.test(char)) {
      this.currentToken.type = ParserState.AttributeKey
      this.currentToken.value += char // 继续收集标识符，且不用更新状态
    } else if (char === "=") {
      this.emit(this.currentToken)
      this.state = ParserState.AttributeValue
    }
  }

  private handleAttributeValue(char: string) {
    if (!this.currentToken.value && char === '"') {
      this.currentToken.type = ParserState.AttributeValue
      this.currentToken.value = '"'
    } else if (LETTERS.test(char)) {
      this.currentToken.value += char
    } else if (char === '"') {
      // 说明属性值结束了，存储 token
      this.currentToken.value += '"'
      this.emit(this.currentToken)
      this.state = ParserState.TryLeaveAttribute // 试图离开属性，若后面没有属性则会离开
    }
  }

  private handleTryLeaveAttribute(char: string) {
    if (char === " ") {
      // 如果 char 是空格，说明后面有新的属性，进入属性状态
      this.state = ParserState.AttributeKey
    } else if (char === ">") {
      // 说明开始标签结束了
      this.emit({ type: ParserState.RightParentheses, value: ">" })
      this.state = ParserState.RightParentheses
    }
  }

  private handleRightParentheses(char: string) {
    // 如果是 <，进入标签开始状态
    if (char === "<") {
      this.emit({ type: ParserState.LeftParentheses, value: "<" })
      this.state = ParserState.LeftParentheses
    } else {
      // 认为是纯文本，如 world
      this.currentToken.type = ParserState.JSXText
      this.currentToken.value += char
      this.state = ParserState.JSXText
    }
  }

  private handleJSXText(char: string) {
    if (LETTERS.test(char)) {
      this.currentToken.value += char
    } else if (char === "<") {
      // 遇到了和文本同级的兄弟标签
      this.emit(this.currentToken) // { type: JSXText, value: 'world' }
      this.emit({ type: ParserState.LeftParentheses, value: "<" })
      this.state = ParserState.LeftParentheses
    }
  }

  private handleBackSlash(char: string) {
    if (LETTERS.test(char)) {
      this.currentToken.type = ParserState.JSXIdentifier
      this.currentToken.value += char
      this.state = ParserState.JSXIdentifier
    }
  }

  public parse(input: string) {
    // 遍历每一个字符，交给 state 对应的状态函数处理
    for (let char of input) {
      const handler = this.handlers[this.state]
      if (handler) {
        handler.call(this, char)
      } else {
        throw new Error(`Unknown state: ${this.state}`)
      }
    }
    return this.tokens
  }
}

让我们来测试一下，相信和预期的结果一致。

const sourceCode = `<h1 id="title"><span>hello</span>world</h1>`
const parser = new Parser()
console.log(parser.parse(sourceCode))

3、应用于流式内容提取

3.1、需求描述

当下 AIGC 应用已经非常普遍，AI 问答交互普遍采用流式的形式输出内容。

假设我们有一个 AI 生成组件代码平台，需要从流式内容中提取到代码块内容，实时呈现到页面 CodeIDE 中，该如何实现？

PS：注意，如果是一段完整的内容，可以使用正则匹配实现，但这里是流式逐字输出的内容该如何实现？

比如 AI 流式输出内容为：

The generated components code is as follows:

<ComponentFile fileName="App.tsx" isEntryFile="true">
  import Button from "./Button";
  export const App = () => {
    return (
      <Button>按钮</Button>
    )
  }
</ComponentFile>
<ComponentFile fileName="Button.tsx">
  export const Button = ({ children }) => {
    return (
      <button>{children}</button>
    )
  }
</ComponentFile>

The content contains two components: App and Button

现在期望在流式输出过程中匹配到以下内容时，通过事件回调暴露给外部：

匹配到 <ComponentFile fileName="App.tsx" isEntryFile="true"> 时，触发 onOpenTag 事件，并将 tagName、fileName、isEntryFile 等属性通过事件暴露出去；
匹配到 <ComponentFile> 标签内的代码时，触发 onConent 事件，将解析到的代码 code 内容暴露出去；
匹配到 </ComponentFile> 时，触发 onCloseTag 事件；

对应到调用解析器的代码示例如下：

let currentFile: {
  name: string
  isEntryFile: boolean
  content: string
} | null = null

const parser = new StreamParser();

parser.onOpenTag = function ({ name, attrs }) {
  if (name === "ComponentFile") {
    const fileName = attrs.fileName as string
    const isEntryFile = attrs.isEntryFile === "true"
    // 定义组件结构
    currentFile = {
      name: fileName,
      isEntryFile,
      content: "",
    }
    console.log("onOpenTag", name, attrs)
  }
}

parser.onContent = function ({ name }, text) {
  if (name === "ComponentFile" && currentFile) {
    // 收集文件内容
    currentFile.content += text
    // TODO... 自定义处理文件逻辑，如将 code content 渲染到 CodeIDE 中
  }
}

parser.onCloseTag = function ({ name }) {
  if (name === "ComponentFile" && currentFile) {
    console.log("onCloseTag", name, currentFile)
    // TODO... 自定义处理文件逻辑
    currentFile = null
  }
}

打印输出示例：

onOpenTag ComponentFile { fileName: 'App.tsx', isEntryFile: 'true' }
onCloseTag ComponentFile {
  name: 'App.tsx',
  isEntryFile: true,
  content: '\n' +
    '  import Button from "./Button";\n' +
    '  export const App = () => {\n' +
    '    return (\n' +
    '      <Button>按钮</Button>\n' +
    '    )\n' +
    '  }\n'
}
onOpenTag ComponentFile { fileName: 'Button.tsx' }
onCloseTag ComponentFile {
  name: 'Button.tsx',
  isEntryFile: false,
  content: '\n' +
    '  export const Button = ({ children }) => {\n' +
    '    return (\n' +
    '      <button>{children}</button>\n' +
    '    )\n' +
    '  }\n'
}

3.2、思路分析

首先，根据输入的流式内容，分析出我们大致需要定义哪些状态。在该场景下，初始状态定义为 TEXT 处理普通文本(如开头和结尾的文本)，其他状态的定义用于匹配 <ComponentFile> 标签代码，比如 开闭标签符号 <>、标签名称、属性、属性值、代码内容 等。

// 定义解析器状态枚举
enum ParserState {
  TEXT, // 1）初始状态，标签外的普通文本(如开头和结尾的文本)
  TAG_OPEN, // 2）开始标签，刚遇到 <
  TAG_NAME, // 3）解析标签名，如 ComponentFile
  ATTR_NAME_START, // 4）准备解析属性名
  ATTR_NAME, // 5）解析属性名，如 fileName
  ATTR_VALUE_START, // 6）属性值开始，等待 " 或者 '
  ATTR_VALUE, // 7）解析属性值，如 App.tsx
  CONTENT, // 8）要解析的代码内容，如 import Button from "./Button";
  CONTENT_POTENTIAL_END, // 9）在代码内容中遇到可能的结束标签
  CLOSING_TAG_OPEN, // 10）结束标签，遇到 </
  CLOSING_TAG_NAME, // 11）解析结束标签名
  SELF_CLOSING_START, // 12）自闭合标签开始，遇到 /
}

接着，我们思考如何对流式内容进行解析工作：将流式内容实时写入到 buffer 缓存区中，搭配 position 指针来处理缓冲区中的字符。

class StreamParser {
  private buffer: string = "" // 缓冲区，用于存储当前需要解析的内容
  private position: number = 0 // 当前解析到的位置
  ...
  
  // 当前状态
  private state: ParserState = ParserState.TEXT
  // 状态处理函数集合
  private handlers: Record<ParserState, (char: string) => void> = {
    [ParserState.TEXT]: this.handleTextState,
    ...
  }
  
  public write(chunk: string) {
    this.buffer += chunk // 添加到缓冲区
    this.parseBuffer()
  }

  private parseBuffer() {
    while (this.position < this.buffer.length) {
      const char = this.buffer[this.position]
      const handler = this.handlers[this.state]
      if (handler) {
        handler.call(this, char)
      } else {
        throw new Error(`Unknown state: ${this.state}`)
      }
      // 移动到下一个字符
      this.position++
    }
    
    // 若存在解析到的 content，通知 onContent 回调
    this.sendPendingContent()
    ...
  }
}

最后，实现状态函数，完成状态函数要处理的工作并推算出下一个状态。

比如，最初的 state = "TEXT"，它的状态函数要做的事情是：匹配 < 字符，推算出下一个状态为 state = "TAG_OPEN" 解析 <ComponentFile> 标签:

private handleTextState(char: string) {
  if (char === "<") {
    // 开始一个新标签
    this.state = ParserState.TAG_OPEN
  }
  // 文本状态下，其他非代码块字符忽略处理
}

后面的流程大致和「词法分析器」的解析相似，依次匹配标签名、属性名、属性值等。额外增加的逻辑是：在匹配完成 Open 标签（如 <ComponentFile>）后触发 onOpenTag 事件，匹配完成 Close 标签（如 </ComponentFile>）后触发 onCloseTag 事件。

最后重点说一下 onContent 事件和 content 内容匹配逻辑：

解析器的目的主要是解析出 <ComponentFile>component code</ComponentFile> 中间的 component code。当匹配到 tagName = ComponentFile 时，便开始进入 ParserState.CONTENT 状态收集 content。

private parseAsContentTags: string[] = ["ComponentFile"]

handleOpenTag() {
  const tagName = this.currentTagName
  ...
  // 检查是否是 <ComponentFile> 标签
  if (this.parseAsContentTags.includes(tagName)) {
    this.state = ParserState.CONTENT // 开始收集 content 内容
  } else {
    this.state = ParserState.TEXT
  }
}

同时，在收集过程中遇到嵌套标签（比如 <Button>），将不会进入标签解析流程，仅作为 content 内容拼接，只有当匹配到 </ComponentFile> 标签时，ParserState.CONTENT 状态才会结束，这时便收集到了 <ComponentFile> 标签中的完整代码。

private handleContentState(char: string) {
  if (char === "<") {
    // 进入潜在闭合标签状态
    this.sendPendingContent()
    this.pendingClosingTag = "<" // 开始收集可能的结束标签
    this.potentialEndTagMatchPos = 1 // 已匹配到"<"，下一个应该是"/"
    this.state = ParserState.CONTENT_POTENTIAL_END // 切换状态
  } else {
    // 继续累积内容
    this.pendingContent += char
  }
}

private handleContentPotentialEndState(char: string) {
  // 伪代码表示
  if ("匹配的闭合标签名称" === "</ComponentFile">) {
    this.onCloseTag?.(curTagData)
    this.state = ParserState.TEXT // 更新状态为文本，恢复为最初状态
  }
}

从上面代码可以看出，触发 onContent 事件的时机可以在：1）当前流式 buffer 缓冲区解析完成，2）解析 content 时遇到了闭合标签就执行一次。

3.3、具体实现

// 定义解析器状态枚举
enum ParserState {
  TEXT, // 1）初始状态，标签外的普通文本(如开头和结尾的文本)
  TAG_OPEN, // 2）开始标签，刚遇到 <
  TAG_NAME, // 3）解析标签名，如 ComponentFile
  ATTR_NAME_START, // 4）准备解析属性名
  ATTR_NAME, // 5）解析属性名，如 fileName
  ATTR_VALUE_START, // 6）属性值开始，等待 " 或者 '
  ATTR_VALUE, // 7）解析属性值，如 App.tsx
  CONTENT, // 8）要解析的代码内容，如 import Button from "./Button";
  CONTENT_POTENTIAL_END, // 9）在代码内容中遇到可能的结束标签
  CLOSING_TAG_OPEN, // 10）结束标签，遇到 </
  CLOSING_TAG_NAME, // 11）解析结束标签名
  SELF_CLOSING_START, // 12）自闭合标签开始，遇到 /
}

interface TagData {
  name: string
  attrs: Record<string, string>
}

class StreamParser {
  private buffer: string = "" // 缓冲区，用于存储当前需要解析的内容
  private position: number = 0 // 当前解析到的位置
  private tagStack: TagData[] = [] // 标签栈，用于存储当前解析到的标签
  private currentTagName: string = "" // 当前解析到的标签名
  private currentAttrs: TagData["attrs"] = {}
  private currentAttrName: string = "" // 当前解析到的属性名
  private currentAttrValue: string = "" // 当前解析到的属性值
  private attrQuoteChar: string = "" // 当前解析到的属性值的引号字符，用于匹配属性值的结束

  // 定义内容标签集合，仅解析此集合中的标签的内容，作为要解析的原始内容使用 onContent 事件暴露
  private parseAsContentTags: string[] = ["ComponentFile"]
  // 保存潜在的未完成的闭合标签
  private pendingClosingTag: string = ""
  // 保存未发送的原始内容
  private pendingContent: string = ""
  // 当前潜在闭合标签匹配位置
  private potentialEndTagMatchPos: number = 0

  // Event handlers
  public onOpenTag: ((tagData: TagData) => void) | null = null
  public onCloseTag: ((tagData: TagData) => void) | null = null
  public onContent: ((tagData: TagData, content: string) => void) | null = null

  // 当前状态
  private state: ParserState = ParserState.TEXT
  // 状态处理函数集合
  private handlers: Record<ParserState, (char: string) => void> = {
    [ParserState.TEXT]: this.handleTextState,
    [ParserState.TAG_OPEN]: this.handleTagOpenState,
    [ParserState.TAG_NAME]: this.handleTagNameState,
    [ParserState.CLOSING_TAG_OPEN]: this.handleClosingTagOpenState,
    [ParserState.CLOSING_TAG_NAME]: this.handleClosingTagNameState,
    [ParserState.ATTR_NAME_START]: this.handleAttrNameStartState,
    [ParserState.ATTR_NAME]: this.handleAttrNameState,
    [ParserState.ATTR_VALUE_START]: this.handleAttrValueStartState,
    [ParserState.ATTR_VALUE]: this.handleAttrValueState,
    [ParserState.SELF_CLOSING_START]: this.handleSelfClosingStartState,
    [ParserState.CONTENT]: this.handleContentState,
    [ParserState.CONTENT_POTENTIAL_END]: this.handleContentPotentialEndState,
  }

  public write(chunk: string) {
    this.buffer += chunk // 添加到缓冲区
    this.parseBuffer()
  }

  private parseBuffer() {
    while (this.position < this.buffer.length) {
      const char = this.buffer[this.position]
      const handler = this.handlers[this.state]
      if (handler) {
        handler.call(this, char)
      } else {
        throw new Error(`Unknown state: ${this.state}`)
      }
      // 移动到下一个字符
      this.position++
    }

    // 若存在解析到的 content，通知 onContent 回调
    this.sendPendingContent()

    // 处理完成字符，重置缓冲区
    if (this.position >= this.buffer.length) {
      this.buffer = ""
      this.position = 0
    }
  }

  private getCurrentHandlingTagData(): TagData {
    return this.tagStack[this.tagStack.length - 1]
  }

  // 辅助方法：判断是否是空白字符
  private isWhitespace(char: string): boolean {
    return char === " " || char === "\t" || char === "\n" || char === "\r"
  }

  private isValidNameChar(char: string): boolean {
    return /[A-Za-z0-9]/.test(char)
  }

  private sendPendingContent() {
    if (this.state !== ParserState.CONTENT || !this.pendingContent) return
    this.onContent?.(this.getCurrentHandlingTagData(), this.pendingContent)
    this.pendingContent = ""
  }

  private resetCurrentTagData(): void {
    this.currentTagName = ""
    this.currentAttrs = {}
    this.currentAttrName = ""
    this.currentAttrValue = ""
    this.attrQuoteChar = ""
  }

  private handleTextState(char: string) {
    if (char === "<") {
      // 开始一个新标签
      this.state = ParserState.TAG_OPEN
    }
    // 文本状态下，其他非代码块字符忽略处理
  }

  private handleTagOpenState(char: string) {
    if (char === "/") {
      // 这是一个结束标签 </tag>
      this.state = ParserState.CLOSING_TAG_OPEN
    } else if (this.isValidNameChar(char)) {
      // 开始收集标签名
      this.currentTagName = char
      this.state = ParserState.TAG_NAME
    } else {
      // 标签开始后应该是 标签名 或 /，否则是错误的语法
    }
  }

  private handleTagNameState(char: string) {
    if (this.isWhitespace(char)) {
      // 标签名后面有空白，准备解析属性
      this.state = ParserState.ATTR_NAME_START
    } else if (char === ">") {
      // 标签结束，没有属性
      this.handleOpenTag()
    } else if (char === "/") {
      // 可能是自闭合标签
      this.state = ParserState.SELF_CLOSING_START
    } else {
      // 继续收集标签名
      this.currentTagName += char
    }
  }

  private handleAttrNameStartState(char: string) {
    if (this.isValidNameChar(char)) {
      // 开始收集属性名
      this.currentAttrName = char
      this.state = ParserState.ATTR_NAME
    } else if (char === ">") {
      // 没有更多属性，标签结束
      this.handleOpenTag()
    } else if (char === "/") {
      // 自闭合标签
      this.state = ParserState.SELF_CLOSING_START
    }
    // 忽略多余的空白
  }

  private handleAttrNameState(char: string) {
    if (char === "=") {
      // 直接遇到=，属性名结束
      this.state = ParserState.ATTR_VALUE_START
    } else if (char === ">") {
      // 布尔属性，没有值
      this.currentAttrs[this.currentAttrName] = "true"
      this.handleOpenTag()
    } else if (char === "/") {
      // 自闭合标签前的布尔属性
      this.currentAttrs[this.currentAttrName] = ""
      this.state = ParserState.SELF_CLOSING_START
    } else {
      // 继续收集属性名
      this.currentAttrName += char
    }
  }

  private handleAttrValueStartState(char: string) {
    if (char === '"' || char === "'") {
      // 属性值开始
      this.attrQuoteChar = char
      this.currentAttrValue = ""
      this.state = ParserState.ATTR_VALUE
    }
    // 忽略=和引号之间的空白
  }

  private handleAttrValueState(char: string) {
    if (this.attrQuoteChar && char === this.attrQuoteChar) {
      // 引号闭合，属性值结束
      this.currentAttrs[this.currentAttrName] = this.currentAttrValue
      this.currentAttrName = ""
      this.currentAttrValue = ""
      this.state = ParserState.ATTR_NAME_START
    } else {
      // 继续收集属性值
      this.currentAttrValue += char
    }
  }

  private handleClosingTagOpenState(char: string) {
    if (this.isValidNameChar(char)) {
      // 开始收集结束标签名
      this.currentTagName = char
      this.state = ParserState.CLOSING_TAG_NAME
    }
  }

  private handleClosingTagNameState(char: string) {
    if (char === ">") {
      // 结束标签结束
      this.handleCloseTag()
      this.currentTagName = ""
    } else if (!this.isWhitespace(char)) {
      // 继续收集标签名
      this.currentTagName += char
    }
    // 忽略结束标签名和>之间的空白
  }

  private handleSelfClosingStartState(char: string): void {
    if (char === ">") {
      // 处理自闭合标签
      const tagData: TagData = {
        name: this.currentTagName,
        attrs: this.currentAttrs,
      }
      // 触发开始和结束标签回调
      this.onOpenTag?.(tagData)
      this.onCloseTag?.(tagData)
      this.resetCurrentTagData()
      this.state = ParserState.TEXT
    }
  }

  private handleContentState(char: string) {
    if (char === "<") {
      // 进入潜在闭合标签状态
      this.sendPendingContent()
      this.pendingClosingTag = "<" // 开始收集可能的结束标签
      this.potentialEndTagMatchPos = 1 // 已匹配到"<"，下一个应该是"/"
      this.state = ParserState.CONTENT_POTENTIAL_END // 切换状态
    } else {
      // 继续累积内容
      this.pendingContent += char
    }
  }

  private handleContentPotentialEndState(char: string) {
    const curTagData = this.getCurrentHandlingTagData()

    // 基于字符逐个匹配潜在的闭合标签
    const expectedEndTag = `</${curTagData.name}>` // 期望的结束标签

    // 检查当前字符是否匹配期望的字符
    if (char === expectedEndTag[this.potentialEndTagMatchPos]) {
      // 字符匹配，更新匹配位置
      this.pendingClosingTag += char
      this.potentialEndTagMatchPos++

      // 检查是否完全匹配了闭合标签
      if (this.potentialEndTagMatchPos === expectedEndTag.length) {
        // 完全匹配，重置状态并触发关闭标签
        this.onCloseTag?.(curTagData)
        this.resetCurrentTagData()

        // 从标签栈中移除
        if (
          this.tagStack.length > 0 &&
          this.tagStack[this.tagStack.length - 1].name === curTagData.name
        ) {
          this.tagStack.pop()
        }

        // 检查父标签是否是原始内容标签
        if (this.tagStack.length > 0) {
          const parentTag = this.tagStack[this.tagStack.length - 1]
          if (this.parseAsContentTags.includes(parentTag.name)) {
            this.state = ParserState.CONTENT
          } else {
            this.state = ParserState.TEXT
          }
        } else {
          this.state = ParserState.TEXT
        }

        // 重置匹配状态
        this.pendingClosingTag = ""
        this.potentialEndTagMatchPos = 0
      }
    } else {
      // 不匹配，回到 CONTENT 状态
      // 将已收集的 pendingClosingTag，以及当前字符 char 作为内容
      this.pendingContent += this.pendingClosingTag + char
      // 重置状态
      this.pendingClosingTag = ""
      this.potentialEndTagMatchPos = 0
      this.state = ParserState.CONTENT
    }
  }

  handleOpenTag() {
    const tagName = this.currentTagName
    const tagData: TagData = { name: tagName, attrs: this.currentAttrs }
    // 触发开始标签回调
    this.onOpenTag?.(tagData)
    // 添加到标签栈
    this.tagStack.push(tagData)
    // 重置当前标签相关数据
    this.currentTagName = ""
    this.currentAttrs = {}
    // 检查是否是原始内容标签
    if (this.parseAsContentTags.includes(tagName)) {
      this.state = ParserState.CONTENT // 开始收集 content 内容
    } else {
      this.state = ParserState.TEXT
    }
  }

  handleCloseTag() {
    const tagName = this.currentTagName
    // 触发结束标签回调
    this.onCloseTag?.({ name: tagName, attrs: this.currentAttrs })
    this.resetCurrentTagData()
    // 从标签栈中移除
    if (
      this.tagStack.length > 0 &&
      this.tagStack[this.tagStack.length - 1].name === tagName
    ) {
      this.tagStack.pop()
    }
    // 检查父标签是否是原始内容标签
    if (this.tagStack.length > 0) {
      const parentTag = this.tagStack[this.tagStack.length - 1]
      if (this.parseAsContentTags.includes(parentTag.name)) {
        this.state = ParserState.CONTENT
      }
    }
    if (this.state !== ParserState.CONTENT) {
      this.state = ParserState.TEXT
    }
  }
}

让我们使用计时器模拟 AI 生成流式内容，相信和预期的结果一致。

let currentFile: {
  name: string
  isEntryFile: boolean
  content: string
} | null = null

const parser = new StreamParser()

parser.onOpenTag = function ({ name, attrs }) {
  if (name === "ComponentFile") {
    const fileName = attrs.fileName as string
    const isEntryFile = attrs.isEntryFile === "true"
    // 定义组件结构
    currentFile = {
      name: fileName,
      isEntryFile,
      content: "",
    }
    console.log("onOpenTag", name, attrs)
  }
}

parser.onContent = function ({ name }, text) {
  if (name === "ComponentFile" && currentFile) {
    // 收集文件内容
    currentFile.content += text
    // console.log("onContent", name, text)
  }
}

parser.onCloseTag = function ({ name }) {
  if (name === "ComponentFile" && currentFile) {
    console.log("onCloseTag", name, currentFile)
    // TODO... 自定义处理文件逻辑
    currentFile = null
  }
}

const content = `
The generated components code is as follows:

<ComponentFile fileName="App.tsx" isEntryFile>
  import Button from "./Button";
  export const App = () => {
    return (
      <Button>按钮</Button>
    )
  }
</ComponentFile>
<ComponentFile fileName="Button.tsx">
  export const Button = ({ children }) => {
    return (
      <button>{children}</button>
    )
  }
</ComponentFile>

The content contains two components: App and Button
`

let index = 0
function typeWriter() {
  if (index < content.length) {
    const text = content.slice(index, index + 10)
    index += 10
    parser.write(text)
    setTimeout(typeWriter, 100)
  }
}
typeWriter()

文末

感谢阅读！文章内容你觉得有用，欢迎点赞支持一下~

普通视图