读 Claude Code 源码 - Read tool 与 @ 提及附件机制

用 FileReadTool 统一读文件. 如果用户使用 @ 提及文件或目录, 则先把它们变成 attachment 再在发给模型前重建成一组“伪造的上下文消息”.

`Read`

Schema

输入:

z.strictObject({
  file_path: z.string().describe('The absolute path to the file to read'),
  offset: z.number().int().nonnegative().optional(),
  limit: z.number().int().positive().optional(),
  pages: z.string().optional(),
})

输出:

z.discriminatedUnion('type', [
  z.object({ type: z.literal('text'), ... }),
  z.object({ type: z.literal('image'), ... }),
  z.object({ type: z.literal('notebook'), ... }),
  z.object({ type: z.literal('pdf'), ... }),
  z.object({ type: z.literal('parts'), ... }),
  z.object({ type: z.literal('file_unchanged'), ... }),
])

这里 file_unchanged 是 Read 为了省上下文 token 做的 dedup.

Prompt

统一的文件查看入口, 文本 / 图片 / PDF / notebook 都走它.

Usage:
- The file_path parameter must be an absolute path, not a relative path
- By default, it reads up to ${MAX_LINES_TO_READ} lines starting from the beginning of the file${maxSizeInstruction}
${offsetInstruction}
${lineFormat}
- This tool allows Claude Code to read images (eg PNG, JPG, etc). When reading an image file the contents are presented visually as Claude Code is a multimodal LLM.
- This tool can read PDF files (.pdf). For large PDFs (more than 10 pages), you MUST provide the pages parameter to read specific page ranges (e.g., pages: "1-5"). Reading a large PDF without the pages parameter will fail. Maximum 20 pages per request.
- This tool can read Jupyter notebooks (.ipynb files) and returns all cells with their outputs, combining code, text, and visualizations.
- This tool can only read files, not directories. To read a directory, use an ls command via the ${BASH_TOOL_NAME} tool.
- You will regularly be asked to read screenshots. If the user provides a path to a screenshot, ALWAYS use this tool to view the file at the path. This tool will work with all temporary file paths.
- If you read a file that exists but has empty contents you will receive a system reminder warning in place of file contents.

默认最多直接读 2000 行:

export const MAX_LINES_TO_READ = 2000

系统 prompt 会反复提醒模型: 读文件不要走 shell, 直接走 Read.

`To read files use ${FILE_READ_TOOL_NAME} instead of cat, head, tail, or sed`,

Bash prompt:

`Read files: Use ${FILE_READ_TOOL_NAME} (NOT cat/head/tail)`,

PowerShell prompt:

- Read files: Use ${FILE_READ_TOOL_NAME} (NOT Get-Content)

输入校验

先校验 PDF 页码范围:

// Validate pages parameter (pure string parsing, no I/O)

然后先展开路径, 再查 deny rule:

// Path expansion + deny rule check (no I/O)

Windows 下还专门有 UNC 路径保护:

// SECURITY: UNC path check (no I/O) — defer filesystem operations
// until after user grants permission to prevent NTLM credential leaks

也就是像 \\server\share\... 这种路径, 在用户授权前不要急着做文件系统操作.

再后面是二进制文件拦截, 但 PDF / 图片例外:

// Binary extension check (string check on extension only, no I/O).
// PDF, images, and SVG are excluded - this tool renders them natively.

最后还会挡住一些危险设备文件:

// Block specific device files that would hang (infinite output or blocking input).
// This is a path-based check with no I/O — safe special files like /dev/null are allowed.

这里能看出思路: 先做 cheap check 和安全检查, 尽量晚一点再真的碰磁盘.

首先 dedup

读过的就不要重复度, 省 prompt token.

// Dedup: if we've already read this exact range and the file hasn't
// changed on disk, return a stub instead of re-sending the full content.
// The earlier Read tool_result is still in context — two full copies
// waste cache_creation tokens on every subsequent turn. BQ proxy shows
// ~18% of Read calls are same-file collisions (up to 2.64% of fleet
// cache_creation). Only applies to text/notebook reads — images/PDFs
// aren't cached in readFileState so won't match here.

也就是说, Claude Code 会记住之前读过的:

哪个文件
什么 offset / limit
当时的时间戳

如果这次还是同一个文件同一个范围, 而且磁盘 mtime 没变, 那就不把内容重新塞进上下文, 而是直接返回:

type: 'file_unchanged'

stub 文案是:

'File unchanged since last read. The content from the earlier Read tool_result in this conversation is still current — refer to that instead of re-reading.'

按文件类型分流

notebook

.ipynb 走专门分支, 把 cell 解析出来. 如果 notebook 太大, 直接报错, 并建议只取一部分:

`Notebook content (${formatFileSize(cellsJsonBytes)}) exceeds maximum allowed size (${formatFileSize(maxSizeBytes)}). ` +
  `Use ${BASH_TOOL_NAME} with jq to read specific portions:\n` +
  `  cat "${file_path}" | jq '.cells[:20]' # First 20 cells\n` +
  `  cat "${file_path}" | jq '.cells[100:120]' # Cells 100-120\n` +
  `  cat "${file_path}" | jq '.cells | length' # Count total cells\n` +
  `  cat "${file_path}" | jq '.cells[] | select(.cell_type=="code") | .source' # All code sources`

image

就是用 LLM 原生多模态能力, 放个 content type image 的 base64.

/**
 * Reads an image file and applies token-based compression if needed.
 * Reads the file ONCE, then applies standard resize. If the result exceeds
 * the token limit, applies aggressive compression from the same buffer.
 */

// Read file ONCE — capped to maxBytes to avoid OOM on huge files

PDF

PDF 有两条路.

如果用户传了 pages, 就抽指定页, 转成图片:

// Extracted page images are read and sent as image blocks in mapToolResultToAPIMessage

如果没传 pages, 而页数太多, 会直接报错:

`This PDF has ${pageCount} pages, which is too many to read at once. ` +
  `Use the pages parameter to read specific page ranges (e.g., pages: "1-5"). ` +
  `Maximum ${PDF_MAX_PAGES_PER_READ} pages per request.`

不分页用的是 Anthropic 服务端支持的上传 API.

text

两层限制: 字节上限 + token 上限

/**
 * Read tool output limits.  Two caps apply to text reads:
 *
 *   | limit         | default | checks                    | cost          | on overflow     |
 *   |---------------|---------|---------------------------|---------------|-----------------|
 *   | maxSizeBytes  | 256 KB  | TOTAL FILE SIZE (not out) | 1 stat        | throws pre-read |
 *   | maxTokens     | 25000   | actual output tokens      | API roundtrip | throws post-read|
 *
 * Known mismatch: maxSizeBytes gates on total file size, not the slice.
 * Tested truncating instead of throwing for explicit-limit reads that
 * exceed the byte cap (#21841, Mar 2026).  Reverted: tool error rate
 * dropped but mean tokens rose — the throw path yields a ~100-byte error
 * tool-result while truncation yields ~25K tokens of content at the cap.
 */

export const DEFAULT_MAX_OUTPUT_TOKENS = 25000

实际读取:

小文件直接整块读进内存再 split, 更快
大文件 / 管道 / 设备文件走 stream, 只保留需要的那几行

// readFileInRange — line-oriented file reader with two code paths
//
// Returns lines [offset, offset + maxLines) from a file.
//
// Fast path (regular files < 10 MB):
//   Opens the file, stats the fd, reads the whole file with readFile(),
//   then splits lines in memory.  This avoids the per-chunk async overhead
//   of createReadStream and is ~2x faster for typical source files.
//
// Streaming path (large files, pipes, devices, etc.):
//   Uses createReadStream with manual indexOf('\n') scanning.  Content is
//   only accumulated for lines inside the requested range — lines outside
//   the range are counted (for totalLines) but discarded, so reading line
//   1 of a 100 GB file won't balloon RSS.

阈值:

const FAST_PATH_MAX_SIZE = 10 * 1024 * 1024 // 10 MB

streaming path 里还专门处理了“大文件单行特别长”这种情况:

// Only keep the trailing fragment when inside the selected range.
// Outside the range we just count newlines — discarding prevents
// unbounded memory growth on huge single-line files.

`Read` 会把状态记下来, 供后续编辑使用

读完文本或 notebook 之后, 会写入 readFileState:

readFileState.set(fullFilePath, {
  content,
  timestamp: Math.floor(mtimeMs),
  offset,
  limit,
})

cache 结构:

export type FileState = {
  content: string
  timestamp: number
  offset: number | undefined
  limit: number | undefined
  isPartialView?: boolean
}

isPartialView 的注释也很关键:

// True when this entry was populated by auto-injection (e.g. CLAUDE.md) and
// the injected content did not match disk (stripped HTML comments, stripped
// frontmatter, truncated MEMORY.md). The model has only seen a partial view;
// Edit/Write must require an explicit Read first.

也就是说, 自动注入的内容不一定算真正“读过文件”.

这个 cache 还是个带大小限制的 LRU:

export const READ_FILE_STATE_CACHE_SIZE = 100

const DEFAULT_MAX_CACHE_SIZE_BYTES = 25 * 1024 * 1024

所以 `Read` 还是 `Edit/Write` 的前置条件

Edit prompt:

- You must use your `Read` tool at least once in the conversation before editing. This tool will error if you attempt an edit without reading the file.

Write prompt:

- If this is an existing file, you MUST use the Read tool first to read the file's contents. This tool will fail if you did not read the file first.

实现里真会检查.

FileEditTool:

const readTimestamp = toolUseContext.readFileState.get(fullFilePath)
if (!readTimestamp || readTimestamp.isPartialView) {
  return {
    result: false,
    behavior: 'ask',
    message:
      'File has not been read yet. Read it first before writing to it.',
    ...
  }
}

以及:

'File has been modified since read, either by the user or by a linter. Read it again before attempting to write it.'

FileWriteTool 也是同样逻辑.

`@` 提及

第二部分来看 @文件 / @目录 / @pdf / @图片.

先从用户输入里通过正则抽出 @...
转成 attachment
再在发 API 之前, 重建成模型能理解的上下文消息

先从输入里提取 `@...`

提取函数:

export function extractAtMentionedFiles(content: string): string[] {
  // Extract filenames mentioned with @ symbol, including line range syntax: @file.txt#L10-20
  // Also supports quoted paths for files with spaces: @"my/file with spaces.txt"

正则有两套:

const quotedAtMentionRegex = /(^|\s)@"([^"]+)"/g
const regularAtMentionRegex = /(^|\s)@([^\s]+)\b/g

所以这些都支持:

@foo.ts
@"my dir/foo bar.ts"
@foo.ts#L10-20

行号解析函数:

// Parse mentions like "file.txt#L10-20", "file.txt#heading", or just "file.txt"
// Supports line ranges (#L10, #L10-20) and strips non-line-range fragments (#heading)

对应实现:

const match = mention.match(/^([^#]+)(?:#L(\d+)(?:-(\d+))?)?(?:#[^#]*)?$/)

也就是:

#L10
#L10-20

会变成具体的 offset/limit, 而类似 #heading 这种 fragment 会被去掉.

`@目录` 的处理: 伪造一条 `ls`

相当于自动 ls 一次.

如果发现是目录:

if (stats.isDirectory()) {
  const entries = await readdir(absoluteFilename, { withFileTypes: true })
  const MAX_DIR_ENTRIES = 1000
  ...
  return {
    type: 'directory' as const,
    path: absoluteFilename,
    content: stdout,
    displayPath: relative(getCwd(), absoluteFilename),
  }
}

后面 attachment 序列化时, 这类 directory 会被重建成:

createToolUseMessage(BashTool.name, {
  command: `ls ${quote([attachment.path])}`,
  description: `Lists files in ${attachment.path}`,
}),
createToolResultMessage(BashTool, {
  stdout: attachment.content,
  stderr: '',
  interrupted: false,
}),

`@文件` 的处理: 走 `generateFileAttachment`

普通文件分支会调用:

generateFileAttachment(
  absoluteFilename,
  toolUseContext,
  'tengu_at_mention_extracting_filename_success',
  'tengu_at_mention_extracting_filename_error',
  'at-mention',
  {
    offset: lineStart,
    limit: lineEnd && lineStart ? lineEnd - lineStart + 1 : undefined,
  },
)

generateFileAttachment() 的注释写得很清楚:

/**
 * Generates a file attachment by reading a file with proper validat  ion and truncation.
 * This is the core file reading logic shared between @-mentioned files and post-compact restoration.
 */

也就是说:

@提及文件
compact 之后重新恢复文件上下文

这两件事底层复用的是同一套逻辑.

`@文件` 也会做大文件处理和截断

generateFileAttachment() 不是无脑把整个文件塞进去.

先看大文件检查:

// Check file size before attempting to read (skip for PDFs — they have their own size/page handling below)

非 PDF 的超大文件, 在 at-mention 模式下可能直接跳过.

如果真正调用 FileReadTool.call 时命中了这两类错误:

error instanceof MaxFileReadTokenExceededError ||
error instanceof FileTooLargeError

就退化成只读前 MAX_LINES_TO_READ 行:

// Read only the first MAX_LINES_TO_READ lines for files that are too large

然后还会补一条只给模型看的 meta 提示:

`Note: The file ${attachment.filename} was too large and has been truncated to the first ${MAX_LINES_TO_READ} lines. Don't tell the user about this truncation. Use ${FileReadTool.name} to read more of the file if you need.`

`@文件` 如果已经读过, 可能什么都不再发

和上面说的 dedup 一样, 先查 readFileState, 读过就不用再读.

`@图片`: 最终还是 image block

@图片路径 也走 generateFileAttachment -> FileReadTool.call.

如果结果是 image, attachment 在 messages.ts 里会被重建成:

createToolUseMessage(FileReadTool.name, {
  file_path: attachment.filename,
}),
createToolResultMessage(FileReadTool, fileContent),

而 createToolResultMessage() 对图片有特殊分支:

// If the result contains image content blocks, preserve them as is
if (
  Array.isArray(result.content) &&
  result.content.some(block => block.type === 'image')
) {
  return createUserMessage({
    content: result.content as ContentBlockParam[],
    isMeta: true,
  })
}

所以 @图片 最终进入模型上下文时, 仍然是真正的 image content block, 不是一段文本描述.

`@PDF`: 大 PDF 给 reference, 小 PDF 走 file attachment

先看大 PDF.

generateFileAttachment() 里先调:

tryGetPDFReference(filename)

注释:

/**
 * Check if a PDF file should be represented as a lightweight reference
 * instead of being inlined. Returns a PDFReferenceAttachment for large PDFs
 * (more than PDF_AT_MENTION_INLINE_THRESHOLD pages), or null otherwise.
 */

如果页数太多, 就直接返回:

{
  type: 'pdf_reference',
  filename,
  pageCount: effectivePageCount,
  fileSize: stats.size,
  displayPath: relative(getCwd(), filename),
}

后面在 messages.ts 里变成一段 meta 提示:

`PDF file: ${attachment.filename} (${attachment.pageCount} pages, ${formatFileSize(attachment.fileSize)}). ` +
`This PDF is too large to read all at once. You MUST use the ${FILE_READ_TOOL_NAME} tool with the pages parameter ` +
`to read specific page ranges (e.g., pages: "1-5"). Do NOT call ${FILE_READ_TOOL_NAME} without the pages parameter ` +
`or it will fail. Start by reading the first few pages to understand the structure, then read more as needed. ` +
`Maximum 20 pages per request.`

也就是说, 大 PDF 的 @提及 不会直接 inline 内容, 而是只给一个 reference, 然后强制模型后面自己再分页调用 Read.

小 PDF 则会继续走普通的 generateFileAttachment -> FileReadTool.call.

不过这里有一个实现细节值得单独说一下:

直接调用 Read pdf 时, FileReadTool.callInner() 会通过 newMessages 再补一个 document block
但 attachment 重建路径里, messages.ts 这里看到的是“重建 tool_use + tool_result”
注释写的是:

// PDFs are handled via supplementalContent in the tool result

所以从设计意图上看, attachment 路径也想保留这种补充内容. 但如果只盯 messages.ts 这一层, 它不像直接工具执行那样显式地构造 document block, 这里的实际拼接路径比 Read 直接执行那条链更绕一点.

这块如果写文章, 我觉得可以保守表述:

直接 Read PDF 时, 源码明确会额外发 document block
@大PDF 明确只发 pdf_reference
@小PDF 会复用 file attachment 逻辑, 设计目标是让模型拿到和 Read 相近的上下文, 但具体“补 document block”的实现路径比分页图片那条更隐蔽

attachment 不是直接拼到用户文本后面, 发 API 前还会重排

processTextPrompt() 里, 初始结果是:

return {
  messages: [userMessage, ...attachmentMessages],
  shouldQuery: true,
}

也就是用户原始 prompt 和 attachment 是分开的 message.

后面真正发 API 前, 还会做一次重排:

/**
 * Reorders messages so that attachments bubble up until they hit either:
 * - A tool call result (user message with tool_result content)
 * - Any assistant message
 */

所以 @ 提及机制的本质是:

从文本里提取 @path
变成 attachment
attachment 再被重建成模型上下文
发 API 前还会按边界重新冒泡排序

不是简单的字符串替换.

bridge / web 上传附件最终也会降级成 `@path`

这一点也很有意思.

如果是 web composer 上传文件, bridge/inboundAttachments.ts 里的注释写得很直接:

/**
 * Resolve file_uuid attachments on inbound bridge user messages.
 *
 * Web composer uploads via cookie-authed /api/{org}/upload, sends file_uuid
 * alongside the message. Here we fetch each via GET /api/oauth/files/{uuid}/content
 * (oauth-authed, same store), write to ~/.claude/uploads/{sessionId}/, and
 * return @path refs to prepend. Claude's Read tool takes it from there.
 */

也就是说, 远端上传附件最终并没有走一套独立的“云附件输入协议”, 而是:

先下载到本地 ~/.claude/uploads/{sessionId}/
再 prepend 成 @"绝对路径"
最后仍然复用同一套 @提及 -> attachment -> Read 链路

这个设计挺整齐的: 上传文件、手动 @文件、IDE 提及文件, 最后都尽量汇到同一个入口.

小结

如果只从表面看:

Read 像一个“带行号的 cat”
@文件 像一个“路径补全”

但源码里它们其实都是上下文管理机制的一部分.

Read 这一层做的事情包括:

统一文本 / 图片 / PDF / notebook 读取入口
安全预检查
字节和 token 双层预算
dedup, 避免重复把同样内容塞回上下文
按文件类型分流序列化
把读过的状态写进 readFileState, 给后续 Edit/Write 做一致性检查

@ 这一层做的事情包括:

解析路径、引号路径、行号范围
区分目录 / 普通文件 / PDF / 图片
把内容包装成 attachment
再重建成模型可消费的 tool-like 上下文
对大 PDF 不 inline, 强制后续分页 Read
对已经在上下文里的文件直接跳过, 不重复发送

所以更准确地说:

Read 是 Claude Code 里“文件内容进入模型”的标准入口
@提及 是 Claude Code 里“把用户显式指定的文件上下文自动接到对话里”的附件机制

参考源码

src/tools/FileReadTool/prompt.ts
src/tools/FileReadTool/limits.ts
src/tools/FileReadTool/FileReadTool.ts
src/utils/readFileInRange.ts
src/utils/fileStateCache.ts
src/utils/attachments.ts
src/utils/messages.ts
src/utils/processUserInput/processTextPrompt.ts
src/bridge/inboundAttachments.ts

Read

Schema

Prompt

输入校验

首先 dedup

按文件类型分流

notebook

image

PDF

text

Read 会把状态记下来, 供后续编辑使用

所以 Read 还是 Edit/Write 的前置条件

@ 提及

先从输入里提取 @...

@目录 的处理: 伪造一条 ls

@文件 的处理: 走 generateFileAttachment

@文件 也会做大文件处理和截断

@文件 如果已经读过, 可能什么都不再发

@图片: 最终还是 image block

@PDF: 大 PDF 给 reference, 小 PDF 走 file attachment