Llm Wiki

Karpathy 的 LLM Wiki — 构建并维护一个持久化、相互链接的 Markdown 知识库。摄取来源、查询已编译的知识，并进行一致性检查。

技能元数据


来源	捆绑（默认安装）
路径	`skills/research/llm-wiki`
版本	`2.1.0`
作者	Hermes Agent
许可证	MIT
标签	`wiki`, `knowledge-base`, `research`, `notes`, `markdown`, `rag-alternative`
相关技能	`obsidian`, `arxiv`

参考：完整 SKILL.md

信息

以下是 Hermes 在触发此技能时加载的完整技能定义。这是技能激活时代理所看到的指令。

Karpathy 的 LLM Wiki

以相互链接的 Markdown 文件形式构建并维护一个持久化、不断累积的知识库。基于 Andrej Karpathy 的 LLM Wiki 模式。

与传统的 RAG（每次查询都从头重新发现知识）不同，Wiki 一次性编译知识并保持其最新状态。交叉引用已经存在。矛盾之处已被标记。综合内容反映了所有摄取的信息。

分工： 人类负责策划来源和指导分析。代理负责总结、交叉引用、归档并保持一致性。

何时激活此技能

当用户执行以下操作时使用此技能：

要求创建、构建或启动 Wiki 或知识库
要求将来源摄取、添加或处理到其 Wiki 中
提出问题且配置的路径下存在现有 Wiki
要求对其 Wiki 进行 lint 检查、审计或健康检查
在研究背景下提及他们的 Wiki、知识库或“笔记”

Wiki 位置

位置： 通过 WIKI_PATH 环境变量设置（例如在 ~/.hermes/.env 中）。

如果未设置，默认为 ~/wiki。

WIKI="${WIKI_PATH:-$HOME/wiki}"

Wiki 只是一个 Markdown 文件目录 — 可在 Obsidian、VS Code 或任何编辑器中打开它。无需数据库，也无需特殊工具。

架构：三层结构

wiki/
├── SCHEMA.md           # Conventions, structure rules, domain config
├── index.md            # Sectioned content catalog with one-line summaries
├── log.md              # Chronological action log (append-only, rotated yearly)
├── raw/                # Layer 1: Immutable source material
│   ├── articles/       # Web articles, clippings
│   ├── papers/         # PDFs, arxiv papers
│   ├── transcripts/    # Meeting notes, interviews
│   └── assets/         # Images, diagrams referenced by sources
├── entities/           # Layer 2: Entity pages (people, orgs, products, models)
├── concepts/           # Layer 2: Concept/topic pages
├── comparisons/        # Layer 2: Side-by-side analyses
└── queries/            # Layer 2: Filed query results worth keeping

第一层 — 原始来源： 不可变。代理读取但从不修改这些文件。 第二层 — Wiki： 代理拥有的 Markdown 文件。由代理创建、更新和交叉引用。 第三层 — 模式： SCHEMA.md 定义结构、约定和标签分类法。

恢复现有 Wiki（关键 — 每次会话都必须执行此操作）

当用户拥有现有 Wiki 时，在执行任何操作之前务必先熟悉环境：

① 阅读 SCHEMA.md — 理解领域、约定和标签分类法。 ② 阅读 index.md — 了解存在哪些页面及其摘要。 ③ 扫描最近的 log.md — 阅读最后 20-30 条条目以了解最近的活动。

WIKI="${WIKI_PATH:-$HOME/wiki}"
# Orientation reads at session start
read_file "$WIKI/SCHEMA.md"
read_file "$WIKI/index.md"
read_file "$WIKI/log.md" offset=<last 30 lines>

只有在熟悉环境之后，才应进行摄取、查询或 lint 检查。这可以防止：

为已存在的实体创建重复页面
遗漏指向现有内容的交叉引用
与模式的约定相矛盾
重复已记录的工作

对于大型 Wiki（100+ 页面），在创建任何新内容之前，还应针对当前主题快速运行 search_files。

初始化新 Wiki

当用户要求创建或启动 Wiki 时：

确定 Wiki 路径（来自 $WIKI_PATH 环境变量，或询问用户；默认 ~/wiki）
创建上述目录结构
询问用户 Wiki 涵盖的领域 — 务必具体
编写针对该领域定制的 SCHEMA.md（参见下方模板）
编写带有分节标题的初始 index.md
编写带有创建条目的初始 log.md
确认 Wiki 已就绪并建议首批要摄取的来源

SCHEMA.md 模板

根据用户的领域进行调整。模式约束代理行为并确保一致性：

# Wiki Schema

## Domain
[What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"]

## Conventions
- File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`)
- Every wiki page starts with YAML frontmatter (see below)
- Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page)
- When updating a page, always bump the `updated` date
- Every new page must be added to `index.md` under the correct section
- Every action must be appended to `log.md`
- **Provenance markers:** On pages that synthesize 3+ sources, append `^[raw/articles/source-file.md]`
  at the end of paragraphs whose claims come from a specific source. This lets a reader trace each
  claim back without re-reading the whole raw file. Optional on single-source pages where the
  `sources:` frontmatter is enough.

## Frontmatter
  ```yaml
  ---
  title: 页面标题
  created: YYYY-MM-DD
  updated: YYYY-MM-DD
  type: entity | concept | comparison | query | summary
  tags: [来自下方的分类法]
  sources: [raw/articles/source-name.md]
  # 可选质量信号： {#optional-quality-signals}
  confidence: high | medium | low        # 主张的支持程度
  contested: true                        # 当页面存在未解决的矛盾时设置
  contradictions: [other-page-slug]      # 与此页面冲突的其他页面
  ---

confidence and contested are optional but recommended for opinion-heavy or fast-moving topics. Lint surfaces contested: true and confidence: low pages for review so weak claims don't silently harden into accepted wiki fact.

raw/ Frontmatter

Raw sources ALSO get a small frontmatter block so re-ingests can detect drift:

---
source_url: https://example.com/article   # 原始 URL（如果适用）
ingested: YYYY-MM-DD
sha256: &lt;frontmatter 下方原始内容的 hex 摘要>
---

The sha256: lets a future re-ingest of the same URL skip processing when content is unchanged, and flag drift when it has changed. Compute over the body only (everything after the closing ---), not the frontmatter itself.

Tag Taxonomy

[Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.]

Example for AI/ML:

Models: model, architecture, benchmark, training
People/Orgs: person, company, lab, open-source
Techniques: optimization, fine-tuning, inference, alignment, data
Meta: comparison, timeline, controversy, prediction

Rule: every tag on a page must appear in this taxonomy. If a new tag is needed, add it here first, then use it. This prevents tag sprawl.

Page Thresholds

Create a page when an entity/concept appears in 2+ sources OR is central to one source
Add to existing page when a source mentions something already covered
DON'T create a page for passing mentions, minor details, or things outside the domain
Split a page when it exceeds ~200 lines — break into sub-topics with cross-links
Archive a page when its content is fully superseded — move to _archive/, remove from index

Entity Pages

One page per notable entity. Include:

Overview / what it is
Key facts and dates
Relationships to other entities ([[wikilinks]])
Source references

Concept Pages

One page per concept or topic. Include:

Definition / explanation
Current state of knowledge
Open questions or debates
Related concepts ([[wikilinks]])

Comparison Pages

Side-by-side analyses. Include:

What is being compared and why
Dimensions of comparison (table format preferred)
Verdict or synthesis
Sources

Update Policy

When new information conflicts with existing content:

Check the dates — newer sources generally supersede older ones
If genuinely contradictory, note both positions with dates and sources
Mark the contradiction in frontmatter: contradictions: [page-name]
Flag for user review in the lint report

### index.md 模板 \{#indexmd-template}

索引按类型分节。每个条目占一行：维基链接 + 摘要。

```markdown
# Wiki Index

> Content catalog. Every wiki page listed under its type with a one-line summary.
> Read this first to find relevant pages for any query.
> Last updated: YYYY-MM-DD | Total pages: N

## Entities
<!-- Alphabetical within section -->

## Concepts

## Comparisons

## Queries

扩展规则： 当任何部分超过 50 个条目时，按首字母或子域将其拆分为子部分。当索引总条目数超过 200 时，创建一个 _meta/topic-map.md，按主题对页面进行分组以便更快导航。

log.md 模板

# Wiki Log

> Chronological record of all wiki actions. Append-only.
> Format: `## [YYYY-MM-DD] action | subject`
> Actions: ingest, update, query, lint, create, archive, delete
> When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh.

## [YYYY-MM-DD] create | Wiki initialized
- Domain: [domain]
- Structure created with SCHEMA.md, index.md, log.md

核心操作

1. 摄取

当用户提供来源（URL、文件、粘贴文本）时，将其整合到 wiki 中：

① 捕获原始来源：

URL → 使用 web_extract 获取 markdown，保存至 raw/articles/
PDF → 使用 web_extract（处理 PDF），保存至 raw/papers/
粘贴的文本 → 保存至适当的 raw/ 子目录
使用描述性文件名：raw/articles/karpathy-llm-wiki-2026.md
添加原始 frontmatter（source_url、ingested、正文的 sha256）。在重新摄入同一 URL 时：重新计算 sha256，与存储的值进行比较—— 如果相同则跳过，如果不同则标记漂移并更新。这种操作成本足够低，可以在每次重新摄入时执行，并能捕捉静默的来源变更。

② 与用户讨论要点 —— 什么有趣，什么对领域重要。（在自动化/cron 上下文中跳过此步骤 —— 直接继续。）

③ 检查现有内容 —— 搜索 index.md 并使用 search_files 查找提及的实体/概念的现有页面。这是区分不断增长的 wiki 和重复内容堆积的关键。

④ 编写或更新 wiki 页面：

新实体/概念： 仅当满足 SCHEMA.md 中的页面阈值（2+ 来源提及，或对某一来源至关重要）时才创建页面
现有页面： 添加新信息，更新事实，提升 updated 日期。当新信息与现有内容矛盾时，遵循更新策略（Update Policy）。
交叉引用： 每个新建或更新的页面必须通过 [[wikilinks]] 链接到至少 2 个其他页面。检查现有页面是否反向链接。
标签： 仅使用 SCHEMA.md 分类法中的标签
出处： 在综合了 3+ 来源的页面上，为那些主张可追溯至特定来源的段落附加 ^[raw/articles/source.md] 标记。
置信度： 对于观点性强、变化快或单一来源的主张，在 frontmatter 中设置 confidence: medium 或 low。除非主张在多个来源中得到充分支持，否则不要标记为 high。

⑤ 更新导航：

将新页面按字母顺序添加到 index.md 的正确部分下
更新 index 头部中的“总页数”（Total pages）计数和“最后更新”（Last updated）日期
追加到 log.md：## [YYYY-MM-DD] ingest | Source Title
在日志条目中列出每个创建或更新的文件

⑥ 报告变更内容 —— 向用户列出每个创建或更新的文件。

单个来源可能触发 5-15 个 wiki 页面的更新。这是正常且预期的现象 —— 这就是复合效应。

2. 查询

当用户询问关于 wiki 领域的问题时：

① 阅读 index.md 以识别相关页面。 ② 对于拥有 100+ 页面的 wiki，还需在所有 .md 文件中对关键术语执行 search_files —— 仅靠索引可能会遗漏相关内容。 ③ 使用 read_file 阅读相关页面。 ④ 从编译的知识中综合答案。引用你参考的 wiki 页面：“基于 [[page-a]] 和 [[page-b]]...” ⑤ 将有价值的答案归档 —— 如果答案是实质性的比较、深入探讨或新颖的综合，则在 queries/ 或 comparisons/ 中创建页面。不要归档琐碎的查找结果 —— 仅归档那些重新推导会很痛苦的答桉。 ⑥ 更新 log.md，记录查询内容以及是否已归档。

3. Lint（代码检查/健康检查）

当用户要求对 wiki 进行 lint、健康检查或审计时：

① 孤立页面： 查找没有其他页面通过入站 [[wikilinks]] 链接到的页面。

# Use execute_code for this — programmatic scan across all wiki pages
import os, re
from collections import defaultdict
wiki = "<WIKI_PATH>"
# Scan all .md files in entities/, concepts/, comparisons/, queries/
# Extract all [[wikilinks]] — build inbound link map
# Pages with zero inbound links are orphans

② 损坏的 wikilink： 查找指向不存在页面的 [[links]]。

③ 索引完整性： 每个 wiki 页面都应出现在 index.md 中。将文件系统与索引条目进行比较。

④ Frontmatter 验证： 每个 wiki 页面必须包含所有必需字段（title, created, updated, type, tags, sources）。标签必须在分类法中。

⑤ 过时内容： updated 日期比提及相同实体的最新来源早超过 90 天的页面。

⑥ 矛盾： 同一主题下具有冲突主张的页面。查找共享标签/实体但陈述不同事实的页面。展示所有带有 contested: true 或 contradictions: frontmatter 的页面以供用户审查。

⑦ 质量信号： 列出带有 confidence: low 的页面，以及任何仅引用单一来源但未设置 confidence 字段的页面 —— 这些是寻找佐证或降级为 confidence: medium 的候选对象。

⑧ 来源漂移： 对于 raw/ 中每个带有 sha256: frontmatter 的文件，重新计算哈希值并标记不匹配项。不匹配表明原始文件已被编辑（不应发生 —— raw/ 是不可变的）或从已发生变化的 URL 摄入。这不是硬性错误，但值得报告。

⑨ 页面大小： 标记超过 200 行的页面 —— 这些是拆分候选对象。

⑩ 标签审计： 列出所有正在使用的标签，标记任何不在 SCHEMA.md 分类法中的标签。

⑪ 日志轮转： 如果 log.md 超过 500 条条目，则对其进行轮转。

⑫ 报告发现，提供具体的文件路径和建议的操作，按严重程度分组（损坏链接 > 孤立页面 > 来源漂移 > 争议页面 > 过时内容 > 风格问题）。

⑬ 追加到 log.md： ## [YYYY-MM-DD] lint | N issues found

使用 Wiki

搜索

# Find pages by content
search_files "transformer" path="$WIKI" file_glob="*.md"

# Find pages by filename
search_files "*.md" target="files" path="$WIKI"

# Find pages by tag
search_files "tags:.*alignment" path="$WIKI" file_glob="*.md"

# Recent activity
read_file "$WIKI/log.md" offset=<last 20 lines>

批量摄入

当一次性摄入多个来源时，请批量处理更新：

首先读取所有来源
识别所有来源中的实体和概念
检查它们对应的现有页面（进行一次搜索，而非 N 次）
在一次遍历中创建/更新页面（避免冗余更新）
最后一次性更新 index.md
编写一条涵盖该批次的单一日志条目

Obsidian 集成

Wiki 目录开箱即用地作为 Obsidian 库工作：

[[wikilinks]] 渲染为可点击的链接
图谱视图（Graph View）可视化知识网络
YAML frontmatter 支持 Dataview 查询
raw/assets/ 文件夹存放通过 ![[image.png]] 引用的图片

为了获得最佳效果：

将 Obsidian 的附件文件夹设置为 raw/assets/
在 Obsidian 设置中启用“Wikilinks”（通常默认开启）
安装 Dataview 插件以支持诸如 TABLE tags FROM "entities" WHERE contains(tags, "company") 的查询

如果与本技能同时使用 Obsidian 技能，请将 OBSIDIAN_VAULT_PATH 设置为与 wiki 路径相同的目录。

无头模式 Obsidian（服务器和无界面机器）

在没有显示器的机器上，使用 obsidian-headless 代替桌面应用程序。它通过 Obsidian Sync 同步库，无需图形用户界面——非常适合在服务器上运行的代理写入 wiki，而 Obsidian 桌面端在另一台设备上读取。

设置：

# Requires Node.js 22+
npm install -g obsidian-headless

# Login (requires Obsidian account with Sync subscription)
ob login --email <email> --password '<password>'

# Create a remote vault for the wiki
ob sync-create-remote --name "LLM Wiki"

# Connect the wiki directory to the vault
cd ~/wiki
ob sync-setup --vault "<vault-id>"

# Initial sync
ob sync

# Continuous sync (foreground — use systemd for background)
ob sync --continuous

通过 systemd 进行持续后台同步：

# ~/.config/systemd/user/obsidian-wiki-sync.service
[Unit]
Description=Obsidian LLM Wiki Sync
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/path/to/ob sync --continuous
WorkingDirectory=/home/user/wiki
Restart=on-failure
RestartSec=10

[Install]
WantedBy=default.target

systemctl --user daemon-reload
systemctl --user enable --now obsidian-wiki-sync
# Enable linger so sync survives logout:
sudo loginctl enable-linger $USER

这使得代理可以在服务器上写入 ~/wiki，而你在笔记本电脑/手机上的 Obsidian 中浏览同一个库——更改会在几秒内显示。

常见陷阱

切勿修改 raw/ 中的文件——来源是不可变的。修正应放在 wiki 页面中。
始终先进行定位——在新会话中进行任何操作之前，先阅读 SCHEMA + index + 最近日志。跳过此步骤会导致重复和遗漏交叉引用。
始终更新 index.md 和 log.md——跳过此步骤会导致 wiki 退化。它们是导航的核心骨干。
不要为短暂提及创建页面——遵循 SCHEMA.md 中的页面阈值。仅在脚注中出现一次的名称不值得创建实体页面。
不要创建没有交叉引用的页面——孤立的页面是不可见的。每个页面必须至少链接到其他 2 个页面。
Frontmatter 是必需的——它支持搜索、过滤和陈旧性检测。
标签必须来自分类法——自由形式的标签会退化为噪声。先在 SCHEMA.md 中添加新标签，然后再使用它们。
保持页面可扫描——wiki 页面应在 30 秒内可读。超过 200 行的页面应拆分。将详细分析移至专用的深入探讨页面。
大规模更新前需询问——如果一次摄入会影响 10+ 个现有页面，请先与用户确认范围。
轮换日志——当 log.md 超过 500 条条目时，将其重命名为 log-YYYY.md 并重新开始。代理应在 lint 过程中检查日志大小。
明确处理矛盾——不要静默覆盖。注明带有日期的两种主张，在 frontmatter 中标记，并标记供用户审查。

llm-wiki-compiler 是一个 Node.js CLI，它将来源编译成具有相同 Karpathy 灵感的概念 wiki。它与 Obsidian 兼容，因此希望使用定时/CLI 驱动编译管道的用户可以将其指向本技能维护的同一个库。权衡之处：它掌控页面生成（取代代理在页面创建上的判断），并且针对小型语料库进行了优化。当你希望代理介入策展时使用本技能；当你希望批量编译源目录时使用 llmwiki。

技能元数据​

参考：完整 SKILL.md​

Karpathy 的 LLM Wiki

何时激活此技能​

Wiki 位置​

架构：三层结构​

恢复现有 Wiki（关键 — 每次会话都必须执行此操作）​

初始化新 Wiki​

SCHEMA.md 模板​

raw/ Frontmatter​

Tag Taxonomy​

Page Thresholds​

Entity Pages​

Concept Pages​

Comparison Pages​

Update Policy​

log.md 模板​

核心操作​

1. 摄取​

2. 查询​

3. Lint（代码检查/健康检查）​

使用 Wiki​

搜索​

批量摄入​

归档​

Obsidian 集成​

无头模式 Obsidian（服务器和无界面机器）​

常见陷阱​

相关工具​