Llm Wiki

Karpathy 的 LLM Wiki — 構建並維護一個持久化、相互鏈接的 Markdown 知識庫。攝取來源、查詢已編譯的知識，並進行一致性檢查。

技能元數據


來源	捆綁（默認安裝）
路徑	`skills/research/llm-wiki`
版本	`2.1.0`
作者	Hermes Agent
許可證	MIT
標籤	`wiki`, `knowledge-base`, `research`, `notes`, `markdown`, `rag-alternative`
相關技能	`obsidian`, `arxiv`

參考：完整 SKILL.md

信息

以下是 Hermes 在觸發此技能時加載的完整技能定義。這是技能激活時代理所看到的指令。

Karpathy 的 LLM Wiki

以相互鏈接的 Markdown 文件形式構建並維護一個持久化、不斷累積的知識庫。基於 Andrej Karpathy 的 LLM Wiki 模式。

與傳統的 RAG（每次查詢都從頭重新發現知識）不同，Wiki 一次性編譯知識並保持其最新狀態。交叉引用已經存在。矛盾之處已被標記。綜合內容反映了所有攝取的信息。

分工： 人類負責策劃來源和指導分析。代理負責總結、交叉引用、歸檔並保持一致性。

何時激活此技能

當用戶執行以下操作時使用此技能：

要求創建、構建或啟動 Wiki 或知識庫
要求將來源攝取、添加或處理到其 Wiki 中
提出問題且配置的路徑下存在現有 Wiki
要求對其 Wiki 進行 lint 檢查、審計或健康檢查
在研究背景下提及他們的 Wiki、知識庫或“筆記”

Wiki 位置

位置： 通過 WIKI_PATH 環境變量設置（例如在 ~/.hermes/.env 中）。

如果未設置，默認為 ~/wiki。

WIKI="${WIKI_PATH:-$HOME/wiki}"

Wiki 只是一個 Markdown 文件目錄 — 可在 Obsidian、VS Code 或任何編輯器中打開它。無需數據庫，也無需特殊工具。

架構：三層結構

wiki/
├── SCHEMA.md           # Conventions, structure rules, domain config
├── index.md            # Sectioned content catalog with one-line summaries
├── log.md              # Chronological action log (append-only, rotated yearly)
├── raw/                # Layer 1: Immutable source material
│   ├── articles/       # Web articles, clippings
│   ├── papers/         # PDFs, arxiv papers
│   ├── transcripts/    # Meeting notes, interviews
│   └── assets/         # Images, diagrams referenced by sources
├── entities/           # Layer 2: Entity pages (people, orgs, products, models)
├── concepts/           # Layer 2: Concept/topic pages
├── comparisons/        # Layer 2: Side-by-side analyses
└── queries/            # Layer 2: Filed query results worth keeping

第一層 — 原始來源： 不可變。代理讀取但從不修改這些文件。 第二層 — Wiki： 代理擁有的 Markdown 文件。由代理創建、更新和交叉引用。 第三層 — 模式： SCHEMA.md 定義結構、約定和標籤分類法。

恢復現有 Wiki（關鍵 — 每次會話都必須執行此操作）

當用戶擁有現有 Wiki 時，在執行任何操作之前務必先熟悉環境：

① 閱讀 SCHEMA.md — 理解領域、約定和標籤分類法。 ② 閱讀 index.md — 瞭解存在哪些頁面及其摘要。 ③ 掃描最近的 log.md — 閱讀最後 20-30 條條目以瞭解最近的活動。

WIKI="${WIKI_PATH:-$HOME/wiki}"
# Orientation reads at session start
read_file "$WIKI/SCHEMA.md"
read_file "$WIKI/index.md"
read_file "$WIKI/log.md" offset=<last 30 lines>

只有在熟悉環境之後，才應進行攝取、查詢或 lint 檢查。這可以防止：

為已存在的實體創建重複頁面
遺漏指向現有內容的交叉引用
與模式的約定相矛盾
重複已記錄的工作

對於大型 Wiki（100+ 頁面），在創建任何新內容之前，還應針對當前主題快速運行 search_files。

初始化新 Wiki

當用戶要求創建或啟動 Wiki 時：

確定 Wiki 路徑（來自 $WIKI_PATH 環境變量，或詢問用戶；默認 ~/wiki）
創建上述目錄結構
詢問用戶 Wiki 涵蓋的領域 — 務必具體
編寫針對該領域定製的 SCHEMA.md（參見下方模板）
編寫帶有分節標題的初始 index.md
編寫帶有創建條目的初始 log.md
確認 Wiki 已就緒並建議首批要攝取的來源

SCHEMA.md 模板

根據用戶的領域進行調整。模式約束代理行為並確保一致性：

# Wiki Schema

## Domain
[What this wiki covers — e.g., "AI/ML research", "personal health", "startup intelligence"]

## Conventions
- File names: lowercase, hyphens, no spaces (e.g., `transformer-architecture.md`)
- Every wiki page starts with YAML frontmatter (see below)
- Use `[[wikilinks]]` to link between pages (minimum 2 outbound links per page)
- When updating a page, always bump the `updated` date
- Every new page must be added to `index.md` under the correct section
- Every action must be appended to `log.md`
- **Provenance markers:** On pages that synthesize 3+ sources, append `^[raw/articles/source-file.md]`
  at the end of paragraphs whose claims come from a specific source. This lets a reader trace each
  claim back without re-reading the whole raw file. Optional on single-source pages where the
  `sources:` frontmatter is enough.

## Frontmatter
  ```yaml
  ---
  title: 頁面標題
  created: YYYY-MM-DD
  updated: YYYY-MM-DD
  type: entity | concept | comparison | query | summary
  tags: [來自下方的分類法]
  sources: [raw/articles/source-name.md]
  # 可選質量信號： {#optional-quality-signals}
  confidence: high | medium | low        # 主張的支持程度
  contested: true                        # 當頁面存在未解決的矛盾時設置
  contradictions: [other-page-slug]      # 與此頁面衝突的其他頁面
  ---

confidence and contested are optional but recommended for opinion-heavy or fast-moving topics. Lint surfaces contested: true and confidence: low pages for review so weak claims don't silently harden into accepted wiki fact.

raw/ Frontmatter

Raw sources ALSO get a small frontmatter block so re-ingests can detect drift:

---
source_url: https://example.com/article   # 原始 URL（如果適用）
ingested: YYYY-MM-DD
sha256: &lt;frontmatter 下方原始內容的 hex 摘要>
---

The sha256: lets a future re-ingest of the same URL skip processing when content is unchanged, and flag drift when it has changed. Compute over the body only (everything after the closing ---), not the frontmatter itself.

Tag Taxonomy

[Define 10-20 top-level tags for the domain. Add new tags here BEFORE using them.]

Example for AI/ML:

Models: model, architecture, benchmark, training
People/Orgs: person, company, lab, open-source
Techniques: optimization, fine-tuning, inference, alignment, data
Meta: comparison, timeline, controversy, prediction

Rule: every tag on a page must appear in this taxonomy. If a new tag is needed, add it here first, then use it. This prevents tag sprawl.

Page Thresholds

Create a page when an entity/concept appears in 2+ sources OR is central to one source
Add to existing page when a source mentions something already covered
DON'T create a page for passing mentions, minor details, or things outside the domain
Split a page when it exceeds ~200 lines — break into sub-topics with cross-links
Archive a page when its content is fully superseded — move to _archive/, remove from index

Entity Pages

One page per notable entity. Include:

Overview / what it is
Key facts and dates
Relationships to other entities ([[wikilinks]])
Source references

Concept Pages

One page per concept or topic. Include:

Definition / explanation
Current state of knowledge
Open questions or debates
Related concepts ([[wikilinks]])

Comparison Pages

Side-by-side analyses. Include:

What is being compared and why
Dimensions of comparison (table format preferred)
Verdict or synthesis
Sources

Update Policy

When new information conflicts with existing content:

Check the dates — newer sources generally supersede older ones
If genuinely contradictory, note both positions with dates and sources
Mark the contradiction in frontmatter: contradictions: [page-name]
Flag for user review in the lint report

### index.md 模板 \{#indexmd-template}

索引按類型分節。每個條目佔一行：維基鏈接 + 摘要。

```markdown
# Wiki Index

> Content catalog. Every wiki page listed under its type with a one-line summary.
> Read this first to find relevant pages for any query.
> Last updated: YYYY-MM-DD | Total pages: N

## Entities
<!-- Alphabetical within section -->

## Concepts

## Comparisons

## Queries

擴展規則： 當任何部分超過 50 個條目時，按首字母或子域將其拆分為子部分。當索引總條目數超過 200 時，創建一個 _meta/topic-map.md，按主題對頁面進行分組以便更快導航。

log.md 模板

# Wiki Log

> Chronological record of all wiki actions. Append-only.
> Format: `## [YYYY-MM-DD] action | subject`
> Actions: ingest, update, query, lint, create, archive, delete
> When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh.

## [YYYY-MM-DD] create | Wiki initialized
- Domain: [domain]
- Structure created with SCHEMA.md, index.md, log.md

核心操作

1. 攝取

當用戶提供來源（URL、文件、粘貼文本）時，將其整合到 wiki 中：

① 捕獲原始來源：

URL → 使用 web_extract 獲取 markdown，保存至 raw/articles/
PDF → 使用 web_extract（處理 PDF），保存至 raw/papers/
粘貼的文本 → 保存至適當的 raw/ 子目錄
使用描述性文件名：raw/articles/karpathy-llm-wiki-2026.md
添加原始 frontmatter（source_url、ingested、正文的 sha256）。在重新攝入同一 URL 時：重新計算 sha256，與存儲的值進行比較—— 如果相同則跳過，如果不同則標記漂移並更新。這種操作成本足夠低，可以在每次重新攝入時執行，並能捕捉靜默的來源變更。

② 與用戶討論要點 —— 什麼有趣，什麼對領域重要。（在自動化/cron 上下文中跳過此步驟 —— 直接繼續。）

③ 檢查現有內容 —— 搜索 index.md 並使用 search_files 查找提及的實體/概念的現有頁面。這是區分不斷增長的 wiki 和重複內容堆積的關鍵。

④ 編寫或更新 wiki 頁面：

新實體/概念： 僅當滿足 SCHEMA.md 中的頁面閾值（2+ 來源提及，或對某一來源至關重要）時才創建頁面
現有頁面： 添加新信息，更新事實，提升 updated 日期。當新信息與現有內容矛盾時，遵循更新策略（Update Policy）。
交叉引用： 每個新建或更新的頁面必須通過 [[wikilinks]] 鏈接到至少 2 個其他頁面。檢查現有頁面是否反向鏈接。
標籤： 僅使用 SCHEMA.md 分類法中的標籤
出處： 在綜合了 3+ 來源的頁面上，為那些主張可追溯至特定來源的段落附加 ^[raw/articles/source.md] 標記。
置信度： 對於觀點性強、變化快或單一來源的主張，在 frontmatter 中設置 confidence: medium 或 low。除非主張在多個來源中得到充分支持，否則不要標記為 high。

⑤ 更新導航：

將新頁面按字母順序添加到 index.md 的正確部分下
更新 index 頭部中的“總頁數”（Total pages）計數和“最後更新”（Last updated）日期
追加到 log.md：## [YYYY-MM-DD] ingest | Source Title
在日誌條目中列出每個創建或更新的文件

⑥ 報告變更內容 —— 向用戶列出每個創建或更新的文件。

單個來源可能觸發 5-15 個 wiki 頁面的更新。這是正常且預期的現象 —— 這就是複合效應。

2. 查詢

當用戶詢問關於 wiki 領域的問題時：

① 閱讀 index.md 以識別相關頁面。 ② 對於擁有 100+ 頁面的 wiki，還需在所有 .md 文件中對關鍵術語執行 search_files —— 僅靠索引可能會遺漏相關內容。 ③ 使用 read_file 閱讀相關頁面。 ④ 從編譯的知識中綜合答案。引用你參考的 wiki 頁面：“基於 [[page-a]] 和 [[page-b]]...” ⑤ 將有價值的答案歸檔 —— 如果答案是實質性的比較、深入探討或新穎的綜合，則在 queries/ 或 comparisons/ 中創建頁面。不要歸檔瑣碎的查找結果 —— 僅歸檔那些重新推導會很痛苦的答桉。 ⑥ 更新 log.md，記錄查詢內容以及是否已歸檔。

3. Lint（代碼檢查/健康檢查）

當用戶要求對 wiki 進行 lint、健康檢查或審計時：

① 孤立頁面： 查找沒有其他頁面通過入站 [[wikilinks]] 鏈接到的頁面。

# Use execute_code for this — programmatic scan across all wiki pages
import os, re
from collections import defaultdict
wiki = "<WIKI_PATH>"
# Scan all .md files in entities/, concepts/, comparisons/, queries/
# Extract all [[wikilinks]] — build inbound link map
# Pages with zero inbound links are orphans

② 損壞的 wikilink： 查找指向不存在頁面的 [[links]]。

③ 索引完整性： 每個 wiki 頁面都應出現在 index.md 中。將文件系統與索引條目進行比較。

④ Frontmatter 驗證： 每個 wiki 頁面必須包含所有必需字段（title, created, updated, type, tags, sources）。標籤必須在分類法中。

⑤ 過時內容： updated 日期比提及相同實體的最新來源早超過 90 天的頁面。

⑥ 矛盾： 同一主題下具有衝突主張的頁面。查找共享標籤/實體但陳述不同事實的頁面。展示所有帶有 contested: true 或 contradictions: frontmatter 的頁面以供用戶審查。

⑦ 質量信號： 列出帶有 confidence: low 的頁面，以及任何僅引用單一來源但未設置 confidence 字段的頁面 —— 這些是尋找佐證或降級為 confidence: medium 的候選對象。

⑧ 來源漂移： 對於 raw/ 中每個帶有 sha256: frontmatter 的文件，重新計算哈希值並標記不匹配項。不匹配表明原始文件已被編輯（不應發生 —— raw/ 是不可變的）或從已發生變化的 URL 攝入。這不是硬性錯誤，但值得報告。

⑨ 頁面大小： 標記超過 200 行的頁面 —— 這些是拆分候選對象。

⑩ 標籤審計： 列出所有正在使用的標籤，標記任何不在 SCHEMA.md 分類法中的標籤。

⑪ 日誌輪轉： 如果 log.md 超過 500 條條目，則對其進行輪轉。

⑫ 報告發現，提供具體的文件路徑和建議的操作，按嚴重程度分組（損壞鏈接 > 孤立頁面 > 來源漂移 > 爭議頁面 > 過時內容 > 風格問題）。

⑬ 追加到 log.md： ## [YYYY-MM-DD] lint | N issues found

使用 Wiki

搜索

# Find pages by content
search_files "transformer" path="$WIKI" file_glob="*.md"

# Find pages by filename
search_files "*.md" target="files" path="$WIKI"

# Find pages by tag
search_files "tags:.*alignment" path="$WIKI" file_glob="*.md"

# Recent activity
read_file "$WIKI/log.md" offset=<last 20 lines>

批量攝入

當一次性攝入多個來源時，請批量處理更新：

首先讀取所有來源
識別所有來源中的實體和概念
檢查它們對應的現有頁面（進行一次搜索，而非 N 次）
在一次遍歷中創建/更新頁面（避免冗餘更新）
最後一次性更新 index.md
編寫一條涵蓋該批次的單一日誌條目

Obsidian 集成

Wiki 目錄開箱即用地作為 Obsidian 庫工作：

[[wikilinks]] 渲染為可點擊的鏈接
圖譜視圖（Graph View）可視化知識網絡
YAML frontmatter 支持 Dataview 查詢
raw/assets/ 文件夾存放通過 ![[image.png]] 引用的圖片

為了獲得最佳效果：

將 Obsidian 的附件文件夾設置為 raw/assets/
在 Obsidian 設置中啟用“Wikilinks”（通常默認開啟）
安裝 Dataview 插件以支持諸如 TABLE tags FROM "entities" WHERE contains(tags, "company") 的查詢

如果與本技能同時使用 Obsidian 技能，請將 OBSIDIAN_VAULT_PATH 設置為與 wiki 路徑相同的目錄。

無頭模式 Obsidian（服務器和無界面機器）

在沒有顯示器的機器上，使用 obsidian-headless 代替桌面應用程序。它通過 Obsidian Sync 同步庫，無需圖形用戶界面——非常適合在服務器上運行的代理寫入 wiki，而 Obsidian 桌面端在另一臺設備上讀取。

設置：

# Requires Node.js 22+
npm install -g obsidian-headless

# Login (requires Obsidian account with Sync subscription)
ob login --email <email> --password '<password>'

# Create a remote vault for the wiki
ob sync-create-remote --name "LLM Wiki"

# Connect the wiki directory to the vault
cd ~/wiki
ob sync-setup --vault "<vault-id>"

# Initial sync
ob sync

# Continuous sync (foreground — use systemd for background)
ob sync --continuous

通過 systemd 進行持續後臺同步：

# ~/.config/systemd/user/obsidian-wiki-sync.service
[Unit]
Description=Obsidian LLM Wiki Sync
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/path/to/ob sync --continuous
WorkingDirectory=/home/user/wiki
Restart=on-failure
RestartSec=10

[Install]
WantedBy=default.target

systemctl --user daemon-reload
systemctl --user enable --now obsidian-wiki-sync
# Enable linger so sync survives logout:
sudo loginctl enable-linger $USER

這使得代理可以在服務器上寫入 ~/wiki，而你在筆記本電腦/手機上的 Obsidian 中瀏覽同一個庫——更改會在幾秒內顯示。

常見陷阱

切勿修改 raw/ 中的文件——來源是不可變的。修正應放在 wiki 頁面中。
始終先進行定位——在新會話中進行任何操作之前，先閱讀 SCHEMA + index + 最近日誌。跳過此步驟會導致重複和遺漏交叉引用。
始終更新 index.md 和 log.md——跳過此步驟會導致 wiki 退化。它們是導航的核心骨幹。
不要為短暫提及創建頁面——遵循 SCHEMA.md 中的頁面閾值。僅在腳註中出現一次的名稱不值得創建實體頁面。
不要創建沒有交叉引用的頁面——孤立的頁面是不可見的。每個頁面必須至少鏈接到其他 2 個頁面。
Frontmatter 是必需的——它支持搜索、過濾和陳舊性檢測。
標籤必須來自分類法——自由形式的標籤會退化為噪聲。先在 SCHEMA.md 中添加新標籤，然後再使用它們。
保持頁面可掃描——wiki 頁面應在 30 秒內可讀。超過 200 行的頁面應拆分。將詳細分析移至專用的深入探討頁面。
大規模更新前需詢問——如果一次攝入會影響 10+ 個現有頁面，請先與用戶確認範圍。
輪換日誌——當 log.md 超過 500 條條目時，將其重命名為 log-YYYY.md 並重新開始。代理應在 lint 過程中檢查日誌大小。
明確處理矛盾——不要靜默覆蓋。註明帶有日期的兩種主張，在 frontmatter 中標記，並標記供用戶審查。

llm-wiki-compiler 是一個 Node.js CLI，它將來源編譯成具有相同 Karpathy 靈感的概念 wiki。它與 Obsidian 兼容，因此希望使用定時/CLI 驅動編譯管道的用戶可以將其指向本技能維護的同一個庫。權衡之處：它掌控頁面生成（取代代理在頁面創建上的判斷），並且針對小型語料庫進行了優化。當你希望代理介入策展時使用本技能；當你希望批量編譯源目錄時使用 llmwiki。

技能元數據​

參考：完整 SKILL.md​

Karpathy 的 LLM Wiki

何時激活此技能​

Wiki 位置​

架構：三層結構​

恢復現有 Wiki（關鍵 — 每次會話都必須執行此操作）​

初始化新 Wiki​

SCHEMA.md 模板​

raw/ Frontmatter​

Tag Taxonomy​

Page Thresholds​

Entity Pages​

Concept Pages​

Comparison Pages​

Update Policy​

log.md 模板​

核心操作​

1. 攝取​

2. 查詢​

3. Lint（代碼檢查/健康檢查）​

使用 Wiki​

搜索​

批量攝入​

歸檔​

Obsidian 集成​

無頭模式 Obsidian（服務器和無界面機器）​

常見陷阱​

相關工具​