Dspy

使用聲明式編程構建複雜的 AI 系統，自動優化提示詞，並利用 DSPy（斯坦福 NLP 的系統化語言模型編程框架）創建模塊化的 RAG 系統和智能體。

技能元數據


來源	捆綁（默認安裝）
路徑	`skills/mlops/research/dspy`
版本	`1.0.0`
作者	Orchestra Research
許可證	MIT
依賴項	`dspy`, `openai`, `anthropic`
標籤	`Prompt Engineering`, `DSPy`, `Declarative Programming`, `RAG`, `Agents`, `Prompt Optimization`, `LM Programming`, `Stanford NLP`, `Automatic Optimization`, `Modular AI`

參考：完整 SKILL.md

信息

以下是 Hermes 在觸發此技能時加載的完整技能定義。這是技能激活時智能體看到的指令。

DSPy：聲明式語言模型編程

何時使用此技能

在需要執行以下操作時使用 DSPy：

構建複雜的 AI 系統，包含多個組件和工作流
以聲明式方式對語言模型 (LM) 進行編程，而非手動進行提示詞工程
使用數據驅動方法自動優化提示詞
創建可維護且可移植的模塊化 AI 管道
通過優化器系統地改進模型輸出
構建具有更高可靠性的 RAG 系統、智能體或分類器

GitHub Stars: 22,000+ | 創建者: Stanford NLP

安裝

# Stable release
pip install dspy

# Latest development version
pip install git+https://github.com/stanfordnlp/dspy.git

# With specific LM providers
pip install dspy[openai]        # OpenAI
pip install dspy[anthropic]     # Anthropic Claude
pip install dspy[all]           # All providers

快速入門

基本示例：問答

import dspy

# Configure your language model
lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
dspy.settings.configure(lm=lm)

# Define a signature (input → output)
class QA(dspy.Signature):
    """Answer questions with short factual answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

# Create a module
qa = dspy.Predict(QA)

# Use it
response = qa(question="What is the capital of France?")
print(response.answer)  # "Paris"

思維鏈推理

import dspy

lm = dspy.Claude(model="claude-sonnet-4-5-20250929")
dspy.settings.configure(lm=lm)

# Use ChainOfThought for better reasoning
class MathProblem(dspy.Signature):
    """Solve math word problems."""
    problem = dspy.InputField()
    answer = dspy.OutputField(desc="numerical answer")

# ChainOfThought generates reasoning steps automatically
cot = dspy.ChainOfThought(MathProblem)

response = cot(problem="If John has 5 apples and gives 2 to Mary, how many does he have?")
print(response.rationale)  # Shows reasoning steps
print(response.answer)     # "3"

核心概念

1. 簽名 (Signatures)

簽名定義了 AI 任務的結構（輸入 → 輸出）：

# Inline signature (simple)
qa = dspy.Predict("question -> answer")

# Class signature (detailed)
class Summarize(dspy.Signature):
    """Summarize text into key points."""
    text = dspy.InputField()
    summary = dspy.OutputField(desc="bullet points, 3-5 items")

summarizer = dspy.ChainOfThought(Summarize)

何時使用每種方式：

Inline（內聯）：快速原型設計，簡單任務
Class（類）：複雜任務，類型提示，更好的文檔說明

2. 模塊 (Modules)

模塊是將輸入轉換為輸出的可重用組件：

dspy.Predict

基礎預測模塊：

predictor = dspy.Predict("context, question -> answer")
result = predictor(context="Paris is the capital of France",
                   question="What is the capital?")

dspy.ChainOfThought

在回答之前生成推理步驟：

cot = dspy.ChainOfThought("question -> answer")
result = cot(question="Why is the sky blue?")
print(result.rationale)  # Reasoning steps
print(result.answer)     # Final answer

dspy.ReAct

帶有工具的智能體式推理：

from dspy.predict import ReAct

class SearchQA(dspy.Signature):
    """Answer questions using search."""
    question = dspy.InputField()
    answer = dspy.OutputField()

def search_tool(query: str) -> str:
    """Search Wikipedia."""
    # Your search implementation
    return results

react = ReAct(SearchQA, tools=[search_tool])
result = react(question="When was Python created?")

dspy.ProgramOfThought

生成並執行代碼以進行推理：

pot = dspy.ProgramOfThought("question -> answer")
result = pot(question="What is 15% of 240?")
# Generates: answer = 240 * 0.15

3. 優化器 (Optimizers)

優化器使用訓練數據自動改進你的模塊：

BootstrapFewShot

從示例中學習：

from dspy.teleprompt import BootstrapFewShot

# Training data
trainset = [
    dspy.Example(question="What is 2+2?", answer="4").with_inputs("question"),
    dspy.Example(question="What is 3+5?", answer="8").with_inputs("question"),
]

# Define metric
def validate_answer(example, pred, trace=None):
    return example.answer == pred.answer

# Optimize
optimizer = BootstrapFewShot(metric=validate_answer, max_bootstrapped_demos=3)
optimized_qa = optimizer.compile(qa, trainset=trainset)

# Now optimized_qa performs better!

MIPRO (Most Important Prompt Optimization)

迭代式改進提示詞：

from dspy.teleprompt import MIPRO

optimizer = MIPRO(
    metric=validate_answer,
    num_candidates=10,
    init_temperature=1.0
)

optimized_cot = optimizer.compile(
    cot,
    trainset=trainset,
    num_trials=100
)

BootstrapFinetune

創建用於模型微調的數據集：

from dspy.teleprompt import BootstrapFinetune

optimizer = BootstrapFinetune(metric=validate_answer)
optimized_module = optimizer.compile(qa, trainset=trainset)

# Exports training data for fine-tuning

4. 構建複雜系統

多階段管道

import dspy

class MultiHopQA(dspy.Module):
    def __init__(self):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=3)
        self.generate_query = dspy.ChainOfThought("question -> search_query")
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        # Stage 1: Generate search query
        search_query = self.generate_query(question=question).search_query

        # Stage 2: Retrieve context
        passages = self.retrieve(search_query).passages
        context = "\n".join(passages)

        # Stage 3: Generate answer
        answer = self.generate_answer(context=context, question=question).answer
        return dspy.Prediction(answer=answer, context=context)

# Use the pipeline
qa_system = MultiHopQA()
result = qa_system(question="Who wrote the book that inspired the movie Blade Runner?")

帶優化的 RAG 系統

import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM

# Configure retriever
retriever = ChromadbRM(
    collection_name="documents",
    persist_directory="./chroma_db"
)

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

# Create and optimize
rag = RAG()

# Optimize with training data
from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=validate_answer)
optimized_rag = optimizer.compile(rag, trainset=trainset)

LM 提供商配置

Anthropic Claude

import dspy

lm = dspy.Claude(
    model="claude-sonnet-4-5-20250929",
    api_key="your-api-key",  # Or set ANTHROPIC_API_KEY env var
    max_tokens=1000,
    temperature=0.7
)
dspy.settings.configure(lm=lm)

OpenAI

lm = dspy.OpenAI(
    model="gpt-4",
    api_key="your-api-key",
    max_tokens=1000
)
dspy.settings.configure(lm=lm)

本地模型 (Ollama)

lm = dspy.OllamaLocal(
    model="llama3.1",
    base_url="http://localhost:11434"
)
dspy.settings.configure(lm=lm)

多個模型

# Different models for different tasks
cheap_lm = dspy.OpenAI(model="gpt-3.5-turbo")
strong_lm = dspy.Claude(model="claude-sonnet-4-5-20250929")

# Use cheap model for retrieval, strong model for reasoning
with dspy.settings.context(lm=cheap_lm):
    context = retriever(question)

with dspy.settings.context(lm=strong_lm):
    answer = generator(context=context, question=question)

常見模式

模式 1：結構化輸出

from pydantic import BaseModel, Field

class PersonInfo(BaseModel):
    name: str = Field(description="Full name")
    age: int = Field(description="Age in years")
    occupation: str = Field(description="Current job")

class ExtractPerson(dspy.Signature):
    """Extract person information from text."""
    text = dspy.InputField()
    person: PersonInfo = dspy.OutputField()

extractor = dspy.TypedPredictor(ExtractPerson)
result = extractor(text="John Doe is a 35-year-old software engineer.")
print(result.person.name)  # "John Doe"
print(result.person.age)   # 35

模式 2：斷言驅動的優化

import dspy
from dspy.primitives.assertions import assert_transform_module, backtrack_handler

class MathQA(dspy.Module):
    def __init__(self):
        super().__init__()
        self.solve = dspy.ChainOfThought("problem -> solution: float")

    def forward(self, problem):
        solution = self.solve(problem=problem).solution

        # Assert solution is numeric
        dspy.Assert(
            isinstance(float(solution), float),
            "Solution must be a number",
            backtrack=backtrack_handler
        )

        return dspy.Prediction(solution=solution)

模式 3：自一致性

import dspy
from collections import Counter

class ConsistentQA(dspy.Module):
    def __init__(self, num_samples=5):
        super().__init__()
        self.qa = dspy.ChainOfThought("question -> answer")
        self.num_samples = num_samples

    def forward(self, question):
        # Generate multiple answers
        answers = []
        for _ in range(self.num_samples):
            result = self.qa(question=question)
            answers.append(result.answer)

        # Return most common answer
        most_common = Counter(answers).most_common(1)[0][0]
        return dspy.Prediction(answer=most_common)

模式 4：帶重排序的檢索

class RerankedRAG(dspy.Module):
    def __init__(self):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=10)
        self.rerank = dspy.Predict("question, passage -> relevance_score: float")
        self.answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        # Retrieve candidates
        passages = self.retrieve(question).passages

        # Rerank passages
        scored = []
        for passage in passages:
            score = float(self.rerank(question=question, passage=passage).relevance_score)
            scored.append((score, passage))

        # Take top 3
        top_passages = [p for _, p in sorted(scored, reverse=True)[:3]]
        context = "\n\n".join(top_passages)

        # Generate answer
        return self.answer(context=context, question=question)

評估與指標

自定義指標

def exact_match(example, pred, trace=None):
    """Exact match metric."""
    return example.answer.lower() == pred.answer.lower()

def f1_score(example, pred, trace=None):
    """F1 score for text overlap."""
    pred_tokens = set(pred.answer.lower().split())
    gold_tokens = set(example.answer.lower().split())

    if not pred_tokens:
        return 0.0

    precision = len(pred_tokens & gold_tokens) / len(pred_tokens)
    recall = len(pred_tokens & gold_tokens) / len(gold_tokens)

    if precision + recall == 0:
        return 0.0

    return 2 * (precision * recall) / (precision + recall)

評估

from dspy.evaluate import Evaluate

# Create evaluator
evaluator = Evaluate(
    devset=testset,
    metric=exact_match,
    num_threads=4,
    display_progress=True
)

# Evaluate model
score = evaluator(qa_system)
print(f"Accuracy: {score}")

# Compare optimized vs unoptimized
score_before = evaluator(qa)
score_after = evaluator(optimized_qa)
print(f"Improvement: {score_after - score_before:.2%}")

最佳實踐

1. 從簡單開始，迭代開發

# Start with Predict
qa = dspy.Predict("question -> answer")

# Add reasoning if needed
qa = dspy.ChainOfThought("question -> answer")

# Add optimization when you have data
optimized_qa = optimizer.compile(qa, trainset=data)

2. 使用描述性簽名

# ❌ Bad: Vague
class Task(dspy.Signature):
    input = dspy.InputField()
    output = dspy.OutputField()

# ✅ Good: Descriptive
class SummarizeArticle(dspy.Signature):
    """Summarize news articles into 3-5 key points."""
    article = dspy.InputField(desc="full article text")
    summary = dspy.OutputField(desc="bullet points, 3-5 items")

3. 使用代表性數據進行優化

# Create diverse training examples
trainset = [
    dspy.Example(question="factual", answer="...).with_inputs("question"),
    dspy.Example(question="reasoning", answer="...").with_inputs("question"),
    dspy.Example(question="calculation", answer="...").with_inputs("question"),
]

# Use validation set for metric
def metric(example, pred, trace=None):
    return example.answer in pred.answer

4. 保存和加載優化後的模型

# Save
optimized_qa.save("models/qa_v1.json")

# Load
loaded_qa = dspy.ChainOfThought("question -> answer")
loaded_qa.load("models/qa_v1.json")

5. 監控與調試

# Enable tracing
dspy.settings.configure(lm=lm, trace=[])

# Run prediction
result = qa(question="...")

# Inspect trace
for call in dspy.settings.trace:
    print(f"Prompt: {call['prompt']}")
    print(f"Response: {call['response']}")

與其他方法的比較

特性	手動提示詞工程	LangChain	DSPy
提示詞工程	手動	手動	自動
優化	試錯法	無	數據驅動
模塊化	低	中	高
類型安全	否	有限	是（簽名）
可移植性	低	中	高
學習曲線	低	中	中-高

何時選擇 DSPy：

你擁有訓練數據或可以生成訓練數據
你需要系統性地改進提示詞
你正在構建複雜的多階段系統
你希望跨不同的語言模型進行優化

何時選擇替代方案：

快速原型設計（手動提示詞工程）
使用現有工具的簡單鏈式調用（LangChain）
需要自定義優化邏輯

資源

文檔: https://dspy.ai
GitHub: https://github.com/stanfordnlp/dspy (22k+ stars)
Discord: https://discord.gg/XCGy2WDCQB
Twitter: @DSPyOSS
論文: "DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines"

另請參閱

references/modules.md - 詳細模塊指南（Predict、ChainOfThought、ReAct、ProgramOfThought）
references/optimizers.md - 優化算法（BootstrapFewShot、MIPRO、BootstrapFinetune）
references/examples.md - 實際示例（RAG、智能體、分類器）

技能元數據​

參考：完整 SKILL.md​

DSPy：聲明式語言模型編程

何時使用此技能​

安裝​

快速入門​

基本示例：問答​

思維鏈推理​

核心概念​

1. 簽名 (Signatures)​

2. 模塊 (Modules)​

dspy.Predict​

dspy.ChainOfThought​

dspy.ReAct​

dspy.ProgramOfThought​

3. 優化器 (Optimizers)​

BootstrapFewShot​

MIPRO (Most Important Prompt Optimization)​

BootstrapFinetune​

4. 構建複雜系統​

多階段管道​

帶優化的 RAG 系統​

LM 提供商配置​

Anthropic Claude​

OpenAI​

本地模型 (Ollama)​

多個模型​

常見模式​

模式 1：結構化輸出​

模式 2：斷言驅動的優化​

模式 3：自一致性​

模式 4：帶重排序的檢索​

評估與指標​

自定義指標​

評估​

最佳實踐​

1. 從簡單開始，迭代開發​

2. 使用描述性簽名​

3. 使用代表性數據進行優化​

4. 保存和加載優化後的模型​

5. 監控與調試​

與其他方法的比較​

資源​

另請參閱​