Outlines

在生成過程中保證有效的 JSON/XML/代碼結構，使用 Pydantic 模型實現類型安全的輸出，支持本地模型（Transformers, vLLM），並通過 Outlines（dottxt.ai 的結構化生成庫）最大化推理速度

技能元數據


來源	捆綁包（默認安裝）
路徑	`skills/mlops/inference/outlines`
版本	`1.0.0`
作者	Orchestra Research
許可證	MIT
依賴項	`outlines`, `transformers`, `vllm`, `pydantic`
標籤	`Prompt Engineering`, `Outlines`, `Structured Generation`, `JSON Schema`, `Pydantic`, `Local Models`, `Grammar-Based Generation`, `vLLM`, `Transformers`, `Type Safety`

參考：完整 SKILL.md

信息

以下是 Hermes 在觸發此技能時加載的完整技能定義。這是技能激活時代理看到的指令。

Outlines：結構化文本生成

何時使用此技能

當您需要執行以下操作時，請使用 Outlines：

保證生成過程中有效的 JSON/XML/代碼結構
使用 Pydantic 模型實現類型安全的輸出
支持本地模型（Transformers, llama.cpp, vLLM）
通過零開銷結構化生成最大化推理速度
針對 JSON schema 自動生成
在語法級別控制 token 採樣

GitHub Stars: 8,000+ | 來自: dottxt.ai (前身為 .txt)

安裝

# Base installation
pip install outlines

# With specific backends
pip install outlines transformers  # Hugging Face models
pip install outlines llama-cpp-python  # llama.cpp
pip install outlines vllm  # vLLM for high-throughput

快速入門

基本示例：分類

import outlines
from typing import Literal

# Load model
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# Generate with type constraint
prompt = "Sentiment of 'This product is amazing!': "
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])
sentiment = generator(prompt)

print(sentiment)  # "positive" (guaranteed one of these)

配合 Pydantic 模型使用

from pydantic import BaseModel
import outlines

class User(BaseModel):
    name: str
    age: int
    email: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# Generate structured output
prompt = "Extract user: John Doe, 30 years old, john@example.com"
generator = outlines.generate.json(model, User)
user = generator(prompt)

print(user.name)   # "John Doe"
print(user.age)    # 30
print(user.email)  # "john@example.com"

核心概念

1. 約束 Token 採樣

Outlines 使用有限狀態機（FSM）在 logit 級別約束 token 生成。

工作原理：

將 schema（JSON/Pydantic/正則表達式）轉換為上下文無關文法（CFG）
將 CFG 轉換為有限狀態機（FSM）
在生成過程中的每一步過濾無效 token
當只有一個有效 token 時快進

優勢：

零開銷：過濾發生在 token 級別
速度提升：沿確定性路徑快進
保證有效性：不可能產生無效輸出

import outlines

# Pydantic model -> JSON schema -> CFG -> FSM
class Person(BaseModel):
    name: str
    age: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# Behind the scenes:
# 1. Person -> JSON schema
# 2. JSON schema -> CFG
# 3. CFG -> FSM
# 4. FSM filters tokens during generation

generator = outlines.generate.json(model, Person)
result = generator("Generate person: Alice, 25")

2. 結構化生成器

Outlines 為不同的輸出類型提供專用生成器。

Choice 生成器

# Multiple choice selection
generator = outlines.generate.choice(
    model,
    ["positive", "negative", "neutral"]
)

sentiment = generator("Review: This is great!")
# Result: One of the three choices

JSON 生成器

from pydantic import BaseModel

class Product(BaseModel):
    name: str
    price: float
    in_stock: bool

# Generate valid JSON matching schema
generator = outlines.generate.json(model, Product)
product = generator("Extract: iPhone 15, $999, available")

# Guaranteed valid Product instance
print(type(product))  # <class '__main__.Product'>

正則表達式生成器

# Generate text matching regex
generator = outlines.generate.regex(
    model,
    r"[0-9]{3}-[0-9]{3}-[0-9]{4}"  # Phone number pattern
)

phone = generator("Generate phone number:")
# Result: "555-123-4567" (guaranteed to match pattern)

整數/浮點數生成器

# Generate specific numeric types
int_generator = outlines.generate.integer(model)
age = int_generator("Person's age:")  # Guaranteed integer

float_generator = outlines.generate.float(model)
price = float_generator("Product price:")  # Guaranteed float

3. 模型後端

Outlines 支持多種本地和基於 API 的後端。

Transformers (Hugging Face)

import outlines

# Load from Hugging Face
model = outlines.models.transformers(
    "microsoft/Phi-3-mini-4k-instruct",
    device="cuda"  # Or "cpu"
)

# Use with any generator
generator = outlines.generate.json(model, YourModel)

llama.cpp

# Load GGUF model
model = outlines.models.llamacpp(
    "./models/llama-3.1-8b-instruct.Q4_K_M.gguf",
    n_gpu_layers=35
)

generator = outlines.generate.json(model, YourModel)

vLLM (高吞吐量)

# For production deployments
model = outlines.models.vllm(
    "meta-llama/Llama-3.1-8B-Instruct",
    tensor_parallel_size=2  # Multi-GPU
)

generator = outlines.generate.json(model, YourModel)

OpenAI (有限支持)

# Basic OpenAI support
model = outlines.models.openai(
    "gpt-4o-mini",
    api_key="your-api-key"
)

# Note: Some features limited with API models
generator = outlines.generate.json(model, YourModel)

4. Pydantic 集成

Outlines 提供一流的 Pydantic 支持，具備自動 schema 轉換功能。

基本模型

from pydantic import BaseModel, Field

class Article(BaseModel):
    title: str = Field(description="Article title")
    author: str = Field(description="Author name")
    word_count: int = Field(description="Number of words", gt=0)
    tags: list[str] = Field(description="List of tags")

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, Article)

article = generator("Generate article about AI")
print(article.title)
print(article.word_count)  # Guaranteed > 0

嵌套模型

class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # Nested model

generator = outlines.generate.json(model, Person)
person = generator("Generate person in New York")

print(person.address.city)  # "New York"

枚舉和字面量

from enum import Enum
from typing import Literal

class Status(str, Enum):
    PENDING = "pending"
    APPROVED = "approved"
    REJECTED = "rejected"

class Application(BaseModel):
    applicant: str
    status: Status  # Must be one of enum values
    priority: Literal["low", "medium", "high"]  # Must be one of literals

generator = outlines.generate.json(model, Application)
app = generator("Generate application")

print(app.status)  # Status.PENDING (or APPROVED/REJECTED)

常見模式

模式 1：數據提取

from pydantic import BaseModel
import outlines

class CompanyInfo(BaseModel):
    name: str
    founded_year: int
    industry: str
    employees: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, CompanyInfo)

text = """
Apple Inc. was founded in 1976 in the technology industry.
The company employs approximately 164,000 people worldwide.
"""

prompt = f"Extract company information:\n{text}\n\nCompany:"
company = generator(prompt)

print(f"Name: {company.name}")
print(f"Founded: {company.founded_year}")
print(f"Industry: {company.industry}")
print(f"Employees: {company.employees}")

模式 2：分類

from typing import Literal
import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# Binary classification
generator = outlines.generate.choice(model, ["spam", "not_spam"])
result = generator("Email: Buy now! 50% off!")

# Multi-class classification
categories = ["technology", "business", "sports", "entertainment"]
category_gen = outlines.generate.choice(model, categories)
category = category_gen("Article: Apple announces new iPhone...")

# With confidence
class Classification(BaseModel):
    label: Literal["positive", "negative", "neutral"]
    confidence: float

classifier = outlines.generate.json(model, Classification)
result = classifier("Review: This product is okay, nothing special")

模式 3：結構化表單

class UserProfile(BaseModel):
    full_name: str
    age: int
    email: str
    phone: str
    country: str
    interests: list[str]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, UserProfile)

prompt = """
Extract user profile from:
Name: Alice Johnson
Age: 28
Email: alice@example.com
Phone: 555-0123
Country: USA
Interests: hiking, photography, cooking
"""

profile = generator(prompt)
print(profile.full_name)
print(profile.interests)  # ["hiking", "photography", "cooking"]

模式 4：多實體提取

class Entity(BaseModel):
    name: str
    type: Literal["PERSON", "ORGANIZATION", "LOCATION"]

class DocumentEntities(BaseModel):
    entities: list[Entity]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, DocumentEntities)

text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond."
prompt = f"Extract entities from: {text}"

result = generator(prompt)
for entity in result.entities:
    print(f"{entity.name} ({entity.type})")

模式 5：代碼生成

class PythonFunction(BaseModel):
    function_name: str
    parameters: list[str]
    docstring: str
    body: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, PythonFunction)

prompt = "Generate a Python function to calculate factorial"
func = generator(prompt)

print(f"def {func.function_name}({', '.join(func.parameters)}):")
print(f'    """{func.docstring}"""')
print(f"    {func.body}")

模式 6：批處理

def batch_extract(texts: list[str], schema: type[BaseModel]):
    """Extract structured data from multiple texts."""
    model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
    generator = outlines.generate.json(model, schema)

    results = []
    for text in texts:
        result = generator(f"Extract from: {text}")
        results.append(result)

    return results

class Person(BaseModel):
    name: str
    age: int

texts = [
    "John is 30 years old",
    "Alice is 25 years old",
    "Bob is 40 years old"
]

people = batch_extract(texts, Person)
for person in people:
    print(f"{person.name}: {person.age}")

後端配置

Transformers

import outlines

# Basic usage
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# GPU configuration
model = outlines.models.transformers(
    "microsoft/Phi-3-mini-4k-instruct",
    device="cuda",
    model_kwargs={"torch_dtype": "float16"}
)

# Popular models
model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

llama.cpp

# Load GGUF model
model = outlines.models.llamacpp(
    "./models/llama-3.1-8b.Q4_K_M.gguf",
    n_ctx=4096,         # Context window
    n_gpu_layers=35,    # GPU layers
    n_threads=8         # CPU threads
)

# Full GPU offload
model = outlines.models.llamacpp(
    "./models/model.gguf",
    n_gpu_layers=-1  # All layers on GPU
)

vLLM (生產環境)

# Single GPU
model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")

# Multi-GPU
model = outlines.models.vllm(
    "meta-llama/Llama-3.1-70B-Instruct",
    tensor_parallel_size=4  # 4 GPUs
)

# With quantization
model = outlines.models.vllm(
    "meta-llama/Llama-3.1-8B-Instruct",
    quantization="awq"  # Or "gptq"
)

最佳實踐

1. 使用具體類型

# ✅ Good: Specific types
class Product(BaseModel):
    name: str
    price: float  # Not str
    quantity: int  # Not str
    in_stock: bool  # Not str

# ❌ Bad: Everything as string
class Product(BaseModel):
    name: str
    price: str  # Should be float
    quantity: str  # Should be int

2. 添加約束

from pydantic import Field

# ✅ Good: With constraints
class User(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=0, le=120)
    email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")

# ❌ Bad: No constraints
class User(BaseModel):
    name: str
    age: int
    email: str

3. 對類別使用枚舉

# ✅ Good: Enum for fixed set
class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"

class Task(BaseModel):
    title: str
    priority: Priority

# ❌ Bad: Free-form string
class Task(BaseModel):
    title: str
    priority: str  # Can be anything

4. 在提示詞中提供上下文

# ✅ Good: Clear context
prompt = """
Extract product information from the following text.
Text: iPhone 15 Pro costs $999 and is currently in stock.
Product:
"""

# ❌ Bad: Minimal context
prompt = "iPhone 15 Pro costs $999 and is currently in stock."

5. 處理可選字段

from typing import Optional

# ✅ Good: Optional fields for incomplete data
class Article(BaseModel):
    title: str  # Required
    author: Optional[str] = None  # Optional
    date: Optional[str] = None  # Optional
    tags: list[str] = []  # Default empty list

# Can succeed even if author/date missing

與替代方案比較

特性	Outlines	Instructor	Guidance	LMQL
Pydantic 支持	✅ 原生	✅ 原生	❌ 無	❌ 無
JSON Schema	✅ 是	✅ 是	⚠️ 有限	✅ 是
正則表達式約束	✅ 是	❌ 無	✅ 是	✅ 是
本地模型	✅ 完整	⚠️ 有限	✅ 完整	✅ 完整
API 模型	⚠️ 有限	✅ 完整	✅ 完整	✅ 完整
零開銷	✅ 是	❌ 否	⚠️ 部分	✅ 是
自動重試	❌ 無	✅ 是	❌ 無	❌ 無
學習曲線	低	低	低	高

何時選擇 Outlines：

使用本地模型（Transformers, llama.cpp, vLLM）
需要最大推理速度
想要 Pydantic 模型支持
需要零開銷結構化生成
控制 token 採樣過程

何時選擇替代方案：

Instructor：需要帶有自動重試功能的 API 模型
Guidance：需要 token healing 和複雜工作流
LMQL：偏好聲明式查詢語法

性能特徵

速度：

零開銷：結構化生成的速度與無約束生成一樣快
快進優化：跳過確定性 token
比生成後驗證方法快 1.2-2 倍

內存：

每個 schema 的 FSM 僅編譯一次（緩存）
運行時開銷極小
與 vLLM 配合使用可實現高吞吐量

準確性：

100% 有效輸出（由 FSM 保證）
無需重試循環
確定性 token 過濾

資源

文檔：https://outlines-dev.github.io/outlines
GitHub：https://github.com/outlines-dev/outlines（8k+ stars）
Discord：https://discord.gg/R9DSu34mGd
博客：https://blog.dottxt.co

另見

references/json_generation.md - 全面的 JSON 和 Pydantic 模式
references/backends.md - 特定於後端的配置
references/examples.md - 生產就緒示例

技能元數據​

參考：完整 SKILL.md​

Outlines：結構化文本生成

何時使用此技能​

安裝​

快速入門​

基本示例：分類​

配合 Pydantic 模型使用​

核心概念​

1. 約束 Token 採樣​

2. 結構化生成器​

Choice 生成器​

JSON 生成器​

正則表達式生成器​

整數/浮點數生成器​

3. 模型後端​

Transformers (Hugging Face)​

llama.cpp​

vLLM (高吞吐量)​

OpenAI (有限支持)​

4. Pydantic 集成​

基本模型​

嵌套模型​

枚舉和字面量​

常見模式​

模式 1：數據提取​

模式 2：分類​

模式 3：結構化表單​

模式 4：多實體提取​

模式 5：代碼生成​

模式 6：批處理​

後端配置​

Transformers​

llama.cpp​

vLLM (生產環境)​

最佳實踐​

1. 使用具體類型​

2. 添加約束​

3. 對類別使用枚舉​

4. 在提示詞中提供上下文​

5. 處理可選字段​

與替代方案比較​

性能特徵​

資源​

另見​

技能元數據

參考：完整 SKILL.md

何時使用此技能

安裝

快速入門

基本示例：分類

配合 Pydantic 模型使用

核心概念

1. 約束 Token 採樣

2. 結構化生成器

Choice 生成器

JSON 生成器

正則表達式生成器

整數/浮點數生成器

3. 模型後端

Transformers (Hugging Face)

llama.cpp

vLLM (高吞吐量)

OpenAI (有限支持)

4. Pydantic 集成

基本模型

嵌套模型

枚舉和字面量

常見模式

模式 1：數據提取

模式 2：分類

模式 3：結構化表單

模式 4：多實體提取

模式 5：代碼生成

模式 6：批處理

後端配置

Transformers

llama.cpp

vLLM (生產環境)

最佳實踐

1. 使用具體類型

2. 添加約束

3. 對類別使用枚舉

4. 在提示詞中提供上下文

5. 處理可選字段

與替代方案比較

性能特徵

資源

另見