跳到主要內容

Outlines

在生成過程中保證有效的 JSON/XML/代碼結構,使用 Pydantic 模型實現類型安全的輸出,支持本地模型(Transformers, vLLM),並通過 Outlines(dottxt.ai 的結構化生成庫)最大化推理速度

技能元數據

來源捆綁包(默認安裝)
路徑skills/mlops/inference/outlines
版本1.0.0
作者Orchestra Research
許可證MIT
依賴項outlines, transformers, vllm, pydantic
標籤Prompt Engineering, Outlines, Structured Generation, JSON Schema, Pydantic, Local Models, Grammar-Based Generation, vLLM, Transformers, Type Safety

參考:完整 SKILL.md

信息

以下是 Hermes 在觸發此技能時加載的完整技能定義。這是技能激活時代理看到的指令。

Outlines:結構化文本生成

何時使用此技能

當您需要執行以下操作時,請使用 Outlines:

  • 保證生成過程中有效的 JSON/XML/代碼結構
  • 使用 Pydantic 模型實現類型安全的輸出
  • 支持本地模型(Transformers, llama.cpp, vLLM)
  • 通過零開銷結構化生成最大化推理速度
  • 針對 JSON schema 自動生成
  • 在語法級別控制 token 採樣

GitHub Stars: 8,000+ | 來自: dottxt.ai (前身為 .txt)

安裝

# Base installation
pip install outlines

# With specific backends
pip install outlines transformers # Hugging Face models
pip install outlines llama-cpp-python # llama.cpp
pip install outlines vllm # vLLM for high-throughput

快速入門

基本示例:分類

import outlines
from typing import Literal

# Load model
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# Generate with type constraint
prompt = "Sentiment of 'This product is amazing!': "
generator = outlines.generate.choice(model, ["positive", "negative", "neutral"])
sentiment = generator(prompt)

print(sentiment) # "positive" (guaranteed one of these)

配合 Pydantic 模型使用

from pydantic import BaseModel
import outlines

class User(BaseModel):
name: str
age: int
email: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# Generate structured output
prompt = "Extract user: John Doe, 30 years old, john@example.com"
generator = outlines.generate.json(model, User)
user = generator(prompt)

print(user.name) # "John Doe"
print(user.age) # 30
print(user.email) # "john@example.com"

核心概念

1. 約束 Token 採樣

Outlines 使用有限狀態機(FSM)在 logit 級別約束 token 生成。

工作原理:

  1. 將 schema(JSON/Pydantic/正則表達式)轉換為上下文無關文法(CFG)
  2. 將 CFG 轉換為有限狀態機(FSM)
  3. 在生成過程中的每一步過濾無效 token
  4. 當只有一個有效 token 時快進

優勢:

  • 零開銷:過濾發生在 token 級別
  • 速度提升:沿確定性路徑快進
  • 保證有效性:不可能產生無效輸出
import outlines

# Pydantic model -> JSON schema -> CFG -> FSM
class Person(BaseModel):
name: str
age: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# Behind the scenes:
# 1. Person -> JSON schema
# 2. JSON schema -> CFG
# 3. CFG -> FSM
# 4. FSM filters tokens during generation

generator = outlines.generate.json(model, Person)
result = generator("Generate person: Alice, 25")

2. 結構化生成器

Outlines 為不同的輸出類型提供專用生成器。

Choice 生成器

# Multiple choice selection
generator = outlines.generate.choice(
model,
["positive", "negative", "neutral"]
)

sentiment = generator("Review: This is great!")
# Result: One of the three choices

JSON 生成器

from pydantic import BaseModel

class Product(BaseModel):
name: str
price: float
in_stock: bool

# Generate valid JSON matching schema
generator = outlines.generate.json(model, Product)
product = generator("Extract: iPhone 15, $999, available")

# Guaranteed valid Product instance
print(type(product)) # <class '__main__.Product'>

正則表達式生成器

# Generate text matching regex
generator = outlines.generate.regex(
model,
r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern
)

phone = generator("Generate phone number:")
# Result: "555-123-4567" (guaranteed to match pattern)

整數/浮點數生成器

# Generate specific numeric types
int_generator = outlines.generate.integer(model)
age = int_generator("Person's age:") # Guaranteed integer

float_generator = outlines.generate.float(model)
price = float_generator("Product price:") # Guaranteed float

3. 模型後端

Outlines 支持多種本地和基於 API 的後端。

Transformers (Hugging Face)

import outlines

# Load from Hugging Face
model = outlines.models.transformers(
"microsoft/Phi-3-mini-4k-instruct",
device="cuda" # Or "cpu"
)

# Use with any generator
generator = outlines.generate.json(model, YourModel)

llama.cpp

# Load GGUF model
model = outlines.models.llamacpp(
"./models/llama-3.1-8b-instruct.Q4_K_M.gguf",
n_gpu_layers=35
)

generator = outlines.generate.json(model, YourModel)

vLLM (高吞吐量)

# For production deployments
model = outlines.models.vllm(
"meta-llama/Llama-3.1-8B-Instruct",
tensor_parallel_size=2 # Multi-GPU
)

generator = outlines.generate.json(model, YourModel)

OpenAI (有限支持)

# Basic OpenAI support
model = outlines.models.openai(
"gpt-4o-mini",
api_key="your-api-key"
)

# Note: Some features limited with API models
generator = outlines.generate.json(model, YourModel)

4. Pydantic 集成

Outlines 提供一流的 Pydantic 支持,具備自動 schema 轉換功能。

基本模型

from pydantic import BaseModel, Field

class Article(BaseModel):
title: str = Field(description="Article title")
author: str = Field(description="Author name")
word_count: int = Field(description="Number of words", gt=0)
tags: list[str] = Field(description="List of tags")

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, Article)

article = generator("Generate article about AI")
print(article.title)
print(article.word_count) # Guaranteed > 0

嵌套模型

class Address(BaseModel):
street: str
city: str
country: str

class Person(BaseModel):
name: str
age: int
address: Address # Nested model

generator = outlines.generate.json(model, Person)
person = generator("Generate person in New York")

print(person.address.city) # "New York"

枚舉和字面量

from enum import Enum
from typing import Literal

class Status(str, Enum):
PENDING = "pending"
APPROVED = "approved"
REJECTED = "rejected"

class Application(BaseModel):
applicant: str
status: Status # Must be one of enum values
priority: Literal["low", "medium", "high"] # Must be one of literals

generator = outlines.generate.json(model, Application)
app = generator("Generate application")

print(app.status) # Status.PENDING (or APPROVED/REJECTED)

常見模式

模式 1:數據提取

from pydantic import BaseModel
import outlines

class CompanyInfo(BaseModel):
name: str
founded_year: int
industry: str
employees: int

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, CompanyInfo)

text = """
Apple Inc. was founded in 1976 in the technology industry.
The company employs approximately 164,000 people worldwide.
"""

prompt = f"Extract company information:\n{text}\n\nCompany:"
company = generator(prompt)

print(f"Name: {company.name}")
print(f"Founded: {company.founded_year}")
print(f"Industry: {company.industry}")
print(f"Employees: {company.employees}")

模式 2:分類

from typing import Literal
import outlines

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# Binary classification
generator = outlines.generate.choice(model, ["spam", "not_spam"])
result = generator("Email: Buy now! 50% off!")

# Multi-class classification
categories = ["technology", "business", "sports", "entertainment"]
category_gen = outlines.generate.choice(model, categories)
category = category_gen("Article: Apple announces new iPhone...")

# With confidence
class Classification(BaseModel):
label: Literal["positive", "negative", "neutral"]
confidence: float

classifier = outlines.generate.json(model, Classification)
result = classifier("Review: This product is okay, nothing special")

模式 3:結構化表單

class UserProfile(BaseModel):
full_name: str
age: int
email: str
phone: str
country: str
interests: list[str]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, UserProfile)

prompt = """
Extract user profile from:
Name: Alice Johnson
Age: 28
Email: alice@example.com
Phone: 555-0123
Country: USA
Interests: hiking, photography, cooking
"""

profile = generator(prompt)
print(profile.full_name)
print(profile.interests) # ["hiking", "photography", "cooking"]

模式 4:多實體提取

class Entity(BaseModel):
name: str
type: Literal["PERSON", "ORGANIZATION", "LOCATION"]

class DocumentEntities(BaseModel):
entities: list[Entity]

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, DocumentEntities)

text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond."
prompt = f"Extract entities from: {text}"

result = generator(prompt)
for entity in result.entities:
print(f"{entity.name} ({entity.type})")

模式 5:代碼生成

class PythonFunction(BaseModel):
function_name: str
parameters: list[str]
docstring: str
body: str

model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, PythonFunction)

prompt = "Generate a Python function to calculate factorial"
func = generator(prompt)

print(f"def {func.function_name}({', '.join(func.parameters)}):")
print(f' """{func.docstring}"""')
print(f" {func.body}")

模式 6:批處理

def batch_extract(texts: list[str], schema: type[BaseModel]):
"""Extract structured data from multiple texts."""
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")
generator = outlines.generate.json(model, schema)

results = []
for text in texts:
result = generator(f"Extract from: {text}")
results.append(result)

return results

class Person(BaseModel):
name: str
age: int

texts = [
"John is 30 years old",
"Alice is 25 years old",
"Bob is 40 years old"
]

people = batch_extract(texts, Person)
for person in people:
print(f"{person.name}: {person.age}")

後端配置

Transformers

import outlines

# Basic usage
model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct")

# GPU configuration
model = outlines.models.transformers(
"microsoft/Phi-3-mini-4k-instruct",
device="cuda",
model_kwargs={"torch_dtype": "float16"}
)

# Popular models
model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct")
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

llama.cpp

# Load GGUF model
model = outlines.models.llamacpp(
"./models/llama-3.1-8b.Q4_K_M.gguf",
n_ctx=4096, # Context window
n_gpu_layers=35, # GPU layers
n_threads=8 # CPU threads
)

# Full GPU offload
model = outlines.models.llamacpp(
"./models/model.gguf",
n_gpu_layers=-1 # All layers on GPU
)

vLLM (生產環境)

# Single GPU
model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct")

# Multi-GPU
model = outlines.models.vllm(
"meta-llama/Llama-3.1-70B-Instruct",
tensor_parallel_size=4 # 4 GPUs
)

# With quantization
model = outlines.models.vllm(
"meta-llama/Llama-3.1-8B-Instruct",
quantization="awq" # Or "gptq"
)

最佳實踐

1. 使用具體類型

# ✅ Good: Specific types
class Product(BaseModel):
name: str
price: float # Not str
quantity: int # Not str
in_stock: bool # Not str

# ❌ Bad: Everything as string
class Product(BaseModel):
name: str
price: str # Should be float
quantity: str # Should be int

2. 添加約束

from pydantic import Field

# ✅ Good: With constraints
class User(BaseModel):
name: str = Field(min_length=1, max_length=100)
age: int = Field(ge=0, le=120)
email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$")

# ❌ Bad: No constraints
class User(BaseModel):
name: str
age: int
email: str

3. 對類別使用枚舉

# ✅ Good: Enum for fixed set
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"

class Task(BaseModel):
title: str
priority: Priority

# ❌ Bad: Free-form string
class Task(BaseModel):
title: str
priority: str # Can be anything

4. 在提示詞中提供上下文

# ✅ Good: Clear context
prompt = """
Extract product information from the following text.
Text: iPhone 15 Pro costs $999 and is currently in stock.
Product:
"""

# ❌ Bad: Minimal context
prompt = "iPhone 15 Pro costs $999 and is currently in stock."

5. 處理可選字段

from typing import Optional

# ✅ Good: Optional fields for incomplete data
class Article(BaseModel):
title: str # Required
author: Optional[str] = None # Optional
date: Optional[str] = None # Optional
tags: list[str] = [] # Default empty list

# Can succeed even if author/date missing

與替代方案比較

特性OutlinesInstructorGuidanceLMQL
Pydantic 支持✅ 原生✅ 原生❌ 無❌ 無
JSON Schema✅ 是✅ 是⚠️ 有限✅ 是
正則表達式約束✅ 是❌ 無✅ 是✅ 是
本地模型✅ 完整⚠️ 有限✅ 完整✅ 完整
API 模型⚠️ 有限✅ 完整✅ 完整✅ 完整
零開銷✅ 是❌ 否⚠️ 部分✅ 是
自動重試❌ 無✅ 是❌ 無❌ 無
學習曲線

何時選擇 Outlines:

  • 使用本地模型(Transformers, llama.cpp, vLLM)
  • 需要最大推理速度
  • 想要 Pydantic 模型支持
  • 需要零開銷結構化生成
  • 控制 token 採樣過程

何時選擇替代方案:

  • Instructor:需要帶有自動重試功能的 API 模型
  • Guidance:需要 token healing 和複雜工作流
  • LMQL:偏好聲明式查詢語法

性能特徵

速度:

  • 零開銷:結構化生成的速度與無約束生成一樣快
  • 快進優化:跳過確定性 token
  • 比生成後驗證方法快 1.2-2 倍

內存:

  • 每個 schema 的 FSM 僅編譯一次(緩存)
  • 運行時開銷極小
  • 與 vLLM 配合使用可實現高吞吐量

準確性:

  • 100% 有效輸出(由 FSM 保證)
  • 無需重試循環
  • 確定性 token 過濾

資源

另見

  • references/json_generation.md - 全面的 JSON 和 Pydantic 模式
  • references/backends.md - 特定於後端的配置
  • references/examples.md - 生產就緒示例