Skip to main content

飞书群消息接入日报管线 — 实施计划

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: 把飞书群作为第二个信息源接入现有日报管线;新建独立的 bot/feishu-bot/ 目录做抽取,最终单份 detailed.md 同时包含微信和飞书内容。

Architecture: 飞书侧每日批量调 Open API 拉前一日消息,落到 bot/feishu-bot/data/daily/<date>.feishu.json。schema 与微信侧 <date>.json 严格对齐。bot/wechat-bot/scripts/generate_report.py 唯一改动是新增 _load_daily(date_str) 函数把两个 JSON 的 groups 列表串接。_display_source / dedupe_highlights / sanitize_highlights / prompt 全部 0 改动。

Tech Stack: Python 3.9+,仅依赖 requests + python-dotenv,不引入飞书官方 SDK。测试用 unittest(标准库),离线 JSON fixture,不打活的 API。

Reference Spec: docs/superpowers/specs/2026-05-01-feishu-group-extraction-design.md


File Structure

新增:

bot/feishu-bot/
├── README.md # 用法 + 飞书自建应用配置步骤
├── CLAUDE.md # 给 AI 的快速 context
├── .env.example # FEISHU_APP_ID / SECRET / CHAT_IDS / GROUP_LABELS
├── .gitignore # data/, .env
├── scripts/
│ ├── _feishu.py # client + token + decoders(single source of truth)
│ ├── extract_day.py # CLI,与微信侧对齐
│ └── inventory.py # 列机器人在的群、最近 7 日消息量
├── data/
│ └── daily/<date>.feishu.json # 抽取产物,gitignored
└── tests/
├── __init__.py
├── fixtures/
│ ├── post_simple.json
│ ├── post_with_links.json
│ ├── share_chat.json
│ ├── share_user.json
│ ├── file.json
│ └── messages_page1.json
│ └── messages_page2.json
├── test_decoders.py
├── test_pagination.py
└── test_load_daily.py

修改:

  • bot/wechat-bot/scripts/generate_report.py:抽出 _load_daily(),约 +25 行

每个任务下方"Files"段落引用的所有路径都相对仓库根。


Pre-conditions(用户操作,不在自动化范围

这些动作 由人 在飞书开放平台完成,无法在 plan 里执行。代码任务可以在没拿到真实 credentials 时通过单测推进;但 Task 17(端到端验证)需要先完成下面三步。

  1. 创建自建应用:飞书开放平台 → 创建企业自建应用 → 启用机器人能力。
  2. 申请权限 scope:勾选 im:message:readonlyim:chat:readonlyim:chat.member:readcontact:user.id:readonly。提交并获得审批通过。
  3. 拿凭证 + 拉机器人入群:复制 app_id / app_secret.env;在 Hermes 飞书群里 @机器人 把它加入群。

Task 1: 项目目录骨架

Files:

  • Create: bot/feishu-bot/.gitignore

  • Create: bot/feishu-bot/.env.example

  • Create: bot/feishu-bot/scripts/ (directory marker via .gitkeep-style file is unnecessary; we'll add _feishu.py next task)

  • Create: bot/feishu-bot/tests/__init__.py

  • Create: bot/feishu-bot/tests/fixtures/.gitkeep

  • Step 1: 写 .gitignore

Create bot/feishu-bot/.gitignore:

.env
.env.*
!.env.example

# Python
__pycache__/
*.pyc
*.pyo
.venv/
venv/

# Extracted Feishu data — sensitive, never commit
data/

# macOS
.DS_Store
  • Step 2: 写 .env.example

Create bot/feishu-bot/.env.example:

# Feishu 自建应用凭证
FEISHU_APP_ID=cli_xxxxxxxx
FEISHU_APP_SECRET=xxxxxxxxxxxx

# 要监听的群 chat_id 列表,逗号分隔
FEISHU_CHAT_IDS=oc_xxxxxxxxxxxx

# 可选:把 chat_id 显式映射到日报中显示的群名前缀
# 多条目用 ; 分隔,单条 chat_id=label 用 = 分隔
# 缺省时按 "Hermes Agent 中文社区飞书群" + FEISHU_CHAT_IDS 中的顺序号补
# FEISHU_GROUP_LABELS=oc_xxx=Hermes Agent 中文社区飞书群 1
  • Step 3: 创建测试目录骨架
mkdir -p bot/feishu-bot/scripts bot/feishu-bot/tests/fixtures bot/feishu-bot/data/daily
touch bot/feishu-bot/tests/__init__.py
touch bot/feishu-bot/tests/fixtures/.gitkeep
  • Step 4: Commit
git add bot/feishu-bot/.gitignore bot/feishu-bot/.env.example \
bot/feishu-bot/tests/__init__.py bot/feishu-bot/tests/fixtures/.gitkeep
git commit -m "feat(feishu-bot): 项目目录骨架"

Task 2: 消息解码器 — text 类型

Files:

  • Create: bot/feishu-bot/scripts/_feishu.py
  • Create: bot/feishu-bot/tests/fixtures/text_simple.json
  • Create: bot/feishu-bot/tests/test_decoders.py

Background: 飞书 im/v1/messages 返回的每条消息的 body.content 是 JSON 字符串,里面再嵌套类型相关结构。text 类型最简单:{"text": "你好"}

  • Step 1: 写 fixture

Create bot/feishu-bot/tests/fixtures/text_simple.json:

{
"message_id": "om_aaa111",
"create_time": "1714492800000",
"msg_type": "text",
"sender": {"id": "ou_aaa", "id_type": "open_id"},
"body": {"content": "{\"text\":\"deepseek 怎么样\"}"}
}

注意 create_time毫秒级字符串(飞书 API 实际返回值),body.content 也是字符串。

  • Step 2: 写失败的测试

Create bot/feishu-bot/tests/test_decoders.py:

import json
import unittest
from pathlib import Path

import sys
sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "scripts"))
from _feishu import decode_message # noqa: E402

FIXTURES = Path(__file__).parent / "fixtures"


def load_fixture(name: str) -> dict:
return json.loads((FIXTURES / name).read_text(encoding="utf-8"))


class TestDecodeText(unittest.TestCase):
def test_simple_text(self):
raw = load_fixture("text_simple.json")
msg = decode_message(raw)
self.assertIsNotNone(msg)
self.assertEqual(msg["type"], "text")
self.assertEqual(msg["sender_wxid"], "ou_aaa")
self.assertEqual(msg["sender_name"], "")
self.assertEqual(msg["text"], "deepseek 怎么样")
# ts 是秒级 int(从毫秒字符串转)
self.assertEqual(msg["ts"], 1714492800)
# time 是 Asia/Shanghai 的 HH:MM:SS
self.assertEqual(msg["time"], "09:20:00")


if __name__ == "__main__":
unittest.main()
  • Step 3: 跑测试,确认失败
cd bot/feishu-bot
/usr/bin/python3 -m unittest tests.test_decoders -v

Expected: ImportError: cannot import name 'decode_message' from '_feishu'(或 ModuleNotFoundError

  • Step 4: 写最小实现

Create bot/feishu-bot/scripts/_feishu.py:

"""Feishu Open API client + message decoders.

Single source of truth for everything in this directory.
"""
from __future__ import annotations

import datetime as dt
import json
from zoneinfo import ZoneInfo

TZ = ZoneInfo("Asia/Shanghai")


def _ts_seconds(create_time: str | int) -> int:
"""Feishu API returns create_time as milliseconds (string or int)."""
return int(int(create_time) // 1000)


def _decode_text(content: dict) -> str:
return (content.get("text") or "").strip()


def decode_message(raw: dict) -> dict | None:
"""Decode one Feishu message envelope into our internal schema.

Returns None for unsupported / noise types (image, sticker, audio, etc.).
"""
msg_type = raw.get("msg_type")
body_raw = (raw.get("body") or {}).get("content") or "{}"
try:
content = json.loads(body_raw) if isinstance(body_raw, str) else body_raw
except json.JSONDecodeError:
return None

if msg_type == "text":
text = _decode_text(content)
else:
return None

if not text:
return None

ts = _ts_seconds(raw.get("create_time") or 0)
sender_id = ((raw.get("sender") or {}).get("id")) or ""
return {
"ts": ts,
"time": dt.datetime.fromtimestamp(ts, TZ).strftime("%H:%M:%S"),
"sender_wxid": sender_id,
"sender_name": "",
"type": msg_type,
"text": text,
}
  • Step 5: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_decoders -v

Expected: PASS

  • Step 6: Commit
git add bot/feishu-bot/scripts/_feishu.py \
bot/feishu-bot/tests/test_decoders.py \
bot/feishu-bot/tests/fixtures/text_simple.json
git commit -m "feat(feishu-bot): 解码 text 消息(含 ts/time 转换)"

Task 3: 消息解码器 — post 富文本

Files:

  • Modify: bot/feishu-bot/scripts/_feishu.py
  • Create: bot/feishu-bot/tests/fixtures/post_simple.json
  • Create: bot/feishu-bot/tests/fixtures/post_with_links.json
  • Modify: bot/feishu-bot/tests/test_decoders.py

Background: 飞书 post 消息的 content 形如:

{"title":"标题","content":[[{"tag":"text","text":"段一"},{"tag":"a","href":"https://x","text":"链接"}],[{"tag":"text","text":"段二"}]]}

content段落数组的数组:外层每项是一段,内层是该段内的元素(text / a / at / img / emotion)。

  • Step 1: 写两个 fixture

Create bot/feishu-bot/tests/fixtures/post_simple.json:

{
"message_id": "om_post1",
"create_time": "1714492810000",
"msg_type": "post",
"sender": {"id": "ou_bbb", "id_type": "open_id"},
"body": {"content": "{\"title\":\"周报\",\"content\":[[{\"tag\":\"text\",\"text\":\"本周完成 X\"}],[{\"tag\":\"text\",\"text\":\"下周计划 Y\"}]]}"}
}

Create bot/feishu-bot/tests/fixtures/post_with_links.json:

{
"message_id": "om_post2",
"create_time": "1714492820000",
"msg_type": "post",
"sender": {"id": "ou_ccc", "id_type": "open_id"},
"body": {"content": "{\"title\":\"\",\"content\":[[{\"tag\":\"text\",\"text\":\"看这个 \"},{\"tag\":\"a\",\"href\":\"https://example.com\",\"text\":\"博客\"},{\"tag\":\"text\",\"text\":\" 还可以\"}],[{\"tag\":\"at\",\"user_id\":\"ou_xxx\",\"user_name\":\"张三\"},{\"tag\":\"text\",\"text\":\" 你怎么看\"}]]}"}
}
  • Step 2: 加测试用例

Append to bot/feishu-bot/tests/test_decoders.py:

class TestDecodePost(unittest.TestCase):
def test_post_with_title(self):
raw = load_fixture("post_simple.json")
msg = decode_message(raw)
self.assertEqual(msg["type"], "post")
self.assertEqual(msg["sender_wxid"], "ou_bbb")
# 标题与正文段之间用空行隔开;段之间也用空行
self.assertEqual(msg["text"], "周报\n\n本周完成 X\n\n下周计划 Y")

def test_post_with_link_and_at(self):
raw = load_fixture("post_with_links.json")
msg = decode_message(raw)
# 链接保留 [文字](url) 形式;at 保留 @姓名
self.assertEqual(
msg["text"],
"看这个 [博客](https://example.com) 还可以\n\n@张三 你怎么看",
)
  • Step 3: 跑测试,确认失败
/usr/bin/python3 -m unittest tests.test_decoders.TestDecodePost -v

Expected: FAIL(decode_message 还不支持 post,返回 None

  • Step 4: 实现 _decode_post

Add to bot/feishu-bot/scripts/_feishu.py:

def _decode_post_element(el: dict) -> str:
"""One inline element inside a post paragraph."""
tag = el.get("tag")
if tag == "text":
return el.get("text") or ""
if tag == "a":
text = el.get("text") or ""
href = el.get("href") or ""
return f"[{text}]({href})" if href else text
if tag == "at":
# user_name 可能为空(被 at 的人不在群),用 user_id 兜底
return "@" + (el.get("user_name") or el.get("user_id") or "")
if tag == "img":
return "[图片]"
if tag == "emotion":
return ""
if tag == "media":
return "[媒体]"
if tag == "file":
return "[文件]"
return ""


def _decode_post(content: dict) -> str:
"""Flatten Feishu post content into plain text with links and ats."""
title = (content.get("title") or "").strip()
paragraphs = content.get("content") or []
rendered = []
for para in paragraphs:
if not isinstance(para, list):
continue
line = "".join(_decode_post_element(el) for el in para if isinstance(el, dict))
line = line.strip()
if line:
rendered.append(line)
body = "\n\n".join(rendered)
if title and body:
return f"{title}\n\n{body}"
return title or body

Modify the decode_message dispatch:

    if msg_type == "text":
text = _decode_text(content)
elif msg_type in ("post", "post_v2"):
text = _decode_post(content)
else:
return None
  • Step 5: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_decoders -v

Expected: 3 tests, all PASS

  • Step 6: Commit
git add bot/feishu-bot/scripts/_feishu.py \
bot/feishu-bot/tests/test_decoders.py \
bot/feishu-bot/tests/fixtures/post_simple.json \
bot/feishu-bot/tests/fixtures/post_with_links.json
git commit -m "feat(feishu-bot): 解码 post 富文本(保留链接/@/段落分隔)"

Task 4: 消息解码器 — share_chat / share_user

Files:

  • Modify: bot/feishu-bot/scripts/_feishu.py
  • Create: bot/feishu-bot/tests/fixtures/share_chat.json
  • Create: bot/feishu-bot/tests/fixtures/share_user.json
  • Modify: bot/feishu-bot/tests/test_decoders.py

Background: share_chat 消息的 content 形如 {"chatId":"oc_xxx"}share_user 形如 {"userId":"ou_xxx"}。这两种类型本身不带名字 / URL,需要客户端二次查询。为了让 decoder 保持纯函数,我们让 decode_message 接受一个可选的 resolver 回调,由调用方决定怎么补名字。

  • Step 1: 写两个 fixture

Create bot/feishu-bot/tests/fixtures/share_chat.json:

{
"message_id": "om_share1",
"create_time": "1714492830000",
"msg_type": "share_chat",
"sender": {"id": "ou_ddd", "id_type": "open_id"},
"body": {"content": "{\"chatId\":\"oc_target_group\"}"}
}

Create bot/feishu-bot/tests/fixtures/share_user.json:

{
"message_id": "om_share2",
"create_time": "1714492840000",
"msg_type": "share_user",
"sender": {"id": "ou_eee", "id_type": "open_id"},
"body": {"content": "{\"userId\":\"ou_target_user\"}"}
}
  • Step 2: 加测试用例

Append to bot/feishu-bot/tests/test_decoders.py:

class TestDecodeShare(unittest.TestCase):
def test_share_chat_with_resolver(self):
raw = load_fixture("share_chat.json")
resolver = lambda kind, ref_id: f"群名(假){ref_id}" if kind == "chat" else None
msg = decode_message(raw, resolver=resolver)
self.assertEqual(msg["type"], "share")
self.assertEqual(msg["text"], "[转发链接] 群名(假)oc_target_group")

def test_share_chat_without_resolver_degrades(self):
raw = load_fixture("share_chat.json")
msg = decode_message(raw)
self.assertEqual(msg["text"], "[转发链接] oc_target_group")

def test_share_user_with_resolver(self):
raw = load_fixture("share_user.json")
resolver = lambda kind, ref_id: "李四" if kind == "user" else None
msg = decode_message(raw, resolver=resolver)
self.assertEqual(msg["type"], "share")
self.assertEqual(msg["text"], "[转发名片] 李四")
  • Step 3: 跑测试,确认失败
/usr/bin/python3 -m unittest tests.test_decoders.TestDecodeShare -v

Expected: FAIL

  • Step 4: 实现 share 解码 + resolver 参数

Modify decode_message in bot/feishu-bot/scripts/_feishu.py:

from typing import Callable

ShareResolver = Callable[[str, str], str | None]


def _decode_share_chat(content: dict, resolver: ShareResolver | None) -> str:
chat_id = content.get("chatId") or content.get("chat_id") or ""
name = resolver("chat", chat_id) if (resolver and chat_id) else None
return f"[转发链接] {name or chat_id}"


def _decode_share_user(content: dict, resolver: ShareResolver | None) -> str:
user_id = content.get("userId") or content.get("user_id") or ""
name = resolver("user", user_id) if (resolver and user_id) else None
return f"[转发名片] {name or user_id}"


def decode_message(raw: dict, resolver: ShareResolver | None = None) -> dict | None:
msg_type = raw.get("msg_type")
body_raw = (raw.get("body") or {}).get("content") or "{}"
try:
content = json.loads(body_raw) if isinstance(body_raw, str) else body_raw
except json.JSONDecodeError:
return None

if msg_type == "text":
text = _decode_text(content)
out_type = "text"
elif msg_type in ("post", "post_v2"):
text = _decode_post(content)
out_type = "post"
elif msg_type == "share_chat":
text = _decode_share_chat(content, resolver)
out_type = "share"
elif msg_type == "share_user":
text = _decode_share_user(content, resolver)
out_type = "share"
else:
return None

if not text:
return None

ts = _ts_seconds(raw.get("create_time") or 0)
sender_id = ((raw.get("sender") or {}).get("id")) or ""
return {
"ts": ts,
"time": dt.datetime.fromtimestamp(ts, TZ).strftime("%H:%M:%S"),
"sender_wxid": sender_id,
"sender_name": "",
"type": out_type,
"text": text,
}

注意:原来 out_type = msg_type,现在改成 explicit 映射,因为 share_chat / share_user 都规范化成 share

  • Step 5: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_decoders -v

Expected: 6 tests, all PASS

  • Step 6: Commit
git add bot/feishu-bot/scripts/_feishu.py \
bot/feishu-bot/tests/test_decoders.py \
bot/feishu-bot/tests/fixtures/share_chat.json \
bot/feishu-bot/tests/fixtures/share_user.json
git commit -m "feat(feishu-bot): 解码 share_chat / share_user(含 resolver 回调)"

Task 5: 消息解码器 — file + 未知类型

Files:

  • Modify: bot/feishu-bot/scripts/_feishu.py

  • Create: bot/feishu-bot/tests/fixtures/file.json

  • Create: bot/feishu-bot/tests/fixtures/sticker.json

  • Modify: bot/feishu-bot/tests/test_decoders.py

  • Step 1: 写两个 fixture

Create bot/feishu-bot/tests/fixtures/file.json:

{
"message_id": "om_file1",
"create_time": "1714492850000",
"msg_type": "file",
"sender": {"id": "ou_fff", "id_type": "open_id"},
"body": {"content": "{\"file_key\":\"file_xxx\",\"file_name\":\"演示文档.pdf\"}"}
}

Create bot/feishu-bot/tests/fixtures/sticker.json:

{
"message_id": "om_stk1",
"create_time": "1714492860000",
"msg_type": "sticker",
"sender": {"id": "ou_ggg", "id_type": "open_id"},
"body": {"content": "{\"file_key\":\"sticker_xxx\"}"}
}
  • Step 2: 加测试用例

Append to bot/feishu-bot/tests/test_decoders.py:

class TestDecodeFile(unittest.TestCase):
def test_file_keeps_filename(self):
raw = load_fixture("file.json")
msg = decode_message(raw)
self.assertEqual(msg["type"], "file")
self.assertEqual(msg["text"], "[文件] 演示文档.pdf")


class TestDecodeUnknown(unittest.TestCase):
def test_sticker_dropped(self):
raw = load_fixture("sticker.json")
msg = decode_message(raw)
self.assertIsNone(msg)

def test_garbage_content_dropped(self):
raw = {
"msg_type": "text",
"create_time": "1714492870000",
"sender": {"id": "ou_x"},
"body": {"content": "not json"},
}
self.assertIsNone(decode_message(raw))
  • Step 3: 跑测试,确认 file 测试失败、unknown 已通过
/usr/bin/python3 -m unittest tests.test_decoders -v

Expected: file 测试 FAIL;sticker / garbage 已自动通过(因为目前 dispatch 走 else: return None)

  • Step 4: 实现 _decode_file

Modify bot/feishu-bot/scripts/_feishu.py:

def _decode_file(content: dict) -> str:
name = (content.get("file_name") or "").strip()
return f"[文件] {name}" if name else ""

Add to dispatch in decode_message:

    elif msg_type == "file":
text = _decode_file(content)
out_type = "file"
  • Step 5: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_decoders -v

Expected: 9 tests, all PASS

  • Step 6: Commit
git add bot/feishu-bot/scripts/_feishu.py \
bot/feishu-bot/tests/test_decoders.py \
bot/feishu-bot/tests/fixtures/file.json \
bot/feishu-bot/tests/fixtures/sticker.json
git commit -m "feat(feishu-bot): 解码 file(保留文件名)+ 未知类型 drop"

Task 6: HTTP 客户端骨架 + tenant_access_token

Files:

  • Modify: bot/feishu-bot/scripts/_feishu.py
  • Create: bot/feishu-bot/tests/test_client.py

Background: 飞书认证流程:

POST https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal
Body: {"app_id":"...","app_secret":"..."}
Resp: {"code":0, "tenant_access_token":"t-xxx", "expire": 7200}

后续业务调用 header 加 Authorization: Bearer <token>。Token 过期会返回 code == 99991663,需要刷新重试。

FeishuClient 接收一个可选 transport callable,签名 (method, url, headers, params, json) -> (status_code, json_body)。生产用 requests,测试用一个 in-memory fake。

  • Step 1: 写测试

Create bot/feishu-bot/tests/test_client.py:

import sys
import unittest
from pathlib import Path

sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "scripts"))
from _feishu import FeishuClient # noqa: E402


class FakeTransport:
"""Records calls + returns canned responses by URL match."""

def __init__(self):
self.calls: list[dict] = []
self.responses: dict[str, list[tuple[int, dict]]] = {}

def respond(self, url_substring: str, status: int, body: dict):
self.responses.setdefault(url_substring, []).append((status, body))

def __call__(self, method, url, *, headers=None, params=None, json=None):
self.calls.append({
"method": method, "url": url,
"headers": headers or {}, "params": params or {}, "json": json,
})
for sub, queue in self.responses.items():
if sub in url and queue:
return queue.pop(0)
raise AssertionError(f"unexpected request: {method} {url}")


class TestTokenFetch(unittest.TestCase):
def test_fetch_token_first_call(self):
t = FakeTransport()
t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t-abc", "expire": 7200,
})
c = FeishuClient("cli_a", "secret_a", transport=t)

token = c._get_token()

self.assertEqual(token, "t-abc")
self.assertEqual(t.calls[0]["json"], {"app_id": "cli_a", "app_secret": "secret_a"})

def test_token_cached_within_run(self):
t = FakeTransport()
t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t-abc", "expire": 7200,
})
c = FeishuClient("cli_a", "secret_a", transport=t)

c._get_token()
c._get_token()

self.assertEqual(len(t.calls), 1)

def test_token_fetch_failure_raises(self):
t = FakeTransport()
t.respond("tenant_access_token", 200, {"code": 1234, "msg": "bad app"})
c = FeishuClient("cli_a", "secret_a", transport=t)

with self.assertRaises(RuntimeError):
c._get_token()


if __name__ == "__main__":
unittest.main()
  • Step 2: 跑测试,确认失败
cd bot/feishu-bot
/usr/bin/python3 -m unittest tests.test_client -v

Expected: FAIL — FeishuClient 不存在

  • Step 3: 实现 FeishuClient + _get_token

Add to top of bot/feishu-bot/scripts/_feishu.py:

import time

FEISHU_API = "https://open.feishu.cn/open-apis"
TOKEN_URL = f"{FEISHU_API}/auth/v3/tenant_access_token/internal"


def _default_transport(method: str, url: str, *, headers=None, params=None, json=None):
"""Real HTTP transport using `requests`. Imported lazily so unit tests
don't need the dependency installed.
"""
import requests # local import so test env without requests still works
resp = requests.request(
method, url, headers=headers, params=params, json=json, timeout=30
)
try:
body = resp.json()
except ValueError:
body = {}
return resp.status_code, body


class FeishuClient:
def __init__(self, app_id: str, app_secret: str, *, transport=None):
self.app_id = app_id
self.app_secret = app_secret
self._transport = transport or _default_transport
self._token: str | None = None

def _get_token(self) -> str:
if self._token:
return self._token
status, body = self._transport(
"POST", TOKEN_URL,
json={"app_id": self.app_id, "app_secret": self.app_secret},
)
if status != 200 or body.get("code") != 0:
raise RuntimeError(
f"Feishu token fetch failed: status={status} body={body}"
)
self._token = body.get("tenant_access_token")
if not self._token:
raise RuntimeError(f"Feishu token response missing token: {body}")
return self._token
  • Step 4: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_client -v

Expected: 3 tests PASS

  • Step 5: Commit
git add bot/feishu-bot/scripts/_feishu.py bot/feishu-bot/tests/test_client.py
git commit -m "feat(feishu-bot): FeishuClient 骨架 + tenant_access_token 拉取"

Task 7: 通用请求 + token 过期重试

Files:

  • Modify: bot/feishu-bot/scripts/_feishu.py
  • Modify: bot/feishu-bot/tests/test_client.py

Background: 后续所有业务请求都要带 token。封装 _request(method, path, params=None, json=None):自动加 token → 调用 transport → 检查 code == 99991663(token 过期)则刷一次重试 → 5xx / 网络异常退避重试 3 次。

  • Step 1: 加测试

Append to bot/feishu-bot/tests/test_client.py:

class TestRequest(unittest.TestCase):
def test_token_attached(self):
t = FakeTransport()
t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t-abc", "expire": 7200,
})
t.respond("im/v1/messages", 200, {"code": 0, "data": {"items": []}})
c = FeishuClient("cli", "sec", transport=t)

c._request("GET", "/im/v1/messages", params={"x": 1})

# second call (the messages one) should have Authorization header
msg_call = t.calls[1]
self.assertEqual(msg_call["headers"]["Authorization"], "Bearer t-abc")
self.assertEqual(msg_call["params"], {"x": 1})

def test_token_refresh_on_99991663(self):
t = FakeTransport()
t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t-old", "expire": 7200,
})
# First business call returns expired-token error
t.respond("im/v1/messages", 200, {"code": 99991663, "msg": "token expired"})
# After refresh, second token + second business call succeed
t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t-new", "expire": 7200,
})
t.respond("im/v1/messages", 200, {"code": 0, "data": {"items": []}})
c = FeishuClient("cli", "sec", transport=t)

body = c._request("GET", "/im/v1/messages")

self.assertEqual(body, {"code": 0, "data": {"items": []}})
# 4 calls total: token, messages(fail), token(refresh), messages(retry)
self.assertEqual(len(t.calls), 4)
self.assertEqual(t.calls[3]["headers"]["Authorization"], "Bearer t-new")

def test_5xx_retried_then_succeeds(self):
t = FakeTransport()
t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t-abc", "expire": 7200,
})
t.respond("im/v1/messages", 503, {})
t.respond("im/v1/messages", 200, {"code": 0, "data": {}})
c = FeishuClient("cli", "sec", transport=t)
# speed up retry in test
c._retry_base_delay = 0

body = c._request("GET", "/im/v1/messages")

self.assertEqual(body["code"], 0)
self.assertEqual(len(t.calls), 3)

def test_5xx_exhausted_raises(self):
t = FakeTransport()
t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t-abc", "expire": 7200,
})
for _ in range(3):
t.respond("im/v1/messages", 503, {})
c = FeishuClient("cli", "sec", transport=t)
c._retry_base_delay = 0

with self.assertRaises(RuntimeError):
c._request("GET", "/im/v1/messages")
  • Step 2: 跑测试,确认失败
/usr/bin/python3 -m unittest tests.test_client.TestRequest -v

Expected: FAIL — _request 未实现

  • Step 3: 实现 _request

Add to FeishuClient in bot/feishu-bot/scripts/_feishu.py:

class FeishuClient:
# ... 既有代码 ...

_max_retries = 3
_retry_base_delay = 2.0 # tests override to 0

def _request(self, method: str, path: str, *, params=None, json=None) -> dict:
url = FEISHU_API + path if path.startswith("/") else f"{FEISHU_API}/{path}"

for attempt in range(self._max_retries):
headers = {"Authorization": f"Bearer {self._get_token()}"}
status, body = self._transport(
method, url, headers=headers, params=params, json=json,
)

# Token expired — clear cached token; the next loop iteration will
# refresh on next _get_token. This consumes one retry slot, which
# is fine: token refresh is rare and we have _max_retries to spare.
if status == 200 and isinstance(body, dict) and body.get("code") == 99991663:
self._token = None
continue

if status >= 500:
if attempt + 1 == self._max_retries:
raise RuntimeError(f"Feishu {method} {path} 5xx after {self._max_retries} attempts: status={status}")
time.sleep(self._retry_base_delay * (2 ** attempt))
continue

if status == 429:
if attempt + 1 == self._max_retries:
raise RuntimeError(f"Feishu {method} {path} 429 rate limited after {self._max_retries} attempts")
time.sleep(self._retry_base_delay * (2 ** attempt))
continue

if status != 200 or body.get("code") not in (0, None):
raise RuntimeError(
f"Feishu {method} {path} failed: status={status} body={body}"
)

return body

# Should be unreachable — retry loop either returns or raises.
raise RuntimeError(f"Feishu {method} {path}: retry loop exhausted unexpectedly")

注意:Python for attempt in range(...) 配合 continue 会推进到下一次迭代,所以 token-expired 重试会消耗一次 attempt 槽位——这没关系,因为 token 过期罕见且我们有 3 次 retry 余量。如果未来发现耗尽 retry 的情形,再考虑用 while 循环。

  • Step 4: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_client -v

Expected: 7 tests PASS

  • Step 5: Commit
git add bot/feishu-bot/scripts/_feishu.py bot/feishu-bot/tests/test_client.py
git commit -m "feat(feishu-bot): _request 通用请求(含 token 刷新 + 5xx/429 退避重试)"

Task 8: 群消息分页拉取 iter_messages

Files:

  • Modify: bot/feishu-bot/scripts/_feishu.py
  • Create: bot/feishu-bot/tests/test_pagination.py

Background: 飞书 GET /open-apis/im/v1/messages 参数:

  • container_id_type=chat
  • container_id=oc_xxx
  • start_time=<unix秒字符串>
  • end_time=<unix秒字符串>
  • page_size=50
  • page_token=...(上次响应给的)
  • sort_type=ByCreateTimeDescByCreateTimeAsc

响应:

{"code":0,"data":{"items":[...], "has_more":true, "page_token":"next_xxx"}}

我们要按时间升序拉,全部拉完。

  • Step 1: 写测试

Create bot/feishu-bot/tests/test_pagination.py:

import sys
import unittest
from pathlib import Path

sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "scripts"))
from _feishu import FeishuClient # noqa: E402
from tests.test_client import FakeTransport # reuse the harness


class TestIterMessages(unittest.TestCase):
def setUp(self):
self.t = FakeTransport()
self.t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t-abc", "expire": 7200,
})
self.c = FeishuClient("cli", "sec", transport=self.t)

def test_single_page(self):
self.t.respond("im/v1/messages", 200, {
"code": 0,
"data": {
"items": [{"message_id": "m1"}, {"message_id": "m2"}],
"has_more": False,
"page_token": "",
},
})

out = list(self.c.iter_messages("oc_x", 1714492800, 1714579200))

self.assertEqual([m["message_id"] for m in out], ["m1", "m2"])

def test_two_pages(self):
self.t.respond("im/v1/messages", 200, {
"code": 0,
"data": {
"items": [{"message_id": "m1"}],
"has_more": True,
"page_token": "tok2",
},
})
self.t.respond("im/v1/messages", 200, {
"code": 0,
"data": {
"items": [{"message_id": "m2"}],
"has_more": False,
"page_token": "",
},
})

out = list(self.c.iter_messages("oc_x", 1, 100))

self.assertEqual([m["message_id"] for m in out], ["m1", "m2"])
# second call must carry page_token=tok2
msg_calls = [c for c in self.t.calls if "im/v1/messages" in c["url"]]
self.assertEqual(msg_calls[1]["params"].get("page_token"), "tok2")

def test_passes_window(self):
self.t.respond("im/v1/messages", 200, {
"code": 0,
"data": {"items": [], "has_more": False, "page_token": ""},
})
list(self.c.iter_messages("oc_x", 1714492800, 1714579200))

first = [c for c in self.t.calls if "im/v1/messages" in c["url"]][0]
self.assertEqual(first["params"]["container_id"], "oc_x")
self.assertEqual(first["params"]["container_id_type"], "chat")
self.assertEqual(first["params"]["start_time"], "1714492800")
self.assertEqual(first["params"]["end_time"], "1714579200")
self.assertEqual(first["params"]["sort_type"], "ByCreateTimeAsc")
  • Step 2: 跑测试,确认失败
/usr/bin/python3 -m unittest tests.test_pagination -v

Expected: FAIL — iter_messages 未实现

  • Step 3: 实现 iter_messages

Add to FeishuClient in bot/feishu-bot/scripts/_feishu.py:

    def iter_messages(self, chat_id: str, start_ts: int, end_ts: int):
"""Yield raw message envelopes within [start_ts, end_ts), ascending."""
page_token = ""
while True:
params = {
"container_id_type": "chat",
"container_id": chat_id,
"start_time": str(start_ts),
"end_time": str(end_ts),
"page_size": "50",
"sort_type": "ByCreateTimeAsc",
}
if page_token:
params["page_token"] = page_token
body = self._request("GET", "/im/v1/messages", params=params)
data = body.get("data") or {}
for item in data.get("items") or []:
yield item
if not data.get("has_more"):
return
page_token = data.get("page_token") or ""
if not page_token:
return
  • Step 4: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_pagination -v

Expected: 3 tests PASS

  • Step 5: Commit
git add bot/feishu-bot/scripts/_feishu.py bot/feishu-bot/tests/test_pagination.py
git commit -m "feat(feishu-bot): iter_messages 分页拉取(时间升序)"

Task 9: get_chat_name + 群名缓存

Files:

  • Modify: bot/feishu-bot/scripts/_feishu.py
  • Modify: bot/feishu-bot/tests/test_client.py

Background: GET /open-apis/im/v1/chats/:chat_id{"code":0,"data":{"name":"...","chat_id":"oc_x"}}

share_chat resolver 也要用这个。一次脚本运行内 LRU 缓存即可。

  • Step 1: 加测试

Append to bot/feishu-bot/tests/test_client.py:

class TestChatName(unittest.TestCase):
def test_get_chat_name_cached(self):
t = FakeTransport()
t.respond("tenant_access_token", 200, {
"code": 0, "tenant_access_token": "t", "expire": 7200,
})
t.respond("im/v1/chats/oc_x", 200, {
"code": 0, "data": {"name": "Hermes 中文社区"},
})
c = FeishuClient("a", "b", transport=t)

n1 = c.get_chat_name("oc_x")
n2 = c.get_chat_name("oc_x")

self.assertEqual(n1, "Hermes 中文社区")
self.assertEqual(n2, "Hermes 中文社区")
chat_calls = [x for x in t.calls if "im/v1/chats/oc_x" in x["url"]]
self.assertEqual(len(chat_calls), 1) # cached
  • Step 2: 跑测试,确认失败
/usr/bin/python3 -m unittest tests.test_client.TestChatName -v

Expected: FAIL — get_chat_name 未实现

  • Step 3: 实现 get_chat_name

Add to FeishuClient __init__:

        self._chat_name_cache: dict[str, str] = {}

Add method:

    def get_chat_name(self, chat_id: str) -> str:
if chat_id in self._chat_name_cache:
return self._chat_name_cache[chat_id]
body = self._request("GET", f"/im/v1/chats/{chat_id}")
name = ((body.get("data") or {}).get("name") or "").strip()
self._chat_name_cache[chat_id] = name
return name
  • Step 4: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_client -v

Expected: 8 tests PASS

  • Step 5: Commit
git add bot/feishu-bot/scripts/_feishu.py bot/feishu-bot/tests/test_client.py
git commit -m "feat(feishu-bot): get_chat_name 带本次运行内缓存"

Task 10: 群标签解析 resolve_group_label

Files:

  • Modify: bot/feishu-bot/scripts/_feishu.py
  • Modify: bot/feishu-bot/tests/test_client.py

Background: 用户输入 FEISHU_GROUP_LABELS=oc_x=Hermes Agent 中文社区飞书群 1;oc_y=... 时按显式映射;缺失时按 FEISHU_CHAT_IDS 中顺序号补 "Hermes Agent 中文社区飞书群 N"。这是纯函数。

  • Step 1: 加测试

Append to bot/feishu-bot/tests/test_client.py:

from _feishu import resolve_group_label, parse_group_labels


class TestGroupLabels(unittest.TestCase):
def test_parse_empty(self):
self.assertEqual(parse_group_labels(""), {})
self.assertEqual(parse_group_labels(None), {})

def test_parse_single(self):
self.assertEqual(
parse_group_labels("oc_x=Hermes Agent 中文社区飞书群 1"),
{"oc_x": "Hermes Agent 中文社区飞书群 1"},
)

def test_parse_multi(self):
self.assertEqual(
parse_group_labels("oc_x=群A;oc_y=群B"),
{"oc_x": "群A", "oc_y": "群B"},
)

def test_parse_strips_whitespace(self):
self.assertEqual(
parse_group_labels(" oc_x = 群A ; oc_y = 群B "),
{"oc_x": "群A", "oc_y": "群B"},
)

def test_resolve_explicit(self):
labels = {"oc_x": "群A"}
self.assertEqual(
resolve_group_label("oc_x", chat_ids=["oc_x", "oc_y"], labels=labels),
"群A",
)

def test_resolve_default_by_index(self):
self.assertEqual(
resolve_group_label("oc_y", chat_ids=["oc_x", "oc_y"], labels={}),
"Hermes Agent 中文社区飞书群 2",
)

def test_resolve_unknown_chat_falls_back_to_id(self):
self.assertEqual(
resolve_group_label("oc_z", chat_ids=["oc_x", "oc_y"], labels={}),
"Hermes Agent 中文社区飞书群 oc_z",
)
  • Step 2: 跑测试,确认失败
/usr/bin/python3 -m unittest tests.test_client.TestGroupLabels -v

Expected: FAIL — parse_group_labels / resolve_group_label 未实现

  • Step 3: 实现两个函数

Add to bot/feishu-bot/scripts/_feishu.py:

def parse_group_labels(raw: str | None) -> dict[str, str]:
"""Parse FEISHU_GROUP_LABELS env: "oc_x=Label A;oc_y=Label B"."""
out: dict[str, str] = {}
if not raw:
return out
for pair in raw.split(";"):
pair = pair.strip()
if not pair or "=" not in pair:
continue
k, v = pair.split("=", 1)
k = k.strip()
v = v.strip()
if k and v:
out[k] = v
return out


def resolve_group_label(
chat_id: str, *, chat_ids: list[str], labels: dict[str, str]
) -> str:
"""Resolve display name for a Feishu chat in the daily digest.

Priority:
1. Explicit FEISHU_GROUP_LABELS mapping
2. "Hermes Agent 中文社区飞书群 <index>" using order in FEISHU_CHAT_IDS
3. "Hermes Agent 中文社区飞书群 <chat_id>" if not in FEISHU_CHAT_IDS
"""
if chat_id in labels:
return labels[chat_id]
try:
idx = chat_ids.index(chat_id) + 1
return f"Hermes Agent 中文社区飞书群 {idx}"
except ValueError:
return f"Hermes Agent 中文社区飞书群 {chat_id}"
  • Step 4: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_client -v

Expected: 15 tests PASS

  • Step 5: Commit
git add bot/feishu-bot/scripts/_feishu.py bot/feishu-bot/tests/test_client.py
git commit -m "feat(feishu-bot): 群标签解析(FEISHU_GROUP_LABELS + 顺序号兜底)"

Task 11: extract_day.pyday_bounds + 周末补跑

Files:

  • Create: bot/feishu-bot/scripts/extract_day.py
  • Create: bot/feishu-bot/tests/test_extract_day.py

Background: 这个任务只做日期处理逻辑,不打 API。把可单测的部分抽成纯函数。

  • Step 1: 写测试

Create bot/feishu-bot/tests/test_extract_day.py:

import datetime as dt
import sys
import unittest
from pathlib import Path
from zoneinfo import ZoneInfo

sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "scripts"))
from extract_day import day_bounds, expand_dates # noqa: E402

TZ = ZoneInfo("Asia/Shanghai")


class TestDayBounds(unittest.TestCase):
def test_one_day_window(self):
start, end = day_bounds("2026-04-30")
# 2026-04-30 00:00:00 Asia/Shanghai = 1714406400
self.assertEqual(start, 1714406400)
self.assertEqual(end, 1714492800)
self.assertEqual(end - start, 86400)


class TestExpandDates(unittest.TestCase):
def test_explicit_date_no_expansion(self):
# explicit date — never expanded even if Monday
self.assertEqual(
expand_dates(explicit_date="2026-04-27", today=dt.date(2026, 4, 27)),
["2026-04-27"],
)

def test_no_explicit_normal_day(self):
# today=2026-04-30 (Thursday) → just yesterday
self.assertEqual(
expand_dates(explicit_date=None, today=dt.date(2026, 4, 30)),
["2026-04-29"],
)

def test_no_explicit_monday_backfills_weekend(self):
# 2026-05-04 is a Monday — yesterday=Sun, 前天=Sat → 拉 Sat/Sun/Mon-1=Sun=...
# 与微信侧 wechat-bot/scripts/extract_day.py 行为对齐:周一跑时拉 Sat/Sun/Mon
self.assertEqual(
expand_dates(explicit_date=None, today=dt.date(2026, 5, 4)),
["2026-05-02", "2026-05-03", "2026-05-04"],
)
  • Step 2: 跑测试,确认失败
/usr/bin/python3 -m unittest tests.test_extract_day -v

Expected: FAIL — extract_day 不存在

  • Step 3: 写 extract_day.py 骨架

Create bot/feishu-bot/scripts/extract_day.py:

#!/usr/bin/env python3
"""Pull one day of Feishu group messages → bot/feishu-bot/data/daily/<date>.feishu.json.

Schema is aligned with bot/wechat-bot/data/daily/<date>.json so generate_report.py
can merge them by simply concatenating the `groups` list.

Usage:
python3 scripts/extract_day.py # yesterday (Asia/Shanghai)
python3 scripts/extract_day.py 2026-04-30 # explicit date
python3 scripts/extract_day.py --dry-run # don't write file
python3 scripts/extract_day.py --no-overwrite # exit if file exists
"""
from __future__ import annotations

import argparse
import datetime as dt
import json
import os
import sys
from pathlib import Path
from zoneinfo import ZoneInfo

from dotenv import load_dotenv

# Local imports (scripts dir on sys.path via the same trick as tests use)
sys.path.insert(0, str(Path(__file__).resolve().parent))
from _feishu import ( # noqa: E402
FeishuClient,
decode_message,
parse_group_labels,
resolve_group_label,
)

ROOT = Path(__file__).resolve().parent.parent
OUT_DIR = ROOT / "data/daily"
TZ = ZoneInfo("Asia/Shanghai")


def day_bounds(date_str: str) -> tuple[int, int]:
"""[start, end) unix seconds for the given YYYY-MM-DD in Asia/Shanghai."""
d = dt.datetime.strptime(date_str, "%Y-%m-%d").replace(tzinfo=TZ)
return int(d.timestamp()), int((d + dt.timedelta(days=1)).timestamp())


def expand_dates(*, explicit_date: str | None, today: dt.date) -> list[str]:
"""Decide which dates to extract this run.

Behavior matches bot/wechat-bot/scripts/extract_day.py:
- explicit_date given → just that date
- no explicit + today is Monday → backfill Sat/Sun/Mon
- otherwise → just yesterday
"""
if explicit_date:
return [explicit_date]
if today.weekday() == 0: # Monday
return [
(today - dt.timedelta(days=2)).strftime("%Y-%m-%d"),
(today - dt.timedelta(days=1)).strftime("%Y-%m-%d"),
today.strftime("%Y-%m-%d"),
]
return [(today - dt.timedelta(days=1)).strftime("%Y-%m-%d")]


def main(): # pragma: no cover — CLI; covered by manual smoke
ap = argparse.ArgumentParser()
ap.add_argument("date", nargs="?", help="YYYY-MM-DD (default: yesterday)")
ap.add_argument("--dry-run", action="store_true")
ap.add_argument("--no-overwrite", action="store_true")
args = ap.parse_args()

today = dt.datetime.now(TZ).date()
dates = expand_dates(explicit_date=args.date, today=today)
print(f"[*] dates to extract: {dates}")
# Real implementation in next task.


if __name__ == "__main__": # pragma: no cover
main()

注意 main() 只是占位,下个任务才补完。

  • Step 4: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_extract_day -v

Expected: 4 tests PASS(注意:expand_dates 周一情形会返回三天,跟微信侧一致 — 当天传日期时会被显式分支拦截)

  • Step 5: Commit
git add bot/feishu-bot/scripts/extract_day.py bot/feishu-bot/tests/test_extract_day.py
git commit -m "feat(feishu-bot): extract_day 骨架 + day_bounds / expand_dates 单测"

Task 12: extract_day.py — 主流程 + JSON 写盘

Files:

  • Modify: bot/feishu-bot/scripts/extract_day.py

Background:_feishu 的客户端拼起来:每个 chat_id → iter_messagesdecode_message(带 share resolver)→ 收集到列表 → 写文件。

这一步不写新单测,因为是粘合层;通过端到端手测在 Task 17 验证。

  • Step 1: 实现 _extract_one_day + _write_daily_json

Replace main() and add helpers in bot/feishu-bot/scripts/extract_day.py:

def _make_share_resolver(client: FeishuClient):
"""Build a resolver(kind, ref_id) -> name for share messages."""
def resolver(kind: str, ref_id: str) -> str | None:
if kind == "chat":
try:
return client.get_chat_name(ref_id) or None
except Exception:
return None
# share_user resolution would need contact API; skip for v1
return None
return resolver


def _extract_one_day(
client: FeishuClient,
date_str: str,
chat_ids: list[str],
labels: dict[str, str],
) -> dict:
start, end = day_bounds(date_str)
resolver = _make_share_resolver(client)

groups = []
for chat_id in chat_ids:
chat_name = ""
try:
chat_name = client.get_chat_name(chat_id)
except Exception as e:
print(f" ⚠️ get_chat_name({chat_id}) failed: {e}", file=sys.stderr)

messages = []
for raw in client.iter_messages(chat_id, start, end):
decoded = decode_message(raw, resolver=resolver)
if decoded is not None:
messages.append(decoded)

groups.append({
"group_id": chat_id,
"group_name": resolve_group_label(chat_id, chat_ids=chat_ids, labels=labels),
"platform": "feishu",
"chat_name": chat_name,
"message_count": len(messages),
"messages": messages,
})

return {
"date": date_str,
"tz": "Asia/Shanghai",
"platform": "feishu",
"window_start": start,
"window_end": end,
"groups": groups,
}


def _write_daily_json(data: dict, out_path: Path, no_overwrite: bool) -> None:
if out_path.exists() and no_overwrite:
print(f"[-] {out_path} exists and --no-overwrite given; skipping.", file=sys.stderr)
sys.exit(2)
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text(json.dumps(data, ensure_ascii=False, indent=2), encoding="utf-8")
total = sum(g["message_count"] for g in data["groups"])
print(f"[{data['date']}] {total} messages across {len(data['groups'])} groups -> {out_path}")


def main():
ap = argparse.ArgumentParser()
ap.add_argument("date", nargs="?", help="YYYY-MM-DD (default: yesterday)")
ap.add_argument("--dry-run", action="store_true")
ap.add_argument("--no-overwrite", action="store_true")
args = ap.parse_args()

load_dotenv(ROOT / ".env")
app_id = os.environ.get("FEISHU_APP_ID")
app_secret = os.environ.get("FEISHU_APP_SECRET")
chat_ids_raw = os.environ.get("FEISHU_CHAT_IDS") or ""
labels = parse_group_labels(os.environ.get("FEISHU_GROUP_LABELS"))

if not app_id or not app_secret:
print("[-] FEISHU_APP_ID / FEISHU_APP_SECRET missing in .env", file=sys.stderr)
sys.exit(1)
chat_ids = [c.strip() for c in chat_ids_raw.split(",") if c.strip()]
if not chat_ids:
print("[-] FEISHU_CHAT_IDS empty in .env", file=sys.stderr)
sys.exit(1)

today = dt.datetime.now(TZ).date()
dates = expand_dates(explicit_date=args.date, today=today)
print(f"[*] dates to extract: {dates}")
print(f"[*] chats: {chat_ids}")

client = FeishuClient(app_id, app_secret)

for date_str in dates:
data = _extract_one_day(client, date_str, chat_ids, labels)
if args.dry_run:
print(f"[dry-run] would write {OUT_DIR / f'{date_str}.feishu.json'}")
continue
_write_daily_json(data, OUT_DIR / f"{date_str}.feishu.json", args.no_overwrite)
  • Step 2: 重跑全套测试,确认没破坏
cd bot/feishu-bot
/usr/bin/python3 -m unittest discover tests -v

Expected: 全部 PASS(共 ~19 tests)

  • Step 3: Commit
git add bot/feishu-bot/scripts/extract_day.py
git commit -m "feat(feishu-bot): extract_day 主流程 + JSON 写盘"

Task 13: inventory.py — 排查工具

Files:

  • Create: bot/feishu-bot/scripts/inventory.py

Background: 仿照 bot/wechat-bot/scripts/inventory.py,列机器人在的群 + 最近 7 日每群消息量。这是排查脚本,不写单测,端到端手测就够。

  • Step 1: 写脚本

Create bot/feishu-bot/scripts/inventory.py:

#!/usr/bin/env python3
"""List groups the bot belongs to + 7-day message counts.

Usage:
python3 scripts/inventory.py # default: print to stdout
"""
from __future__ import annotations

import datetime as dt
import os
import sys
from pathlib import Path
from zoneinfo import ZoneInfo

from dotenv import load_dotenv

sys.path.insert(0, str(Path(__file__).resolve().parent))
from _feishu import FeishuClient # noqa: E402

ROOT = Path(__file__).resolve().parent.parent
TZ = ZoneInfo("Asia/Shanghai")


def main():
load_dotenv(ROOT / ".env")
app_id = os.environ.get("FEISHU_APP_ID")
app_secret = os.environ.get("FEISHU_APP_SECRET")
if not app_id or not app_secret:
print("[-] FEISHU_APP_ID / FEISHU_APP_SECRET missing in .env", file=sys.stderr)
sys.exit(1)

client = FeishuClient(app_id, app_secret)

# List the chats the bot is in.
body = client._request("GET", "/im/v1/chats", params={"page_size": "50"})
chats = (body.get("data") or {}).get("items") or []
if not chats:
print("Bot is in 0 chats. 把机器人 @ 拉进群之后再跑。")
return

now = dt.datetime.now(TZ)
seven_days_ago = now - dt.timedelta(days=7)
start = int(seven_days_ago.timestamp())
end = int(now.timestamp())

print(f"机器人在 {len(chats)} 个群中:")
for c in chats:
chat_id = c.get("chat_id") or ""
name = c.get("name") or "(无名)"
# Cheap count — pull all 7 days, just count.
try:
count = sum(1 for _ in client.iter_messages(chat_id, start, end))
except Exception as e:
print(f" - {name} ({chat_id}): error {e}")
continue
print(f" - {name} ({chat_id}): {count} 条 / 近 7 天")


if __name__ == "__main__":
main()
  • Step 2: 跑一下让 import 检查通过
cd bot/feishu-bot
/usr/bin/python3 scripts/inventory.py --help 2>&1 | head -5

Expected: argparse 没用到,命令直接进 main 但缺凭证会退出 — 这只是验证 import 不挂。预期看到 "FEISHU_APP_ID / FEISHU_APP_SECRET missing"。

  • Step 3: Commit
git add bot/feishu-bot/scripts/inventory.py
git commit -m "feat(feishu-bot): inventory 排查脚本(列机器人在的群 + 7 日消息量)"

Task 14: 修改 generate_report.py_load_daily()

Files:

  • Modify: bot/wechat-bot/scripts/generate_report.py
  • Create: bot/wechat-bot/tests/__init__.py
  • Create: bot/wechat-bot/tests/test_load_daily.py

Background: 唯一改动既有代码的点。在 _run_single_day 顶部,把 "读 daily JSON" 这一步抽成 _load_daily(date_str) -> dict | None,并让它读两个文件、串接 groups

  • Step 1: 写测试

Create bot/wechat-bot/tests/__init__.py(空文件)。

Create bot/wechat-bot/tests/test_load_daily.py:

import json
import sys
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch

sys.path.insert(0, str(Path(__file__).resolve().parent.parent / "scripts"))
import generate_report as gr # noqa: E402


class TestLoadDaily(unittest.TestCase):
def setUp(self):
# Build a temp tree mirroring bot/{wechat-bot,feishu-bot}/data/daily
self.tmp = tempfile.TemporaryDirectory()
self.bot_dir = Path(self.tmp.name) / "bot"
self.wechat_daily = self.bot_dir / "wechat-bot" / "data" / "daily"
self.feishu_daily = self.bot_dir / "feishu-bot" / "data" / "daily"
self.wechat_daily.mkdir(parents=True)
self.feishu_daily.mkdir(parents=True)

# Patch DAILY_DIR + ROOT to point inside the temp tree.
# ROOT in generate_report is bot/wechat-bot.
self._patches = [
patch.object(gr, "ROOT", self.bot_dir / "wechat-bot"),
patch.object(gr, "DAILY_DIR", self.wechat_daily),
]
for p in self._patches:
p.start()

def tearDown(self):
for p in self._patches:
p.stop()
self.tmp.cleanup()

def _write(self, path: Path, data: dict):
path.write_text(json.dumps(data, ensure_ascii=False), encoding="utf-8")

def test_neither_file_returns_none(self):
self.assertIsNone(gr._load_daily("2026-04-30"))

def test_only_wechat(self):
self._write(self.wechat_daily / "2026-04-30.json", {
"date": "2026-04-30", "tz": "Asia/Shanghai",
"groups": [{"group_id": "wx1", "group_name": "Hermes Agent 中文社区 1",
"message_count": 10, "messages": []}],
})
data = gr._load_daily("2026-04-30")
self.assertIsNotNone(data)
self.assertEqual(len(data["groups"]), 1)
self.assertEqual(data["groups"][0]["group_id"], "wx1")

def test_only_feishu(self):
self._write(self.feishu_daily / "2026-04-30.feishu.json", {
"date": "2026-04-30", "tz": "Asia/Shanghai", "platform": "feishu",
"groups": [{"group_id": "oc_x", "group_name": "Hermes Agent 中文社区飞书群 1",
"platform": "feishu", "message_count": 5, "messages": []}],
})
data = gr._load_daily("2026-04-30")
self.assertIsNotNone(data)
self.assertEqual(len(data["groups"]), 1)
self.assertEqual(data["groups"][0]["platform"], "feishu")

def test_both_files_concatenate_groups(self):
self._write(self.wechat_daily / "2026-04-30.json", {
"date": "2026-04-30", "tz": "Asia/Shanghai",
"groups": [
{"group_id": "wx1", "group_name": "Hermes Agent 中文社区 1",
"message_count": 10, "messages": []},
{"group_id": "wx2", "group_name": "Hermes Agent 中文社区 2",
"message_count": 7, "messages": []},
],
})
self._write(self.feishu_daily / "2026-04-30.feishu.json", {
"date": "2026-04-30", "tz": "Asia/Shanghai", "platform": "feishu",
"groups": [{"group_id": "oc_x", "group_name": "Hermes Agent 中文社区飞书群 1",
"platform": "feishu", "message_count": 5, "messages": []}],
})
data = gr._load_daily("2026-04-30")
self.assertEqual(len(data["groups"]), 3)
ids = [g["group_id"] for g in data["groups"]]
self.assertEqual(ids, ["wx1", "wx2", "oc_x"])


if __name__ == "__main__":
unittest.main()
  • Step 2: 跑测试,确认失败
cd bot/wechat-bot
/usr/bin/python3 -m unittest tests.test_load_daily -v

Expected: FAIL — _load_daily 不存在

  • Step 3: 实现 _load_daily

Modify bot/wechat-bot/scripts/generate_report.py:

Add after the existing imports + constants block (right after DEFAULT_BASE_URL), new function:

def _load_daily(date_str: str) -> dict | None:
"""Merge WeChat + Feishu daily extracts for one date into a single payload.

Reads:
- bot/wechat-bot/data/daily/<date>.json (WeChat side)
- bot/feishu-bot/data/daily/<date>.feishu.json (Feishu side)

Returns None if neither exists. Otherwise concatenates `groups` lists from
each file in (wechat, feishu) order. Top-level metadata (date, tz, …) comes
from whichever file is present first.

Schema is identical between sides: see
docs/superpowers/specs/2026-05-01-feishu-group-extraction-design.md §5.1.
"""
wechat_path = DAILY_DIR / f"{date_str}.json"
feishu_path = ROOT.parent / "feishu-bot" / "data" / "daily" / f"{date_str}.feishu.json"

parts: list[dict] = []
for p in (wechat_path, feishu_path):
if not p.exists():
continue
try:
parts.append(json.loads(p.read_text(encoding="utf-8")))
except json.JSONDecodeError:
print(f"[-] {p} is not valid JSON, skipping", file=sys.stderr)

if not parts:
return None

base = dict(parts[0])
base["groups"] = []
for p in parts:
base["groups"].extend(p.get("groups", []))
return base

Modify _run_single_day to use it:

Find:

def _run_single_day(args, date_str: str):
daily_path = DAILY_DIR / f"{date_str}.json"
if not daily_path.exists():
print(f"[-] {daily_path} not found. Run scripts/extract_day.py {date_str} first.", file=sys.stderr)
return

data = json.loads(daily_path.read_text(encoding="utf-8"))

Replace with:

def _run_single_day(args, date_str: str):
data = _load_daily(date_str)
if data is None:
wechat_path = DAILY_DIR / f"{date_str}.json"
feishu_path = ROOT.parent / "feishu-bot" / "data" / "daily" / f"{date_str}.feishu.json"
print(
f"[-] No daily data for {date_str}. Looked at:\n"
f" {wechat_path}\n"
f" {feishu_path}\n"
f" Run extract_day.py first.",
file=sys.stderr,
)
return
  • Step 4: 跑测试,确认通过
/usr/bin/python3 -m unittest tests.test_load_daily -v

Expected: 4 tests PASS

  • Step 5: 跑微信侧整体冒烟(如果有 vendor/decrypted 数据)
# 不打 LLM,只走数据加载分支
/usr/bin/python3 scripts/generate_report.py 2026-04-30 --dry-run

Expected: 输出 "[*] 2026-04-30: N groups pass threshold",N 与现有数据一致;如果 2026-04-30 这天文件都不存在,输出新的 "Looked at: ..." 错误信息 — 不报错退出码非 0 不算坏。

  • Step 6: Commit
git add bot/wechat-bot/scripts/generate_report.py \
bot/wechat-bot/tests/__init__.py \
bot/wechat-bot/tests/test_load_daily.py
git commit -m "feat(generate_report): _load_daily 合并微信 + 飞书每日抽取"

Task 15: README + CLAUDE.md

Files:

  • Create: bot/feishu-bot/README.md

  • Create: bot/feishu-bot/CLAUDE.md

  • Step 1: 写 README.md

Create bot/feishu-bot/README.md:

# feishu-bot

把 Hermes Agent 飞书群每日消息批量拉下来,落到 `bot/feishu-bot/data/daily/<date>.feishu.json`。下游 `bot/wechat-bot/scripts/generate_report.py` 会把这份 JSON 与微信侧 `<date>.json` 合并,最终生成单份日报。

## 一次性配置

1. 飞书开放平台 → 创建企业自建应用
2. 启用机器人能力,申请权限 scope:
- `im:message:readonly`
- `im:chat:readonly`
- `im:chat.member:read`
- `contact:user.id:readonly`
3. 提交审核(仅团队内)
4. 复制 `app_id` / `app_secret``bot/feishu-bot/.env`(参考 `.env.example`
5. 在飞书群里 `@机器人` 把它拉进群
6. 拿群的 `chat_id` 写入 `FEISHU_CHAT_IDS`
```bash
/usr/bin/python3 bot/feishu-bot/scripts/inventory.py
```

## 每日运行

```bash
# 默认拉昨天(Asia/Shanghai)
/usr/bin/python3 bot/feishu-bot/scripts/extract_day.py

# 指定日期
/usr/bin/python3 bot/feishu-bot/scripts/extract_day.py 2026-04-30

# 不写盘,只看会拉哪些
/usr/bin/python3 bot/feishu-bot/scripts/extract_day.py --dry-run

# 已存在则退出
/usr/bin/python3 bot/feishu-bot/scripts/extract_day.py --no-overwrite
```

跑完后接现有 generate_report:

```bash
cd bot/wechat-bot
/usr/bin/python3 scripts/generate_report.py 2026-04-30
```

`generate_report.py` 会同时读:

- `bot/wechat-bot/data/daily/2026-04-30.json`
- `bot/feishu-bot/data/daily/2026-04-30.feishu.json`

输出还是单份 `bot/wechat-bot/data/reports/<model>/2026-04-30.detailed.md`

## 周末

跟微信侧对齐:周末不出日报。周一不传日期跑 `extract_day.py`,会自动补 Sat/Sun/Mon 三天。

## 测试

```bash
cd bot/feishu-bot
/usr/bin/python3 -m unittest discover tests -v
```

不打活的飞书 API;都走离线 fixture 和 in-memory transport。
  • Step 2: 写 CLAUDE.md

Create bot/feishu-bot/CLAUDE.md:

# CLAUDE.md

This file provides guidance to Claude Code when working in `bot/feishu-bot/`.

## What this is

第二个信息源接入:把飞书群的一天消息批量拉下来,落到 `data/daily/<date>.feishu.json`,schema 与 `bot/wechat-bot/data/daily/<date>.json` 严格对齐。下游 `bot/wechat-bot/scripts/generate_report.py` 会合并两侧,**没有独立的报告生成、prompt、海报**

设计稿:`docs/superpowers/specs/2026-05-01-feishu-group-extraction-design.md`

## 关键文件

- `scripts/_feishu.py` — single source of truth:
- `FeishuClient`:tenant_access_token、5xx/429 退避、token 过期刷新
- `iter_messages(chat_id, start_ts, end_ts)`:分页升序
- `decode_message(raw, resolver=None)`:text / post / share_chat / share_user / file 解码;其它类型返 `None`
- `parse_group_labels(raw)` / `resolve_group_label(...)`:按 `FEISHU_GROUP_LABELS` 或顺序号补群名
- `scripts/extract_day.py` — CLI;`day_bounds``expand_dates` 是纯函数,主流程不单测
- `scripts/inventory.py` — 排查:列机器人在的群 + 7 日消息量

## 与微信侧的接缝

`generate_report.py:_load_daily(date_str)` 读两边 JSON,串接 `groups` 列表。schema 一致是这步零适配的基础——任何 schema 漂移都会破坏合并:

- `groups`**列表**
- 每条 message 字段名:`ts` / `time` / `sender_wxid` / `sender_name` / `text`
- 飞书消息把 `open_id` 存到 `sender_wxid` key 下;`sender_name` 留空字符串

## 测试边界

- 解码器 / 客户端 / 分页 / 日期边界 / 群标签解析 — 全单测
- `inventory.py` 主流程 + `extract_day.py` 主流程 — 不单测,靠手动冒烟(要打活的 API)

## 风险点

- 飞书群名变更 → 用 `FEISHU_GROUP_LABELS` 显式映射兜底
- 机器人被踢 → API 403,extract 报错并退出码非 0
- token 不写盘缓存:每次运行重新拿,避免凭证落地
- prune_report.py 的 dedupe key 必须与 generate_report.py 同步(本次未改,但要点记住)
  • Step 3: Commit
git add bot/feishu-bot/README.md bot/feishu-bot/CLAUDE.md
git commit -m "docs(feishu-bot): README + CLAUDE.md"

Task 16: 全套测试 + 仓库根 .gitignore 兜底

Files:

  • Modify: .gitignore(仓库根)— 仅在确认未覆盖时

Background: 仓库根 .gitignore 应该已经覆盖 data/ 等。这一步只是兜底确认 — 检查 bot/feishu-bot/data/.env 不会被 commit。

  • Step 1: 检查根 .gitignore 覆盖情况
cd /Users/claw/Documents/GithubProjects/hermes-cn-v1
cat .gitignore | head -40
git check-ignore bot/feishu-bot/data/daily/test.feishu.json bot/feishu-bot/.env

Expected output (if covered): 两个路径都被 git check-ignore 列出。

  • Step 2: 如果根 .gitignore 没覆盖,加规则

只在上一步命令对 bot/feishu-bot/.env 没输出时才执行:

# Append (尾部,不破坏现有规则)
cat >> .gitignore <<'EOF'

# Feishu bot
bot/feishu-bot/data/
bot/feishu-bot/.env
bot/feishu-bot/.env.*
!bot/feishu-bot/.env.example
EOF

bot/feishu-bot/.gitignore(Task 1 创建的)已经覆盖了同一作用域,但根级再写一遍是双保险,避免有人在仓库根 git add . 时绕过子目录 .gitignore(实际上 git 会读 nested .gitignore,所以这步多半不必要——只在 Step 1 显示未覆盖时做)。

  • Step 3: 跑两侧全套单测
cd bot/feishu-bot
/usr/bin/python3 -m unittest discover tests -v

cd ../wechat-bot
/usr/bin/python3 -m unittest discover tests -v

Expected: 飞书侧 ~19 tests PASS;微信侧 4 tests PASS(test_load_daily)

  • Step 4: Commit(仅当 Step 2 改了根 .gitignore)
git add .gitignore
git commit -m "chore: 根 .gitignore 兜底覆盖 bot/feishu-bot/{data,.env}"

Task 17: 端到端手动验证

前置:Pre-conditions 三步已完成(机器人创建、权限审批、入群、.env 已填好真实 app_id / app_secret / chat_id)。

这一步不能自动化——必须打活的飞书 API,用户决定何时跑。

  • Step 1: inventory 验证连通
cd bot/feishu-bot
/usr/bin/python3 scripts/inventory.py

Expected: 列出至少 1 个群,群名能被正确读出。如果 0 个 → 机器人没入群;如果 401/403 → app_secret 错或权限没批通过。

  • Step 2: 单日抽取

挑昨天作为目标日(脚本默认就是昨天):

/usr/bin/python3 scripts/extract_day.py

Expected:

  • stdout 出现 [YYYY-MM-DD] N messages across 1 groups -> /...
  • data/daily/YYYY-MM-DD.feishu.json 存在
  • 打开看:groups 是 list,第一个 group 有 platform: "feishu"message_count > 0(如果当天有消息)

如果 message_count = 0 但群里有消息:im:message:readonly 没生效 / scope 没批 / 时间窗外。

  • Step 3: dry-run + no-overwrite
/usr/bin/python3 scripts/extract_day.py --dry-run
/usr/bin/python3 scripts/extract_day.py --no-overwrite # 应该 exit 2

Expected: 第一条不写盘只打印;第二条 exit code = 2,stderr 有 "exists and --no-overwrite given"。

  • Step 4: 接 generate_report 跑一天 dry-run
cd ../wechat-bot
/usr/bin/python3 scripts/generate_report.py YYYY-MM-DD --dry-run

Expected: stdout 列出 N+M 个群(N 微信 + M 飞书),所有飞书群名以 "Hermes Agent 中文社区飞书群" 开头。

  • Step 5: 完整跑一遍(消耗 LLM 配额)
/usr/bin/python3 scripts/generate_report.py YYYY-MM-DD

Expected:

  • bot/wechat-bot/data/reports/<model>/YYYY-MM-DD.detailed.md 包含飞书群来源条目

  • 来源标签里飞书群显示为 "Hermes Agent 中文社区飞书群 1",微信群显示为 "Hermes Agent 中文社区微信群 N"

  • 如果某话题在两个平台都讨论过,dedupe 后 **来源**:Hermes Agent 中文社区微信群 3 / Hermes Agent 中文社区飞书群 1

  • Step 6: 出海报验证下游

cd ../..
pnpm wechat-summary:render -- YYYY-MM-DD

Expected: bot/wechat-summary-bot/output/YYYY-MM-DD.png 包含飞书群条目;视觉上没破。

  • Step 7: 验收

确认上述全部 OK 即可关掉端到端任务。这一步不 commit——验证通过即结束。

如有失败:根据失败位置回到对应 Task(解码错 → Task 3-5;分页错 → Task 8;合并错 → Task 14)补单测复现 + 修复。


Self-Review

设计稿 (docs/superpowers/specs/2026-05-01-feishu-group-extraction-design.md) 各章节覆盖检查:

  • §3 整体架构 → Task 11-12(extract_day)+ Task 14(generate_report 改造)
  • §4 飞书自建应用 & 权限 → Pre-conditions(人工)
  • §5.1 schema → Task 12 输出格式 + Task 14 测试断言
  • §5.2 实现要点 → Task 6-9 客户端 + Task 2-5 解码器 + Task 11 day_bounds + Task 12 主流程 + Task 11 周末补跑
  • §6.1 _load_daily → Task 14
  • §6.2 _display_source 不动 → 隐含;不需要任务(不动既有正则)
  • §6.3 LLM 输入预处理保持现状 → 不需要任务
  • §6.4 dedupe 不动 → 不需要任务
  • §7 目录结构 → Task 1, 13, 15
  • §8 配置 & 凭证 → Task 1(.env.example)+ Task 16(gitignore 兜底)
  • §9 错误处理 & 幂等性 → Task 7(重试)+ Task 12(--no-overwrite)+ Task 4-5(未知类型 drop)
  • §10 测试 → Task 2-11、Task 14
  • §11 风险与权衡 → README/CLAUDE.md(Task 15)

placeholder 扫描:无 TBD / TODO / "适当的错误处理"。

类型一致性:decode_message 返回值字段名(ts / time / sender_wxid / sender_name / type / text)在 Task 2 定义后,Task 3-5、12 全部一致使用。FeishuClient 方法签名(_get_token / _request / iter_messages / get_chat_name)在 Task 6-9 定义后,Task 12-13 调用一致。

scope 检查:本计划是单一实现计划,三个 logical phases(解码器、客户端、CLI/集成)顺序执行,无独立子项目。

Plan ready.