LiveKit Agent Framework 学習ロードマップ
目標: LiveKitのことなら俺に聞いてくれ状態のスペシャリストになるためのガチ学習。 環境前提: ローカルPython → LiveKit Cloud(Langfuse observability)
ドキュメント参照元: LiveKit Agents Documentation
設計思想
概要 → 詳細のブレイクダウン形式。各Phaseで「大きな絵」を掴んでから細部へ。
Phase 0: 基礎概念 → LiveKitとは何か
Phase 1: 概念モデル → Room/Participant/Track/Agent
Phase 2: 全体像 → Agentsフレームワークの俯瞰
Phase 3: インフラ層 → Worker/Job/Dispatchの仕組み
Phase 4: 実装層 → Session/Logic/Tools/Nodesのコア
Phase 5: モデル層 → STT/LLM/TTS/Realtimeの選択と設定
Phase 6: 本番運用 → Deploy/Observability/Langfuse
Phase 7: フロント層 → RoomIO/カメラ/画面共有/ByteStreams(実装時に参照)
凡例
- ⬜ 未着手
- 🔄 進行中
- ✅ 完了
Phase 0: LiveKit基礎概念(概要を掴む)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | LiveKit Basics Overview | https://docs.livekit.io/intro/basics/ | LiveKit Basics Overview |
Phase 1: 概念モデル(コアの構成要素)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | Rooms, Participants, and Tracks | https://docs.livekit.io/intro/basics/rooms-participants-tracks/ | Rooms, Participants, and Tracks |
Phase 2: Agentsフレームワーク全体像(俯瞰)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | Agents Framework Introduction | https://docs.livekit.io/agents/ | Agents Framework Introduction |
| ✅ | Voice AI Quickstart | https://docs.livekit.io/agents/start/voice-ai-quickstart/ | Voice AI Quickstart |
| ✅ | Multimodality Overview | https://docs.livekit.io/agents/multimodality/ | - |
| ✅ | Text and Transcriptions | https://docs.livekit.io/agents/multimodality/text/ | Text and Transcriptions |
| ✅ | Agent Speech and Audio | https://docs.livekit.io/agents/multimodality/audio/ | Agent Speech and Audio |
| ✅ | Vision | https://docs.livekit.io/agents/multimodality/vision/ | Vision |
Phase 3: Agent Server(インフラ層)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | Server Lifecycle | https://docs.livekit.io/agents/server/lifecycle/ | Server Lifecycle |
| ✅ | Job Lifecycle | https://docs.livekit.io/agents/server/job/ | Job Lifecycle |
| ✅ | Agent Dispatch | https://docs.livekit.io/agents/server/agent-dispatch/ | Agent Dispatch |
| ✅ | Server Startup Modes | https://docs.livekit.io/agents/server/startup-modes/ | Server Startup Modes |
Phase 4: Logic & Structure(コア実装層)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | Logic Overview | https://docs.livekit.io/agents/logic/ | Logic and Structure Overview |
| ✅ | Agent Session | https://docs.livekit.io/agents/logic/sessions/ | Agent Session |
| ✅ | Workflows | https://docs.livekit.io/agents/logic/workflows/ | Workflows |
| ✅ | Agents & Handoffs | https://docs.livekit.io/agents/logic/agents-handoffs/ | Agents and handoffs |
| ✅ | Tasks & Task Groups(Python only) | https://docs.livekit.io/agents/logic/tasks/ | Tasks and Task Groups |
| ⬜ | Tool Definition & Use | https://docs.livekit.io/agents/logic/tools/ | - |
| ⬜ | Turn Detection & Interruptions | https://docs.livekit.io/agents/logic/turns/ | - |
| ⬜ | Pipeline Nodes & Hooks | https://docs.livekit.io/agents/logic/nodes/ | - |
| ⬜ | External Data & RAG | https://docs.livekit.io/agents/logic/external-data/ | - |
Phase 4.5: Sample Code Deep Dive(実装読解)
Agent設計の理解をコードへ落とし込むため、実運用に近いサンプルを段階的に読む。
| ステータス | サンプル | URL | ノート |
|---|---|---|---|
| ⬜ | Drive-thru sample | https://github.com/livekit/agents/blob/main/examples/drive-thru/agent.py | - |
| ⬜ | Front-desk sample | https://github.com/livekit/agents/blob/main/examples/frontdesk/frontdesk_agent.py | - |
| ⬜ | Medical Office sample | https://github.com/livekit-examples/python-agents-examples/blob/main/complex-agents/medical_office_triage/triage.py | - |
| ⬜ | Restaurant sample | https://github.com/livekit/agents/blob/main/examples/voice_agents/restaurant_agent.py | - |
| ⬜ | Prebuilt Tasks Deep Dive | https://docs.livekit.io/agents/prebuilt/tasks/ | - |
| ⬜ | Prebuilt Tools Deep Dive | https://docs.livekit.io/agents/prebuilt/tools/ | - |
Phase 5: Models(モデル選択・設定)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ⬜ | Models Overview | https://docs.livekit.io/agents/models/ | - |
| ⬜ | STT Models | https://docs.livekit.io/agents/models/stt/ | - |
| ⬜ | LLM Models | https://docs.livekit.io/agents/models/llm/ | - |
| ⬜ | TTS Models | https://docs.livekit.io/agents/models/tts/ | - |
| ⬜ | Realtime Models | https://docs.livekit.io/agents/models/realtime/ | - |
Phase 6: Deploy & Observe(本番運用)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ⬜ | Agent Deployment Overview | https://docs.livekit.io/deploy/agents/ | - |
| ⬜ | Observability Overview | https://docs.livekit.io/deploy/observability/ | - |
| ⬜ | Agent Insights(LiveKit Cloud) | https://docs.livekit.io/deploy/observability/insights/ | - |
| ⬜ | Data Hooks & OpenTelemetry(Langfuse) | https://docs.livekit.io/deploy/observability/data/ | - |
Phase 7: フロントエンド実装(Transport 層)
エージェント実装後、実際にフロントと繋ぐ際に必要な知識。
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ⬜ | RoomIO Overview | https://docs.livekit.io/home/client/tracks/ | - |
| ⬜ | Camera and Microphone | https://docs.livekit.io/home/client/tracks/camera-microphone/ | - |
| ⬜ | Screen Sharing | https://docs.livekit.io/home/client/tracks/screenshare/ | - |
| ⬜ | Byte Streams | https://docs.livekit.io/home/client/data/byte-streams/ | - |
次にやること
Phase 4(Logic & Structure コア実装層)へ移行。
次のドキュメント: Tool Definition & Use → https://docs.livekit.io/agents/logic/tools/
作成済みノート一覧
| 作成日 | フェーズ | ノートタイプ | タイトル |
|---|---|---|---|
| 2026-02-28 | - | SourceNote | LiveKit Agents Documentation |
| 2026-02-28 | - | StructureNote | このノート |
| 2026-02-28 | Phase 0 | LiteratureNote | LiveKit Basics Overview |
| 2026-02-28 | Phase 1 | LiteratureNote | Rooms, Participants, and Tracks |
| 2026-02-28 | Phase 1 | LiteratureNote | Webhooks and Events |
| 2026-02-28 | Phase 2 | LiteratureNote | Agents Framework Introduction |
| 2026-03-01 | Phase 2 | LiteratureNote | Voice AI Quickstart |
| 2026-03-01 | Phase 2 | LiteratureNote | Text and Transcriptions |
| 2026-03-01 | Phase 2 | LiteratureNote | Agent Speech and Audio |
| 2026-03-02 | Phase 2 | LiteratureNote | Vision |
| 2026-03-03 | Phase 3 | LiteratureNote | Server Lifecycle |
| 2026-03-05 | Phase 3 | LiteratureNote | Job Lifecycle |
| 2026-03-05 | Phase 3 | LiteratureNote | Agent Dispatch |
| 2026-03-06 | Phase 3 | LiteratureNote | Server Startup Modes |
| 2026-03-07 | Phase 4 | LiteratureNote | Logic and Structure Overview |
| 2026-03-07 | Phase 4 | LiteratureNote | Agent Session |
| 2026-03-07 | Phase 4 | LiteratureNote | RoomIO (Agent Session Context) |
| 2026-03-08 | Phase 4 | LiteratureNote | Workflows |
| 2026-03-09 | Phase 4 | LiteratureNote | Agents and handoffs |
| 2026-03-10 | Phase 4 | LiteratureNote | Tasks and Task Groups |