LiveKit Agent Framework 学習ロードマップ
目標: LiveKitのことなら俺に聞いてくれ状態のスペシャリストになるためのガチ学習。 環境前提: ローカルPython → LiveKit Cloud(Langfuse observability)
ドキュメント参照元: LiveKit Agents Documentation
設計思想
概要 → 詳細のブレイクダウン形式。各Phaseで「大きな絵」を掴んでから細部へ。
Phase 0: 基礎概念 → LiveKitとは何か
Phase 1: 概念モデル → Room/Participant/Track/Agent
Phase 2: 全体像 → Agentsフレームワークの俯瞰
Phase 3: インフラ層 → Worker/Job/Dispatchの仕組み
Phase 4: 実装層 → Session/Logic/Tools/Nodesのコア
Phase 5: モデル層 → STT/LLM/TTS/Realtimeの選択と設定
Phase 6: 本番運用 → Deploy/Observability/Langfuse
Phase 6.5: 実装読解 → 公式/実戦サンプルで設計を定着
Phase 7: Transport層 → RPC/Data channels/Byte streams の実装
Phase 8: フロント層 → RoomIO/カメラ/画面共有/画面連携(実装時に参照)
凡例
- ⬜ 未着手
- 🔄 進行中
- ✅ 完了
Phase 0: LiveKit基礎概念(概要を掴む)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | LiveKit Basics Overview | https://docs.livekit.io/intro/basics/ | LiveKit Basics Overview |
Phase 1: 概念モデル(コアの構成要素)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | Rooms, Participants, and Tracks | https://docs.livekit.io/intro/basics/rooms-participants-tracks/ | Rooms, Participants, and Tracks |
Phase 2: Agentsフレームワーク全体像(俯瞰)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | Agents Framework Introduction | https://docs.livekit.io/agents/ | Agents Framework Introduction |
| ✅ | Voice AI Quickstart | https://docs.livekit.io/agents/start/voice-ai-quickstart/ | Voice AI Quickstart |
| ✅ | Multimodality Overview | https://docs.livekit.io/agents/multimodality/ | - |
| ✅ | Text and Transcriptions | https://docs.livekit.io/agents/multimodality/text/ | Text and Transcriptions |
| ✅ | Agent Speech and Audio | https://docs.livekit.io/agents/multimodality/audio/ | Agent Speech and Audio |
| ✅ | Audio Customization(Cached TTS / Pronunciation / Volume) | https://docs.livekit.io/agents/multimodality/audio/customization/ | Audio Customization |
| ✅ | Vision | https://docs.livekit.io/agents/multimodality/vision/ | Vision |
Phase 3: Agent Server(インフラ層)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | Server Lifecycle | https://docs.livekit.io/agents/server/lifecycle/ | Server Lifecycle |
| ✅ | Job Lifecycle | https://docs.livekit.io/agents/server/job/ | Job Lifecycle |
| ✅ | Agent Dispatch | https://docs.livekit.io/agents/server/agent-dispatch/ | Agent Dispatch |
| ✅ | Server Startup Modes | https://docs.livekit.io/agents/server/startup-modes/ | Server Startup Modes |
Phase 4: Logic & Structure(コア実装層)
Phase 5: Models(モデル選択・設定)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | LiveKit Inference Overview | https://docs.livekit.io/agents/models/inference/ | LiveKit Inference Overview |
| ✅ | Introducing LiveKit Inference(Blog) | https://livekit.com/blog/introducing-livekit-inference | Introducing LiveKit Inference (Blog) |
| ✅ | Models Overview | https://docs.livekit.io/agents/models/ | Models Overview |
| ✅ | STT Models | https://docs.livekit.io/agents/models/stt/ | STT Models Overview |
| ✅ | LLM Models | https://docs.livekit.io/agents/models/llm/ | LLM Models Overview |
| ✅ | TTS Models | https://docs.livekit.io/agents/models/tts/ | TTS Models Overview |
| ✅ | Realtime Models | https://docs.livekit.io/agents/models/realtime/ | Realtime Models Overview |
| ✅ | Virtual Avatars | https://docs.livekit.io/agents/models/avatar/ | Virtual Avatar Models Overview |
Phase 6: Deploy & Observe(本番運用)
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ✅ | Agent Deployment Overview | https://docs.livekit.io/deploy/agents/ | - |
| ✅ | Agent Deployment Quickstart | https://docs.livekit.io/deploy/agents/quickstart/ | Agent Deployment Quickstart |
| ✅ | Deployment Management | https://docs.livekit.io/deploy/agents/managing-deployments/ | Deployment Management |
| ✅ | Secrets Management | https://docs.livekit.io/deploy/agents/secrets/ | Secrets Management |
| ✅ | Logs | https://docs.livekit.io/deploy/agents/logs/ | - |
| ✅ | Log Drains | https://docs.livekit.io/deploy/agents/log-drains/ | Log Drains |
| ✅ | Builds and Dockerfiles | https://docs.livekit.io/deploy/agents/builds/ | Builds and Dockerfiles |
| ✅ | Self-hosted Deployments | https://docs.livekit.io/deploy/custom/deployments/ | Self-hosted Deployments |
| ✅ | Observability Overview | https://docs.livekit.io/deploy/observability/ | Observability Overview |
| ✅ | Agent Insights(LiveKit Cloud) | https://docs.livekit.io/deploy/observability/insights/ | Agent Insights in LiveKit Cloud |
| ✅ | Data Hooks & OpenTelemetry(Langfuse) | https://docs.livekit.io/deploy/observability/data/ | Data Hooks |
Phase 6.5: Sample Code Deep Dive(実装読解)
Agent設計の理解をコードへ落とし込むため、実運用に近いサンプルを段階的に読む。
Phase 7: Transport(データ連携層)
Agent と frontend / backend を接続するデータチャネル層。RPCとストリーム通信を重点的に学ぶ。
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ⬜ | RPC(Realtime function calling) | https://docs.livekit.io/transport/data/rpc/ | - |
| ⬜ | Data packets | https://docs.livekit.io/home/client/data/packets/ | - |
| ⬜ | Text streams | https://docs.livekit.io/home/client/data/text-streams/ | - |
| ⬜ | Byte streams | https://docs.livekit.io/home/client/data/byte-streams/ | - |
Phase 8: フロントエンド実装(UI/メディア層)
エージェント実装後、実際のUI・メディア入力を接続するための知識。
| ステータス | ドキュメント | URL | ノート |
|---|---|---|---|
| ⬜ | RoomIO Overview | https://docs.livekit.io/home/client/tracks/ | - |
| ⬜ | Camera and Microphone | https://docs.livekit.io/home/client/tracks/camera-microphone/ | - |
| ⬜ | Screen Sharing | https://docs.livekit.io/home/client/tracks/screenshare/ | - |
次にやること
Phase 6 完了。Phase 6.5(Sample Code Deep Dive)に進む。
次のドキュメント: Front-desk sample → https://github.com/livekit/agents/blob/main/examples/frontdesk/frontdesk_agent.py
作成済みノート一覧
| 作成日 | フェーズ | ノートタイプ | タイトル |
|---|---|---|---|
| 2026-02-28 | - | SourceNote | LiveKit Agents Documentation |
| 2026-02-28 | - | StructureNote | このノート |
| 2026-02-28 | Phase 0 | LiteratureNote | LiveKit Basics Overview |
| 2026-02-28 | Phase 1 | LiteratureNote | Rooms, Participants, and Tracks |
| 2026-02-28 | Phase 1 | LiteratureNote | Webhooks and Events |
| 2026-02-28 | Phase 2 | LiteratureNote | Agents Framework Introduction |
| 2026-03-01 | Phase 2 | LiteratureNote | Voice AI Quickstart |
| 2026-03-01 | Phase 2 | LiteratureNote | Text and Transcriptions |
| 2026-03-01 | Phase 2 | LiteratureNote | Agent Speech and Audio |
| 2026-03-02 | Phase 2 | LiteratureNote | Vision |
| 2026-03-03 | Phase 3 | LiteratureNote | Server Lifecycle |
| 2026-03-05 | Phase 3 | LiteratureNote | Job Lifecycle |
| 2026-03-05 | Phase 3 | LiteratureNote | Agent Dispatch |
| 2026-03-06 | Phase 3 | LiteratureNote | Server Startup Modes |
| 2026-03-07 | Phase 4 | LiteratureNote | Logic and Structure Overview |
| 2026-03-07 | Phase 4 | LiteratureNote | Agent Session |
| 2026-03-07 | Phase 4 | LiteratureNote | RoomIO (Agent Session Context) |
| 2026-03-08 | Phase 4 | LiteratureNote | Workflows |
| 2026-03-09 | Phase 4 | LiteratureNote | Agents and handoffs |
| 2026-03-10 | Phase 4 | LiteratureNote | Tasks and Task Groups |
| 2026-03-14 | Phase 4 | LiteratureNote | Function Tool Definition |
| 2026-03-14 | Phase 4 | LiteratureNote | Model Context Protocol (MCP) |
| 2026-03-14 | Phase 4 | LiteratureNote | Forwarding to the frontend (RPC) |
| 2026-03-22 | Phase 4 | LiteratureNote | Silero VAD plugin |
| 2026-03-22 | Phase 4 | LiteratureNote | TurnHandlingOptions リファレンス |
| 2026-03-22 | Phase 4 | LiteratureNote | Events and error handling |
| 2026-03-22 | Phase 4 | LiteratureNote | External Data and RAG |
| 2026-03-29 | Phase 4 | LiteratureNote | Gemini Provider Tools |
| 2026-03-29 | Phase 4 | LiteratureNote | xAI Provider Tools and Realtime Model |
| 2026-04-08 | Phase 5 | LiteratureNote | LiveKit Inference Overview |
| 2026-04-08 | Phase 5 | LiteratureNote | Introducing LiveKit Inference (Blog) |
| 2026-04-08 | Phase 5 | LiteratureNote | Models Overview |
| 2026-04-08 | Phase 5 | LiteratureNote | STT Models Overview |
| 2026-04-09 | Phase 5 | LiteratureNote | LLM Models Overview |
| 2026-04-11 | Phase 5 | LiteratureNote | TTS Models Overview |
| 2026-04-12 | Phase 5 | LiteratureNote | Realtime Models Overview |
| 2026-04-12 | Phase 5 | LiteratureNote | Virtual Avatar Models Overview |