LiveKit Agent Framework 学習ロードマップ

目標: LiveKitのことなら俺に聞いてくれ状態のスペシャリストになるためのガチ学習。 環境前提: ローカルPython → LiveKit Cloud(Langfuse observability)

ドキュメント参照元: LiveKit Agents Documentation


設計思想

概要 → 詳細のブレイクダウン形式。各Phaseで「大きな絵」を掴んでから細部へ。

Phase 0: 基礎概念      → LiveKitとは何か
Phase 1: 概念モデル    → Room/Participant/Track/Agent
Phase 2: 全体像        → Agentsフレームワークの俯瞰
Phase 3: インフラ層    → Worker/Job/Dispatchの仕組み
Phase 4: 実装層        → Session/Logic/Tools/Nodesのコア
Phase 5: モデル層      → STT/LLM/TTS/Realtimeの選択と設定
Phase 6: 本番運用      → Deploy/Observability/Langfuse
Phase 6.5: 実装読解    → 公式/実戦サンプルで設計を定着
Phase 7: Transport層   → RPC/Data channels/Byte streams の実装
Phase 8: フロント層    → RoomIO/カメラ/画面共有/画面連携(実装時に参照)

凡例

  • ⬜ 未着手
  • 🔄 進行中
  • ✅ 完了

Phase 0: LiveKit基礎概念(概要を掴む)

ステータスドキュメントURLノート
LiveKit Basics Overviewhttps://docs.livekit.io/intro/basics/LiveKit Basics Overview

Phase 1: 概念モデル(コアの構成要素)

ステータスドキュメントURLノート
Rooms, Participants, and Trackshttps://docs.livekit.io/intro/basics/rooms-participants-tracks/Rooms, Participants, and Tracks

Phase 2: Agentsフレームワーク全体像(俯瞰)


Phase 3: Agent Server(インフラ層)


Phase 4: Logic & Structure(コア実装層)

ステータスドキュメントURLノート
Logic Overviewhttps://docs.livekit.io/agents/logic/Logic and Structure Overview
Agent Sessionhttps://docs.livekit.io/agents/logic/sessions/Agent Session
Workflowshttps://docs.livekit.io/agents/logic/workflows/Workflows
Agents & Handoffshttps://docs.livekit.io/agents/logic/agents-handoffs/Agents and handoffs
Tasks & Task Groups(Python only)https://docs.livekit.io/agents/logic/tasks/Tasks and Task Groups
Tool Definition & Use(overview, 新構成)https://docs.livekit.io/agents/logic/tools/Function Tool Definition / Model Context Protocol (MCP) / Forwarding to the frontend (RPC)
Function Tool Definitionhttps://docs.livekit.io/agents/logic/tools/definition/Function Tool Definition
Model Context Protocol (MCP)https://docs.livekit.io/agents/logic/tools/mcp/Model Context Protocol (MCP)
Forwarding to the Frontend (RPC)https://docs.livekit.io/agents/logic/tools/forwarding/Forwarding to the frontend (RPC)
Pipeline Nodes & Hookshttps://docs.livekit.io/agents/logic/nodes/Pipeline Nodes and Hooks
Turn Detection & Interruptionshttps://docs.livekit.io/agents/logic/turns/Turn Detection and Interruptions
Turn detection methods(子ページ)https://docs.livekit.io/agents/build/turns/turn-detector/LiveKit Turn Detector Plugin
Adaptive interruption handlinghttps://docs.livekit.io/agents/logic/turns/adaptive-interruption-handling/Adaptive Interruption Handling
Silero VAD pluginhttps://docs.livekit.io/agents/logic/turns/vad/Silero VAD plugin
TurnHandlingOptions リファレンスhttps://docs.livekit.io/reference/agents/turn-handling-options/TurnHandlingOptions リファレンス
Events リファレンスhttps://docs.livekit.io/reference/agents/events/Events and error handling
External Data & RAGhttps://docs.livekit.io/agents/build/external-data/External Data and RAG
Tool definition & use: Provider tools 概要(補助)https://docs.livekit.io/agents/logic/tools/#provider-tools-
Gemini Provider Tools(補助)https://docs.livekit.io/agents/models/llm/gemini/#provider-toolsGemini Provider Tools
xAI Provider Tools(補助)https://docs.livekit.io/agents/models/realtime/plugins/xai/xAI Provider Tools and Realtime Model

Phase 5: Models(モデル選択・設定)


Phase 6: Deploy & Observe(本番運用)


Phase 6.5: Sample Code Deep Dive(実装読解)

Agent設計の理解をコードへ落とし込むため、実運用に近いサンプルを段階的に読む。

ステータスサンプルURLノート
Front-desk samplehttps://github.com/livekit/agents/blob/main/examples/frontdesk/frontdesk_agent.pyFront-desk sample
Restaurant samplehttps://github.com/livekit/agents/blob/main/examples/voice_agents/restaurant_agent.py-
Drive-thru samplehttps://github.com/livekit/agents/blob/main/examples/drive-thru/agent.py-
Structured Outputhttps://github.com/livekit/agents/blob/main/examples/voice_agents/structured_output.py-
Use of enum(tool引数設計)https://github.com/livekit/agents/blob/main/examples/voice_agents/annotated_tool_args.py-
Dynamic tool creation(動的ツール公開)https://github.com/livekit/agents/blob/main/examples/voice_agents/dynamic_tool_creation.py-
LlamaIndex RAG samplehttps://github.com/livekit/agents/tree/main/examples/voice_agents/llamaindex-rag-
Keyword Detectionhttps://github.com/livekit-examples/python-agents-examples/tree/main/docs/examples/keyword-detection-
LLM Content Filterhttps://github.com/livekit-examples/python-agents-examples/tree/main/docs/examples/llm_powered_content_filter-
Speedup Output Audiohttps://github.com/livekit/agents/blob/main/examples/voice_agents/speedup_output_audio.py-
Prebuilt Tasks Deep Divehttps://docs.livekit.io/agents/prebuilt/tasks/-
Prebuilt Tools Deep Divehttps://docs.livekit.io/agents/prebuilt/tools/-
Medical Office samplehttps://github.com/livekit-examples/python-agents-examples/blob/main/complex-agents/medical_office_triage/triage.py-
Shopify Voice Shopper samplehttps://github.com/livekit-examples/python-agents-examples/tree/main/complex-agents/shopify-voice-shopper-
Chain-of-thought agenthttps://github.com/livekit-examples/agent-demos/tree/main/chain-of-thought-tts-

Phase 7: Transport(データ連携層)

Agent と frontend / backend を接続するデータチャネル層。RPCとストリーム通信を重点的に学ぶ。


Phase 8: フロントエンド実装(UI/メディア層)

エージェント実装後、実際のUI・メディア入力を接続するための知識。

ステータスドキュメントURLノート
RoomIO Overviewhttps://docs.livekit.io/home/client/tracks/-
Camera and Microphonehttps://docs.livekit.io/home/client/tracks/camera-microphone/-
Screen Sharinghttps://docs.livekit.io/home/client/tracks/screenshare/-

次にやること

Phase 6 完了。Phase 6.5(Sample Code Deep Dive)に進む。

次のドキュメント: Front-desk samplehttps://github.com/livekit/agents/blob/main/examples/frontdesk/frontdesk_agent.py


作成済みノート一覧

作成日フェーズノートタイプタイトル
2026-02-28-SourceNoteLiveKit Agents Documentation
2026-02-28-StructureNoteこのノート
2026-02-28Phase 0LiteratureNoteLiveKit Basics Overview
2026-02-28Phase 1LiteratureNoteRooms, Participants, and Tracks
2026-02-28Phase 1LiteratureNoteWebhooks and Events
2026-02-28Phase 2LiteratureNoteAgents Framework Introduction
2026-03-01Phase 2LiteratureNoteVoice AI Quickstart
2026-03-01Phase 2LiteratureNoteText and Transcriptions
2026-03-01Phase 2LiteratureNoteAgent Speech and Audio
2026-03-02Phase 2LiteratureNoteVision
2026-03-03Phase 3LiteratureNoteServer Lifecycle
2026-03-05Phase 3LiteratureNoteJob Lifecycle
2026-03-05Phase 3LiteratureNoteAgent Dispatch
2026-03-06Phase 3LiteratureNoteServer Startup Modes
2026-03-07Phase 4LiteratureNoteLogic and Structure Overview
2026-03-07Phase 4LiteratureNoteAgent Session
2026-03-07Phase 4LiteratureNoteRoomIO (Agent Session Context)
2026-03-08Phase 4LiteratureNoteWorkflows
2026-03-09Phase 4LiteratureNoteAgents and handoffs
2026-03-10Phase 4LiteratureNoteTasks and Task Groups
2026-03-14Phase 4LiteratureNoteFunction Tool Definition
2026-03-14Phase 4LiteratureNoteModel Context Protocol (MCP)
2026-03-14Phase 4LiteratureNoteForwarding to the frontend (RPC)
2026-03-22Phase 4LiteratureNoteSilero VAD plugin
2026-03-22Phase 4LiteratureNoteTurnHandlingOptions リファレンス
2026-03-22Phase 4LiteratureNoteEvents and error handling
2026-03-22Phase 4LiteratureNoteExternal Data and RAG
2026-03-29Phase 4LiteratureNoteGemini Provider Tools
2026-03-29Phase 4LiteratureNotexAI Provider Tools and Realtime Model
2026-04-08Phase 5LiteratureNoteLiveKit Inference Overview
2026-04-08Phase 5LiteratureNoteIntroducing LiveKit Inference (Blog)
2026-04-08Phase 5LiteratureNoteModels Overview
2026-04-08Phase 5LiteratureNoteSTT Models Overview
2026-04-09Phase 5LiteratureNoteLLM Models Overview
2026-04-11Phase 5LiteratureNoteTTS Models Overview
2026-04-12Phase 5LiteratureNoteRealtime Models Overview
2026-04-12Phase 5LiteratureNoteVirtual Avatar Models Overview