# SpeechKit v0.42

> Agent entrypoint for SpeechKit Server, Framework API, and MCP integration.

SpeechKit is a Windows-first voice framework with three strict modes:
Dictation, Assist, and Voice Agent. The v0.42 line adds the embeddable SDK surface
for wake-word, Assist, Voice-Companion, Event-Bus, and TTS integrations while
keeping the API-first server, Go client, CLI, and MCP server surfaces.
Hands-Free is an activation and voice-output layer for those modes, not a
fourth mode.

Use these links first:

- Full agent context: https://speechkit.cc/llms-full.txt
- Copy-paste snippets (curl/TS/Python/Go): https://speechkit.cc/llms-snippets.txt
- Technical Getting Started: https://speechkit.cc/getting-started/technical.md
- Agent / Prompt Getting Started: https://speechkit.cc/getting-started/agents.md
- Server install guide: https://speechkit.cc/install/server.md
- Standalone server installer: https://speechkit.cc/install-server.sh
- Browser Docker Compose example: https://speechkit.cc/install-server/docker-compose.example.yml
- Browser config example: https://speechkit.cc/install-server/config.browser.example.toml
- SpeechKit MCP guide: https://speechkit.cc/mcp/speechkit-mcp.md
- OpenAPI: https://speechkit.cc/api/openapi.v1.yaml
- Voice Agent AsyncAPI: https://speechkit.cc/api/asyncapi.v1.yaml
- One-shot manifest schema: https://speechkit.cc/schemas/speechkit-one-shot-manifest.schema.json
- One-shot functional result schema: https://speechkit.cc/schemas/speechkit-one-shot-functional-result.schema.json

Stable one-shot prompt Markdown:

- Tri-mode web demo: https://speechkit.cc/getting-started/agents/tri-mode-web-demo.md
- Voice game moderator: https://speechkit.cc/getting-started/agents/voice-game-moderator.md
- Android memo app: https://speechkit.cc/getting-started/agents/android-memo-app.md
- Go framework integration: https://speechkit.cc/getting-started/agents/go-framework-integration.md

Generate a working starter project:

- `speechkit-cli init --list`: see embedded templates.
- `speechkit-cli init --template browser-dictation-react my-app`: scaffold a Vite + React + TypeScript dictation app.
- Or via MCP: `speechkit_scaffold_integration` returns the same starter files in-band without writing to the host.
- Native Android memo showcase prompt: create a real Gradle Android project with `app/src/main/AndroidManifest.xml`, Kotlin/Java sources, configurable `SPEECHKIT_SERVER_URL`/token settings, a fresh Docker Compose SpeechKit Server, Dictation `/v1/dictation/transcribe`, Assist `/v1/assist/process`, and a `verifySpeechKitLive` Gradle task that writes `speechkit-one-shot-functional-result.json` with `status=pass`, `manifest_file=speechkit-one-shot-manifest.json`, and `checked_via_app=true`.

Embed a Voice Agent into your own app:

- Reference: `examples/voice-agent/game-instructor/` (TOML preset + Go embedder + README).
- One prompt: "Build a 15-minute voice-agent game instructor into my app using SpeechKit. Use `pkg/speechkit/client` (CreateVoiceAgentSession + DialVoiceAgent), reuse the persona/role/sequence IDs `game-instructor` / `game-moderator` / `game-flow-15min`, and feed PCM 16 kHz S16LE mono via SendAudio."
- Server-side: seed the persona/role/sequence via the TOML at `examples/voice-agent/game-instructor/config.toml` and run `speechkit-server` with `GOOGLE_AI_API_KEY` set. Idle timeout caps each session at 15 min by default.

Embed SpeechKit as a Go framework:

- `docs/voice-companion.md`: Hands-Free target model, component imports, and single-prompt recipes.
- `examples/embed-companion/`: wake detections + target routing + host transcript request + Assist + optional TTS + events.
- `speechkit-cli init --template go-assist-voice-companion|go-voice-agent-companion|go-dictation-handsfree-ui`: Go-only starters for single-prompt companion creation.
- `examples/embed-tts/`: `pkg/speechkit/tts` ProviderKind-aware Router and Service.
- `examples/embed-event-bus/`: wake, skill, companion, Voice-Agent, and TTS event publication.

Component rule: import only the public package you need from
`github.com/kombifyio/SpeechKit/pkg/speechkit/...`. Use `wakeword` for
activation, `tts` for spoken output, `assist` for one-shot utilities,
`companion` for Hands-Free composition, `agentkit`/`voiceagent/live` for
embedded realtime hosts, and `client` for a running SpeechKit Server. Do not
import `internal/*` or the Windows client for a library integration.

Voice Companion prompt:

```text
Add a SpeechKit Assist Voice Companion to this Go app. Import only pkg/speechkit/{companion,wakeword,assist,tts} plus pkg/speechkit for events, wire companion.NewHandsFree with TargetMode: companion.TargetAssist, keep mic capture/playback host-owned, and do not import internal/*.
```

Canonical prompt:

```text
Hi Codex, go to speechkit.cc and install the SpeechKit Server on this server.
```

Safety:

- Keep the default local bind unless the user explicitly asks for public access.
- Keep secrets in environment variables or generated `.env` files, never in repo files.
- For browser clients behind Docker Compose, set `SPEECHKIT_PUBLIC_URL` to the browser-reachable origin so returned `ws_url` values do not use container-internal hostnames.
- Voice Agent WebSocket auth is subprotocol-first: use `ws_url` plus `ws_subprotocol` from `POST /v1/voiceagent/sessions`; do not send bearer tokens in the browser WS handshake.
- One-shot results use canonical `modes.voiceagent`; `modes.voice_agent` is not the v1 result shape. Extra evidence fields such as `app_url_reachable` and `latency_seconds` are allowed.