AI
Local AI services running on the kontti server.
| Service | Purpose |
|---|---|
| Ollama | Local LLM inference server |
| Open WebUI | Web interface for Ollama |
| SearXNG | Private, self-hosted search engine |
| Qdrant | Vector database |
| Home Assistant MCP | MCP server for Home Assistant |
Ollama
Ollama runs local language models and exposes them via a REST API on port 11434. It runs on the AMD GPU using the Vulkan backend.
[Container]
Image=docker.io/ollama/ollama:latest
AddDevice=/dev/kfd
AddDevice=/dev/dri
# Vulkan GPU backend
Environment=OLLAMA_VULKAN=1
# GFX version override required for this GPU generation
Environment=HSA_OVERRIDE_GFX_VERSION=11.0.2
PublishPort=11434:11434
The HSA_OVERRIDE_GFX_VERSION override is needed because Ollama's ROCm support doesn't yet recognise the GPU's actual GFX version — without it, Ollama falls back to CPU inference.
Open WebUI
Open WebUI provides a ChatGPT-like web interface for Ollama. It connects to Ollama's API and supports model selection, conversation history, and document uploads.
SearXNG
SearXNG is a self-hosted meta search engine. It aggregates results from multiple sources without tracking searches or sending data to third parties. It uses Valkey (a Redis-compatible store) for caching.
Qdrant
Qdrant is a vector database used for semantic search over the Plex media library. The plex-sync script embeds Plex library metadata with Ollama and stores the vectors in Qdrant, enabling natural language search over local media.
Home Assistant MCP
Home Assistant runs the official Model Context Protocol add-on, which exposes the smart home as an MCP server. This allows AI assistants to query and control Home Assistant — reading sensor states, triggering automations, and interacting with devices. Not currently connected to any client.