Skip to content

AI

Local AI services running on the kontti server.

Service Purpose
Ollama Local LLM inference server
Open WebUI Web interface for Ollama
SearXNG Private, self-hosted search engine
Qdrant Vector database
Home Assistant MCP MCP server for Home Assistant

Ollama

Ollama runs local language models and exposes them via a REST API on port 11434. It runs on the AMD GPU using the Vulkan backend.

ollama.container (excerpt)
[Container]
Image=docker.io/ollama/ollama:latest
AddDevice=/dev/kfd
AddDevice=/dev/dri

# Vulkan GPU backend
Environment=OLLAMA_VULKAN=1
# GFX version override required for this GPU generation
Environment=HSA_OVERRIDE_GFX_VERSION=11.0.2

PublishPort=11434:11434

The HSA_OVERRIDE_GFX_VERSION override is needed because Ollama's ROCm support doesn't yet recognise the GPU's actual GFX version — without it, Ollama falls back to CPU inference.

Open WebUI

Open WebUI provides a ChatGPT-like web interface for Ollama. It connects to Ollama's API and supports model selection, conversation history, and document uploads.


SearXNG

SearXNG is a self-hosted meta search engine. It aggregates results from multiple sources without tracking searches or sending data to third parties. It uses Valkey (a Redis-compatible store) for caching.


Qdrant

Qdrant is a vector database used for semantic search over the Plex media library. The plex-sync script embeds Plex library metadata with Ollama and stores the vectors in Qdrant, enabling natural language search over local media.


Home Assistant MCP

Home Assistant runs the official Model Context Protocol add-on, which exposes the smart home as an MCP server. This allows AI assistants to query and control Home Assistant — reading sensor states, triggering automations, and interacting with devices. Not currently connected to any client.