AI

Local AI services running on the kontti server.

Service	Port	Purpose
Ollama	11434	Local LLM inference server
Open WebUI	3001	Web interface for Ollama
SearXNG	8888	Private, self-hosted search engine
Qdrant	6333 / 6334	Vector database (REST / gRPC)
Home Assistant MCP	—	MCP server for Home Assistant (add-on on the HA host)
Media Server MCP	8085	MCP server for the Plex media library

Ollama

Ollama runs local language models and exposes them via a REST API on port 11434. It runs on the AMD GPU using the Vulkan backend.

ollama.container (excerpt)

[Container]
Image=docker.io/ollama/ollama:latest
AddDevice=/dev/kfd
AddDevice=/dev/dri

# Vulkan GPU backend
Environment=OLLAMA_VULKAN=1
# GFX version override required for this GPU generation
Environment=HSA_OVERRIDE_GFX_VERSION=11.0.2

PublishPort=11434:11434

The HSA_OVERRIDE_GFX_VERSION override is needed because Ollama's ROCm support doesn't yet recognise the GPU's actual GFX version — without it, Ollama falls back to CPU inference.

Open WebUI

Open WebUI provides a ChatGPT-like web interface for Ollama. It connects to Ollama's API and supports model selection, conversation history, and document uploads.

SearXNG

SearXNG is a self-hosted meta search engine. It aggregates results from multiple sources without tracking searches or sending data to third parties. It uses Valkey (a Redis-compatible store) for caching.

Qdrant

Qdrant is a vector database. Here it stores embeddings of the Plex media library so the catalogue can be queried in natural language rather than by exact title — see plex-sync below for how the data gets in.

plex-sync

plex-sync is a small custom Python service that turns the Plex library into a searchable vector index. It's the piece that connects three otherwise-separate local services — Ollama, Qdrant, and Plex — into a semantic search pipeline that runs entirely on-prem, with no cloud API calls and no library metadata leaving the network.

How it works

graph LR
    plex[Plex API<br/>movies & shows] -->|descriptive text| sync[plex-sync]
    sync -->|embed| ollama[Ollama<br/>nomic-embed-text]
    ollama -->|768-dim vector| qdrant[(Qdrant<br/>plex_movies / plex_shows)]
    qdrant -->|search| mcp[Media Server MCP]
    mcp -->|tools| owui[Open WebUI]

Fetch — the script reads every movie and TV library from the Plex API.
Describe — each item is flattened into a compact text block (title, original title, year, genres, countries, ratings, studio, director, top cast, summary). Embedding the description rather than just the title is what makes fuzzy, plot-based queries work.
Embed — the text is sent to Ollama's nomic-embed-text model, producing a 768-dimensional vector.
Store — the vector and a rich payload (ratings, runtime, genres, view stats…) are upserted into Qdrant, into separate plex_movies and plex_shows collections.

Incremental by design

Re-embedding the whole library every night would be wasteful. Each item's description is hashed (MD5), and that hash is stored alongside the vector. On the next run the script checks Qdrant for an existing point with the same plex_id and the same content hash — if both match, nothing has changed and the item is skipped. Only new or edited titles reach Ollama, so a daily sync over a large library costs a handful of embeddings rather than thousands.

Packaging and scheduling

plex-sync builds its own minimal container (python:3.14-slim, the current stable Python release, plus qdrant-client and requests) from a Quadlet .build file, so the image is produced on the host rather than pulled from a registry. A systemd timer runs it once a day at 04:00 with a randomised delay and Persistent=true, so a missed run (for example, the server was down) is caught up on the next boot. The HTTP client uses exponential-backoff retries, since Plex and Ollama can both be briefly busy.

The result feeds the Media Server MCP and Open WebUI: a query like "bleak Nordic crime dramas" resolves against vectors instead of a literal title match.

Home Assistant MCP

Home Assistant runs the official Model Context Protocol add-on, which exposes the smart home as an MCP server. This allows AI assistants to query and control Home Assistant — reading sensor states, triggering automations, and interacting with devices. It is connected to an MCP client.

Media Server MCP

Media Server MCP is a self-hosted MCP server that exposes the Plex media library to AI assistants. It runs in a Deno container on port 8085 and provides tools for searching the library, looking up metadata and generating watch recommendations through natural language.

My experience

Ollama's main job so far has been generating embeddings for the plex-sync pipeline — feeding Plex metadata into Qdrant via the nomic-embed-text model. I tested larger language models locally, but the GPU doesn't have enough power for a good experience.

I've tested the semantic Plex search through Open WebUI using Gemini models, and it works reasonably well, but it has remained more of an experiment. Open WebUI in general is still at the experimentation stage. I'm waiting for local language models and affordable server hardware to mature before I expect to use them seriously.

The MCP servers have worked well with Claude — they help with things like Home Assistant configuration and media library management. Sometimes Claude prefers to use the application's API directly, since the MCP servers don't cover every operation.