How I Gave an AI Full Control of My Homelab (And Why MCP Changes Everything)

There's a moment in every homelab journey where you realize you've built something complex enough that managing it becomes a project in itself. A Proxmox cluster here, a Kubernetes deployment there, Terraform state to wrangle, DNS records to update, monitoring to configure, secrets to rotate. Each system has its own API, its own CLI, its own mental model. The cognitive overhead compounds.

What if your AI assistant could just... do it all? Not generate commands for you to copy-paste. Actually do it. Check the cluster health. Deploy a container. Update DNS. Rotate a secret. Create a monitoring alert. All through a single, unified interface.

That's what I built. And the enabling technology is something called MCP — the Model Context Protocol.

What is MCP?

MCP is an open protocol that lets AI models interact with external systems through structured tool definitions. Think of it as a universal adapter layer: you define what tools are available (with their inputs, outputs, and descriptions), and the AI model can discover and invoke them as part of a conversation.

The key insight is that MCP isn't just about giving an AI access to an API. It's about giving it context. When a tool returns the current state of your Kubernetes cluster, or the output of a Terraform plan, the AI can reason about that state and decide what to do next. It turns a chatbot into an operator.

The Architecture: One Server, Twelve Services

I built a single MCP server — a TypeScript application — that acts as a unified control plane for my entire homelab. It exposes over a hundred tools across twelve different service integrations:

Virtualization — Full lifecycle management of VMs and containers across a multi-node cluster. Node health, resource utilization, live migration.
Kubernetes — Pod management, log retrieval, resource inspection, rolling restarts, scaling, manifest application. Everything you'd normally do with kubectl, but through natural language.
Infrastructure as Code — Terraform plan, apply (with a human-in-the-loop confirmation workflow), state inspection, and even HCL block-level editing. The server can modify infrastructure definitions, show you the diff, and apply it — all within the conversation flow.
Networking — Device inventory, client lists, firewall rules, VLAN configuration, port forwarding rules. Read access to the full network topology.
Storage — NAS health monitoring, disk SMART status, storage pool utilization, shared folder management.
Secret Management — KV store operations for credential lifecycle. No more SSH-ing into a pod to read or rotate a secret.
DNS & Service Management — Remote systemd unit control and Docker container management across any host via SSH tunneling.
Monitoring — Uptime Kuma monitor creation, tagging, pause/resume. When I deploy a new service, the AI creates the corresponding health checks automatically.
Git & CI/CD — Repository management, issue tracking, workflow dispatch, build logs. The full development lifecycle without leaving the conversation.
Documentation — Page and database operations for keeping infrastructure documentation in sync with reality.
Media — Because what's a homelab without a media stack? Search, discover, and request content through the same interface.

Why This Matters: The Compound Effect

Any single integration is useful but not revolutionary. You could write a script that checks your cluster health. You could use the Terraform CLI directly. The power emerges from the combination.

Consider a real workflow: decommissioning a service. Without MCP, this means:

Remove the Terraform resource definition
Run terraform plan to verify
Apply the changes
Remove the DNS record
Apply DNS changes
Update monitoring (remove health checks)
Update documentation
Commit and push everything

With MCP, I describe the intent: "decommission the ollama service." The AI has enough context to execute every step, asking for confirmation at the destructive boundaries. It edits the Terraform files, runs the plan, shows me the diff, waits for my approval, applies, removes DNS, updates monitoring, updates the documentation page, commits, and pushes. What used to be a 30-minute checklist becomes a two-minute conversation.

Safety by Design

Giving an AI write access to production infrastructure sounds terrifying. It should. The design accounts for this in several ways:

Confirmation workflows — Terraform applies use a token-based two-step process. The AI requests an apply, receives a time-limited confirmation token, and must present it back to actually execute. This creates a natural pause for human review.
Read/write separation — Every tool is clearly categorized. Read operations flow freely; write operations are gated.
Behavioral guardrails — The AI operates under strict rules: never modify critical systems without explicit approval, always plan before applying, always verify before acting. These aren't suggestions — they're enforced constraints in the system prompt and through hook scripts that intercept dangerous operations before they execute.
Audit trail — Every tool invocation is logged. Every Terraform apply auto-commits the changes to version control. There's always a breadcrumb trail back to what changed and why.

The Developer Experience

This runs through Claude Code, Anthropic's CLI tool. The experience is remarkably fluid. I open a terminal, describe what I want in plain language, and watch the AI orchestrate across systems. It reads current state, reasons about dependencies, and executes in the right order.

The MCP server itself is straightforward to build. Each service integration is a self-contained module: a set of Zod-validated tool definitions with handler functions that call the underlying APIs. Adding a new integration — say, a new monitoring service or a database backup tool — is a matter of defining the tools and registering them. The protocol handles discovery, validation, and execution.

What surprised me most is how natural it feels. There's no mode-switching between "talking to an AI" and "operating infrastructure." It's the same conversation. I can ask "what's the current CPU utilization across the cluster?" and follow up with "scale up the <insert service here>" without context loss.

What I Learned

Building this taught me a few things worth sharing:

Tool design matters more than you'd think. The descriptions you write for each tool directly affect how well the AI uses them. Vague descriptions lead to wrong tool choices. Precise, opinionated descriptions lead to surprisingly good autonomous behavior.
The confirmation pattern is essential. Any system that gives an AI write access to infrastructure needs a human checkpoint for destructive operations. The token-based confirmation workflow has saved me from several unintended changes.
State reconciliation is a real problem. When your virtualization platform supports live migration and HA failover, the infrastructure-as-code state can drift. Building automatic reconciliation into the MCP server (detecting drift and fixing state before planning) eliminated an entire class of errors.
SSH tunneling is underrated. For services that don't have HTTP APIs (systemd, Docker on remote hosts), SSH-based tool execution is simple, secure, and reliable. No need to expose additional network surfaces.

Looking Forward

MCP is still early, but the trajectory is clear. As AI models get better at multi-step reasoning and tool orchestration, the value of having a well-structured tool layer increases. The protocol itself is simple — the complexity lives in the tools you build and the guardrails you put around them.

For anyone running a homelab (or managing infrastructure at any scale), I'd encourage experimenting with MCP. Start small — a read-only integration with your monitoring stack or your virtualization API. Once you see an AI reason about your actual infrastructure state and suggest the right next action, you'll understand why this feels like a fundamental shift in how we interact with systems.

The future of infrastructure management isn't better dashboards or smarter alerting. It's conversational. And with MCP, we're closer than you might think.

How I Gave an AI Full Control of My Homelab (And Why MCP Changes Everything)

What is MCP?

The Architecture: One Server, Twelve Services

Why This Matters: The Compound Effect

Safety by Design

The Developer Experience

What I Learned

Looking Forward

Related Articles

Think Like a CISO: AI Agents and How to Secure Them

Infrastructure as Code for Your UCG-Max: Managing UniFi with Terraform and Claude Code

Think Like a CISO: Taking the Keys Back From My AI Agent