ai-agentshermessupabasearchitectureself-hosted

How I Wired a Self-Hosted AI Agent to My Supabase Backend

Architecture notes on running a local Hermes model, giving it Supabase tools, and letting it autonomously query and update production data.

Milin Paul

@milinpaul · Jun 10, 2026 · 8 min read

Most agent tutorials reach for OpenAI. There are good reasons for that — the API is stable, the function-calling interface is well-documented, and the models are genuinely capable. But if you’re building something where the data is sensitive, or where you want to control costs at the inference level, self-hosting is worth the setup cost.

I spent a few weeks wiring a local Hermes 2.5 Pro instance to a Supabase backend. Here’s what I learned.

Why self-hosted?

Two reasons, in order of importance.

Privacy. The agent I built reads and writes customer data. Sending that to a third-party inference provider, even one with strong data agreements, wasn’t something I wanted to explain to stakeholders. Running inference locally means the data never leaves our network.

Cost predictability. At the volume I was running — a few hundred agent invocations per day — the token cost on hosted APIs was significant. A one-time investment in a machine running Ollama costs the same whether you run 100 or 10,000 queries today.

The tradeoff: setup complexity, hardware dependency, and model capability. Hermes 2.5 Pro is very good at function calling. It’s not GPT-4. For the task I was building — structured data queries against a known schema — it was good enough.

The architecture

The setup is three components:

Ollama running Hermes 2.5 Pro on a local machine (16GB VRAM, runs the 8B model comfortably)
A Python agent loop that handles tool dispatch and conversation state
Supabase as the database, accessed via the Python client library

The agent receives a task in natural language, reasons about which tools to use, calls them in sequence, and returns a structured result. The Supabase client is exposed as a set of callable tools.

Here’s the agent loop in simplified form:

def run_agent(task: str, tools: list[Tool]) -> str:
    messages = [{"role": "user", "content": task}]
    
    while True:
        response = ollama.chat(
            model="hermes2pro",
            messages=messages,
            tools=[t.schema for t in tools],
        )
        
        message = response["message"]
        messages.append(message)
        
        if not message.get("tool_calls"):
            # Agent is done — return final response
            return message["content"]
        
        # Execute tool calls and feed results back
        for call in message["tool_calls"]:
            tool = find_tool(tools, call["function"]["name"])
            result = tool.execute(call["function"]["arguments"])
            messages.append({
                "role": "tool",
                "content": str(result),
            })

The loop runs until the model produces a response without tool calls. That’s the signal that it’s done reasoning.

Defining Supabase tools

I wrapped the Supabase Python client in a thin tool layer. Each tool has a name, a description, a JSON schema for its parameters, and an execute method.

Here’s the query tool:

class QueryTool:
    name = "query_table"
    description = "Query a Supabase table with optional filters. Returns rows as JSON."
    schema = {
        "type": "function",
        "function": {
            "name": "query_table",
            "description": description,
            "parameters": {
                "type": "object",
                "properties": {
                    "table": {"type": "string", "description": "Table name"},
                    "filters": {
                        "type": "object",
                        "description": "Key-value pairs to filter by (equality)",
                    },
                    "limit": {"type": "integer", "default": 10},
                },
                "required": ["table"],
            },
        },
    }
    
    def execute(self, args: dict) -> list[dict]:
        query = supabase.table(args["table"]).select("*")
        for key, value in args.get("filters", {}).items():
            query = query.eq(key, value)
        query = query.limit(args.get("limit", 10))
        result = query.execute()
        return result.data

The description matters. Hermes is remarkably sensitive to how you describe what a tool does. Vague descriptions produce inconsistent tool selection. Specific descriptions — including what data is returned and in what format — produce reliable behavior.

Failure modes I hit

Hallucinated table names. The model would sometimes call query_table with a table name that didn’t exist. The fix: inject the schema into the system prompt. A short description of available tables and their columns prevents most of this.

system_prompt = f"""
You are an assistant with access to a Supabase database.
Available tables:
- expenses (id, user_id, amount, category, created_at)
- users (id, email, created_at)
- budgets (id, user_id, category, limit_amount)

Use the provided tools to answer questions about this data.
"""

Infinite loops. Early versions of the loop would sometimes get stuck calling the same tool repeatedly with slightly different arguments. The fix: add a maximum iteration count and return an error if the agent exceeds it. 10 iterations is generous for most tasks; anything beyond that is usually a reasoning failure.

Wrong filter semantics. The agent would correctly identify that it needed to filter, but apply the filter to the wrong field. This was a model capability issue, not a tooling issue. Adding examples to the system prompt helped significantly.

What worked well

The Hermes model’s function-calling reliability surprised me. For well-defined tasks — “find all expenses over $50 in the last 30 days”, “summarize the top spending categories for user X” — it was accurate on the first try more than 85% of the time.

The Ollama local inference was faster than I expected. On an RTX 3090, the 8B model produces tokens at ~80 tok/s. For an agentic loop that runs 3–5 tool calls before producing a final answer, the latency is under 10 seconds end-to-end. That’s usable.

Supabase’s Python client is a good fit for this use case. The fluent query builder maps naturally to the kinds of structured queries an agent wants to run. The RLS policies mean I can give the agent credentials with scoped access — it can only read and write the tables I explicitly allow.

Should you do this?

Self-hosting an agent is a meaningful investment. You’re taking on the operations burden of a running inference server, the debugging burden of an opaque model, and the engineering burden of a custom tool layer.

That said, if privacy is a hard requirement, or if you need cost certainty at scale, the investment pays off quickly. The architecture I described here runs reliably. The failure modes are understood and mitigated. And it doesn’t phone home.

If you’re starting with hosted models and want to migrate later, the tool schema format I used here is compatible with OpenAI’s function calling interface — the same tool definitions work with both. That makes migration straightforward when the time comes.

Share: Twitter/X LinkedIn

Milin Paul

@milinpaul

Lead Software Engineer at EverestEngineering. I write about practical AI systems, engineering architecture, and what actually works in production.

Get new posts in your inbox

No noise. Just engineering insights when I publish something worth reading.

Unsubscribe anytime.