The self-hosted open-source LLM pitch sounds compelling. Own your model. No per-token costs. Train it on your data. Run it on your hardware. Total control.
Then you do the math.
The Critical Requirement: Tool Use
Our AI assistant doesn't write poetry. It parses natural language into structured operations. "Order a Bergara for Richard at 15% margin" needs to resolve into the right handler, the right parameters, the right sequence of API calls. That's tool use — and it's where frontier models are significantly ahead of open-source alternatives.
A model that gets the answer right 90% of the time is unusable for operational software. When a dealer says "invoice that order," you need 100% reliability on intent parsing. One misrouted command is a compliance issue, a financial error, or a lost customer. We tested open-source models. The tool-use reliability gap wasn't close.
The GPU Math
| Approach | Monthly Cost | Notes |
|---|---|---|
| Self-hosted (A100 instance) | $1,500 – $3,000 | 24/7 for availability. Plus DevOps time. |
| API (Claude via Bedrock) | $100 – $250 | At startup scale (50 dealers). Pay per token. |
| Break-even point | — | Hundreds of active tenants. Years away. |
The break-even math is clear: you need hundreds of active tenants running thousands of daily conversations before self-hosting makes economic sense. At startup scale, API costs are a rounding error compared to GPU hosting.
What "Training" Actually Means
Most people who say "train it on your data" mean one of three things, and two of them are free:
- System prompts — Free. Works on any model. This is what "training" usually means in practice. You write detailed instructions, include examples, describe your domain. No GPU required.
- Fine-tuning — $500–$5K per run. Requires training data you probably don't have yet. Improves specific task performance. Worth it at scale, not at launch.
- Pre-training — $100K+. Building a model from scratch. Never the right answer for a vertical SaaS startup.
What most people mean by "training" is just a good system prompt. You don't need a GPU for that.
The Bedrock Advantage
AWS Bedrock collapses the vendor relationship. Single AWS bill. IAM authentication — the same identity system that manages your entire infrastructure. Same region as your database and application servers. No separate API key management, no second vendor relationship, no additional compliance surface.
For a solo founder, every hour spent wrangling GPU instances, debugging model outputs, and managing inference infrastructure is an hour not spent shipping features for customers. The API lets you treat AI as a utility, like a database or a CDN. You call it when you need it and don't think about it otherwise.
The Hybrid Option Stays Open
Use a cheap model for simple lookups, a frontier model for complex workflows. But you need real usage data to make that call, not speculation.
The architecture supports model switching. The AI assistant talks to a port interface, not to Claude directly. Tomorrow we could route simple queries to a smaller model and complex tool-use workflows to the frontier. But that optimization requires usage patterns we don't have yet. Premature optimization of AI infrastructure is the same trap as premature optimization of code — you're solving a problem you've imagined, not one you've measured.
Ship with the best available model. Collect real usage data. Optimize when the data tells you to, not when the Hacker News comments tell you to.