Started at USD 2400/mo. Down to USD 720/mo with same throughput.
1. Aggressive prompt caching — saved 40%.
2. Cheaper model for the easy 80% of requests, expensive only for the long-tail. Saved 25%.
3. Hard cap on max output tokens (was unbounded, now 800). Saved 5%.
No quality regression. Took two days.
Discussion (0)
Full answers are for Plus members.
Plus members get the full thread + every other premium community. Create a free account first, then upgrade in one click.