The Complete Guide to Claude Cost Optimization with MoltBot and Telegram
“Efficiency is doing better what is already being done.” — Peter Drucker
TL;DR
- Cut Claude spend with prompt discipline, caching, routing, and evals while keeping quality.
- Mix models (e.g., Sonnet for quality, cheaper tiers for routine) and shape responses to reduce tokens.
- Track costs per workflow and cache/reuse answers to avoid paying twice.
A comprehensive guide to reducing AI costs by up to 73% through strategic model selection, prompt caching, and smart configuration
"Price is what you pay. Value is what you get." - Warren Buffett
TL;DR
The Problem: You're hitting Claude subscription rate limits or paying overage charges.
The Solution: Reduce costs by 60-83% through:
- ✅ Use Sonnet 4.5, not Opus (40% base savings, Opus is actually 2.7x more expensive)
- ✅ Enable prompt caching (90% savings on repeated content, automatic in MoltBot)
- ✅ Use Telegram for conversations (automatic session management maximizes cache hits)
- ✅ Enable built-in hooks (session-memory, command-logger, boot-md all save costs)
- ✅ Batch related questions (3+ messages per conversation to profit from caching)
- ✅ Use
/newstrategically when switching topics (saves context to memory files)
Quick Setup (30 minutes):
# Install MoltBot
curl -fsSL https://clawd.bot/install.sh | bash
clawdbot onboard --install-daemon
# Authenticate with Claude subscription
claude setup-token
clawdbot models auth paste-token --provider anthropic
# Verify caching enabled (automatic)
clawdbot models list
Expected Results: $50/month → $8.50/month on 10M tokens (83% reduction)
Read if: You want the complete setup guide, economics analysis, troubleshooting, and real-world examples.
Table of Contents
- Introduction
- Understanding Claude Models & Pricing
- Prompt Caching: The 90% Cost Reducer
- Setting Up MoltBot with Claude Code Subscription
- Telegram Integration for Seamless Caching
- Hooks & Automation for Cost Savings
- Cost Optimization Strategies
- Real-World Savings Examples
- Monitoring & Troubleshooting
- Conclusion
Introduction
If you're using Claude Code with a subscription and hitting your rate limits (or worse, paying for overage), you're probably looking for ways to reduce costs without sacrificing quality. This guide covers everything you need to know about optimizing Claude costs through:
- Model selection (Sonnet vs Opus)
- Prompt caching (90% savings on repeated content)
- MoltBot integration (personal AI assistant on Telegram)
- Smart configuration (hooks, session management, automation)
By the end, you'll have a setup that can reduce costs by 60-73% compared to unoptimized usage.
Understanding Claude Models & Pricing
Available Models (2026)
| Model | Input Cost | Output Cost | Best For |
|---|---|---|---|
| Haiku 4.5 | $1/M tokens | $5/M tokens | Simple tasks, classification |
| Sonnet 4.5 | $3/M tokens | $15/M tokens | General-purpose daily work |
| Opus 4.5 | $5/M tokens | $25/M tokens | Complex reasoning, critical tasks |
The Sonnet vs Opus Decision
Key insight: Opus costs 67% more per token than Sonnet, but the real cost difference is even larger because:
- Per-token premium: Opus is $5 vs Sonnet's $3 (67% more)
- Token consumption: Opus uses ~60% MORE tokens to complete the same tasks
- Combined effect: Opus can cost 2.7x more in practice
Example calculation:
- Task requires 10M tokens with Sonnet
- Same task uses 16M tokens with Opus
- Sonnet cost: 10M × $3 = $30
- Opus cost: 16M × $5 = $80
- Actual premium: 167%
Recommendation
Use Sonnet 4.5 for 95% of tasks. Only switch to Opus when:
- Sonnet fails repeatedly on the specific task
- You need maximum reasoning for critical decisions
- Cost is secondary to accuracy
Prompt Caching: The 90% Cost Reducer
Prompt caching is Claude's most powerful cost optimization feature, allowing you to cache static content and pay just 10% of the normal price on subsequent reads.
How It Works
Claude caches portions of your prompts that don't change:
- System prompts
- Conversation history
- Tool/function definitions
- Reference documents
Pricing Structure
| Type | Cost (Sonnet 4.5) | vs Normal |
|---|---|---|
| Normal input | $3/M tokens | Baseline |
| Cache write (5 min) | $3.75/M tokens | +25% |
| Cache write (1 hour) | $6/M tokens | +100% |
| Cache read | $0.30/M tokens | -90% |
Break-Even Analysis
After just 2-3 requests, caching pays for itself:
Request 1: Cache write = $6/M
Request 2: Cache read = $0.30/M
Request 3: Cache read = $0.30/M
Total: $6.60 vs $9 without cache = Save $2.40
By the 10th request:
Without cache: 10 × $3 = $30/M
With cache: $6 + (9 × $0.30) = $8.70/M
Savings: $21.30 (71% reduction)
Cache Duration Options
Claude offers only two cache durations:
| Duration | Write Cost | When to Use |
|---|---|---|
| 5 minutes | 1.25x | Very frequent use (every 1-5 min) |
| 1 hour | 2x | Regular use (every 5-60 min) |
No 12-hour option exists - 1 hour is the maximum.
Choosing the Right Duration
For typical Telegram usage (messages every 15-60 minutes):
- ✅ 1-hour cache captures most conversations
- ❌ 5-minute cache would expire between messages
Recommendation: Use 1-hour cache (default in MoltBot).
Setting Up MoltBot with Claude Code Subscription
MoltBot is a personal AI assistant that bridges messaging platforms (Telegram, WhatsApp, Discord) to Claude. Here's how to set it up with your Claude Code subscription.
Prerequisites
- Claude Code subscription (Pro/Max/Code)
- Node.js ≥22
- Telegram account (for messaging integration)
Installation
# Install MoltBot globally
curl -fsSL https://clawd.bot/install.sh | bash
# Run onboarding wizard
clawdbot onboard --install-daemon
Authentication with Claude Code
You have three options:
Option 1: Setup Token (Recommended)
# Generate setup token from Claude Code CLI
claude setup-token
# Configure MoltBot
clawdbot models auth paste-token --provider anthropic
# Paste the token when prompted
Option 2: Manual OAuth Token (If setup-token fails)
If claude setup-token fails in non-interactive environments (SSH, WSL):
- Extract OAuth token from Claude Code credentials:
cat ~/.claude/.credentials.json | jq -r '.claudeAiOauth.accessToken'
- Update MoltBot's auth profile:
nano ~/.clawdbot/agents/main/agent/auth-profiles.json
- Update the
anthropic:defaulttoken field:
"anthropic:default": {
"type": "token",
"provider": "anthropic",
"token": "sk-ant-oat01-YOUR_ACCESS_TOKEN_HERE"
}
Option 3: Re-sync Claude Code OAuth
If you're logged into Claude Code CLI on the same machine:
clawdbot models status
This syncs the token automatically.
Configuration
1. Set Default Model to Sonnet
Edit ~/.clawdbot/clawdbot.json:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-5"
},
"models": {
"anthropic/claude-sonnet-4-5": {}
}
}
}
}
2. Verify Prompt Caching is Enabled
MoltBot v2026.1.24+ has prompt caching enabled by default. Verify:
# Test with two messages in same session
clawdbot agent --local --session-id test -m "Hello!"
clawdbot agent --local --session-id test -m "What's 2+2?"
# Second message should be faster (cache hit)
Verification
# Check models
clawdbot models list
# Should show: anthropic/claude-sonnet-4-5 (default)
# Check authentication
clawdbot channels list
# Should show Auth: yes for anthropic
# Test agent
clawdbot agent --local --session-id test -m "Test message"
Telegram Integration for Seamless Caching
Telegram provides the perfect interface for MoltBot with automatic session management that maximizes cache benefits.
How Telegram Sessions Work
MoltBot automatically manages sessions for Telegram:
- Each Telegram chat = separate session
- DM conversations = one session per user
- Group chats = one session per group
- Group topics/threads = separate sessions (
:topic:<threadId>)
Why This Matters for Caching
Same Telegram chat = same session = cache hits!
10:00 AM - "Help me brainstorm ideas" → Cache write: $6/M
10:15 AM - "Tell me more about idea #3" → Cache read: $0.30/M
10:45 AM - "Create a plan for that" → Cache read: $0.30/M
Within 1 hour in the same chat:
- Total: $6.60/M
- vs no cache: $9/M
- Savings: $2.40 (27%)
Setting Up Telegram Bot
- Create a bot with @BotFather
- Copy the bot token
- Configure in
~/.clawdbot/clawdbot.json:
{
"channels": {
"telegram": {
"enabled": true,
"botToken": "YOUR_BOT_TOKEN_HERE",
"dmPolicy": "pairing",
"streamMode": "partial"
}
}
}
- Start the gateway:
clawdbot gateway
- Message your bot on Telegram!
Best Practices for Cache Benefits
✅ Do This
- Keep conversations in same chat
- Continue threads instead of starting new chats
- Related topics in one conversation
- Message within 1-hour windows
- Active conversations get maximum benefit
- Cache expires after 1 hour idle
- Use
/newstrategically- Save context before switching topics
- Prevents bloated conversation history
❌ Don't Do This
- Don't scatter topics across chats
- Each new chat = new session = cache miss
- Don't restart unnecessarily
- Let cache expire naturally
- Restarts invalidate all caches
- Don't switch between DM and groups
- DM = one session
- Group = different session
- No cache carryover
Example: 10-Message Conversation
Scenario: Marketing discussion over 30 minutes
Message 1: "I need marketing ideas" → Write: $6/M
Messages 2-10: Follow-up questions → Read: $0.30/M each
Total: $6 + (9 × $0.30) = $8.70/M
vs no cache: 10 × $3 = $30/M
Savings: $21.30 (71%)
Hooks & Automation for Cost Savings
MoltBot includes built-in hooks that automate cost-saving behaviors.
Available Hooks
clawdbot hooks list
Output:
✓ ready 🚀 boot-md - Run BOOT.md on gateway startup
✓ ready 📝 command-logger - Log all command events to audit file
✓ ready 💾 session-memory - Save session context on /new command
Hook #1: session-memory (Most Important)
What it does: Saves conversation context to memory files when you issue /new
Cost benefit: Prevents re-processing identical conversation history
Savings: 30-50% reduction on continuation conversations
How to use:
[Long conversation about Project X]
You: /new
Bot: [Saves context to ~/.clawdbot/agents/main/memory/2026-01-27-project-x.md]
You: "What were we discussing about Project X?"
Bot: [Recalls from memory file instead of full transcript replay]
Why it saves money:
- Without memory: Bot replays full transcript (e.g., 10K tokens)
- With memory: Bot reads summary (e.g., 500 tokens)
- Savings: 95% token reduction on context retrieval
Hook #2: command-logger
What it does: Logs all commands to ~/.clawdbot/logs/commands.log
Cost benefit: Track usage patterns to identify cost drivers
Usage:
# See most active sessions
cat ~/.clawdbot/logs/commands.log | jq '.sessionKey' | sort | uniq -c | sort -rn | head -10
# Count total commands (proxy for token usage)
wc -l ~/.clawdbot/logs/commands.log
Hook #3: boot-md
What it does: Runs initialization instructions once at gateway startup
Cost benefit: Setup logic runs once, not per-message
Configuration: Create ~/.clawdbot/BOOT.md with startup instructions
Enabling/Disabling Hooks
All cost-saving hooks are enabled by default in modern MoltBot versions.
To manually control:
# Enable a hook
clawdbot hooks enable session-memory
# Disable a hook
clawdbot hooks disable command-logger
Cost Optimization Strategies
Strategy #1: Model Selection
Rule: Use the cheapest model that works.
| Task Type | Model | Cost/M |
|---|---|---|
| Simple classification, Q&A | Haiku 4.5 | $1/$5 |
| General development, writing | Sonnet 4.5 | $3/$15 |
| Complex reasoning, critical decisions | Opus 4.5 | $5/$25 |
Decision tree:
Does Haiku work? → Use Haiku
↓ No
Does Sonnet work? → Use Sonnet
↓ No
Use Opus (and reconsider if task really needs AI)
Strategy #2: Message Batching
Instead of:
You: "What's 2+2?"
You: "What's 3+3?"
You: "What's 4+4?"
Do this:
You: "Calculate: 2+2, 3+3, and 4+4"
Savings:
- 3 API calls → 1 API call
- 3 cache writes → 1 cache write
- Better cache utilization
Strategy #3: Strategic /new Usage
Use /new to:
- Save current context to memory
- Start fresh conversation
- Avoid bloated conversation history
Best practice:
Topic 1: Marketing ideas (10 messages)
You: /new ← Save to memory
Topic 2: Code review (8 messages)
You: /new ← Save to memory
Back to Topic 1: "What were those marketing ideas?"
Bot: ← Retrieves from memory file (cheap!)
Strategy #4: Concise Prompts
- Remove unnecessary verbosity
- Be specific, not wordy
- Use system prompts to enforce brevity
Example:
❌ Verbose (200 tokens):
I would really appreciate it if you could please help me understand
the concept of machine learning in a way that's easy to understand
for someone who doesn't have a technical background...
✅ Concise (20 tokens):
Explain machine learning for non-technical audience
Strategy #5: Long Context Optimization
For conversations that need long context:
# Use 1M context window (Sonnet only, for Console/API users)
/model anthropic.claude-sonnet-4-5-20250929-v1:0[1m]
Pricing:
- ≤200K tokens: $3/M input
- >200K tokens: $6/M input
Strategy #6: Batch API (50% Discount)
For non-urgent requests (API users only):
Pricing:
- Normal: $3 input / $15 output
- Batch: $1.50 input / $7.50 output
Can combine with caching for up to 95% total savings:
Batch + 90% cache hit = $0.15 input + $7.50 output
vs normal = $3 input + $15 output
Savings: 95% on input, 50% on output
Real-World Savings Examples
Example 1: Customer Support Bot
Scenario: Bot with 50K token product manual serving 1,000 queries/day
Without optimization:
50K manual × 1,000 queries = 50M tokens/day
Cost: 50M × $3 = $150/day = $4,500/month
With caching:
Day 1: Cache write = 50K × $6 = $0.30
Days 1-30: Cache reads = 50K × 1,000 × 30 × $0.30 = $450
Total: $450.30/month
Savings: $4,050/month (90%)
Example 2: Personal Telegram Assistant
Scenario: 100 messages/day, averaging 5K tokens each
Without optimization (using Opus):
Input: 100 × 5K = 500K tokens/day
Cost: 500K × $5 = $2.50/day = $75/month
With optimization (Sonnet + 70% cache hits):
Cache writes: 30% × 500K × $6 = $0.90/day
Cache reads: 70% × 500K × $0.30 = $0.105/day
Total: $1.005/day = $30/month
Savings: $45/month (60%)
Example 3: Development Assistant
Scenario: 50 coding sessions/month, 20K tokens each
Without optimization:
50 sessions × 20K = 1M tokens/month
Cost: 1M × $3 = $3/month
With optimization (90% cache within sessions):
New sessions: 50 × 20K × $6 = $6
Cache hits (assume 5 follow-ups per session):
250 messages × 5K avg × $0.30 = $0.375
Total: $6.375/month
Difference: Actually more expensive for one-shot sessions!
Key insight: Caching benefits multi-turn conversations, not single queries.
Summary: Combined Savings
Starting point: Unoptimized Opus usage
- Opus 4.5: $5/M input
- No caching
- No batching
- Verbose prompts
10M tokens/month scenario:
| Optimization | Cost | Savings |
|---|---|---|
| Baseline (Opus, no cache) | $50 | 0% |
| Switch to Sonnet | $30 | 40% |
| Add caching (70% hit rate) | $11.10 | 78% |
| Add message batching | $9.50 | 81% |
| Concise prompts | $8.50 | 83% |
Total savings: 83% ($41.50/month on 10M tokens)
Monitoring & Troubleshooting
Monitoring Your Costs
Claude Code Subscription
Check usage at: https://claude.ai/settings/billing
Monitor:
- Rate limit usage (% consumed)
- Overage charges (pay-per-use)
- Quota reset times
MoltBot Usage
# Check rate limits and quotas
clawdbot channels list
# View session information
clawdbot agent --local --session-id test -m "/status"
# Check command logs
tail -50 ~/.clawdbot/logs/commands.log | jq '.'
# View memory files
ls -lh ~/.clawdbot/agents/main/memory/
Estimating Token Usage
Rule of thumb:
- Average message: ~100 tokens
- With conversation context: ~500-1000 tokens/message
- With memory retrieval: ~200-300 tokens/message
Calculator:
# Rough estimate: 1 token ≈ 4 characters
echo "Your message text" | wc -c
# Divide by 4 for token estimate
Common Issues
Issue: Cache Not Working
Symptoms: Every message feels like first message (slow, expensive)
Diagnosis:
# Check session ID is consistent
clawdbot agent --local --session-id test -m "Message 1"
clawdbot agent --local --session-id test -m "Message 2"
# Must use SAME session-id
Solutions:
- ✅ Use consistent session IDs
- ✅ Keep messages within 1 hour
- ✅ Don't change system prompts
- ✅ Restart gateway if needed:
clawdbot gateway
Issue: High Costs Despite Optimization
Diagnosis:
# Check which model is actually running
clawdbot models list
# Review command logs for patterns
cat ~/.clawdbot/logs/commands.log | jq -r '.action' | sort | uniq -c | sort -rn
Common causes:
- ❌ Using Opus instead of Sonnet
- ❌ Verbose prompts (ask for concise responses)
- ❌ High output token usage (set length limits)
- ❌ Too many one-shot queries (no cache benefit)
Solutions:
- Verify Sonnet is default model
- Add to system prompt: "Be concise. Prioritize clarity over completeness."
- Ask for bullet points instead of paragraphs
- Batch related questions
Issue: Token Expiration
Symptoms: "HTTP 401 authentication_error: Invalid bearer token"
Solution:
# Option 1: Generate new setup token
claude setup-token
clawdbot models auth paste-token --provider anthropic
# Option 2: Manual OAuth refresh
cat ~/.claude/.credentials.json | jq -r '.claudeAiOauth.accessToken'
# Copy token and update ~/.clawdbot/agents/main/agent/auth-profiles.json
Conclusion
By combining the strategies in this guide, you can achieve 60-83% cost reduction on Claude usage:
Quick Wins (Immediate Impact)
- ✅ Use Sonnet 4.5 instead of Opus (40% base savings)
- ✅ Enable prompt caching (automatic in MoltBot)
- ✅ Use Telegram for automatic session management
- ✅ Enable all hooks (session-memory, command-logger, boot-md)
Behavioral Changes (Ongoing Savings)
- 💡 Use
/newregularly when switching topics - 💡 Batch related questions in single messages
- 💡 Keep conversations in same Telegram thread
- 💡 Write concise prompts (specific, not verbose)
Monitoring (Stay Optimized)
- 📊 Review billing weekly at claude.ai/settings/billing
- 📊 Check command logs for usage patterns
- 📊 Verify model selection with
clawdbot models list - 📊 Monitor memory files to confirm
/newworking
Expected Results
With full optimization on 10M tokens/month:
| Metric | Before | After | Savings |
|---|---|---|---|
| Model | Opus 4.5 | Sonnet 4.5 | 40% |
| Caching | Disabled | 70% hit rate | 30% |
| Batching | Single queries | Combined | 5% |
| Prompts | Verbose | Concise | 5% |
| Total | $50/month | $8.50/month | 83% |
Final Thoughts
Cost optimization isn't about sacrificing quality - it's about being strategic:
- Sonnet 4.5 is excellent for 95% of tasks
- Prompt caching is automatic once configured
- Telegram provides perfect UX for cache benefits
- Small behavioral changes compound into major savings
The configuration time (30-60 minutes) pays for itself in the first month for any serious Claude user.
Resources
Documentation
- Claude Prompt Caching
- Claude API Pricing
- Claude Code Model Config
- MoltBot Documentation
- MoltBot Hooks
- MoltBot Telegram Setup
Tools
Community
- MoltBot Discord (check repo for invite)
- Claude Community
Last updated: January 26, 2026
MoltBot version: 2026.1.24-3
Claude Code version: 2.1.19