The Complete Guide to Claude Cost Optimization with MoltBot and Telegram

“Efficiency is doing better what is already being done.” — Peter Drucker

TL;DR

  • Cut Claude spend with prompt discipline, caching, routing, and evals while keeping quality.
  • Mix models (e.g., Sonnet for quality, cheaper tiers for routine) and shape responses to reduce tokens.
  • Track costs per workflow and cache/reuse answers to avoid paying twice.

A comprehensive guide to reducing AI costs by up to 73% through strategic model selection, prompt caching, and smart configuration


"Price is what you pay. Value is what you get." - Warren Buffett

TL;DR

The Problem: You're hitting Claude subscription rate limits or paying overage charges.

The Solution: Reduce costs by 60-83% through:

  • Use Sonnet 4.5, not Opus (40% base savings, Opus is actually 2.7x more expensive)
  • Enable prompt caching (90% savings on repeated content, automatic in MoltBot)
  • Use Telegram for conversations (automatic session management maximizes cache hits)
  • Enable built-in hooks (session-memory, command-logger, boot-md all save costs)
  • Batch related questions (3+ messages per conversation to profit from caching)
  • Use /new strategically when switching topics (saves context to memory files)

Quick Setup (30 minutes):

# Install MoltBot
curl -fsSL https://clawd.bot/install.sh | bash
clawdbot onboard --install-daemon

# Authenticate with Claude subscription
claude setup-token
clawdbot models auth paste-token --provider anthropic

# Verify caching enabled (automatic)
clawdbot models list

Expected Results: $50/month → $8.50/month on 10M tokens (83% reduction)

Read if: You want the complete setup guide, economics analysis, troubleshooting, and real-world examples.


Table of Contents

  1. Introduction
  2. Understanding Claude Models & Pricing
  3. Prompt Caching: The 90% Cost Reducer
  4. Setting Up MoltBot with Claude Code Subscription
  5. Telegram Integration for Seamless Caching
  6. Hooks & Automation for Cost Savings
  7. Cost Optimization Strategies
  8. Real-World Savings Examples
  9. Monitoring & Troubleshooting
  10. Conclusion

Introduction

If you're using Claude Code with a subscription and hitting your rate limits (or worse, paying for overage), you're probably looking for ways to reduce costs without sacrificing quality. This guide covers everything you need to know about optimizing Claude costs through:

  • Model selection (Sonnet vs Opus)
  • Prompt caching (90% savings on repeated content)
  • MoltBot integration (personal AI assistant on Telegram)
  • Smart configuration (hooks, session management, automation)

By the end, you'll have a setup that can reduce costs by 60-73% compared to unoptimized usage.


Understanding Claude Models & Pricing

Available Models (2026)

Model Input Cost Output Cost Best For
Haiku 4.5 $1/M tokens $5/M tokens Simple tasks, classification
Sonnet 4.5 $3/M tokens $15/M tokens General-purpose daily work
Opus 4.5 $5/M tokens $25/M tokens Complex reasoning, critical tasks

The Sonnet vs Opus Decision

Key insight: Opus costs 67% more per token than Sonnet, but the real cost difference is even larger because:

  1. Per-token premium: Opus is $5 vs Sonnet's $3 (67% more)
  2. Token consumption: Opus uses ~60% MORE tokens to complete the same tasks
  3. Combined effect: Opus can cost 2.7x more in practice

Example calculation:

  • Task requires 10M tokens with Sonnet
  • Same task uses 16M tokens with Opus
  • Sonnet cost: 10M × $3 = $30
  • Opus cost: 16M × $5 = $80
  • Actual premium: 167%

Recommendation

Use Sonnet 4.5 for 95% of tasks. Only switch to Opus when:

  • Sonnet fails repeatedly on the specific task
  • You need maximum reasoning for critical decisions
  • Cost is secondary to accuracy

Prompt Caching: The 90% Cost Reducer

Prompt caching is Claude's most powerful cost optimization feature, allowing you to cache static content and pay just 10% of the normal price on subsequent reads.

How It Works

Claude caches portions of your prompts that don't change:

  • System prompts
  • Conversation history
  • Tool/function definitions
  • Reference documents

Pricing Structure

Type Cost (Sonnet 4.5) vs Normal
Normal input $3/M tokens Baseline
Cache write (5 min) $3.75/M tokens +25%
Cache write (1 hour) $6/M tokens +100%
Cache read $0.30/M tokens -90%

Break-Even Analysis

After just 2-3 requests, caching pays for itself:

Request 1: Cache write = $6/M
Request 2: Cache read = $0.30/M
Request 3: Cache read = $0.30/M
Total: $6.60 vs $9 without cache = Save $2.40

By the 10th request:

Without cache: 10 × $3 = $30/M
With cache: $6 + (9 × $0.30) = $8.70/M
Savings: $21.30 (71% reduction)

Cache Duration Options

Claude offers only two cache durations:

Duration Write Cost When to Use
5 minutes 1.25x Very frequent use (every 1-5 min)
1 hour 2x Regular use (every 5-60 min)

No 12-hour option exists - 1 hour is the maximum.

Choosing the Right Duration

For typical Telegram usage (messages every 15-60 minutes):

  • 1-hour cache captures most conversations
  • 5-minute cache would expire between messages

Recommendation: Use 1-hour cache (default in MoltBot).


Setting Up MoltBot with Claude Code Subscription

MoltBot is a personal AI assistant that bridges messaging platforms (Telegram, WhatsApp, Discord) to Claude. Here's how to set it up with your Claude Code subscription.

Prerequisites

  • Claude Code subscription (Pro/Max/Code)
  • Node.js ≥22
  • Telegram account (for messaging integration)

Installation

# Install MoltBot globally
curl -fsSL https://clawd.bot/install.sh | bash

# Run onboarding wizard
clawdbot onboard --install-daemon

Authentication with Claude Code

You have three options:

# Generate setup token from Claude Code CLI
claude setup-token

# Configure MoltBot
clawdbot models auth paste-token --provider anthropic
# Paste the token when prompted

Option 2: Manual OAuth Token (If setup-token fails)

If claude setup-token fails in non-interactive environments (SSH, WSL):

  1. Extract OAuth token from Claude Code credentials:
cat ~/.claude/.credentials.json | jq -r '.claudeAiOauth.accessToken'
  1. Update MoltBot's auth profile:
nano ~/.clawdbot/agents/main/agent/auth-profiles.json
  1. Update the anthropic:default token field:
"anthropic:default": {
  "type": "token",
  "provider": "anthropic",
  "token": "sk-ant-oat01-YOUR_ACCESS_TOKEN_HERE"
}

Option 3: Re-sync Claude Code OAuth

If you're logged into Claude Code CLI on the same machine:

clawdbot models status

This syncs the token automatically.

Configuration

1. Set Default Model to Sonnet

Edit ~/.clawdbot/clawdbot.json:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-5"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {}
      }
    }
  }
}

2. Verify Prompt Caching is Enabled

MoltBot v2026.1.24+ has prompt caching enabled by default. Verify:

# Test with two messages in same session
clawdbot agent --local --session-id test -m "Hello!"
clawdbot agent --local --session-id test -m "What's 2+2?"
# Second message should be faster (cache hit)

Verification

# Check models
clawdbot models list
# Should show: anthropic/claude-sonnet-4-5 (default)

# Check authentication
clawdbot channels list
# Should show Auth: yes for anthropic

# Test agent
clawdbot agent --local --session-id test -m "Test message"

Telegram Integration for Seamless Caching

Telegram provides the perfect interface for MoltBot with automatic session management that maximizes cache benefits.

How Telegram Sessions Work

MoltBot automatically manages sessions for Telegram:

  • Each Telegram chat = separate session
  • DM conversations = one session per user
  • Group chats = one session per group
  • Group topics/threads = separate sessions (:topic:<threadId>)

Why This Matters for Caching

Same Telegram chat = same session = cache hits!

10:00 AM - "Help me brainstorm ideas" → Cache write: $6/M
10:15 AM - "Tell me more about idea #3" → Cache read: $0.30/M
10:45 AM - "Create a plan for that" → Cache read: $0.30/M

Within 1 hour in the same chat:

  • Total: $6.60/M
  • vs no cache: $9/M
  • Savings: $2.40 (27%)

Setting Up Telegram Bot

  1. Create a bot with @BotFather
  2. Copy the bot token
  3. Configure in ~/.clawdbot/clawdbot.json:
{
  "channels": {
    "telegram": {
      "enabled": true,
      "botToken": "YOUR_BOT_TOKEN_HERE",
      "dmPolicy": "pairing",
      "streamMode": "partial"
    }
  }
}
  1. Start the gateway:
clawdbot gateway
  1. Message your bot on Telegram!

Best Practices for Cache Benefits

✅ Do This

  1. Keep conversations in same chat
    • Continue threads instead of starting new chats
    • Related topics in one conversation
  2. Message within 1-hour windows
    • Active conversations get maximum benefit
    • Cache expires after 1 hour idle
  3. Use /new strategically
    • Save context before switching topics
    • Prevents bloated conversation history

❌ Don't Do This

  1. Don't scatter topics across chats
    • Each new chat = new session = cache miss
  2. Don't restart unnecessarily
    • Let cache expire naturally
    • Restarts invalidate all caches
  3. Don't switch between DM and groups
    • DM = one session
    • Group = different session
    • No cache carryover

Example: 10-Message Conversation

Scenario: Marketing discussion over 30 minutes

Message 1: "I need marketing ideas" → Write: $6/M
Messages 2-10: Follow-up questions → Read: $0.30/M each

Total: $6 + (9 × $0.30) = $8.70/M
vs no cache: 10 × $3 = $30/M
Savings: $21.30 (71%)

Hooks & Automation for Cost Savings

MoltBot includes built-in hooks that automate cost-saving behaviors.

Available Hooks

clawdbot hooks list

Output:

✓ ready   🚀 boot-md        - Run BOOT.md on gateway startup
✓ ready   📝 command-logger - Log all command events to audit file
✓ ready   💾 session-memory - Save session context on /new command

Hook #1: session-memory (Most Important)

What it does: Saves conversation context to memory files when you issue /new

Cost benefit: Prevents re-processing identical conversation history

Savings: 30-50% reduction on continuation conversations

How to use:

[Long conversation about Project X]
You: /new
Bot: [Saves context to ~/.clawdbot/agents/main/memory/2026-01-27-project-x.md]
You: "What were we discussing about Project X?"
Bot: [Recalls from memory file instead of full transcript replay]

Why it saves money:

  • Without memory: Bot replays full transcript (e.g., 10K tokens)
  • With memory: Bot reads summary (e.g., 500 tokens)
  • Savings: 95% token reduction on context retrieval

Hook #2: command-logger

What it does: Logs all commands to ~/.clawdbot/logs/commands.log

Cost benefit: Track usage patterns to identify cost drivers

Usage:

# See most active sessions
cat ~/.clawdbot/logs/commands.log | jq '.sessionKey' | sort | uniq -c | sort -rn | head -10

# Count total commands (proxy for token usage)
wc -l ~/.clawdbot/logs/commands.log

Hook #3: boot-md

What it does: Runs initialization instructions once at gateway startup

Cost benefit: Setup logic runs once, not per-message

Configuration: Create ~/.clawdbot/BOOT.md with startup instructions

Enabling/Disabling Hooks

All cost-saving hooks are enabled by default in modern MoltBot versions.

To manually control:

# Enable a hook
clawdbot hooks enable session-memory

# Disable a hook
clawdbot hooks disable command-logger

Cost Optimization Strategies

Strategy #1: Model Selection

Rule: Use the cheapest model that works.

Task Type Model Cost/M
Simple classification, Q&A Haiku 4.5 $1/$5
General development, writing Sonnet 4.5 $3/$15
Complex reasoning, critical decisions Opus 4.5 $5/$25

Decision tree:

Does Haiku work? → Use Haiku
  ↓ No
Does Sonnet work? → Use Sonnet
  ↓ No
Use Opus (and reconsider if task really needs AI)

Strategy #2: Message Batching

Instead of:

You: "What's 2+2?"
You: "What's 3+3?"
You: "What's 4+4?"

Do this:

You: "Calculate: 2+2, 3+3, and 4+4"

Savings:

  • 3 API calls → 1 API call
  • 3 cache writes → 1 cache write
  • Better cache utilization

Strategy #3: Strategic /new Usage

Use /new to:

  • Save current context to memory
  • Start fresh conversation
  • Avoid bloated conversation history

Best practice:

Topic 1: Marketing ideas (10 messages)
You: /new  ← Save to memory

Topic 2: Code review (8 messages)
You: /new  ← Save to memory

Back to Topic 1: "What were those marketing ideas?"
Bot: ← Retrieves from memory file (cheap!)

Strategy #4: Concise Prompts

  • Remove unnecessary verbosity
  • Be specific, not wordy
  • Use system prompts to enforce brevity

Example:

Verbose (200 tokens):

I would really appreciate it if you could please help me understand
the concept of machine learning in a way that's easy to understand
for someone who doesn't have a technical background...

Concise (20 tokens):

Explain machine learning for non-technical audience

Strategy #5: Long Context Optimization

For conversations that need long context:

# Use 1M context window (Sonnet only, for Console/API users)
/model anthropic.claude-sonnet-4-5-20250929-v1:0[1m]

Pricing:

  • ≤200K tokens: $3/M input
  • >200K tokens: $6/M input

Strategy #6: Batch API (50% Discount)

For non-urgent requests (API users only):

Pricing:

  • Normal: $3 input / $15 output
  • Batch: $1.50 input / $7.50 output

Can combine with caching for up to 95% total savings:

Batch + 90% cache hit = $0.15 input + $7.50 output
vs normal = $3 input + $15 output
Savings: 95% on input, 50% on output

Real-World Savings Examples

Example 1: Customer Support Bot

Scenario: Bot with 50K token product manual serving 1,000 queries/day

Without optimization:

50K manual × 1,000 queries = 50M tokens/day
Cost: 50M × $3 = $150/day = $4,500/month

With caching:

Day 1: Cache write = 50K × $6 = $0.30
Days 1-30: Cache reads = 50K × 1,000 × 30 × $0.30 = $450
Total: $450.30/month
Savings: $4,050/month (90%)

Example 2: Personal Telegram Assistant

Scenario: 100 messages/day, averaging 5K tokens each

Without optimization (using Opus):

Input: 100 × 5K = 500K tokens/day
Cost: 500K × $5 = $2.50/day = $75/month

With optimization (Sonnet + 70% cache hits):

Cache writes: 30% × 500K × $6 = $0.90/day
Cache reads: 70% × 500K × $0.30 = $0.105/day
Total: $1.005/day = $30/month
Savings: $45/month (60%)

Example 3: Development Assistant

Scenario: 50 coding sessions/month, 20K tokens each

Without optimization:

50 sessions × 20K = 1M tokens/month
Cost: 1M × $3 = $3/month

With optimization (90% cache within sessions):

New sessions: 50 × 20K × $6 = $6
Cache hits (assume 5 follow-ups per session):
  250 messages × 5K avg × $0.30 = $0.375
Total: $6.375/month
Difference: Actually more expensive for one-shot sessions!

Key insight: Caching benefits multi-turn conversations, not single queries.

Summary: Combined Savings

Starting point: Unoptimized Opus usage

  • Opus 4.5: $5/M input
  • No caching
  • No batching
  • Verbose prompts

10M tokens/month scenario:

Optimization Cost Savings
Baseline (Opus, no cache) $50 0%
Switch to Sonnet $30 40%
Add caching (70% hit rate) $11.10 78%
Add message batching $9.50 81%
Concise prompts $8.50 83%

Total savings: 83% ($41.50/month on 10M tokens)


Monitoring & Troubleshooting

Monitoring Your Costs

Claude Code Subscription

Check usage at: https://claude.ai/settings/billing

Monitor:

  • Rate limit usage (% consumed)
  • Overage charges (pay-per-use)
  • Quota reset times

MoltBot Usage

# Check rate limits and quotas
clawdbot channels list

# View session information
clawdbot agent --local --session-id test -m "/status"

# Check command logs
tail -50 ~/.clawdbot/logs/commands.log | jq '.'

# View memory files
ls -lh ~/.clawdbot/agents/main/memory/

Estimating Token Usage

Rule of thumb:

  • Average message: ~100 tokens
  • With conversation context: ~500-1000 tokens/message
  • With memory retrieval: ~200-300 tokens/message

Calculator:

# Rough estimate: 1 token ≈ 4 characters
echo "Your message text" | wc -c
# Divide by 4 for token estimate

Common Issues

Issue: Cache Not Working

Symptoms: Every message feels like first message (slow, expensive)

Diagnosis:

# Check session ID is consistent
clawdbot agent --local --session-id test -m "Message 1"
clawdbot agent --local --session-id test -m "Message 2"
# Must use SAME session-id

Solutions:

  1. ✅ Use consistent session IDs
  2. ✅ Keep messages within 1 hour
  3. ✅ Don't change system prompts
  4. ✅ Restart gateway if needed: clawdbot gateway

Issue: High Costs Despite Optimization

Diagnosis:

# Check which model is actually running
clawdbot models list

# Review command logs for patterns
cat ~/.clawdbot/logs/commands.log | jq -r '.action' | sort | uniq -c | sort -rn

Common causes:

  1. ❌ Using Opus instead of Sonnet
  2. ❌ Verbose prompts (ask for concise responses)
  3. ❌ High output token usage (set length limits)
  4. ❌ Too many one-shot queries (no cache benefit)

Solutions:

  1. Verify Sonnet is default model
  2. Add to system prompt: "Be concise. Prioritize clarity over completeness."
  3. Ask for bullet points instead of paragraphs
  4. Batch related questions

Issue: Token Expiration

Symptoms: "HTTP 401 authentication_error: Invalid bearer token"

Solution:

# Option 1: Generate new setup token
claude setup-token
clawdbot models auth paste-token --provider anthropic

# Option 2: Manual OAuth refresh
cat ~/.claude/.credentials.json | jq -r '.claudeAiOauth.accessToken'
# Copy token and update ~/.clawdbot/agents/main/agent/auth-profiles.json

Conclusion

By combining the strategies in this guide, you can achieve 60-83% cost reduction on Claude usage:

Quick Wins (Immediate Impact)

  1. Use Sonnet 4.5 instead of Opus (40% base savings)
  2. Enable prompt caching (automatic in MoltBot)
  3. Use Telegram for automatic session management
  4. Enable all hooks (session-memory, command-logger, boot-md)

Behavioral Changes (Ongoing Savings)

  1. 💡 Use /new regularly when switching topics
  2. 💡 Batch related questions in single messages
  3. 💡 Keep conversations in same Telegram thread
  4. 💡 Write concise prompts (specific, not verbose)

Monitoring (Stay Optimized)

  1. 📊 Review billing weekly at claude.ai/settings/billing
  2. 📊 Check command logs for usage patterns
  3. 📊 Verify model selection with clawdbot models list
  4. 📊 Monitor memory files to confirm /new working

Expected Results

With full optimization on 10M tokens/month:

Metric Before After Savings
Model Opus 4.5 Sonnet 4.5 40%
Caching Disabled 70% hit rate 30%
Batching Single queries Combined 5%
Prompts Verbose Concise 5%
Total $50/month $8.50/month 83%

Final Thoughts

Cost optimization isn't about sacrificing quality - it's about being strategic:

  • Sonnet 4.5 is excellent for 95% of tasks
  • Prompt caching is automatic once configured
  • Telegram provides perfect UX for cache benefits
  • Small behavioral changes compound into major savings

The configuration time (30-60 minutes) pays for itself in the first month for any serious Claude user.


Resources

Documentation

Tools

Community


Last updated: January 26, 2026
MoltBot version: 2026.1.24-3
Claude Code version: 2.1.19