The Complete Guide to Claude Cost Optimization with MoltBot and Telegram

“Efficiency is doing better what is already being done.” — Peter Drucker

TL;DR

Cut Claude spend with prompt discipline, caching, routing, and evals while keeping quality.
Mix models (e.g., Sonnet for quality, cheaper tiers for routine) and shape responses to reduce tokens.
Track costs per workflow and cache/reuse answers to avoid paying twice.

A comprehensive guide to reducing AI costs by up to 73% through strategic model selection, prompt caching, and smart configuration

"Price is what you pay. Value is what you get." - Warren Buffett

TL;DR

The Problem: You're hitting Claude subscription rate limits or paying overage charges.

The Solution: Reduce costs by 60-83% through:

✅ Use Sonnet 4.5, not Opus (40% base savings, Opus is actually 2.7x more expensive)
✅ Enable prompt caching (90% savings on repeated content, automatic in MoltBot)
✅ Use Telegram for conversations (automatic session management maximizes cache hits)
✅ Enable built-in hooks (session-memory, command-logger, boot-md all save costs)
✅ Batch related questions (3+ messages per conversation to profit from caching)
✅ Use /new strategically when switching topics (saves context to memory files)

Quick Setup (30 minutes):

# Install MoltBot
curl -fsSL https://clawd.bot/install.sh | bash
clawdbot onboard --install-daemon

# Authenticate with Claude subscription
claude setup-token
clawdbot models auth paste-token --provider anthropic

# Verify caching enabled (automatic)
clawdbot models list

Expected Results: $50/month → $8.50/month on 10M tokens (83% reduction)

Read if: You want the complete setup guide, economics analysis, troubleshooting, and real-world examples.

Introduction
Understanding Claude Models & Pricing
Prompt Caching: The 90% Cost Reducer
Setting Up MoltBot with Claude Code Subscription
Telegram Integration for Seamless Caching
Hooks & Automation for Cost Savings
Cost Optimization Strategies
Real-World Savings Examples
Monitoring & Troubleshooting
Conclusion

Introduction

If you're using Claude Code with a subscription and hitting your rate limits (or worse, paying for overage), you're probably looking for ways to reduce costs without sacrificing quality. This guide covers everything you need to know about optimizing Claude costs through:

Model selection (Sonnet vs Opus)
Prompt caching (90% savings on repeated content)
MoltBot integration (personal AI assistant on Telegram)
Smart configuration (hooks, session management, automation)

By the end, you'll have a setup that can reduce costs by 60-73% compared to unoptimized usage.

Understanding Claude Models & Pricing

Available Models (2026)

Model	Input Cost	Output Cost	Best For
Haiku 4.5	$1/M tokens	$5/M tokens	Simple tasks, classification
Sonnet 4.5	$3/M tokens	$15/M tokens	General-purpose daily work
Opus 4.5	$5/M tokens	$25/M tokens	Complex reasoning, critical tasks

The Sonnet vs Opus Decision

Key insight: Opus costs 67% more per token than Sonnet, but the real cost difference is even larger because:

Per-token premium: Opus is $5 vs Sonnet's $3 (67% more)
Token consumption: Opus uses ~60% MORE tokens to complete the same tasks
Combined effect: Opus can cost 2.7x more in practice

Example calculation:

Task requires 10M tokens with Sonnet
Same task uses 16M tokens with Opus
Sonnet cost: 10M × $3 = $30
Opus cost: 16M × $5 = $80
Actual premium: 167%

Recommendation

Use Sonnet 4.5 for 95% of tasks. Only switch to Opus when:

Sonnet fails repeatedly on the specific task
You need maximum reasoning for critical decisions
Cost is secondary to accuracy

Prompt Caching: The 90% Cost Reducer

Prompt caching is Claude's most powerful cost optimization feature, allowing you to cache static content and pay just 10% of the normal price on subsequent reads.

How It Works

Claude caches portions of your prompts that don't change:

System prompts
Conversation history
Tool/function definitions
Reference documents

Pricing Structure

Type	Cost (Sonnet 4.5)	vs Normal
Normal input	$3/M tokens	Baseline
Cache write (5 min)	$3.75/M tokens	+25%
Cache write (1 hour)	$6/M tokens	+100%
Cache read	$0.30/M tokens	-90%

Break-Even Analysis

After just 2-3 requests, caching pays for itself:

Request 1: Cache write = $6/M
Request 2: Cache read = $0.30/M
Request 3: Cache read = $0.30/M
Total: $6.60 vs $9 without cache = Save $2.40

By the 10th request:

Without cache: 10 × $3 = $30/M
With cache: $6 + (9 × $0.30) = $8.70/M
Savings: $21.30 (71% reduction)

Cache Duration Options

Claude offers only two cache durations:

Duration	Write Cost	When to Use
5 minutes	1.25x	Very frequent use (every 1-5 min)
1 hour	2x	Regular use (every 5-60 min)

No 12-hour option exists - 1 hour is the maximum.

Choosing the Right Duration

For typical Telegram usage (messages every 15-60 minutes):

✅ 1-hour cache captures most conversations
❌ 5-minute cache would expire between messages

Recommendation: Use 1-hour cache (default in MoltBot).

Setting Up MoltBot with Claude Code Subscription

MoltBot is a personal AI assistant that bridges messaging platforms (Telegram, WhatsApp, Discord) to Claude. Here's how to set it up with your Claude Code subscription.

Prerequisites

Claude Code subscription (Pro/Max/Code)
Node.js ≥22
Telegram account (for messaging integration)

Installation

# Install MoltBot globally
curl -fsSL https://clawd.bot/install.sh | bash

# Run onboarding wizard
clawdbot onboard --install-daemon

Authentication with Claude Code

You have three options:

Option 1: Setup Token (Recommended)

# Generate setup token from Claude Code CLI
claude setup-token

# Configure MoltBot
clawdbot models auth paste-token --provider anthropic
# Paste the token when prompted

Option 2: Manual OAuth Token (If setup-token fails)

If claude setup-token fails in non-interactive environments (SSH, WSL):

Extract OAuth token from Claude Code credentials:

cat ~/.claude/.credentials.json | jq -r '.claudeAiOauth.accessToken'

Update MoltBot's auth profile:

nano ~/.clawdbot/agents/main/agent/auth-profiles.json

Update the anthropic:default token field:

"anthropic:default": {
  "type": "token",
  "provider": "anthropic",
  "token": "sk-ant-oat01-YOUR_ACCESS_TOKEN_HERE"
}

Option 3: Re-sync Claude Code OAuth

If you're logged into Claude Code CLI on the same machine:

clawdbot models status

This syncs the token automatically.

Configuration

1. Set Default Model to Sonnet

Edit ~/.clawdbot/clawdbot.json:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-5"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {}
      }
    }
  }
}

2. Verify Prompt Caching is Enabled

MoltBot v2026.1.24+ has prompt caching enabled by default. Verify:

# Test with two messages in same session
clawdbot agent --local --session-id test -m "Hello!"
clawdbot agent --local --session-id test -m "What's 2+2?"
# Second message should be faster (cache hit)

Verification

# Check models
clawdbot models list
# Should show: anthropic/claude-sonnet-4-5 (default)

# Check authentication
clawdbot channels list
# Should show Auth: yes for anthropic

# Test agent
clawdbot agent --local --session-id test -m "Test message"

Telegram Integration for Seamless Caching

Telegram provides the perfect interface for MoltBot with automatic session management that maximizes cache benefits.

How Telegram Sessions Work

MoltBot automatically manages sessions for Telegram:

Each Telegram chat = separate session
DM conversations = one session per user
Group chats = one session per group
Group topics/threads = separate sessions (:topic:<threadId>)

Why This Matters for Caching

Same Telegram chat = same session = cache hits!

10:00 AM - "Help me brainstorm ideas" → Cache write: $6/M
10:15 AM - "Tell me more about idea #3" → Cache read: $0.30/M
10:45 AM - "Create a plan for that" → Cache read: $0.30/M

Within 1 hour in the same chat:

Total: $6.60/M
vs no cache: $9/M
Savings: $2.40 (27%)

Setting Up Telegram Bot

Create a bot with @BotFather
Copy the bot token
Configure in ~/.clawdbot/clawdbot.json:

{
  "channels": {
    "telegram": {
      "enabled": true,
      "botToken": "YOUR_BOT_TOKEN_HERE",
      "dmPolicy": "pairing",
      "streamMode": "partial"
    }
  }
}

Start the gateway:

clawdbot gateway

Message your bot on Telegram!

Best Practices for Cache Benefits

✅ Do This

Keep conversations in same chat
- Continue threads instead of starting new chats
- Related topics in one conversation
Message within 1-hour windows
- Active conversations get maximum benefit
- Cache expires after 1 hour idle
Use /new strategically
- Save context before switching topics
- Prevents bloated conversation history

❌ Don't Do This

Don't scatter topics across chats
- Each new chat = new session = cache miss
Don't restart unnecessarily
- Let cache expire naturally
- Restarts invalidate all caches
Don't switch between DM and groups
- DM = one session
- Group = different session
- No cache carryover

Example: 10-Message Conversation

Scenario: Marketing discussion over 30 minutes

Message 1: "I need marketing ideas" → Write: $6/M
Messages 2-10: Follow-up questions → Read: $0.30/M each

Total: $6 + (9 × $0.30) = $8.70/M
vs no cache: 10 × $3 = $30/M
Savings: $21.30 (71%)

Hooks & Automation for Cost Savings

MoltBot includes built-in hooks that automate cost-saving behaviors.

Available Hooks

clawdbot hooks list

Output:

✓ ready   🚀 boot-md        - Run BOOT.md on gateway startup
✓ ready   📝 command-logger - Log all command events to audit file
✓ ready   💾 session-memory - Save session context on /new command

Hook #1: session-memory (Most Important)

What it does: Saves conversation context to memory files when you issue /new

Cost benefit: Prevents re-processing identical conversation history

Savings: 30-50% reduction on continuation conversations

How to use:

[Long conversation about Project X]
You: /new
Bot: [Saves context to ~/.clawdbot/agents/main/memory/2026-01-27-project-x.md]
You: "What were we discussing about Project X?"
Bot: [Recalls from memory file instead of full transcript replay]

Why it saves money:

Without memory: Bot replays full transcript (e.g., 10K tokens)
With memory: Bot reads summary (e.g., 500 tokens)
Savings: 95% token reduction on context retrieval

Hook #2: command-logger

What it does: Logs all commands to ~/.clawdbot/logs/commands.log

Cost benefit: Track usage patterns to identify cost drivers

Usage:

# See most active sessions
cat ~/.clawdbot/logs/commands.log | jq '.sessionKey' | sort | uniq -c | sort -rn | head -10

# Count total commands (proxy for token usage)
wc -l ~/.clawdbot/logs/commands.log

Hook #3: boot-md

What it does: Runs initialization instructions once at gateway startup

Cost benefit: Setup logic runs once, not per-message

Configuration: Create ~/.clawdbot/BOOT.md with startup instructions

Enabling/Disabling Hooks

All cost-saving hooks are enabled by default in modern MoltBot versions.

To manually control:

# Enable a hook
clawdbot hooks enable session-memory

# Disable a hook
clawdbot hooks disable command-logger

Cost Optimization Strategies

Strategy #1: Model Selection

Rule: Use the cheapest model that works.

Task Type	Model	Cost/M
Simple classification, Q&A	Haiku 4.5	$1/$5
General development, writing	Sonnet 4.5	$3/$15
Complex reasoning, critical decisions	Opus 4.5	$5/$25

Decision tree:

Does Haiku work? → Use Haiku
  ↓ No
Does Sonnet work? → Use Sonnet
  ↓ No
Use Opus (and reconsider if task really needs AI)

Strategy #2: Message Batching

Instead of:

You: "What's 2+2?"
You: "What's 3+3?"
You: "What's 4+4?"

Do this:

You: "Calculate: 2+2, 3+3, and 4+4"

Savings:

3 API calls → 1 API call
3 cache writes → 1 cache write
Better cache utilization

Strategy #3: Strategic `/new` Usage

Use /new to:

Save current context to memory
Start fresh conversation
Avoid bloated conversation history

Best practice:

Topic 1: Marketing ideas (10 messages)
You: /new  ← Save to memory

Topic 2: Code review (8 messages)
You: /new  ← Save to memory

Back to Topic 1: "What were those marketing ideas?"
Bot: ← Retrieves from memory file (cheap!)

Strategy #4: Concise Prompts

Remove unnecessary verbosity
Be specific, not wordy
Use system prompts to enforce brevity

Example:

❌ Verbose (200 tokens):

I would really appreciate it if you could please help me understand
the concept of machine learning in a way that's easy to understand
for someone who doesn't have a technical background...

✅ Concise (20 tokens):

Explain machine learning for non-technical audience

Strategy #5: Long Context Optimization

For conversations that need long context:

# Use 1M context window (Sonnet only, for Console/API users)
/model anthropic.claude-sonnet-4-5-20250929-v1:0[1m]

Pricing:

≤200K tokens: $3/M input
>200K tokens: $6/M input

Strategy #6: Batch API (50% Discount)

For non-urgent requests (API users only):

Pricing:

Normal: $3 input / $15 output
Batch: $1.50 input / $7.50 output

Can combine with caching for up to 95% total savings:

Batch + 90% cache hit = $0.15 input + $7.50 output
vs normal = $3 input + $15 output
Savings: 95% on input, 50% on output

Real-World Savings Examples

Example 1: Customer Support Bot

Scenario: Bot with 50K token product manual serving 1,000 queries/day

Without optimization:

50K manual × 1,000 queries = 50M tokens/day
Cost: 50M × $3 = $150/day = $4,500/month

With caching:

Day 1: Cache write = 50K × $6 = $0.30
Days 1-30: Cache reads = 50K × 1,000 × 30 × $0.30 = $450
Total: $450.30/month
Savings: $4,050/month (90%)

Example 2: Personal Telegram Assistant

Scenario: 100 messages/day, averaging 5K tokens each

Without optimization (using Opus):

Input: 100 × 5K = 500K tokens/day
Cost: 500K × $5 = $2.50/day = $75/month

With optimization (Sonnet + 70% cache hits):

Cache writes: 30% × 500K × $6 = $0.90/day
Cache reads: 70% × 500K × $0.30 = $0.105/day
Total: $1.005/day = $30/month
Savings: $45/month (60%)

Example 3: Development Assistant

Scenario: 50 coding sessions/month, 20K tokens each

Without optimization:

50 sessions × 20K = 1M tokens/month
Cost: 1M × $3 = $3/month

With optimization (90% cache within sessions):

New sessions: 50 × 20K × $6 = $6
Cache hits (assume 5 follow-ups per session):
  250 messages × 5K avg × $0.30 = $0.375
Total: $6.375/month
Difference: Actually more expensive for one-shot sessions!

Key insight: Caching benefits multi-turn conversations, not single queries.

Summary: Combined Savings

Starting point: Unoptimized Opus usage

Opus 4.5: $5/M input
No caching
No batching
Verbose prompts

10M tokens/month scenario:

Optimization	Cost	Savings
Baseline (Opus, no cache)	$50	0%
Switch to Sonnet	$30	40%
Add caching (70% hit rate)	$11.10	78%
Add message batching	$9.50	81%
Concise prompts	$8.50	83%

Total savings: 83% ($41.50/month on 10M tokens)

Monitoring & Troubleshooting

Monitoring Your Costs

Claude Code Subscription

Check usage at: https://claude.ai/settings/billing

Monitor:

Rate limit usage (% consumed)
Overage charges (pay-per-use)
Quota reset times

MoltBot Usage

# Check rate limits and quotas
clawdbot channels list

# View session information
clawdbot agent --local --session-id test -m "/status"

# Check command logs
tail -50 ~/.clawdbot/logs/commands.log | jq '.'

# View memory files
ls -lh ~/.clawdbot/agents/main/memory/

Estimating Token Usage

Rule of thumb:

Average message: ~100 tokens
With conversation context: ~500-1000 tokens/message
With memory retrieval: ~200-300 tokens/message

Calculator:

# Rough estimate: 1 token ≈ 4 characters
echo "Your message text" | wc -c
# Divide by 4 for token estimate

Common Issues

Issue: Cache Not Working

Symptoms: Every message feels like first message (slow, expensive)

Diagnosis:

# Check session ID is consistent
clawdbot agent --local --session-id test -m "Message 1"
clawdbot agent --local --session-id test -m "Message 2"
# Must use SAME session-id

Solutions:

✅ Use consistent session IDs
✅ Keep messages within 1 hour
✅ Don't change system prompts
✅ Restart gateway if needed: clawdbot gateway

Issue: High Costs Despite Optimization

Diagnosis:

# Check which model is actually running
clawdbot models list

# Review command logs for patterns
cat ~/.clawdbot/logs/commands.log | jq -r '.action' | sort | uniq -c | sort -rn

Common causes:

❌ Using Opus instead of Sonnet
❌ Verbose prompts (ask for concise responses)
❌ High output token usage (set length limits)
❌ Too many one-shot queries (no cache benefit)

Solutions:

Verify Sonnet is default model
Add to system prompt: "Be concise. Prioritize clarity over completeness."
Ask for bullet points instead of paragraphs
Batch related questions

Issue: Token Expiration

Symptoms: "HTTP 401 authentication_error: Invalid bearer token"

Solution:

# Option 1: Generate new setup token
claude setup-token
clawdbot models auth paste-token --provider anthropic

# Option 2: Manual OAuth refresh
cat ~/.claude/.credentials.json | jq -r '.claudeAiOauth.accessToken'
# Copy token and update ~/.clawdbot/agents/main/agent/auth-profiles.json

Conclusion

By combining the strategies in this guide, you can achieve 60-83% cost reduction on Claude usage:

Quick Wins (Immediate Impact)

✅ Use Sonnet 4.5 instead of Opus (40% base savings)
✅ Enable prompt caching (automatic in MoltBot)
✅ Use Telegram for automatic session management
✅ Enable all hooks (session-memory, command-logger, boot-md)

Behavioral Changes (Ongoing Savings)

💡 Use /new regularly when switching topics
💡 Batch related questions in single messages
💡 Keep conversations in same Telegram thread
💡 Write concise prompts (specific, not verbose)

Monitoring (Stay Optimized)

📊 Review billing weekly at claude.ai/settings/billing
📊 Check command logs for usage patterns
📊 Verify model selection with clawdbot models list
📊 Monitor memory files to confirm /new working

Expected Results

With full optimization on 10M tokens/month:

Metric	Before	After	Savings
Model	Opus 4.5	Sonnet 4.5	40%
Caching	Disabled	70% hit rate	30%
Batching	Single queries	Combined	5%
Prompts	Verbose	Concise	5%
Total	$50/month	$8.50/month	83%

Final Thoughts

Cost optimization isn't about sacrificing quality - it's about being strategic:

Sonnet 4.5 is excellent for 95% of tasks
Prompt caching is automatic once configured
Telegram provides perfect UX for cache benefits
Small behavioral changes compound into major savings

The configuration time (30-60 minutes) pays for itself in the first month for any serious Claude user.

Resources

Documentation

Tools

Community

MoltBot Discord (check repo for invite)
Claude Community

Last updated: January 26, 2026
MoltBot version: 2026.1.24-3
Claude Code version: 2.1.19

The Complete Guide to Claude Cost Optimization with MoltBot and Telegram

TL;DR

TL;DR

Table of Contents

Introduction

Understanding Claude Models & Pricing

Available Models (2026)

The Sonnet vs Opus Decision

Recommendation

Prompt Caching: The 90% Cost Reducer

How It Works

Pricing Structure

Break-Even Analysis

Cache Duration Options

Choosing the Right Duration

Setting Up MoltBot with Claude Code Subscription

Prerequisites

Installation

Authentication with Claude Code

Option 1: Setup Token (Recommended)

Option 2: Manual OAuth Token (If setup-token fails)

Option 3: Re-sync Claude Code OAuth

Configuration

1. Set Default Model to Sonnet

2. Verify Prompt Caching is Enabled

Verification

Telegram Integration for Seamless Caching

How Telegram Sessions Work

Why This Matters for Caching

Setting Up Telegram Bot

Best Practices for Cache Benefits

✅ Do This

❌ Don't Do This

Example: 10-Message Conversation

Hooks & Automation for Cost Savings

Available Hooks

Hook #1: session-memory (Most Important)

Hook #2: command-logger

Hook #3: boot-md

Enabling/Disabling Hooks

Cost Optimization Strategies

Strategy #1: Model Selection

Strategy #2: Message Batching

Strategy #3: Strategic /new Usage

Strategy #4: Concise Prompts

Strategy #5: Long Context Optimization

Strategy #6: Batch API (50% Discount)

Real-World Savings Examples

Example 1: Customer Support Bot

Example 2: Personal Telegram Assistant

Example 3: Development Assistant

Summary: Combined Savings

Monitoring & Troubleshooting

Monitoring Your Costs

Claude Code Subscription

MoltBot Usage

Estimating Token Usage

Common Issues

Issue: Cache Not Working

Issue: High Costs Despite Optimization

Issue: Token Expiration

Conclusion

Quick Wins (Immediate Impact)

Behavioral Changes (Ongoing Savings)

Monitoring (Stay Optimized)

Expected Results

Final Thoughts

Resources

Documentation

Tools

Community

Jason Cochran

Comments

Strategy #3: Strategic `/new` Usage