The Claude API lets you integrate Anthropic's Claude models into any application using a simple REST API or official SDKs for Python and TypeScript. You get an API key from console.anthropic.com, install the SDK with one command, and make your first AI-powered API call in under 5 minutes. This tutorial covers everything from setup to building a production-ready AI writing assistant.
All code examples use the official @anthropic-ai/sdk for TypeScript/Node.js. Python equivalents are structurally identical.
Getting Your API Key
Step 1: Create an Anthropic Account
Go to console.anthropic.com and sign up with your email or Google account. No credit card required for initial signup.
Step 2: Generate an API Key
Navigate to Settings > API Keys and click "Create Key." Give it a descriptive name (e.g., "my-ai-app-dev"). Copy the key immediately — it will not be shown again.
Step 3: Store It Securely
Add the key to your environment variables. Never commit API keys to git. Create a .env.local file in your project root.
# .env.local
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
Installing the SDK
TypeScript / Node.js
npm install @anthropic-ai/sdk
Python
pip install anthropic
The SDK automatically reads your ANTHROPIC_API_KEY environment variable. No manual configuration needed if the env var is set.
Your First API Call
Here is a minimal example that sends a message to Claude and prints the response. This is the foundation every AI app builds on.
TypeScript Example
// app/api/generate/route.ts
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{
role: "user",
content: "Explain quantum computing in 3 sentences."
}]
});
console.log(message.content[0].text);
Key Parameters:
model— Which Claude model to use. Options: claude-haiku, claude-sonnet, claude-opus (with version dates).max_tokens— Maximum number of tokens in the response. 1 token is roughly 3/4 of a word.messages— Array of conversation messages with role ("user" or "assistant") and content.
System Prompts and Prompt Engineering
System prompts define Claude's behavior, personality, and constraints. They are the most powerful tool for controlling AI output quality. A well-written system prompt is the difference between a generic chatbot and a specialized AI assistant.
System Prompt Example
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
system: "You are a senior copywriter specializing in SaaS landing pages. Write concise, benefit-focused copy. Use short paragraphs. Never use buzzwords like 'revolutionary' or 'cutting-edge'. Always include a clear call to action.",
messages: [{
role: "user",
content: "Write a hero section for an AI email tool."
}]
});
Prompt Engineering Best Practices
- Be specific about format: If you want bullet points, say "respond in bullet points." If you want JSON, say "respond with valid JSON only."
- Define constraints: Tell Claude what NOT to do. "Do not include disclaimers. Do not start with 'Sure' or 'Of course.'"
- Give examples: Include 1-2 examples of ideal output in the system prompt. Claude follows examples better than abstract instructions.
- Set the persona: "You are a [role] with [years] of experience in [domain]" anchors Claude's responses in domain expertise.
- Use XML tags for structure: Wrap different parts of your prompt in tags like
<context>,<instructions>,<examples>for clarity.
Structured Output (JSON Mode)
When building applications, you often need Claude to return structured data (JSON) rather than prose. The most reliable method is to instruct Claude to return JSON in the system prompt and parse the response.
Structured Output Example
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: "Analyze the sentiment of the given text. Respond with ONLY valid JSON in this exact format: { \"sentiment\": \"positive\" | \"negative\" | \"neutral\", \"confidence\": 0.0-1.0, \"key_phrases\": [\"phrase1\", \"phrase2\"] }",
messages: [{
role: "user",
content: "The product exceeded my expectations. The UI is clean and the AI responses are incredibly fast."
}]
});
const result = JSON.parse(message.content[0].text);
// { sentiment: "positive", confidence: 0.95, key_phrases: ["exceeded expectations", "clean UI", "incredibly fast"] }
Pro Tip:
Prefill the assistant response to force JSON output. Add an assistant message with content { so Claude continues from an opening brace. This eliminates any preamble text before the JSON.
Tool Use (Function Calling)
Tool use lets Claude call functions you define. Instead of guessing at information, Claude can look up real data, perform calculations, or trigger actions in your system. This is how you build AI agents that actually do things.
Tool Use Example: Weather Lookup
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
tools: [{
name: "get_weather",
description: "Get current weather for a city",
input_schema: {
type: "object",
properties: {
city: { type: "string" }
},
required: ["city"]
}
}],
messages: [{
role: "user",
content: "What's the weather in Tokyo?"
}]
});
// Claude responds with a tool_use block:
// { type: "tool_use", name: "get_weather", input: { city: "Tokyo" } }
// You execute the function, then send the result back
How Tool Use Works
- You define tools (functions) with names, descriptions, and parameter schemas
- Claude decides when to call a tool based on the user's request
- Claude returns a
tool_usecontent block with the function name and arguments - Your code executes the function and gets real data
- You send the result back to Claude in a
tool_resultmessage - Claude incorporates the real data into its final response
Streaming Responses
For user-facing applications, streaming is essential. Instead of waiting for the entire response to generate (which can take 5-15 seconds for long outputs), streaming sends tokens to the client as they are produced. The user sees text appear word by word, which feels much faster.
Streaming Example
const stream = client.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 2048,
messages: [{
role: "user",
content: "Write a product description for an AI writing tool."
}]
});
for await (const event of stream) {
if (event.type === "content_block_delta") {
process.stdout.write(event.delta.text);
}
}
Using Streaming with Next.js API Routes
For web applications built with Next.js, return a ReadableStream from your API route and consume it on the frontend with the Vercel AI SDK's useChat hook or a manual fetch with stream reading. This gives users the ChatGPT-like typing effect.
The Vercel AI SDK (npm install ai) provides helper functions that handle streaming, parsing, and state management for React frontends.
Cost Optimization
AI API costs can grow quickly if you are not strategic. Here are the most impactful optimizations, ranked by how much they save.
Prompt Caching
Cache your system prompt and any static context (documents, instructions, examples). Cached tokens cost 10% of regular input tokens. If your system prompt is 2,000 tokens and you make 1,000 requests/day, caching saves ~$5/day on Sonnet.
Model Routing
Use Haiku for classification, extraction, and simple tasks. Use Sonnet for generation and analysis. Reserve Opus for complex reasoning only. Most apps can route 70% of requests to Haiku, 25% to Sonnet, and 5% to Opus.
Limit Output Tokens
Set max_tokens to the minimum needed. If you need a one-sentence summary, set max_tokens to 100, not 4096. Output tokens cost 3-5x more than input tokens.
Trim Conversation History
For multi-turn conversations, do not send the entire history every time. Keep the last 5-10 messages and summarize older context into a concise system prompt addition. Long conversation histories multiply input costs on every request.
Building a Real Feature: AI Writing Assistant
Let's put it all together. Here is the architecture for a production-ready AI writing assistant that uses system prompts, streaming, and structured output.
Architecture Overview
Frontend
React form with textarea, tone selector, and streaming output display
API Route
Next.js route handler that validates input and calls Claude
Claude API
Sonnet with system prompt defining writing style and constraints
Database
Supabase stores user generations and preferences
The System Prompt for a Writing Assistant
const systemPrompt = `You are an expert copywriter.
The user will provide a topic and a tone.
Write compelling, clear content.
Rules:
- Use short paragraphs (2-3 sentences max)
- No buzzwords or filler phrases
- Lead with the most important point
- Match the requested tone exactly
- Include a call to action at the end
Available tones: professional, casual,
persuasive, educational, humorous
What You Can Build From Here:
- Add tone selection: Let users choose from professional, casual, persuasive, etc. Pass the selection in the user message.
- Add generation history: Store outputs in Supabase so users can revisit and reuse previous generations.
- Add editing: Let users highlight a section and ask Claude to rewrite just that part (pass the full text as context).
- Add templates: Pre-built system prompts for specific use cases (blog posts, emails, social media, product descriptions).
- Add export: Copy to clipboard, download as Markdown, or send directly to a CMS via API.
Frequently Asked Questions
How much does the Claude API cost?
Claude API pricing varies by model. Claude Haiku (fastest, cheapest) costs $0.25 per million input tokens and $1.25 per million output tokens. Claude Sonnet (balanced) costs $3 per million input tokens and $15 per million output tokens. Claude Opus (most capable) costs $15 per million input tokens and $75 per million output tokens. For reference, 1 million tokens is roughly 750,000 words. Most applications can run on Sonnet for under $10/month during development.
What is the difference between Claude Haiku, Sonnet, and Opus?
Haiku is the fastest and cheapest model, best for simple classification, extraction, and high-volume tasks where speed matters more than depth. Sonnet is the balanced model, ideal for most production applications including content generation, analysis, coding assistance, and customer support. Opus is the most capable model, best for complex reasoning, research, advanced coding, and tasks requiring deep understanding. Start with Sonnet for development, use Haiku for simple tasks in production, and reserve Opus for complex workflows.
What is the Claude API rate limit?
Rate limits depend on your usage tier. Tier 1 (new accounts, $5 credit purchase) allows 50 requests per minute and 40,000 tokens per minute. Tier 2 ($40+ spend) increases to 1,000 requests per minute. Tier 3 ($200+ spend) allows 2,000 requests per minute. Tier 4 ($400+ spend) allows 4,000 requests per minute. You can request limit increases by contacting Anthropic sales. For production apps, implement retry logic with exponential backoff to handle rate limit errors gracefully.
Can I use the Claude API for free?
Anthropic provides $5 in free credits when you create a new API account, which is enough for thousands of Haiku calls or hundreds of Sonnet calls. After credits are used, you need to add a payment method. There is no permanent free tier for the API, but the initial credits are enough to build and test an MVP. For ongoing development, expect to spend $5-20/month depending on usage volume.
How do I handle errors and retries with the Claude API?
The Anthropic SDK includes automatic retry logic for transient errors (429 rate limits, 500 server errors, 529 overloaded). You can configure max_retries when initializing the client. For 429 errors, the SDK reads the retry-after header and waits automatically. For production apps, also implement: request timeouts (set via the SDK timeout parameter), graceful fallback messages when the API is unavailable, and logging for failed requests so you can monitor reliability.
What is prompt caching and how do I use it?
Prompt caching lets you cache parts of your prompt (like system instructions or large documents) so they are not re-processed on every request. Cached tokens cost 90% less than regular input tokens. To use it, add cache_control markers to message blocks you want cached. The cache lasts 5 minutes and is automatically refreshed on each use. This is especially valuable when your system prompt is long or when multiple users query against the same document.
Can I fine-tune Claude models?
As of April 2026, Anthropic does not offer public fine-tuning for Claude models. Instead, you achieve customization through: (1) detailed system prompts that define behavior, tone, and constraints, (2) few-shot examples in your prompt showing the desired input/output format, (3) retrieval-augmented generation (RAG) to give Claude access to your specific data, and (4) tool use to let Claude call your APIs and databases. These approaches cover most use cases without fine-tuning.
Complete Creator Academy - All Courses
Master Instagram growth, AI influencers, n8n automation, and digital products for just $99/month. Cancel anytime.
All 4 premium courses (Instagram, AI Influencers, Automation, Digital Products)
100+ hours of training content
Exclusive templates and workflows
Weekly live Q&A sessions
Private community access
New courses and updates included
Cancel anytime - no long-term commitment
✨ Includes: Instagram Ignited • AI Influencers Academy • AI Automations • Digital Products Empire