ANI

Getting started with the Claude API in Python

# Introduction

You want to add Claude to a Python application. Creating an account and making your first API call is straightforward. Legal documents can get you from zero to a working application in minutes. The following questions are often most effective:

  • What does the response element contain?
  • How do you broadcast responses so that users can see the output as it is generated?
  • How do you organize information and handle responses in a production application?

I Claude Python SDK takes care of most API interactions. It provides typed response objects, built-in redirection, and a simple interface for working with the Messages API.

This article walks you through setup, your first API call, reading the response, system information, and streaming. Finally, you will have a working foundation.

# Requirements and installation

You need Python 3.9 or higher, a free Claude Console account, and an API key from Console's Settings > API Keys page. You can add $5 in credits and work with everything in this article.

With those in place, install the SDK:

Never hard-code your API key into source files. Save it as an environment variable instead:

export ANTHROPIC_API_KEY="YOUR-API-KEY-HERE"

Or add to a .env file in the root of the project if you use python-dotenv. The SDK reads as follows ANTHROPIC_API_KEY from your environment, so you don't need to pass it anywhere in your code.

# Making your first API call

The entry point for all interactions is client.messages.create(). Let's ask Claude to explain what a context window is, something you'll really need to understand as you use the API.

You pass three things: the model ID, a max_tokens limit, and a messages list. A list of messages is always a list of dicts, each with a "role" again "content" the key.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=256,
    messages=[
        {
            "role": "user",
            "content": "In one sentence, what is a context window?"
        }
    ]
)

print(response.content[0].text)

I model field takes a model ID string. max_tokens it is a hard ceiling on how many mining tokens Claude will generate; the answer stops there even if the thought isn't perfect, so put it high enough to get open requests. I messages the list must always start with a "user" repent.

Sample output:

A context window is the maximum amount of text (measured in tokens) that a language
model can process and consider at one time, encompassing both your input and its output.

# Understanding the Response Object

Answer from messages.create() it is written Message thing. It is worth inspecting the full structure before building anything on it.

Replace the print line in the previous example with:

A run that gives you the full thing:

Message(
  id='msg_01XFDUDYJgAACzvnptvVoYEL',
  type="message",
  role="assistant",
  content=[TextBlock(text="A context window is...", type="text")],
  model="claude-sonnet-5",
  stop_reason='end_turn',
  stop_sequence=None,
  usage=Usage(input_tokens=19, output_tokens=42)
)

A few fields here are more important than they first appear. stop_reason tells you why Claude stopped producing. end_turn it means that Claude finished according to his terms. If you see max_tokensthe response is limited by your limit, and you may need to raise or reconsider the order.

I usage field keeps track of both the input and output tokens of the request. This is how Anthropic calculates the payout, and it's how you find out when the data is crawling too close to the model context limit. content is a list — in standard text responses it always has one item, a TextBlock – so response.content[0].text idiomatic method of extracting text.

# Using System Prompts

System prompts allow you to assign Claude a persistent role, set parameters, or provide a context to run throughout the conversation. He passed it as a senior system parameter — break a list of messages, not as the message itself.

Here we prepare Claude to work as a code reviewer who only responds in Python and avoids generic definitions:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=512,
    system=(
        "You are a Python code reviewer. "
        "Respond only with corrected or improved Python code. "
        "Do not explain changes unless the user explicitly asks."
    ),
    messages=[
        {
            "role": "user",
            "content": (
                "def get_user(id):n"
                "    db = connect()n"
                "    return db.query('SELECT * FROM users WHERE id=' + id)"
            )
        }
    ]
)

print(response.content[0].text)

System prompts sit above the conversation in Claude's context. It carries the same authority across all curves, so the paragraph instructions, formatting rules, and domain restrictions you set here carry over without you having to repeat them in every message.

# Broadcast responses

For requests where Claude may take a few seconds to respond, live streaming allows you to display the text as it arrives instead of waiting for a full response. The SDK exposes this using client.messages.stream()used as a content manager.

I text_stream the iterator generates pieces of each text in real time. Each paragraph is a piece of thread, not a full sentence. He passes end="" again flush=True to print() so the output appears continuously rather than buffered:

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-5",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": "Walk me through what happens when a Python list grows beyond its initial capacity."
        }
    ]
) as stream:
    for chunk in stream.text_stream:
        print(chunk, end="", flush=True)

print()  # newline after stream ends

The context manager ensures that the HTTP connection is closed cleanly when the block exits, even if an exception is raised during the stream. If you need a complete Message object after streaming — including token usage statistics — call stream.get_final_message() before closing the block.

Sample output:

Python lists are dynamic arrays. When you append an element and the list has no
room, Python allocates a new, larger block of memory — typically 1.125x the current
size — copies all existing elements into it, and releases the old block. This
operation is O(n) in the worst case, but because it happens infrequently relative to
the number of appends, the amortized cost per append stays O(1). You can pre-allocate
capacity with a list comprehension or by passing an iterable to the list constructor
if you know the final size upfront.

# Next Steps

You now have the essential building blocks: requests, structured responses, system information, and streaming.

Next, you can learn about error handling, token usage, and multi-curve negotiation. Because the API is stateless, you need to send the conversation history with each request. The SDK documentation shows the recommended method.

The API reference includes features such as structured output and tool usage. Enjoy exploring!

Count Priya C is an engineer and technical writer from India. He loves working at the intersection of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, he works to learn and share his knowledge with the engineering community by authoring tutorials, how-to guides, ideas, and more. Bala also creates engaging resource overviews and code tutorials.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button