ANI

Collecting real-time data with an API: A Hands-on Guide using Python

Collecting real-time data with an API: A Hands-on Guide using Python
Photo by the Author

The obvious Getting started

The ability to gather high-quality, relevant data is still a fundamental skill for any data professional. While there are several ways to collect data, one of the most powerful and reliable methods is using APIs (Application Programming Interfaces). They act as bridges, allowing different software programs to communicate and share data seamlessly.

In this article, we'll break down the basics of using APIs for data collection – why they're important, how they work, and how to get started with them in Python.

The obvious What is an API?

An API (Programming API) is a set of rules and protocols that allow different software programs to communicate and exchange information efficiently.
Think of it like eating at a restaurant. Instead of talking directly to the chef, you place your order with the waiter. The waiter checks that the ingredients are available, relays the request to the kitchen, and brings your food back when it's ready.
The API works the same way: It receives your request for some data, checks if that data exists, and returns it if it does – it acts as a messenger between you and the data source.
When using an API, interactions typically include the following elements:

  • Client: an application or program that sends a request for data access or functionality
  • Request: The client sends a structured request to the server, specifying what the data is
  • Server: A program that processes a request and provides the requested information or performs an action
  • Answer: The server processes the request and sends back the data or result in a structured format, usually JSON or XML

Collecting real-time data with an API: A Hands-on Guide using PythonCollecting real-time data with an API: A Hands-on Guide using Python
Photo by the Author

This connection allows applications to share information or functionality, enabling tasks such as downloading data from a database or communicating with third-party services.

The obvious Why use APIs for data collection?

APIs offer several advantages for data collection:

  • Efficiency: They provide direct access to data, eliminating the need for manual data collection
  • Real-time access: APIs often deliver real-time information, which is important for time-sensitive analysis
  • Automation: Enables automated data retrieval processes, reducing human intervention and potential errors
  • Scalability: APIs can handle large volumes of requests, making them suitable for extensive data collection operations

The obvious Implementing API Calls in Python

Making a basic API call in Python is one of the easiest and most useful ways to get started with data collection. Favorites requests The library makes it easy to send HTTP requests and responses.
To show how it works, we will use Random User APIis a free service that provides Dummy user data in JSON format, perfect for testing and learning.
Here's a step-by-step guide to making your first API call in Python.

// To install the library:

// To import required libraries:

import requests
import pandas as pd

// Checking the documentation page:

Before making any requests, it is important to understand how the API works. This includes reviewing available Endpoints, parameters, and response structure. Start by visiting Random API user documentation.

// To define the API endpoint and parameters:

Based on the documents, we can create a simple application. In this example, we download user data that is limited to users from the United States:

url="
params = {'nat': 'us'}

// To make a GET request:

Use it requests.get() work with URL and parameters:

response = requests.get(url, params=params)

// Handling feedback:

Check if the request was successful, then process the details:

if response.status_code == 200:
    data = response.json()
    # Process the data as needed
else:
    print(f"Error: {response.status_code}")

// To convert our data into data:

To work with data easily, we can convert it into a adulterous head DataFrame:

data = response.json()
df = pd.json_normalize(data["results"])
df

Now, let's simulate a real case.

The obvious Working with Eurostat API

Eurostat is the statistical office of the European Union. It provides high-quality, integrated statistics on various topics such as economy, scope, environment, industry, and tourism – covering all EU member states.

Through its API, Eurostat provides public access to data collection on machines in a machine-readable format, making it a valuable resource for data professionals, researchers, and developers interested in analyzing European-level data.

// Step 0: Understanding API details:

If you go and explore the Eurostat data section, you will find a navigation tree. We can try to see some data of interest in the following sections:

  • Detailed information: Complete Eurostat data in Multi-Dimensional format
  • Selected datasets: Simplified datasets with few indicators, with 2-3 measurements
  • EU policies: Data organized by specific areas of EU policy
  • Cross-Cutting: Thematic data is compiled from multiple sources

// Step 1: Checking the documents:

Always start with the documents. You can find the Eurostat API guide here. It describes the structure of the API, the available endpoints, and how to create valid requests.

The URL of the Eurostat API baseThe URL of the Eurostat API base

// Step 2: Creating the first phone application:

To generate an API request using python, the first step is installation and import requests the library. Remember, we already covered it in the previous simple example. After that, we can easily generate a call request using demo data from Eurostat documents.

# We import the requests library
import requests

# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
url = "

# Make the GET request
response = requests.get(url)

# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json())  # Print the JSON response

Pro tip: We can split the URL into base URL for simplicity – Listen What data we they request from the API.

# We import the requests library
import requests

# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
url = "

# Define the parameters -> We define the parameters to add in the URL.
params = {
   'lang': 'EN'  # Specify the language as English
}

# Make the GET request
response = requests.get(url, params=params)

# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json())  # Print the JSON response

// Step 3: Determining which data to call:

Instead of using the demo dataset, you can choose any dataset from the Eurostat database. For example, let's query the database TOUR_OCC_ARN2which contains tourism data.

# We import the requests library
import requests

# Define the URL endpoint -> We use the demo URL in the EUROSTATS API documentation.
base_url = "
dataset = "TOUR_OCC_ARN2"

url = base_url + dataset
# Define the parameters -> We define the parameters to add in the URL.
params = {
    'lang': 'EN'  # Specify the language as English
}

# Make the GET request -> we generate the request and obtain the response
response = requests.get(url, params=params)

# Print the status code and response data
print(f"Status Code: {response.status_code}")
print(response.json())  # Print the JSON response

// Step 4: Understanding the answer

The Eurostat API returns data in JSON-Stat format, the standard for most statistical data. You can save the response to a file and check its structure:

import requests
import json

# Define the URL endpoint and dataset
base_url = "
dataset = "TOUR_OCC_ARN2"

url = base_url + dataset

# Define the parameters to add in the URL
params = {
    'lang': 'EN',
    "time": 2019  # Specify the language as English
}

# Make the GET request and obtain the response
response = requests.get(url, params=params)

# Check the status code and handle the response
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()

    # Generate a JSON file and write the response data into it
    with open("eurostat_response.json", "w") as json_file:
        json.dump(data, json_file, indent=4)  # Save JSON with pretty formatting

    print("JSON file 'eurostat_response.json' has been successfully created.")
else:
    print(f"Error: Received status code {response.status_code} from the API.")

// Step 5: Converting the response into actionable data:

Now that we have the data, we can find a way to save it in tabular format (CSV) to smooth the analysis process.

import requests
import pandas as pd

# Step 1: Make the GET request to the Eurostat API
base_url = "
dataset = "TOUR_OCC_ARN2"  # Tourist accommodation statistics dataset
url = base_url + dataset
params = {'lang': 'EN'}  # Request data in English

# Make the API request
response = requests.get(url, params=params)

# Step 2: Check if the request was successful
if response.status_code == 200:
    data = response.json()

    # Step 3: Extract the dimensions and metadata
    dimensions = data['dimension']
    dimension_order = data['id']  # ['geo', 'time', 'unit', 'indic', etc.]

    # Extract labels for each dimension dynamically
    dimension_labels = {dim: dimensions[dim]['category']['label'] for dim in dimension_order}

    # Step 4: Determine the size of each dimension
    dimension_sizes = {dim: len(dimensions[dim]['category']['index']) for dim in dimension_order}

    # Step 5: Create a mapping for each index to its respective label
    # For example, if we have 'geo', 'time', 'unit', and 'indic', map each index to the correct label
    index_labels = {
        dim: list(dimension_labels[dim].keys())
        for dim in dimension_order
    }

    # Step 6: Create a list of rows for the CSV
    rows = []
    for key, value in data['value'].items():
        # `key` is a string like '123', we need to break it down into the corresponding labels
        index = int(key)  # Convert string index to integer

        # Calculate the indices for each dimension
        indices = {}
        for dim in reversed(dimension_order):
            dim_index = index % dimension_sizes[dim]
            indices[dim] = index_labels[dim][dim_index]
            index //= dimension_sizes[dim]

        # Construct a row with labels from all dimensions
        row = {f"{dim.capitalize()} Code": indices[dim] for dim in dimension_order}
        row.update({f"{dim.capitalize()} Name": dimension_labels[dim][indices[dim]] for dim in dimension_order})
        row["Value (Tourist Accommodations)"] = value
        rows.append(row)

    # Step 7: Create a DataFrame and save it as CSV
    if rows:
        df = pd.DataFrame(rows)
        csv_filename = "eurostat_tourist_accommodation.csv"
        df.to_csv(csv_filename, index=False)
        print(f"CSV file '{csv_filename}' has been successfully created.")
    else:
        print("No valid data to save as CSV.")
else:
    print(f"Error: Received status code {response.status_code} from the API.")

// Step 6: Creating an Idea

Imagine that we want to keep those records corresponding to tents, apartments or hotels. We can generate the final table for this situation, and get a pandas DataFrame We can work with.

# Check the unique values in the 'Nace_r2 Name' column
set(df["Nace_r2 Name"])

# List of options to filter
options = ['Camping grounds, recreational vehicle parks and trailer parks',
          'Holiday and other short-stay accommodation',
          'Hotels and similar accommodation']

# Filter the DataFrame based on whether the 'Nace_r2 Name' column values are in the options list
df = df[df["Nace_r2 Name"].isin(options)]
df

The obvious Best practices when working with an API

  • Read the documentation: Always check the official API documentation to understand the Endpoints and parameters
  • Manage mistakes
  • Respect the rate limit: Avoid exceeding the server – check the rate limits
  • Secure authentication: If an API requires authentication, never create your API keys in public code

The obvious Wrapping up

The Eurostat API is a powerful gateway to rich European statistics. By learning how to navigate its structure, query details, and interpret the answers, you can use access to critical information for analysis, research, or decision-making – right – right in your Python scripts.

You can go look for the corresponding code In my areas of my gitkub-articles – friendly

Josep Ferrer is an analytics engineer from Barcelona. He graduated in engineering physics and currently works in a data science company used for human mobility. He is a part-time content creator focused on science and technology. Josep writes on all things AI, covering the use of continuous explosions on the field.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button