ANI

5 Useful DIY Python Functions for JSON Parsing and Processing


Photo by the Author

# Introduction

Working with JSON in Python is often a challenge. The basics json.loads() it only gets you so far.

API responses, configuration files, and data exports often contain garbled or malformed JSON. You need to flatten the nested objects, get the values ​​safely out Key error exceptions, combine multiple JSON files, or convert between JSON and other formats. These tasks often appear in web scraping, API integration, and data processing. This article walks you through five practical functions to handle common JSON parsing and processing tasks.

You can find the code for these functions at GitHub.

# 1. Safely Removes Nested Values

JSON objects are usually several levels deep. Accessing deeply nested values ​​with parentheses quickly becomes a challenge. If any key is missing, you get a Key error.

Here's a function that lets you access nested values ​​using dot notation, with a fallback for missing keys:

def get_nested_value(data, path, default=None):
    """
    Safely extract nested values from JSON using dot notation.

    Args:
        data: Dictionary or JSON object
        path: Dot-separated string like "user.profile.email"
        default: Value to return if path doesn't exist

    Returns:
        The value at the path, or default if not found
    """
    keys = path.split('.')
    current = data

    for key in keys:
        if isinstance(current, dict):
            current = current.get(key)
            if current is None:
                return default
        elif isinstance(current, list):
            try:
                index = int(key)
                current = current[index]
            except (ValueError, IndexError):
                return default
        else:
            return default

    return current

Let's test it with a complex nested structure:

# Sample JSON data
user_data = {
    "user": {
        "id": 123,
        "profile": {
            "name": "Allie",
            "email": "[email protected]",
            "settings": {
                "theme": "dark",
                "notifications": True
            }
        },
        "posts": [
            {"id": 1, "title": "First Post"},
            {"id": 2, "title": "Second Post"}
        ]
    }
}

# Extract values
email = get_nested_value(user_data, "user.profile.email")
theme = get_nested_value(user_data, "user.profile.settings.theme")
first_post = get_nested_value(user_data, "user.posts.0.title")
missing = get_nested_value(user_data, "user.profile.age", default=25)

print(f"Email: {email}")
print(f"Theme: {theme}")
print(f"First post: {first_post}")
print(f"Age (default): {missing}")

Output:

Email: [email protected]
Theme: dark
First post: First Post
Age (default): 25

The function splits the path string into dots and moves through the data structure one key at a time. At each level, it checks whether the current value is a dictionary or an array. For dictionaries, it uses .get(key)returning None with missing keys instead of raising an error. For a list, it tries to convert the key to an integer.

I default parameter provides a fallback when any part of the method is missing. This prevents your code from crashing when working with incomplete or inconsistent JSON data in APIs.

This pattern is especially useful when processing API responses where some fields are optional or only exist under certain conditions.

# 2. Making Nested JSON into One-Level Dictionaries

Machine learning models, CSV export, and database entry often require flat data structures. But API responses and configuration files use nested JSON. Converting nested objects into flat value pairs is a common operation.

Here's a function that nests JSON with custom separators:

def flatten_json(data, parent_key='', separator="_"):
    """
    Flatten nested JSON into a single-level dictionary.

    Args:
        data: Nested dictionary or JSON object
        parent_key: Prefix for keys (used in recursion)
        separator: String to join nested keys

    Returns:
        Flattened dictionary with concatenated keys
    """
    items = []

    if isinstance(data, dict):
        for key, value in data.items():
            new_key = f"{parent_key}{separator}{key}" if parent_key else key

            if isinstance(value, dict):
                # Recursively flatten nested dicts
                items.extend(flatten_json(value, new_key, separator).items())
            elif isinstance(value, list):
                # Flatten lists with indexed keys
                for i, item in enumerate(value):
                    list_key = f"{new_key}{separator}{i}"
                    if isinstance(item, (dict, list)):
                        items.extend(flatten_json(item, list_key, separator).items())
                    else:
                        items.append((list_key, item))
            else:
                items.append((new_key, value))
    else:
        items.append((parent_key, data))

    return dict(items)

Now let's flatten the complex nested structure:

# Complex nested JSON
product_data = {
    "product": {
        "id": 456,
        "name": "Laptop",
        "specs": {
            "cpu": "Intel i7",
            "ram": "16GB",
            "storage": {
                "type": "SSD",
                "capacity": "512GB"
            }
        },
        "reviews": [
            {"rating": 5, "comment": "Excellent"},
            {"rating": 4, "comment": "Good value"}
        ]
    }
}

flattened = flatten_json(product_data)

for key, value in flattened.items():
    print(f"{key}: {value}")

Output:

product_id: 456
product_name: Laptop
product_specs_cpu: Intel i7
product_specs_ram: 16GB
product_specs_storage_type: SSD
product_specs_storage_capacity: 512GB
product_reviews_0_rating: 5
product_reviews_0_comment: Excellent
product_reviews_1_rating: 4
product_reviews_1_comment: Good value

The function uses recursion to handle arbitrary nesting depth. If it encounters a dictionary, it processes each key-value pair, creating a flat key by concatenating the parent and separator keys.

For an array, it uses an index as part of the key. This allows you to preserve the order and structure of the array elements in flat output. The pattern reviews_0_rating tells you that this is the estimate from the first update.

I separator parameter allows you to customize the output format. Use dots for dot comments, underscores for snake_case, or slashes for key-like keys depending on your needs.

This function is especially useful if you need to convert JSON API responses into data frames or CSV rows where each column needs a different name.

# 3. Deep Integration of Multiple JSON Objects

Managing settings often requires compiling multiple JSON files that contain default settings, environment-specific settings, user preferences, and more. A simple dict.update() it only handles the top quality. You need deep integration that recursively combines nested structures.

Here's a function that compiles deep JSON objects:

def deep_merge_json(base, override):
    """
    Deep merge two JSON objects, with override taking precedence.

    Args:
        base: Base dictionary
        override: Dictionary with values to override/add

    Returns:
        New dictionary with merged values
    """
    result = base.copy()

    for key, value in override.items():
        if key in result and isinstance(result[key], dict) and isinstance(value, dict):
            # Recursively merge nested dictionaries
            result[key] = deep_merge_json(result[key], value)
        else:
            # Override or add the value
            result[key] = value

    return result

Let's try to put together some sample configuration information:

import json

# Default configuration
default_config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "timeout": 30,
        "pool": {
            "min": 2,
            "max": 10
        }
    },
    "cache": {
        "enabled": True,
        "ttl": 300
    },
    "logging": {
        "level": "INFO"
    }
}

# Production overrides
prod_config = {
    "database": {
        "host": "prod-db.example.com",
        "pool": {
            "min": 5,
            "max": 50
        }
    },
    "cache": {
        "ttl": 600
    },
    "monitoring": {
        "enabled": True
    }
}

merged = deep_merge_json(default_config, prod_config)

print(json.dumps(merged, indent=2))

Output:

{
  "database": {
    "host": "prod-db.example.com",
    "port": 5432,
    "timeout": 30,
    "pool": {
      "min": 5,
      "max": 50
    }
  },
  "cache": {
    "enabled": true,
    "ttl": 600
  },
  "logging": {
    "level": "INFO"
  },
  "monitoring": {
    "enabled": true
  }
}

The function iteratively combines the nested dictionaries. If both base and output contain dictionaries with the same key, it merges those dictionaries instead of replacing them entirely. This saves values ​​that are not explicitly written.

Notice how database.port again database.timeout stay from the default configuration, while database.host is released. Pool settings converge at the nested level, so min again max both are reviewed.

The function also adds new keys that are not in the base config, such as monitoring part in production release.

You can add multiple combinations to the layer configuration:

final_config = deep_merge_json(
    deep_merge_json(default_config, prod_config),
    user_preferences
)

This pattern is common in application configuration where you have defaults, locale-specific settings, and runtime overrides.

# 4. Filtering JSON by Schema or whitelist

APIs often return more data than you need. Large JSON responses make your code hard to read. Sometimes you want only certain fields, or you need to remove sensitive data before entering.

Here's a function that filters the JSON to keep only the specified fields:

def filter_json(data, schema):
    """
    Filter JSON to keep only fields specified in schema.

    Args:
        data: Dictionary or JSON object to filter
        schema: Dictionary defining which fields to keep
                Use True to keep a field, nested dict for nested filtering

    Returns:
        Filtered dictionary containing only specified fields
    """
    if not isinstance(data, dict) or not isinstance(schema, dict):
        return data

    result = {}

    for key, value in schema.items():
        if key not in data:
            continue

        if value is True:
            # Keep this field as-is
            result[key] = data[key]
        elif isinstance(value, dict):
            # Recursively filter nested object
            if isinstance(data[key], dict):
                filtered_nested = filter_json(data[key], value)
                if filtered_nested:
                    result[key] = filtered_nested
            elif isinstance(data[key], list):
                # Filter each item in the list
                filtered_list = []
                for item in data[key]:
                    if isinstance(item, dict):
                        filtered_item = filter_json(item, value)
                        if filtered_item:
                            filtered_list.append(filtered_item)
                    else:
                        filtered_list.append(item)
                if filtered_list:
                    result[key] = filtered_list

    return result

Let's filter a sample API response:

import json
# Sample API response
api_response = {
    "user": {
        "id": 789,
        "username": "Cayla",
        "email": "[email protected]",
        "password_hash": "secret123",
        "profile": {
            "name": "Cayla Smith",
            "bio": "Software developer",
            "avatar_url": "
            "private_notes": "Internal notes"
        },
        "posts": [
            {
                "id": 1,
                "title": "Hello World",
                "content": "My first post",
                "views": 100,
                "internal_score": 0.85
            },
            {
                "id": 2,
                "title": "Python Tips",
                "content": "Some tips",
                "views": 250,
                "internal_score": 0.92
            }
        ]
    },
    "metadata": {
        "request_id": "abc123",
        "server": "web-01"
    }
}

# Schema defining what to keep
public_schema = {
    "user": {
        "id": True,
        "username": True,
        "profile": {
            "name": True,
            "avatar_url": True
        },
        "posts": {
            "id": True,
            "title": True,
            "views": True
        }
    }
}

filtered = filter_json(api_response, public_schema)

print(json.dumps(filtered, indent=2))

Output:

{
  "user": {
    "id": 789,
    "username": "Cayla",
    "profile": {
      "name": "Cayla Smith",
      "avatar_url": "
    },
    "posts": [
      {
        "id": 1,
        "title": "Hello World",
        "views": 100
      },
      {
        "id": 2,
        "title": "Python Tips",
        "views": 250
      }
    ]
  }
}

A schema works like an authority list. Setting a field on it True input to output. Using a nested dictionary allows you to sort nested items. A function iteratively applies a schema to nested structures.

In an array, the schema applies to each element. In the example, the list of posts is sorted so each post is included only id, titleagain viewswhile content again internal_score are not included.

Notice how sensitive fields are password_hash again private_notes not appear in the output. This makes the function useful for cleaning data before entering or sending to front-end applications.

You can create different schemas for different use cases, such as a small schema for a list view, a detailed schema for a single item view, and an all-encompassing admin schema.

# 5. Converting JSON to and from Dot Notation

Some systems use flat key-value stores, but you want to work with nested JSON in your code. Converting between flat-notation keys and nested structures helps achieve this.

Here is a pair of bidirectional conversion functions.

// Converts JSON to Dot Notation

def json_to_dot_notation(data, parent_key=''):
    """
    Convert nested JSON to flat dot-notation dictionary.

    Args:
        data: Nested dictionary
        parent_key: Prefix for keys (used in recursion)

    Returns:
        Flat dictionary with dot-notation keys
    """
    items = {}

    if isinstance(data, dict):
        for key, value in data.items():
            new_key = f"{parent_key}.{key}" if parent_key else key

            if isinstance(value, dict):
                items.update(json_to_dot_notation(value, new_key))
            else:
                items[new_key] = value
    else:
        items[parent_key] = data

    return items

// Converts Dot Notation to JSON

def dot_notation_to_json(flat_data):
    """
    Convert flat dot-notation dictionary to nested JSON.

    Args:
        flat_data: Dictionary with dot-notation keys

    Returns:
        Nested dictionary
    """
    result = {}

    for key, value in flat_data.items():
        parts = key.split('.')
        current = result

        for i, part in enumerate(parts[:-1]):
            if part not in current:
                current[part] = {}
            current = current[part]

        current[parts[-1]] = value

    return result

Let's examine the conversion of the return trip:

import json
# Original nested JSON
config = {
    "app": {
        "name": "MyApp",
        "version": "1.0.0"
    },
    "database": {
        "host": "localhost",
        "credentials": {
            "username": "admin",
            "password": "secret"
        }
    },
    "features": {
        "analytics": True,
        "notifications": False
    }
}

# Convert to dot notation (for environment variables)
flat = json_to_dot_notation(config)
print("Flat format:")
for key, value in flat.items():
    print(f"  {key} = {value}")

print("n" + "="*50 + "n")

# Convert back to nested JSON
nested = dot_notation_to_json(flat)

print("Nested format:")
print(json.dumps(nested, indent=2))

Output:

Flat format:
  app.name = MyApp
  app.version = 1.0.0
  database.host = localhost
  database.credentials.username = admin
  database.credentials.password = secret
  features.analytics = True
  features.notifications = False

==================================================

Nested format:
{
  "app": {
    "name": "MyApp",
    "version": "1.0.0"
  },
  "database": {
    "host": "localhost",
    "credentials": {
      "username": "admin",
      "password": "secret"
    }
  },
  "features": {
    "analytics": true,
    "notifications": false
  }
}

I json_to_dot_notation function flattens the structure by iteratively walking through the nested dictionaries and concatenating the dotted keys. Unlike the previous flat function, this one does not handle arrays; optimized for key-valued configuration data.

I dot_notation_to_json function reverses the process. It separates each key into dots and creates a nested structure by creating intermediate dictionaries as needed. The loop holds all the parts except the last one, creating nesting levels. It then assigns a value to the last key.

This approach keeps your configuration readable and maintainable while working within the constraints of key-value systems.

# Wrapping up

JSON processing goes beyond the basics json.loads(). In most projects, you'll need tools to navigate through nested structures, change shapes, combine settings, filter fields, and convert between formats.

The techniques in this article also apply to other data processing tasks. You can replace these patterns with XML, YAML, or custom data formats.

Start with a safe access function to prevent KeyError exceptions in your code. Add others as you meet specific requirements. Enjoy coding!

Count Priya C is an engineer and technical writer from India. He loves working at the intersection of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, he works to learn and share his knowledge with the engineering community by authoring tutorials, how-to guides, ideas, and more. Bala also creates engaging resource overviews and code tutorials.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button