Machine Learning

Stop writing spaghetti if-else chains: concatenating json with Python's Case

If you work in data science, data engineering, or as a frontlend / backend engineer, you are dealing with JONS. Professionals, for its only death, tax, and json-parsing is inevitable. The problem is that Parsing JSON is often a pain in the ass.

Whether you're pulling data from a rest API, partition logs, or reading configuration files, you eventually end up with a consumed dictionary that you need to reproduce. And let's be honest: The code we write to manage these dictionaries is sometimes… ugly to say the least.

We've all written a “spaghetti parser.” You know that one. It starts with simplicity if Statement, but then you need to check if the key exists. Then you need to check if the list inside this case is empty. After that you need to handle the error condition.

Before you know it, you have a tower of if-elif-else statements that are difficult to read and difficult to maintain. Pipes will eventually break due to some unexpected fault. Bad vibes all around!

In Python 3.10 that came out a few years ago, a feature was introduced that many data scientists have not yet adopted: Structural pattern similarity and match and case. It is often mistaken for a simple “switch” statement (like in C or Java), but it is very powerful. It allows you to check type and formation of your data, rather than its value.

In this article, we'll look at how to replace your soft dictionary checks with nice, readable patterns by using them match and case. I will focus on specific uses – most of the cases are familiar to us, rather than trying to give an insight into how you can work with match and case.


Status: “Mystery” API response

Let's think about a typical situation. You are polling an external API that you do not have full control over. Let's say, to do a concrete setup, that the API retrieves the status of the task processing data in JSSA format. The API is inconsistent (as it often is).

It may return a Success Answer:

{
    "status": 200,
    "data": {
        "job_id": 101,
        "result": ["file_a.csv", "file_b.csv"]
    }
}

Or a Deviation Answer:

{
    "status": 500,
    "error": "Timeout",
    "retry_after": 30
}

Or maybe a weird secret answer that's a list of IDs (because the API docs lie):

[101, 102, 103]

The old way: The if-else The pyramid of doom

If you were writing this using the standard Python Control flow, you would end up with security code that looks like this:

def process_response(response):
    # Scenario 1: Standard Dictionary Response
    if isinstance(response, dict):
        status = response.get("status")
        
        if status == 200:
            # We have to be careful that 'data' actually exists
            data = response.get("data", {})
            results = data.get("result", [])
            print(f"Success! Processed {len(results)} files.")
            return results
        
        elif status == 500:
            error_msg = response.get("error", "Unknown Error")
            print(f"Failed with error: {error_msg}")
            return None
        
        else:
            print("Unknown status code received.")
            return None

    # Scenario 2: The Legacy List Response
    elif isinstance(response, list):
        print(f"Received legacy list with {len(response)} jobs.")
        return response
    
    # Scenario 3: Garbage Data
    else:
        print("Invalid response format.")
        return None

Why does the copy above hurt my soul?

  • It includes “how to”: You mix Business Logic (“Success means status 200”) with type testing tools isinstance() and .get().
  • By Verbose: We delete the part of the code that ensures that the buttons exist to avoid a KeyError.
  • Hard to scan: In order to understand what constitutes “success,” you must mentally integrate the many levels of instruction that are combined.

A Better Way: Structural Pattern Matching

Enter match and case Key words.

Instead of asking questions like “is this dictionary called state? Is that in key 200?”, we can just define type of the data we want to manage. Python tries to fit the data in that case.

Here's a rewrite of the same rewrite with match and case:

def process_response_modern(response):
    match response:
        # Case 1: Success (Matches specific keys AND values)
        case {"status": 200, "data": {"result": results}}:
            print(f"Success! Processed {len(results)} files.")
            return results

        # Case 2: Error (Captures the error message and retry time)
        case {"status": 500, "error": msg, "retry_after": time}:
            print(f"Failed: {msg}. Retrying in {time}s...")
            return None

        # Case 3: Legacy List (Matches any list of integers)
        case [first, *rest]:
            print(f"Received legacy list starting with ID: {first}")
            return response

        # Case 4: Catch-all (The 'else' equivalent)
        case _:
            print("Invalid response format.")
            return None

Note that it is a few short lines, but this is not the only benefit.

Why the structural pattern similarity is surprising

I can come up with at least three reasons why it fits the structural pattern with match and case improves the situation above.

1. Different variable variables

Notice what happened inside Case 1:

case {"status": 200, "data": {"result": results}}:

We didn't just check the keys. We tested that at the same time status Is 200 And- minus the value of result in a named variable results.

We replaced it data = response.get("data").get("result") with simple variable placement. If the structure is not compatible (eg. result lost), this case just skips it. – Even though KeyErrorthere is no collision.

2. The “wild cards” pattern

between Case 2we used it msg and time As site keepers:

case {"status": 500, "error": msg, "retry_after": time}:

This tells Python: I'm expecting a dictionary with status 500, too -one the value corresponding to the buttons "error" and "retry_after". Whatever those values ​​are, commit to change msg and time So I can use them quickly.

3. Write a destruction list

between Case 3We handled the list response:

case [first, *rest]:

This pattern method – what? an array with at least one element. It binds the first thing to first and other lists in rest. This is very useful for iterative algorithms or linear processing.


Adding “Guards” for more control

Sometimes, matching the structure is not enough. You want to match the structure only if a certain condition is met. You can do this by adding the if clause directly with this case.

Imagine that we only want to process the inheritance list if it contains less than 10 items.

case [first, *rest] if len(rest) < 9:
        print(f"Processing small batch starting with {first}")

If the list is too long, this case is dropped, and the code moves on (or catch-all _).

Lasting

I'm not suggesting you take the easy way out if statement by match block. However, you should be very careful when using it match and case Where you are:

  1. Parsing API responses: As shown above, this is a killer use case.
  2. Handling polymorphic data: When the job can get a inta stror a dict and needs to behave differently.
  3. To parse ASTS or JSON trees: If you write scripts to collect or clean clean web data.

As a data professional, our work is often 80% data cleaning and 20% modeling. Anything that makes the cleanup phase less error prone and more readable is a big win for productivity.

Think about if-else Spaghetti. Allow match and case Tools do the heavy lifting instead.

If you are interested in AI, data science, or data engineering, please follow me or connect on LinkedIn.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button