ANI

5 Useful DIY Python Functions for Splitting Dates and Times

0 4 8 minutes read

5 Useful DIY Python Functions for Splitting Dates and Times

Photo by the Author

# Introduction

Analyzing dates and times is one of those tasks that seems easy until you try. Python runtime module handles standard formats well, but real-world data is messy. User input, scraped web data, and legacy systems often throw curveballs.

This article walks you through five practical functions to handle common date and time transfer operations. Finally, you'll understand how to build flexible parsers that handle the messy date formats you see in projects.

Link to the code on GitHub

# 1. Analyzing Relative Time Strings

Social media apps, chat apps, and task feeds display time stamps such as “5 minutes ago” or “2 days ago”. When you scrape or process this data, you need to convert these associated characters back to the original datetime things.

Here is a function that handles relative time expressions:

from datetime import datetime, timedelta
import re

def parse_relative_time(time_string, reference_time=None):
    """
    Convert relative time strings to datetime objects.
    
    Examples: "2 hours ago", "3 days ago", "1 week ago"
    """
    if reference_time is None:
        reference_time = datetime.now()
    
    # Normalize the string
    time_string = time_string.lower().strip()
    
    # Pattern: number + time unit + "ago"
    pattern = r'(d+)s*(second|minute|hour|day|week|month|year)s?s*ago'
    match = re.match(pattern, time_string)
    
    if not match:
        raise ValueError(f"Cannot parse: {time_string}")
    
    amount = int(match.group(1))
    unit = match.group(2)
    
    # Map units to timedelta kwargs
    unit_mapping = {
        'second': 'seconds',
        'minute': 'minutes',
        'hour': 'hours',
        'day': 'days',
        'week': 'weeks',
    }
    
    if unit in unit_mapping:
        delta_kwargs = {unit_mapping[unit]: amount}
        return reference_time - timedelta(**delta_kwargs)
    elif unit == 'month':
        # Approximate: 30 days per month
        return reference_time - timedelta(days=amount * 30)
    elif unit == 'year':
        # Approximate: 365 days per year
        return reference_time - timedelta(days=amount * 365)

The function uses a regular expression (regex) extracting a number and time unit from a string. The pattern (d+) captures one or more digits, too (second|minute|hour|day|week|month|year) corresponds to the unit of time. I s? make plural 's' optional, so both “hour” and “hours” work.

That's for units timedelta supports directly (seconds to weeks), we create a timedelta and remove it from the reference period. For months and years, we estimate using 30 and 365 days respectively. This isn't perfect, but it's good enough for most use cases.

I reference_time parameter allows you to specify a different “now” for testing or when processing historical data.

Let's check it out:

result1 = parse_relative_time("2 hours ago")
result2 = parse_relative_time("3 days ago")
result3 = parse_relative_time("1 week ago")

print(f"2 hours ago: {result1}")
print(f"3 days ago: {result2}")
print(f"1 week ago: {result3}")

Output:

2 hours ago: 2026-01-06 12:09:34.584107
3 days ago: 2026-01-03 14:09:34.584504
1 week ago: 2025-12-30 14:09:34.584558

# 2. Extracting Dates from Natural Language Text

Sometimes you need to find the dates buried in the text: “The meeting is scheduled for January 15, 2026” or “Please respond by March 3”. Instead of splitting the entire sentence manually, you want to extract just the date.

Here is a function that finds and outputs dates in natural language:

import re
from datetime import datetime

def extract_date_from_text(text, current_year=None):
    """
    Extract dates from natural language text.
    
    Handles formats like:
    - "January 15th, 2024"
    - "March 3rd"
    - "Dec 25th, 2023"
    """
    if current_year is None:
        current_year = datetime.now().year
    
    # Month names (full and abbreviated)
    months = {
        'january': 1, 'jan': 1,
        'february': 2, 'feb': 2,
        'march': 3, 'mar': 3,
        'april': 4, 'apr': 4,
        'may': 5,
        'june': 6, 'jun': 6,
        'july': 7, 'jul': 7,
        'august': 8, 'aug': 8,
        'september': 9, 'sep': 9, 'sept': 9,
        'october': 10, 'oct': 10,
        'november': 11, 'nov': 11,
        'december': 12, 'dec': 12
    }
    
    # Pattern: Month Day(st/nd/rd/th), Year (year optional)
    pattern = r'(january|jan|february|feb|march|mar|april|apr|may|june|jun|july|jul|august|aug|september|sep|sept|october|oct|november|nov|december|dec)s+(d{1,2})(?:st|nd|rd|th)?(?:,?s+(d{4}))?'
    
    matches = re.findall(pattern, text.lower())
    
    if not matches:
        return None
    
    # Take the first match
    month_str, day_str, year_str = matches[0]
    
    month = months[month_str]
    day = int(day_str)
    year = int(year_str) if year_str else current_year
    
    return datetime(year, month, day)

The function creates a dictionary mapping month words (both full and abbreviated) to their numeric values. The regex pattern matches month names followed by day numbers with optional ordinal suffixes (st, nd, rd, th) and optional year.

I (?:...) syntax creates a non-capturing group. This means we are matching the pattern but not keeping it apart. This is useful for optional components such as ordinal and year suffixes.

If no year is provided, the function automatically changes to the current year. This makes sense because when someone mentions “March 3” in January, they are usually referring to the coming March, not the previous year.

Let's test it in various text formats:

text1 = "The meeting is scheduled for January 15th, 2026 at 3pm"
text2 = "Please respond by March 3rd"
text3 = "Deadline: Dec 25th, 2026"

date1 = extract_date_from_text(text1)
date2 = extract_date_from_text(text2)
date3 = extract_date_from_text(text3)

print(f"From '{text1}': {date1}")
print(f"From '{text2}': {date2}")
print(f"From '{text3}': {date3}")

Output:

From 'The meeting is scheduled for January 15th, 2026 at 3pm': 2026-01-15 00:00:00
From 'Please respond by March 3rd': 2026-03-03 00:00:00
From 'Deadline: Dec 25th, 2026': 2026-12-25 00:00:00

# 3. Analyzing Variable Date Formats with Smart Detection

Real-world data comes in many formats. Writing separate parsers for each format is tedious. Instead, let's build a function that tries multiple formats automatically.

Here's a smart date parser that handles common formats:

from datetime import datetime

def parse_flexible_date(date_string):
    """
    Parse dates in multiple common formats.
    
    Tries various formats and returns the first match.
    """
    date_string = date_string.strip()
    
    # List of common date formats
    formats = [
        '%Y-%m-%d',           
        '%Y/%m/%d',           
        '%d-%m-%Y',           
        '%d/%m/%Y',         
        '%m/%d/%Y',           
        '%d.%m.%Y',          
        '%Y%m%d',            
        '%B %d, %Y',      
        '%b %d, %Y',         
        '%d %B %Y',          
        '%d %b %Y',           
    ]
    
    # Try each format
    for fmt in formats:
        try:
            return datetime.strptime(date_string, fmt)
        except ValueError:
            continue
    
    # If nothing worked, raise an error
    raise ValueError(f"Unable to parse date: {date_string}")

This function uses the brute-force method. It tries each format until it works. I strptime the work suggests a ValueError if the date string doesn't match the format, then we catch that exception and move on to the next format.

The order of the formats is important. We specify the format of the International Organization for Standardization (ISO) (%Y-%m-%d) first because it is more common in technical situations. Ambiguous formats are similar %d/%m/%Y again %m/%d/%Y appear later. If you know your data consistently uses one, rearrange the list to prioritize it.

Let's test it with various date formats:

# Test different formats
dates = [
    "2026-01-15",
    "15/01/2026",
    "01/15/2026",
    "15.01.2026",
    "20260115",
    "January 15, 2026",
    "15 Jan 2026"
]

for date_str in dates:
    parsed = parse_flexible_date(date_str)
    print(f"{date_str:20} -> {parsed}")

Output:

2026-01-15           -> 2026-01-15 00:00:00
15/01/2026           -> 2026-01-15 00:00:00
01/15/2026           -> 2026-01-15 00:00:00
15.01.2026           -> 2026-01-15 00:00:00
20260115             -> 2026-01-15 00:00:00
January 15, 2026     -> 2026-01-15 00:00:00
15 Jan 2026          -> 2026-01-15 00:00:00

This method isn't very efficient, but it's simple and handles most of the date formats you'll encounter.

# 4. Separation Period

Video players, exercise trackers, and timing apps display durations such as “1h 30m” or “2:45:30”. When analyzing user input or scraped data, you need to convert this to timedelta calculators.

Here is a function that parses common length formats:

from datetime import timedelta
import re

def parse_duration(duration_string):
    """
    Parse duration strings into timedelta objects.
    
    Handles formats like:
    - "1h 30m 45s"
    - "2:45:30" (H:M:S)
    - "90 minutes"
    - "1.5 hours"
    """
    duration_string = duration_string.strip().lower()
    
    # Try colon format first (H:M:S or M:S)
    if ':' in duration_string:
        parts = duration_string.split(':')
        if len(parts) == 2:
            # M:S format
            minutes, seconds = map(int, parts)
            return timedelta(minutes=minutes, seconds=seconds)
        elif len(parts) == 3:
            # H:M:S format
            hours, minutes, seconds = map(int, parts)
            return timedelta(hours=hours, minutes=minutes, seconds=seconds)
    
    # Try unit-based format (1h 30m 45s)
    total_seconds = 0
    
    # Find hours
    hours_match = re.search(r'(d+(?:.d+)?)s*h(?:ours?)?', duration_string)
    if hours_match:
        total_seconds += float(hours_match.group(1)) * 3600
    
    # Find minutes
    minutes_match = re.search(r'(d+(?:.d+)?)s*m(?:in(?:ute)?s?)?', duration_string)
    if minutes_match:
        total_seconds += float(minutes_match.group(1)) * 60
    
    # Find seconds
    seconds_match = re.search(r'(d+(?:.d+)?)s*s(?:ec(?:ond)?s?)?', duration_string)
    if seconds_match:
        total_seconds += float(seconds_match.group(1))
    
    if total_seconds > 0:
        return timedelta(seconds=total_seconds)
    
    raise ValueError(f"Unable to parse duration: {duration_string}")

The function handles two major formats: colon-separated time and unit-based strings. In the colon format, we separate the colon and translate the parts as hours, minutes, and seconds (or just minutes and seconds as the length of the two parts).

In the unit-based format, we use three different regex patterns to find hours, minutes, and seconds. The pattern (d+(?:.d+)?) matches numbers or decimals like “1.5”. The pattern s*h(?:ours?)? matches “h”, “hour”, or “hours” with optional white space.

Each matched value is converted to seconds and added to the value. This method allows the function to handle partial durations such as “45s” or “2h 15m” without requiring all units to be present.

Now let's examine the function in various length formats:

durations = [
    "1h 30m 45s",
    "2:45:30",
    "90 minutes",
    "1.5 hours",
    "45s",
    "2h 15m"
]

for duration in durations:
    parsed = parse_duration(duration)
    print(f"{duration:15} -> {parsed}")

Output:

1h 30m 45s      -> 1:30:45
2:45:30         -> 2:45:30
90 minutes      -> 1:30:00
1.5 hours       -> 1:30:00
45s             -> 0:00:45
2h 15m          -> 2:15:00

# 5. Analyzing the ISO Days of the Week

Some systems use ISO week dates instead of standard calendar days. An ISO week date like “2026-W03-2” means “week 3 of 2026, day 2 (Tuesday)”. This format is common in business situations where planning occurs on a weekly basis.

Here is a function to analyze the ISO week dates:

from datetime import datetime, timedelta

def parse_iso_week_date(iso_week_string):
    """
    Parse ISO week date format: YYYY-Www-D
    
    Example: "2024-W03-2" = Week 3 of 2024, Tuesday
    
    ISO week numbering:
    - Week 1 is the week with the first Thursday of the year
    - Days are numbered 1 (Monday) through 7 (Sunday)
    """
    # Parse the format: YYYY-Www-D
    parts = iso_week_string.split('-')
    
    if len(parts) != 3 or not parts[1].startswith('W'):
        raise ValueError(f"Invalid ISO week format: {iso_week_string}")
    
    year = int(parts[0])
    week = int(parts[1][1:])  # Remove 'W' prefix
    day = int(parts[2])
    
    if not (1 <= week <= 53):
        raise ValueError(f"Week must be between 1 and 53: {week}")
    
    if not (1 <= day <= 7):
        raise ValueError(f"Day must be between 1 and 7: {day}")
    
    # Find January 4th (always in week 1)
    jan_4 = datetime(year, 1, 4)
    
    # Find Monday of week 1
    week_1_monday = jan_4 - timedelta(days=jan_4.weekday())
    
    # Calculate the target date
    target_date = week_1_monday + timedelta(weeks=week - 1, days=day - 1)
    
    return target_date

The days of the ISO week follow certain rules. Week 1 is defined as the week containing the first Thursday of the year. This means that week 1 may start in December of the previous year.

The function uses a reliable method: find January 4 (which is in week 1), then find the Monday of that week. From there, we add the appropriate number of weeks and days to reach the target date.

The figure jan_4.weekday() returns 0 from monday to 6 for sunday. Subtracting this from the 4th of January gives us the Monday of week 1. Then we add (week - 1) weeks and (day - 1) days to receive the last date.

Let's check it out:

# Test ISO week dates
iso_dates = [
    "2024-W01-1",  # Week 1, Monday
    "2024-W03-2",  # Week 3, Tuesday
    "2024-W10-5",  # Week 10, Friday
]

for iso_date in iso_dates:
    parsed = parse_iso_week_date(iso_date)
    print(f"{iso_date} -> {parsed.strftime('%Y-%m-%d (%A)')}")

Output:

2024-W01-1 -> 2024-01-01 (Monday)
2024-W03-2 -> 2024-01-16 (Tuesday)
2024-W10-5 -> 2024-03-08 (Friday)

This format is less common than regular dates, but when encountered, the parser's correction saves valuable time.

# Wrapping up

Each function in this article uses regex patterns and datetime arithmetic to handle formatting variables. These techniques carry over to other analysis challenges, as you can adapt these patterns to custom date formats for your projects.

Building your own parsers helps you understand how date parsing works. If you're using a non-standard date format that standard libraries can't handle, you'll be ready to write a custom solution.

These functions are especially useful for small scripts, prototypes, and learning projects where adding large external dependencies can be prohibitive. Enjoy coding!

Count Priya C is an engineer and technical writer from India. He loves working at the intersection of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, he works to learn and share his knowledge with the engineering community by authoring tutorials, how-to guides, ideas, and more. Bala also creates engaging resource overviews and code tutorials.

Source link

nimda 6 days ago

0 4 8 minutes read