ANI

Izisekelo zezinkinga ze-Python ze-Debugging

Izisekelo zezinkinga ze-Python ze-Debugging
Isithombe ngombhali | Ikantheyili

Wake wagijima iskripthi sePython futhi wafisa ukuthi ngabe awuzange ucindezele ukungena?

Ukulungisa iphutha kwisayensi yedatha akuyona nje isenzo; Kuyikhono lokusinda – ikakhulukazi lapho ubhekene ne-dissed datasets noma amamodeli okubikezela okubikezela lapho abantu bangempela bathembela khona.

Kulesi sihloko, sizohlola izisekelo zokulungisa iphutha, ikakhulukazi ekuhambeni kwakho kwesayensi yesayensi, sisebenzisa i-dataset yempilo yangempela kusuka emsebenzini wokulethwa kwe-doordash, futhi okubaluleke kakhulu, ukuthi ungalungisa kanjani njenge-pro.

I-Doordash Delivery Dreamiction DEMANDICTRICTION: Sisebenzelana nani?

Izinkinga ze-Python ze-Debugging ekubikezelweni kokubeletha isikhathiIzinkinga ze-Python ze-Debugging ekubikezelweni kokubeletha isikhathi

Kule projekthi yedatha, i-dodash yabuza ukhetho lwayo lwesayensi ukuthi libikezele isikhathi sokulethwa. Ake siqale sibheke imininingwane yedathaset. Nayi ikhodi:

Nansi umphumela:

  Inkinga ye-python ye-debugging ekubikezeleni isikhathi sokulethwa  Inkinga ye-python ye-debugging ekubikezeleni isikhathi sokulethwa

Kubukeka sengathi azange zinikeze isikhathi sokulethwa, ngakho-ke kufanele ukuba ukubala lapha. Kulula, kepha azikho izinkathazo uma ungumqali. Ake sibheke ukuthi ingabalwa kanjani.

import pandas as pd
from datetime import datetime

# Assuming historical_data is your DataFrame
historical_data["created_at"] = pd.to_datetime(historical_data['created_at'])
historical_data["actual_delivery_time"] = pd.to_datetime(historical_data['actual_delivery_time'])
historical_data["actual_total_delivery_duration"] = (historical_data["actual_delivery_time"] - historical_data["created_at"]).dt.total_seconds()
historical_data.head()

Nansi ikhanda lomphumela; ungabona actual_total_delivery_duration.

Ukukhishwa kwenkinga ye-Python ye-debugging yokubikezela isikhathi sokulethwa kwezikhathiUkukhishwa kwenkinga ye-Python ye-debugging yokubikezela isikhathi sokulethwa kwezikhathi

Kuhle, manje sesingaqala! Kepha ngaphambi kwalokho, nansi i-Data Inchazelo yolimi yale datha.

Amakholomu ngaphakathi historical_data.csv

Izici Zesikhathi:

  • I-Market_ID: idolobha / isifunda lapho i-doordash isebenza khona, isib, i-los angeles, enikezwe kwimininingwane njenge-ID.
  • I-DEDED_AT: Isikhathi: I-Timestamp e-UTC lapho i-oda lihanjiswe ngumthengi ku-Doordash. .
  • I-Real_Delivery_TiTe: Isikhathi sesikhathi e-UTC lapho i-oda seluhanjiswe kumthengi.

Izici zesitolo:

  • Isitolo_id: I-ID emele indawo yokudlela i-oda yathunyelwa kulo.
  • Isitolo_Primary_category: isigaba se-cuisine sendawo yokudlela, isib, isiNtaliyane, sase-Asia.
  • I-oda_protocol: Isitolo singathola ama-oda kusuka ku-doordash ngezindlela eziningi. Le nkambu imelela i-ID ekhomba iphrothokholi.

Izici ze-oda:

  • Ingqikithi_iTems: Inani eliphelele lezinto ngokulandelana.
  • I-Subtotal: Inani eliphelele le-oda elihanjisiwe (kumasenti).
  • I-NUM_Distinct_items: Inani lezinto ezihlukile ezifakiwe ku-oda.
  • Min_ITEM_Price: Intengo yento enezindleko ezimbalwa ku-oda (kumasenti).
  • UMax_ITEM_Price: Intengo yento enezindleko eziphakeme kakhulu ku-oda (kumasenti).

Izici Zemakethe:

I-Doordash ibe semakethe, sinolwazi ngesimo semakethe lapho kufakwa i-oda, olungasetshenziswa ukulinganisa isikhathi sokulethwa. Izici ezilandelayo ngamanani ngesikhathi created_at (Isikhathi sokungenisa i-oda):

  • I-Total_onshift_Dasher: Inani lama-maxiashi atholakalayo angaphansi kwamamayela ayi-10 esitolo ngesikhathi soku-oda indalo.
  • I-Total_Busy_Dashers: I-Subset yalokhu okungenhla total_onshift_dashers obani njengamanje abasebenza nge-oda.
  • Ingqikithi_ukude_odwana: Inani lama-oda ngaphakathi kwamamayela ayi-10 ale oda acubungulwa njengamanje.

Izibikezelo ezivela kwamanye amamodeli:

Sinezibikezelo ezivela kwamanye amamodeli ngezigaba ezahlukahlukene zenqubo yokulethwa esingayisebenzisa:

  • Isilinganiso_Sorder_Place_Duration: Isikhathi esilinganisiwe sesitolo sokudlela ukuthola i-oda kusuka ku-doordash (ngemizuzwana).
  • Kulinganiselwa_store_to_consumer_udrivation: isikhathi sokuhamba esilinganiselwe phakathi kwesitolo nomthengi (ngemizuzwana).

Kuhle, ngakho-ke ake siqale!

Amaphutha ajwayelekile wePython kumaphrojekthi weSayensi yedatha

Amaphutha ajwayelekile wePython kumaphrojekthi weSayensi yedathaAmaphutha ajwayelekile wePython kumaphrojekthi weSayensi yedatha

Kulesi sigaba, sizothola amaphutha ajwayelekile okulima amaphrojekthi kwenye yamaphrojekthi wesayensi yedatha, aqale ngokufunda idatha yedatha futhi adlule engxenyeni ebaluleke kakhulu: ukumodela.

Kufundwa i-Dataset: FileNotFoundErrorIsexwayiso se-Dtype, kanye nokulungisa

I-Case 1: Ifayela alitholakali – I-Classic

Kwisayensi yedatha, i-bug yakho yokuqala ivame ukukubingelela read_csv. Futhi hhayi nge-hello. Masisuse iphutha lelo mzuzu ngqo ndawonye, umugqa ngomugqa. Nayi ikhodi:

import pandas as pd

try:
    df = pd.read_csv('Strata Questions/historical_data.csv')
    df.head(3)
except FileNotFoundError as e:
    import os
    print("File not found. Here's where Python is looking:")
    print("Working directory:", os.getcwd())
    print("Available files:", os.listdir())
    raise e

Nakhu okuphumayo.

Amaphutha e-Python alungisa iphutha kumaphrojekthi weSayensi yedathaAmaphutha e-Python alungisa iphutha kumaphrojekthi weSayensi yedatha

Awumane nje ukhuphule iphutha – uyaphenya. Lokhu kukhombisa lapho ikhodi icabanga ukuthi yikhona nokuthi ibonani kuyo. Uma ifayela lakho lingekho ohlwini, manje uyazi. Akukho ukuqagela. Amaqiniso nje.

Faka esikhundleni sendlela ephelele, ne-voilĂ !

Amaphutha e-Python amaphutha e-Debugging kufayela alitholakaliAmaphutha e-Python amaphutha e-Debugging kufayela alitholakali

I-Case 2: I-Dtype yokuchazwa kabi – Ukuqagela okungalungile kukaPython ngokuthula

Ulayisha i-dataset, kepha kukhona okungekho. I-bug ifihla ngaphakathi kwezinhlobo zakho.

# Assuming df is your loaded DataFrame
try:
    print("Column Types:n", df.dtypes)
except Exception as e:
    print("Error reading dtypes:", e)

Nakhu okuphumayo.

Amaphutha e-Python amaphutha e-Dython ekuchazeni kabi kwe-DtypeAmaphutha e-Python amaphutha e-Dython ekuchazeni kabi kwe-Dtype

I-Case 3: Usuku Parsing – I-saboteur ethule

Sithole ukuthi kufanele sibala isikhathi sokulethwa kuqala, futhi sakwenza ngale ndlela.

try:
    # This code was shown earlier to calculate the delivery duration
    df["created_at"] = pd.to_datetime(df['created_at'])
    df["actual_delivery_time"] = pd.to_datetime(df['actual_delivery_time'])
    df["actual_total_delivery_duration"] = (df["actual_delivery_time"] - df["created_at"]).dt.total_seconds()
    print("Successfully calculated delivery duration and checked dtypes.")
    print("Relevant dtypes:n", df[['created_at', 'actual_delivery_time', 'actual_total_delivery_duration']].dtypes)
except Exception as e:
    print("Error during date processing:", e)

Nakhu okuphumayo.

Amaphutha e-Python alungisa iphutha ekuphambukeni kwedathaAmaphutha e-Python alungisa iphutha ekuphambukeni kwedatha

Kuhle nochwepheshe! Manje sigwema lawo maphutha abomvu, azophakamisa imizwa yethu – ngiyazi ukuthi ukuzibona kungadambisa isisusa sakho.

Ukuphatha idatha elahlekile: KeyErrors, NaNsnezingibe ezinengqondo

Ezinye izimbungulu zingashayeki ikhodi yakho. Bakunikeza nje imiphumela engafanele, buthule, kuze kube yilapho uzibuza ukuthi kungani imodeli yakho idoti.

Lesi sigaba sigabha idatha elahlekile – hhayi nje ukuyihlanza kanjani, kepha ungayiphonsa kahle.

Icala 1: Ukhiye we-KeyErlor – ubucabanga ukuthi ikholomu ikhona

Nayi ikhodi yethu.

try:
    print(df['store_rating'])
except KeyError as e:
    print("Column not found:", e)
    print("Here are the available columns:n", df.columns.tolist())

Nakhu okuphumayo.

I-KeyError ezinkingeni ze-Python ze-DebuggingI-KeyError ezinkingeni ze-Python ze-Debugging

Ikhodi ayizange iphule ngenxa yokunengqondo; Kwagqashuka ngenxa yokucabanga. Yilokho ngokunembile lapho kuhlanya khona izimpilo. Njalo bhala amakholomu akho ngaphambi kokufinyelela kubo ngokungaboni.

I-Case 2: I-Nan Count – Amanani Alahlekile Wokuthi Awulindelanga

Ucabanga ukuthi konke kuhlanzekile. Kepha idatha yangempela yomhlaba ihlala ifihla izikhala. Ake sibabheke.

try:
    null_counts = df.isnull().sum()
    print("Nulls per column:n", null_counts[null_counts > 0])
except Exception as e:
    print("Failed to inspect nulls:", e)

Nakhu okuphumayo.

Nan Bala ezinkingeni ze-Python ze-debuggingNan Bala ezinkingeni ze-Python ze-debugging

Lokhu kudalula abahlukumezi abathule. Kungenzeka store_primary_category ilahlekile ezinkulungwaneni zemigqa. Mhlawumbe ama-timestamp ahlulekile ukuguqulwa futhi manje NaT.

Ubungeke wazi ngaphandle kokuthi uhlolile. Ukulungisa iphutha – okuqinisekisa konke ukucabanga.

I-Case 3: Izingibe ezinengqondo – idatha engekho empeleni ayisho lutho

Ake sithi uzama ukuhlunga ama-oda lapho okuhlobise khona kukhulu kune-1,000,000, kulindele amakhulu emigqa. Kepha lokhu kukunika zero:

try:
    filtered = df[df['subtotal'] > 1000000]
    print("Rows with subtotal > 1,000,000:", filtered.shape[0])
except Exception as e:
    print("Filtering error:", e)

Leyo akuyona iphutha lekhodi – kuyiphutha elinengqondo. Ulindele ama-oda aphezulu wenani eliphakeme, kepha mhlawumbe akekho okhona ngaphezu kwalombundu. Ukulungisa iphutha ngesheke elihlalu:

print("Subtotal range:", df['subtotal'].min(), "to", df['subtotal'].max())

Nakhu okuphumayo.

Izingibe ezinengqondo ezinkingeni ze-Python ze-DebuggingIzingibe ezinengqondo ezinkingeni ze-Python ze-Debugging

Ikesi 4: isna() ≠ zero akusho ukuthi kuhlanzekile

Noma kunjalo isna().sum() Ibonisa zero, kungahle kube nedatha engcolile, njenge-whitespace noma 'akukho' 'njengentambo. Qalisa isheke elinolaka ngokwengeziwe:

try:
    fake_nulls = df[df['store_primary_category'].isin(['', ' ', 'None', None])]
    print("Rows with fake missing categories:", fake_nulls.shape[0])
except Exception as e:
    print("Fake missing value check failed:", e)

Lokhu kubamba udoti ofihliwe lokho isnull() uphuthelwe.

Ukuphatha idatha elahlekile ezinkingeni ze-Python ze-debuggingUkuphatha idatha elahlekile ezinkingeni ze-Python ze-debugging

Isici sokukhazimula kobunjiniyela: TypeErrorsUsuku lokuphamba, nokuningi

Ubunjiniyela besici bubonakala bumnandi ekuqaleni, kuze kube yilapho ikhola lakho elisha liphule yonke imodeli noma liphonse a TypeError mid-pipeline. Nakhu ukuthi ungakuphikisa leso sigaba njengomuntu oshisiwe ngaphambili.

Icala 1: Ucabanga ukuthi ungahlukanisa, kepha awukwazi

Masakhe isici esisha. Uma kwenzeka iphutha, thina try-except I-block izoyibamba.

try:
    df['value_per_item'] = df['subtotal'] / df['total_items']
    print("value_per_item created successfully")
except Exception as e:
    print("Error occurred:", e)

Nakhu okuphumayo.

Faka ama-glutches wobunjiniyela ezinkingeni ze-python ze-debuggingFaka ama-glutches wobunjiniyela ezinkingeni ze-python ze-debugging

Akunamaphutha? Kuhle. Kepha ake sibheke eduze.

print(df[['subtotal', 'total_items', 'value_per_item']].sample(3))

Nakhu okuphumayo.

Faka ama-glutches wobunjiniyela ezinkingeni ze-python ze-debuggingFaka ama-glutches wobunjiniyela ezinkingeni ze-python ze-debugging

I-Case 2: Usuku lokuphamba alunalo

Manje, ukuguqula okwakho dtype Kubalulekile, kepha kuthiwani uma ucabanga ukuthi konke kwenziwa kahle, nokho kuqhubeka izinkinga?

# This is the standard way, but it can fail silently on mixed types
df["created_at"] = pd.to_datetime(df["created_at"])
df["actual_delivery_time"] = pd.to_datetime(df["actual_delivery_time"])

Ungacabanga ukuthi kulungile, kepha uma ikholomu yakho inezinhlobo ezixubile, ingahluleka buthule noma iphule ipayipi lakho. Kungakho, esikhundleni sokwenza izinguquko ngokuqondile, kungcono ukusebenzisa umsebenzi onamandla.

from datetime import datetime

def parse_date_debug(df, col):
    try:
        parsed = pd.to_datetime(df[col])
        print(f"[SUCCESS] '{col}' parsed successfully.")
        return parsed
    except Exception as e:
        print(f"[ERROR] Failed to parse '{col}':", e)
        # Find non-date-like values to debug
        non_datetimes = df[pd.to_datetime(df[col], errors="coerce").isna()][col].unique()
        print("Sample values causing issue:", non_datetimes[:5])
        raise

df["created_at"] = parse_date_debug(df, "created_at")
df["actual_delivery_time"] = parse_date_debug(df, "actual_delivery_time")

Nakhu okuphumayo.

Usuku olungalungile luphambukela ezinkingeni ze-python ze-debuggingUsuku olungalungile luphambukela ezinkingeni ze-python ze-debugging

Lokhu kukusiza ukulandelela imigqa enephutha lapho ukuphahlazeka kwe-datetterime kuhlanganisa.

I-Case 3: Ukuhlukaniswa Ngenave Okungadukisa

Lokhu ngeke kuphonse iphutha kwi-dataframe yethu njengoba amakholomu asevele enesinombolo. Kepha nansi impikiswano: Amanye ama-datasets ahlehlela ezinhlotsheni zento, noma ngabe zibukeka njengezinombolo. Okuholela ku:

  • Izilinganiso ezidukisayo
  • Ukuziphatha Okungalungile Kwemodeli
  • Azikho izexwayiso
df["busy_dashers_ratio"] = df["total_busy_dashers"] / df["total_onshift_dashers"]

Ake siqinisekise izinhlobo ngaphambi kwekhompyutha, noma ngabe ukusebenza ngeke kuphonse iphutha.

import numpy as np

def create_ratio_debug(df, num_col, denom_col, new_col):
    num_type = df[num_col].dtype
    denom_type = df[denom_col].dtype

    if not np.issubdtype(num_type, np.number) or not np.issubdtype(denom_type, np.number):
        print(f"[TYPE WARNING] '{num_col}' or '{denom_col}' is not numeric.")
        print(f"{num_col}: {num_type}, {denom_col}: {denom_type}")
        df[new_col] = np.nan
        return df
    
    if (df[denom_col] == 0).any():
        print(f"[DIVISION WARNING] '{denom_col}' contains zeros.")
    
    df[new_col] = df[num_col] / df[denom_col]
    return df

df = create_ratio_debug(df, "total_busy_dashers", "total_onshift_dashers", "busy_dashers_ratio")

Nakhu okuphumayo.

Ukuhlukaniswa okungenangqondo kudukisa izinkinga ze-Python ze-debuggingUkuhlukaniswa okungenangqondo kudukisa izinkinga ze-Python ze-debugging

Lokhu kunikeza ukubonakala ekuhlukanisweni kwezingqinamba ezingaba khona-ze-zero futhi kuvimbela izimbungulu ezithule.

Amaphutha wokumodela: ukwakheka okumangazayo nokudideka kokuhlola

I-Case 1: Amanani weNan ku-EPS COLED

Ake sithi sifuna ukwakha imodeli yokuhlehlisa eliqondile. LinearRegression() ayisekeli amanani we-nan ngemvelo. Uma noma yimuphi umugqa we-X enenani elilahlekile, imodeli yenqaba ukuqeqesha.

Nayi ikhodi, eyakha ngamabomu ama-mismatch abumba iphutha:

from sklearn.linear_model import LinearRegression

X_train = df[["estimated_order_place_duration", "estimated_store_to_consumer_driving_duration"]].iloc[:-10]
y_train = df["actual_total_delivery_duration"].iloc[:-5] 
model = LinearRegression()
model.fit(X_train, y_train)

Nakhu okuphumayo.

Amaphutha wokumodela ezinkingeni ze-Python ze-debuggingAmaphutha wokumodela ezinkingeni ze-Python ze-debugging

Masisuse inkinga kule magazini. Okokuqala, sibheka amaNans.

print(X_train.isna().sum())

Nakhu okuphumayo.

Izinkinga ze-Python ze-Debugging kumanani weNanIzinkinga ze-Python ze-Debugging kumanani weNan

Kuhle, ake sibheke okunye okuguquguqukayo.

print(y_train.isna().sum())

Nakhu okuphumayo.

Izinkinga ze-Python ze-Debugging kumanani weNanIzinkinga ze-Python ze-Debugging kumanani weNan

Ama-mismatch nama-nan amanani kufanele axazululwe. Nayi ikhodi ukuyilungisa.

from sklearn.linear_model import LinearRegression

# Re-align X and y to have the same length
X = df[["estimated_order_place_duration", "estimated_store_to_consumer_driving_duration"]]
y = df["actual_total_delivery_duration"]

# Step 1: Drop rows with NaN in features (X)
valid_X = X.dropna()

# Step 2: Align y to match the remaining indices of X
y_aligned = y.loc[valid_X.index]

# Step 3: Find indices where y is not NaN
valid_idx = y_aligned.dropna().index

# Step 4: Create final clean datasets
X_clean = valid_X.loc[valid_idx]
y_clean = y_aligned.loc[valid_idx]

model = LinearRegression()
model.fit(X_clean, y_clean)
print("âś… Model trained successfully!")

Ne-voilĂ ! Nakhu okuphumayo.

Dataset yezinkinga ze-python ze-debuggingDataset yezinkinga ze-python ze-debugging

I-Case 2: Amakholomu Ento (Izinsuku) zishayisa imodeli

Ake sithi uzama ukuqeqesha imodeli usebenzisa i-timestamp efana actual_delivery_time.

Kepha – oh cha – kuseyinto noma uhlobo lwe-datettime, futhi uyixubekile ngephutha ngamakholomu ezinombolo. Ukuhlehlisa okuqondile akunjalo akunjalo.

from sklearn.linear_model import LinearRegression

X = df[["actual_delivery_time", "estimated_order_place_duration"]]
y = df["actual_total_delivery_duration"]

model = LinearRegression()
model.fit(X, y)

Nayi ikhodi yephutha:

I-Python Ekhunjulwa Kwikholomu YentoI-Python Ekhunjulwa Kwikholomu Yento

Uhlanganisa izinhlobo ezimbili ezingahambelani ezingekho emthethweni ku-X Matrix:

  • Ikholamu eyodwa (actual_delivery_time) datetime64.
  • Elinye (estimated_order_place_duration) int64.

I-Skikit-Funda ilindele ukuthi zonke izici zibe yinombolo efanayo ye-Dtype. Ayikwazi ukuphatha izinhlobo ezihlanganisiwe ezifana ne-datetime ne-int. Ake sixazulule ngokuguqula ikholomu ye-DateTet ibe yi-Izinombolo (i-Unix Timestamp).

# Ensure datetime columns are parsed correctly, coercing errors to NaT
df["actual_delivery_time"] = pd.to_datetime(df["actual_delivery_time"], errors="coerce")
df["created_at"] = pd.to_datetime(df["created_at"], errors="coerce")

# Recalculate duration in case of new NaNs
df["actual_total_delivery_duration"] = (df["actual_delivery_time"] - df["created_at"]).dt.total_seconds()

# Convert datetime to a numeric feature (Unix timestamp in seconds)
df["delivery_time_timestamp"] = df["actual_delivery_time"].astype("int64") // 10**9

Kuhle. Manje njengoba ama-Dtypes anezinombolo, ake sisebenzise imodeli ye-ML.

from sklearn.linear_model import LinearRegression

# Use the new numeric timestamp feature
X = df[["delivery_time_timestamp", "estimated_order_place_duration"]]
y = df["actual_total_delivery_duration"]

# Drop any remaining NaNs from our feature set and target
X_clean = X.dropna()
y_clean = y.loc[X_clean.index].dropna()
X_clean = X_clean.loc[y_clean.index]

model = LinearRegression()
model.fit(X_clean, y_clean)
print("âś… Model trained successfully!")

Nakhu okuphumayo.

I-Python Ekhunjulwa Kwikholomu YentoI-Python Ekhunjulwa Kwikholomu Yento

Umsebenzi omuhle!

Imicabango yokugcina: I-Debug Smarter, ayinzima

Ukuphahlazeka kwamamodeli akuhlali kuvela kusuka kuma-bugs ayinkimbinkimbi – kwesinye isikhathi, kungumuntu odukayo nje noma ikholomu yosuku engachazwanga ingena kwipayipi lakho ledatha.

Esikhundleni sokulwa nokulandela umkhondo wesitaki se-cryptic noma ukuphonsa try-except Amabhulokhi afana nama-DARDS ebumnyameni, amba ku-dataframe yakho ekuqaleni. Peek at .info()hlola .isna().sum()futhi musa ukunamahloni .dtypes. Lezi zinyathelo ezilula zembula imisindo efihliwe ngaphambi kokuthi ushaye fit().

Ngikukhombisile ukuthi ngisho nohlobo olulodwa olunganakwa lwento noma inani elilahlekile elingashodayo lingahlanza imodeli. Kepha ngeso elibukhali, ukuhlanza i-prep, kanye nesizinda se-activation, uzosuka ekunciphiseni ukulungiswa kabusha ukuze wakhe ngobuhlakani.

Nate rosidi Ungusosayensi wedatha kanye necebo lomkhiqizo. Ungumuntu futhi ongumsunguli wokufundisa ofundisayo, futhi umsunguli weStratascratch, isiteji esisiza ososayensi bedatha balungiselele izingxoxo zabo ngemibuzo yangempela yengxoxo evela ezinkampanini eziphezulu. UNate ubhala ezitayeleni zakamuva emakethe yezemisebenzi, unikeza izeluleko ezixoxiswayo, wabelana ngamaphrojekthi weSayensi yedatha, futhi amboze yonke into SQL.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button