Izisekelo zezinkinga ze-Python ze-Debugging


Isithombe ngombhali | Ikantheyili
Wake wagijima iskripthi sePython futhi wafisa ukuthi ngabe awuzange ucindezele ukungena?
Ukulungisa iphutha kwisayensi yedatha akuyona nje isenzo; Kuyikhono lokusinda – ikakhulukazi lapho ubhekene ne-dissed datasets noma amamodeli okubikezela okubikezela lapho abantu bangempela bathembela khona.
Kulesi sihloko, sizohlola izisekelo zokulungisa iphutha, ikakhulukazi ekuhambeni kwakho kwesayensi yesayensi, sisebenzisa i-dataset yempilo yangempela kusuka emsebenzini wokulethwa kwe-doordash, futhi okubaluleke kakhulu, ukuthi ungalungisa kanjani njenge-pro.
I-Doordash Delivery Dreamiction DEMANDICTRICTION: Sisebenzelana nani?


Kule projekthi yedatha, i-dodash yabuza ukhetho lwayo lwesayensi ukuthi libikezele isikhathi sokulethwa. Ake siqale sibheke imininingwane yedathaset. Nayi ikhodi:
Nansi umphumela:

Kubukeka sengathi azange zinikeze isikhathi sokulethwa, ngakho-ke kufanele ukuba ukubala lapha. Kulula, kepha azikho izinkathazo uma ungumqali. Ake sibheke ukuthi ingabalwa kanjani.
import pandas as pd
from datetime import datetime
# Assuming historical_data is your DataFrame
historical_data["created_at"] = pd.to_datetime(historical_data['created_at'])
historical_data["actual_delivery_time"] = pd.to_datetime(historical_data['actual_delivery_time'])
historical_data["actual_total_delivery_duration"] = (historical_data["actual_delivery_time"] - historical_data["created_at"]).dt.total_seconds()
historical_data.head()
Nansi ikhanda lomphumela; ungabona actual_total_delivery_duration.

Kuhle, manje sesingaqala! Kepha ngaphambi kwalokho, nansi i-Data Inchazelo yolimi yale datha.
Amakholomu ngaphakathi historical_data.csv
Izici Zesikhathi:
- I-Market_ID: idolobha / isifunda lapho i-doordash isebenza khona, isib, i-los angeles, enikezwe kwimininingwane njenge-ID.
- I-DEDED_AT: Isikhathi: I-Timestamp e-UTC lapho i-oda lihanjiswe ngumthengi ku-Doordash. .
- I-Real_Delivery_TiTe: Isikhathi sesikhathi e-UTC lapho i-oda seluhanjiswe kumthengi.
Izici zesitolo:
- Isitolo_id: I-ID emele indawo yokudlela i-oda yathunyelwa kulo.
- Isitolo_Primary_category: isigaba se-cuisine sendawo yokudlela, isib, isiNtaliyane, sase-Asia.
- I-oda_protocol: Isitolo singathola ama-oda kusuka ku-doordash ngezindlela eziningi. Le nkambu imelela i-ID ekhomba iphrothokholi.
Izici ze-oda:
- Ingqikithi_iTems: Inani eliphelele lezinto ngokulandelana.
- I-Subtotal: Inani eliphelele le-oda elihanjisiwe (kumasenti).
- I-NUM_Distinct_items: Inani lezinto ezihlukile ezifakiwe ku-oda.
- Min_ITEM_Price: Intengo yento enezindleko ezimbalwa ku-oda (kumasenti).
- UMax_ITEM_Price: Intengo yento enezindleko eziphakeme kakhulu ku-oda (kumasenti).
Izici Zemakethe:
I-Doordash ibe semakethe, sinolwazi ngesimo semakethe lapho kufakwa i-oda, olungasetshenziswa ukulinganisa isikhathi sokulethwa. Izici ezilandelayo ngamanani ngesikhathi created_at (Isikhathi sokungenisa i-oda):
- I-Total_onshift_Dasher: Inani lama-maxiashi atholakalayo angaphansi kwamamayela ayi-10 esitolo ngesikhathi soku-oda indalo.
- I-Total_Busy_Dashers: I-Subset yalokhu okungenhla
total_onshift_dashersobani njengamanje abasebenza nge-oda. - Ingqikithi_ukude_odwana: Inani lama-oda ngaphakathi kwamamayela ayi-10 ale oda acubungulwa njengamanje.
Izibikezelo ezivela kwamanye amamodeli:
Sinezibikezelo ezivela kwamanye amamodeli ngezigaba ezahlukahlukene zenqubo yokulethwa esingayisebenzisa:
- Isilinganiso_Sorder_Place_Duration: Isikhathi esilinganisiwe sesitolo sokudlela ukuthola i-oda kusuka ku-doordash (ngemizuzwana).
- Kulinganiselwa_store_to_consumer_udrivation: isikhathi sokuhamba esilinganiselwe phakathi kwesitolo nomthengi (ngemizuzwana).
Kuhle, ngakho-ke ake siqale!
Amaphutha ajwayelekile wePython kumaphrojekthi weSayensi yedatha

Kulesi sigaba, sizothola amaphutha ajwayelekile okulima amaphrojekthi kwenye yamaphrojekthi wesayensi yedatha, aqale ngokufunda idatha yedatha futhi adlule engxenyeni ebaluleke kakhulu: ukumodela.
Kufundwa i-Dataset: FileNotFoundErrorIsexwayiso se-Dtype, kanye nokulungisa
I-Case 1: Ifayela alitholakali – I-Classic
Kwisayensi yedatha, i-bug yakho yokuqala ivame ukukubingelela read_csv. Futhi hhayi nge-hello. Masisuse iphutha lelo mzuzu ngqo ndawonye, umugqa ngomugqa. Nayi ikhodi:
import pandas as pd
try:
df = pd.read_csv('Strata Questions/historical_data.csv')
df.head(3)
except FileNotFoundError as e:
import os
print("File not found. Here's where Python is looking:")
print("Working directory:", os.getcwd())
print("Available files:", os.listdir())
raise e
Nakhu okuphumayo.

Awumane nje ukhuphule iphutha – uyaphenya. Lokhu kukhombisa lapho ikhodi icabanga ukuthi yikhona nokuthi ibonani kuyo. Uma ifayela lakho lingekho ohlwini, manje uyazi. Akukho ukuqagela. Amaqiniso nje.
Faka esikhundleni sendlela ephelele, ne-voilĂ !

I-Case 2: I-Dtype yokuchazwa kabi – Ukuqagela okungalungile kukaPython ngokuthula
Ulayisha i-dataset, kepha kukhona okungekho. I-bug ifihla ngaphakathi kwezinhlobo zakho.
# Assuming df is your loaded DataFrame
try:
print("Column Types:n", df.dtypes)
except Exception as e:
print("Error reading dtypes:", e)
Nakhu okuphumayo.

I-Case 3: Usuku Parsing – I-saboteur ethule
Sithole ukuthi kufanele sibala isikhathi sokulethwa kuqala, futhi sakwenza ngale ndlela.
try:
# This code was shown earlier to calculate the delivery duration
df["created_at"] = pd.to_datetime(df['created_at'])
df["actual_delivery_time"] = pd.to_datetime(df['actual_delivery_time'])
df["actual_total_delivery_duration"] = (df["actual_delivery_time"] - df["created_at"]).dt.total_seconds()
print("Successfully calculated delivery duration and checked dtypes.")
print("Relevant dtypes:n", df[['created_at', 'actual_delivery_time', 'actual_total_delivery_duration']].dtypes)
except Exception as e:
print("Error during date processing:", e)
Nakhu okuphumayo.

Kuhle nochwepheshe! Manje sigwema lawo maphutha abomvu, azophakamisa imizwa yethu – ngiyazi ukuthi ukuzibona kungadambisa isisusa sakho.
Ukuphatha idatha elahlekile: KeyErrors, NaNsnezingibe ezinengqondo
Ezinye izimbungulu zingashayeki ikhodi yakho. Bakunikeza nje imiphumela engafanele, buthule, kuze kube yilapho uzibuza ukuthi kungani imodeli yakho idoti.
Lesi sigaba sigabha idatha elahlekile – hhayi nje ukuyihlanza kanjani, kepha ungayiphonsa kahle.
Icala 1: Ukhiye we-KeyErlor – ubucabanga ukuthi ikholomu ikhona
Nayi ikhodi yethu.
try:
print(df['store_rating'])
except KeyError as e:
print("Column not found:", e)
print("Here are the available columns:n", df.columns.tolist())
Nakhu okuphumayo.

Ikhodi ayizange iphule ngenxa yokunengqondo; Kwagqashuka ngenxa yokucabanga. Yilokho ngokunembile lapho kuhlanya khona izimpilo. Njalo bhala amakholomu akho ngaphambi kokufinyelela kubo ngokungaboni.
I-Case 2: I-Nan Count – Amanani Alahlekile Wokuthi Awulindelanga
Ucabanga ukuthi konke kuhlanzekile. Kepha idatha yangempela yomhlaba ihlala ifihla izikhala. Ake sibabheke.
try:
null_counts = df.isnull().sum()
print("Nulls per column:n", null_counts[null_counts > 0])
except Exception as e:
print("Failed to inspect nulls:", e)
Nakhu okuphumayo.

Lokhu kudalula abahlukumezi abathule. Kungenzeka store_primary_category ilahlekile ezinkulungwaneni zemigqa. Mhlawumbe ama-timestamp ahlulekile ukuguqulwa futhi manje NaT.
Ubungeke wazi ngaphandle kokuthi uhlolile. Ukulungisa iphutha – okuqinisekisa konke ukucabanga.
I-Case 3: Izingibe ezinengqondo – idatha engekho empeleni ayisho lutho
Ake sithi uzama ukuhlunga ama-oda lapho okuhlobise khona kukhulu kune-1,000,000, kulindele amakhulu emigqa. Kepha lokhu kukunika zero:
try:
filtered = df[df['subtotal'] > 1000000]
print("Rows with subtotal > 1,000,000:", filtered.shape[0])
except Exception as e:
print("Filtering error:", e)
Leyo akuyona iphutha lekhodi – kuyiphutha elinengqondo. Ulindele ama-oda aphezulu wenani eliphakeme, kepha mhlawumbe akekho okhona ngaphezu kwalombundu. Ukulungisa iphutha ngesheke elihlalu:
print("Subtotal range:", df['subtotal'].min(), "to", df['subtotal'].max())
Nakhu okuphumayo.

Ikesi 4: isna() ≠zero akusho ukuthi kuhlanzekile
Noma kunjalo isna().sum() Ibonisa zero, kungahle kube nedatha engcolile, njenge-whitespace noma 'akukho' 'njengentambo. Qalisa isheke elinolaka ngokwengeziwe:
try:
fake_nulls = df[df['store_primary_category'].isin(['', ' ', 'None', None])]
print("Rows with fake missing categories:", fake_nulls.shape[0])
except Exception as e:
print("Fake missing value check failed:", e)
Lokhu kubamba udoti ofihliwe lokho isnull() uphuthelwe.

Isici sokukhazimula kobunjiniyela: TypeErrorsUsuku lokuphamba, nokuningi
Ubunjiniyela besici bubonakala bumnandi ekuqaleni, kuze kube yilapho ikhola lakho elisha liphule yonke imodeli noma liphonse a TypeError mid-pipeline. Nakhu ukuthi ungakuphikisa leso sigaba njengomuntu oshisiwe ngaphambili.
Icala 1: Ucabanga ukuthi ungahlukanisa, kepha awukwazi
Masakhe isici esisha. Uma kwenzeka iphutha, thina try-except I-block izoyibamba.
try:
df['value_per_item'] = df['subtotal'] / df['total_items']
print("value_per_item created successfully")
except Exception as e:
print("Error occurred:", e)
Nakhu okuphumayo.

Akunamaphutha? Kuhle. Kepha ake sibheke eduze.
print(df[['subtotal', 'total_items', 'value_per_item']].sample(3))
Nakhu okuphumayo.

I-Case 2: Usuku lokuphamba alunalo
Manje, ukuguqula okwakho dtype Kubalulekile, kepha kuthiwani uma ucabanga ukuthi konke kwenziwa kahle, nokho kuqhubeka izinkinga?
# This is the standard way, but it can fail silently on mixed types
df["created_at"] = pd.to_datetime(df["created_at"])
df["actual_delivery_time"] = pd.to_datetime(df["actual_delivery_time"])
Ungacabanga ukuthi kulungile, kepha uma ikholomu yakho inezinhlobo ezixubile, ingahluleka buthule noma iphule ipayipi lakho. Kungakho, esikhundleni sokwenza izinguquko ngokuqondile, kungcono ukusebenzisa umsebenzi onamandla.
from datetime import datetime
def parse_date_debug(df, col):
try:
parsed = pd.to_datetime(df[col])
print(f"[SUCCESS] '{col}' parsed successfully.")
return parsed
except Exception as e:
print(f"[ERROR] Failed to parse '{col}':", e)
# Find non-date-like values to debug
non_datetimes = df[pd.to_datetime(df[col], errors="coerce").isna()][col].unique()
print("Sample values causing issue:", non_datetimes[:5])
raise
df["created_at"] = parse_date_debug(df, "created_at")
df["actual_delivery_time"] = parse_date_debug(df, "actual_delivery_time")
Nakhu okuphumayo.

Lokhu kukusiza ukulandelela imigqa enephutha lapho ukuphahlazeka kwe-datetterime kuhlanganisa.
I-Case 3: Ukuhlukaniswa Ngenave Okungadukisa
Lokhu ngeke kuphonse iphutha kwi-dataframe yethu njengoba amakholomu asevele enesinombolo. Kepha nansi impikiswano: Amanye ama-datasets ahlehlela ezinhlotsheni zento, noma ngabe zibukeka njengezinombolo. Okuholela ku:
- Izilinganiso ezidukisayo
- Ukuziphatha Okungalungile Kwemodeli
- Azikho izexwayiso
df["busy_dashers_ratio"] = df["total_busy_dashers"] / df["total_onshift_dashers"]
Ake siqinisekise izinhlobo ngaphambi kwekhompyutha, noma ngabe ukusebenza ngeke kuphonse iphutha.
import numpy as np
def create_ratio_debug(df, num_col, denom_col, new_col):
num_type = df[num_col].dtype
denom_type = df[denom_col].dtype
if not np.issubdtype(num_type, np.number) or not np.issubdtype(denom_type, np.number):
print(f"[TYPE WARNING] '{num_col}' or '{denom_col}' is not numeric.")
print(f"{num_col}: {num_type}, {denom_col}: {denom_type}")
df[new_col] = np.nan
return df
if (df[denom_col] == 0).any():
print(f"[DIVISION WARNING] '{denom_col}' contains zeros.")
df[new_col] = df[num_col] / df[denom_col]
return df
df = create_ratio_debug(df, "total_busy_dashers", "total_onshift_dashers", "busy_dashers_ratio")
Nakhu okuphumayo.

Lokhu kunikeza ukubonakala ekuhlukanisweni kwezingqinamba ezingaba khona-ze-zero futhi kuvimbela izimbungulu ezithule.
Amaphutha wokumodela: ukwakheka okumangazayo nokudideka kokuhlola
I-Case 1: Amanani weNan ku-EPS COLED
Ake sithi sifuna ukwakha imodeli yokuhlehlisa eliqondile. LinearRegression() ayisekeli amanani we-nan ngemvelo. Uma noma yimuphi umugqa we-X enenani elilahlekile, imodeli yenqaba ukuqeqesha.
Nayi ikhodi, eyakha ngamabomu ama-mismatch abumba iphutha:
from sklearn.linear_model import LinearRegression
X_train = df[["estimated_order_place_duration", "estimated_store_to_consumer_driving_duration"]].iloc[:-10]
y_train = df["actual_total_delivery_duration"].iloc[:-5]
model = LinearRegression()
model.fit(X_train, y_train)
Nakhu okuphumayo.

Masisuse inkinga kule magazini. Okokuqala, sibheka amaNans.
print(X_train.isna().sum())
Nakhu okuphumayo.

Kuhle, ake sibheke okunye okuguquguqukayo.
print(y_train.isna().sum())
Nakhu okuphumayo.

Ama-mismatch nama-nan amanani kufanele axazululwe. Nayi ikhodi ukuyilungisa.
from sklearn.linear_model import LinearRegression
# Re-align X and y to have the same length
X = df[["estimated_order_place_duration", "estimated_store_to_consumer_driving_duration"]]
y = df["actual_total_delivery_duration"]
# Step 1: Drop rows with NaN in features (X)
valid_X = X.dropna()
# Step 2: Align y to match the remaining indices of X
y_aligned = y.loc[valid_X.index]
# Step 3: Find indices where y is not NaN
valid_idx = y_aligned.dropna().index
# Step 4: Create final clean datasets
X_clean = valid_X.loc[valid_idx]
y_clean = y_aligned.loc[valid_idx]
model = LinearRegression()
model.fit(X_clean, y_clean)
print("âś… Model trained successfully!")
Ne-voilĂ ! Nakhu okuphumayo.

I-Case 2: Amakholomu Ento (Izinsuku) zishayisa imodeli
Ake sithi uzama ukuqeqesha imodeli usebenzisa i-timestamp efana actual_delivery_time.
Kepha – oh cha – kuseyinto noma uhlobo lwe-datettime, futhi uyixubekile ngephutha ngamakholomu ezinombolo. Ukuhlehlisa okuqondile akunjalo akunjalo.
from sklearn.linear_model import LinearRegression
X = df[["actual_delivery_time", "estimated_order_place_duration"]]
y = df["actual_total_delivery_duration"]
model = LinearRegression()
model.fit(X, y)
Nayi ikhodi yephutha:

Uhlanganisa izinhlobo ezimbili ezingahambelani ezingekho emthethweni ku-X Matrix:
- Ikholamu eyodwa (
actual_delivery_time)datetime64. - Elinye (
estimated_order_place_duration)int64.
I-Skikit-Funda ilindele ukuthi zonke izici zibe yinombolo efanayo ye-Dtype. Ayikwazi ukuphatha izinhlobo ezihlanganisiwe ezifana ne-datetime ne-int. Ake sixazulule ngokuguqula ikholomu ye-DateTet ibe yi-Izinombolo (i-Unix Timestamp).
# Ensure datetime columns are parsed correctly, coercing errors to NaT
df["actual_delivery_time"] = pd.to_datetime(df["actual_delivery_time"], errors="coerce")
df["created_at"] = pd.to_datetime(df["created_at"], errors="coerce")
# Recalculate duration in case of new NaNs
df["actual_total_delivery_duration"] = (df["actual_delivery_time"] - df["created_at"]).dt.total_seconds()
# Convert datetime to a numeric feature (Unix timestamp in seconds)
df["delivery_time_timestamp"] = df["actual_delivery_time"].astype("int64") // 10**9
Kuhle. Manje njengoba ama-Dtypes anezinombolo, ake sisebenzise imodeli ye-ML.
from sklearn.linear_model import LinearRegression
# Use the new numeric timestamp feature
X = df[["delivery_time_timestamp", "estimated_order_place_duration"]]
y = df["actual_total_delivery_duration"]
# Drop any remaining NaNs from our feature set and target
X_clean = X.dropna()
y_clean = y.loc[X_clean.index].dropna()
X_clean = X_clean.loc[y_clean.index]
model = LinearRegression()
model.fit(X_clean, y_clean)
print("âś… Model trained successfully!")
Nakhu okuphumayo.

Umsebenzi omuhle!
Imicabango yokugcina: I-Debug Smarter, ayinzima
Ukuphahlazeka kwamamodeli akuhlali kuvela kusuka kuma-bugs ayinkimbinkimbi – kwesinye isikhathi, kungumuntu odukayo nje noma ikholomu yosuku engachazwanga ingena kwipayipi lakho ledatha.
Esikhundleni sokulwa nokulandela umkhondo wesitaki se-cryptic noma ukuphonsa try-except Amabhulokhi afana nama-DARDS ebumnyameni, amba ku-dataframe yakho ekuqaleni. Peek at .info()hlola .isna().sum()futhi musa ukunamahloni .dtypes. Lezi zinyathelo ezilula zembula imisindo efihliwe ngaphambi kokuthi ushaye fit().
Ngikukhombisile ukuthi ngisho nohlobo olulodwa olunganakwa lwento noma inani elilahlekile elingashodayo lingahlanza imodeli. Kepha ngeso elibukhali, ukuhlanza i-prep, kanye nesizinda se-activation, uzosuka ekunciphiseni ukulungiswa kabusha ukuze wakhe ngobuhlakani.
Nate rosidi Ungusosayensi wedatha kanye necebo lomkhiqizo. Ungumuntu futhi ongumsunguli wokufundisa ofundisayo, futhi umsunguli weStratascratch, isiteji esisiza ososayensi bedatha balungiselele izingxoxo zabo ngemibuzo yangempela yengxoxo evela ezinkampanini eziphezulu. UNate ubhala ezitayeleni zakamuva emakethe yezemisebenzi, unikeza izeluleko ezixoxiswayo, wabelana ngamaphrojekthi weSayensi yedatha, futhi amboze yonke into SQL.



