How can you build the movement of data operational activities by working on a machine reading, interpretation, and help of Gemin Ai?

In this study, we travel by the Data END END EDCience Science travel later when we combine traditional machine reading in the Gemini. We start with preparation and symbolizes diabetes data, then we enter the test, the value of the feature, and partly dependent. On the way, we bring Gemini as our AI information scientist explaining the results, answer the assessment questions, and highlight the risk. By doing this, we create a predicate model while and improve our understanding and making decisions about the communication of the environment. Look Full codes here.
!pip -qU google-generativeai scikit-learn matplotlib pandas numpy
from getpass import getpass
import os, json, numpy as np, pandas as pd, matplotlib.pyplot as plt
if not os.environ.get("GOOGLE_API_KEY"):
os.environ["GOOGLE_API_KEY"] = getpass("🔑 Enter your Gemini API key (hidden): ")
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
LLM = genai.GenerativeModel("gemini-1.5-flash")
def ask_llm(prompt, sys=None):
p = prompt if sys is None else f"System:n{sys}nnUser:n{prompt}"
r = LLM.generate_content(p)
return (getattr(r, "text", "") or "").strip()
from sklearn.datasets import load_diabetes
raw = load_diabetes(as_frame=True)
df = raw.frame.rename(columns={"target":"disease_progression"})
print("Shape:", df.shape); display(df.head())
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, QuantileTransformer
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.pipeline import Pipeline
X = df.drop(columns=["disease_progression"]); y = df["disease_progression"]
num_cols = X.columns.tolist()
pre = ColumnTransformer(
[("scale", StandardScaler(), num_cols),
("rank", QuantileTransformer(n_quantiles=min(200, len(X)), output_distribution="normal"), num_cols)],
remainder="drop", verbose_feature_names_out=False)
model = HistGradientBoostingRegressor(max_depth=3, learning_rate=0.07,
l2_regularization=0.0, max_iter=500,
early_stopping=True, validation_fraction=0.15)
pipe = Pipeline([("prep", pre), ("hgbt", model)])
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.20, random_state=42)
cv = KFold(n_splits=5, shuffle=True, random_state=42)
cv_mse = -cross_val_score(pipe, Xtr, ytr, scoring="neg_mean_squared_error", cv=cv).mean()
cv_rmse = float(cv_mse ** 0.5)
pipe.fit(Xtr, ytr)
We upload diabetes dataset, create features, and create a strong pipe through rating, price transformation, and increase the increase. We distinguish data, make verification measurement to estimate RMSE, and equal to the final model to see how hard it is. Look Full codes here.
pred_tr = pipe.predict(Xtr); pred_te = pipe.predict(Xte)
rmse_tr = mean_squared_error(ytr, pred_tr) ** 0.5
rmse_te = mean_squared_error(yte, pred_te) ** 0.5
mae_te = mean_absolute_error(yte, pred_te)
r2_te = r2_score(yte, pred_te)
print(f"CV RMSE={cv_rmse:.2f} | Train RMSE={rmse_tr:.2f} | Test RMSE={rmse_te:.2f} | Test MAE={mae_te:.2f} | R²={r2_te:.3f}")
plt.figure(figsize=(5,4))
plt.scatter(pred_te, yte - pred_te, s=12)
plt.axhline(0, lw=1); plt.xlabel("Predicted"); plt.ylabel("Residual"); plt.title("Residuals (Test)")
plt.show()
from sklearn.inspection import permutation_importance
imp = permutation_importance(pipe, Xte, yte, scoring="neg_mean_squared_error", n_repeats=10, random_state=0)
imp_df = pd.DataFrame({"feature": X.columns, "importance": imp.importances_mean}).sort_values("importance", ascending=False)
display(imp_df.head(10))
plt.figure(figsize=(6,4))
top10 = imp_df.head(10).iloc[::-1]
plt.barh(top10["feature"], top10["importance"])
plt.title("Permutation Importance (Top 10)"); plt.xlabel("Δ(MSE)"); plt.tight_layout(); plt.show()
We examine our model with computing train, test, and verification metrics, and logically visualize residences to view errors. We have identified the importance of allowing to identify which factors are very driving, and to display senior participants using a clear bar frame. Look Full codes here.
def compute_pdp(pipe, Xref: pd.DataFrame, feat: str, grid=40):
xs = np.linspace(np.percentile(Xref[feat], 5), np.percentile(Xref[feat], 95), grid)
Xtmp = Xref.copy()
ys = []
for v in xs:
Xtmp[feat] = v
ys.append(pipe.predict(Xtmp).mean())
return xs, np.array(ys)
top_feats = imp_df["feature"].head(3).tolist()
plt.figure(figsize=(6,4))
for f in top_feats:
xs, ys = compute_pdp(pipe, Xte.copy(), f, grid=40)
plt.plot(xs, ys, label=f)
plt.legend(); plt.xlabel("Feature value"); plt.ylabel("Predicted target"); plt.title("Manual PDP (Top 3)")
plt.tight_layout(); plt.show()
report_obj = {
"dataset": {"rows": int(df.shape[0]), "cols": int(df.shape[1]-1), "target": "disease_progression"},
"metrics": {"cv_rmse": float(cv_rmse), "train_rmse": float(rmse_tr),
"test_rmse": float(rmse_te), "test_mae": float(mae_te), "r2": float(r2_te)},
"top_importances": imp_df.head(10).to_dict(orient="records")
}
print(json.dumps(report_obj, indent=2))
sys_msg = ("You are a senior data scientist. Return: (1) ≤120-word executive summary, "
"(2) key risks/assumptions bullets, (3) 5 prioritized next experiments w/ rationale, "
"(4) quick-win feature engineering ideas as Python pseudocode.")
summary = ask_llm(f"Dataset + metrics + importances:n{json.dumps(report_obj)}", sys=sys_msg)
print("n📊 Gemini Executive Briefn" + "-"*80 + f"n{summary}n")
It includes a part of the part of three top features and visualize how each one changes to predict. We included Campact Jon Stadaset report, mathematical and math report, and ask Gemini to produce accidents, following tests, and Quick-Win Engineering ideas. Look Full codes here.
SAFE_GLOBALS = {"pd": pd, "np": np}
def run_generated_pandas(code: str, df_local: pd.DataFrame):
banned = ["__", "import", "open(", "exec(", "eval(", "os.", "sys.", "pd.read", "to_csv", "to_pickle", "to_sql"]
if any(b in code for b in banned): raise ValueError("Unsafe code rejected.")
loc = {"df": df_local.copy()}
exec(code, SAFE_GLOBALS, loc)
return {k:v for k,v in loc.items() if k not in ("df",)}
def eda_qa(question: str):
prompt = f"""You are a Python+Pandas analyst. DataFrame `df` columns:
{list(df.columns)}. Write a SHORT pandas snippet (no comments/prints) that computes the answer to:
"{question}". Use only pd/np/df; assign the final result to a variable named `answer`."""
code = ask_llm(prompt, sys="Return only code. No prose.")
try:
out = run_generated_pandas(code, df)
return code, out.get("answer", None)
except Exception as e:
return code, f"[Execution error: {e}]"
questions = [
"What is the Pearson correlation between BMI and disease_progression?",
"Show mean target by tertiles of BMI (low/med/high).",
"Which single feature correlates most with the target (absolute value)?"
]
for q in questions:
code, ans = eda_qa(q)
print("nQ:", q, "nCode:n", code, "nAnswer:n", ans)
We create a safe sandbox box to issue a Gemini code code with the assessment data analysis. Then we ask the environment about the meeting and relationship features, let the germin write pandas snippets, and run automatically to get straightforward answers from the Database. Look Full codes here.
crossitique = ask_llm(
f"""Metrics: {report_obj['metrics']}
Top importances: {report_obj['top_importances']}
Identify risks around leakage, overfitting, calibration, OOD robustness, and fairness (even proxy-only).
Propose quick checks (concise Python sketches)."""
)
print("n🧪 Gemini Risk & Robustness Reviewn" + "-"*80 + f"n{critique}n")
def what_if(pipe, Xref: pd.DataFrame, feat: str, delta: float = 0.05):
x0 = Xref.median(numeric_only=True).to_dict()
x1, x2 = x0.copy(), x0.copy()
if feat not in x1: return np.nan
x2[feat] = x1[feat] + delta
X1 = pd.DataFrame([x1], columns=X.columns)
X2 = pd.DataFrame([x2], columns=X.columns)
return float(pipe.predict(X2)[0] - pipe.predict(X1)[0])
for f in top_feats:
print(f"Estimated Δtarget if {f} increases by +0.05 ≈ {what_if(pipe, Xte, f, 0.05):.2f}")
print("n✅ Done: Train → Explain → Query with Gemini → Review risks → What-if analysis. "
"Swap the dataset or tweak model params to extend this notebook.")
We ask the Gemini to review our model about leak, extreme, and impartial, and get the quick Python checks like suggestions. We are running simple “What-IF” inspects how small changes in the highest scenes affect the top prediction, it helps us to interpret with clear conduct.
In conclusion, we see how we can combine the machine study equipment in Gemini to make data science worker and understand. We train, examine, and translate model, and ask the Gemini that the Foundations is found, suggesting improvement, and the dangers of criticism. For this journey, we develop a function that enables us to fulfill the performance of pre-guess and study, and benefit from the AI participation in our data analysis process.
Look Full codes here. Feel free to look our GITHUB page for tutorials, codes and letters of writing. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.
Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.
🔥[Recommended Read] NVIDIA AI Open-Spaces Vipe (Video Video Engine): A Powerful and Powerful Tool to Enter the 3D Reference for 3D for Spatial Ai



