5 standard chatgpt interviews can handle data scientists


Photo by writer | Kanele
According to a Data Science report, Data Scientists spend about 60% of their time in cleaning and editing data. These are ordinary jobs, which consumes time in Chatgpt elections to take over.
In this article, we will examine five jobs used in a conversation when using the right corruption, including cleaning and planning. We will use the real data project from Gett, the London Black Taxi app such as Uber, used in their renting process, to show how it works in operation.
CARTURE OF THE CART: Analyzes of RICE EVERTER FROM THE GETT
For this data project, GETT asks you to analyze Rider orders failed by checking the same key metrics to understand why some customers have not successfully received the vehicle.
Here is the description of the data.

Now, let's check it by uploading data to chatGPT.
On the next five stairs, we will go with the regular chatgpt activities that can handle the data project. The steps are shown below.

Step 1: To view data and analysis
In the data test, we use the same functions all the time, such as head, details, or explain.
When we ask Chatgpt, we will include key functions at checkout. We will attach to the description of the project and paste the dataset.

We will use the lowest. Just return the text within square brackets with a project meaning. You can find the project description here:
Here is the data project description: [paste here ]
Perform basic EDA, show head, info, and summary stats, missing values, and correlation heatmap.
Here's the outgoing.

As you can see, ChatGPT summarizes this data by highlight key columns, lost prices, and creating a heating heatmap to check the relationship.
Step 2: Cleaning data
Both of these contains lost values.

Let's write Prompt to work on this.
Clean this dataset: identify and handle missing values appropriately (e.g., drop or impute based on context). Provide a summary of the cleaning steps.
Here is a summary of what Chatgt:

Chatgt changed the daily column, threw invalid orders, and installed missing prices in M_Orer_ETa.
Step 3: Produce visualization
To make the best of your data, it is important to see the right things. Instead of producing random sites, we can imitate Chatgt by providing a source link, called retrieval -Gneed generation generation.
We will use this article. Here's the fast:
Before generating visualizations, read this article on choosing the right plots for different data types and distributions: [LINK]. hen, show most suitable visualizations for this dataset and explain why each was selected and produce the plots in this chat by running code on the dataset.
Here's the outgoing.

We have six different diverts that have produced with chatGPT.

You will see why the graph related to the selected, graph, and the meaning of the graph.
Step 4: Make your data ready for machine readings
Now that we are managing lost prices and checked details, the next step is to fix the machine reading. This includes measures such as including variable information and balancing pricing features.
Here is our propress.
Prepare this machine study data: Measurement of the middle class, measure pricing features, and restore clean data data ready for model. Briefly describe each step.
Here's the outgoing.

Your features have now been built and installed, so your data is ready to install a machine learning model.
Step 5: Using a Machine Learning Model
Let's move on to a machine reading model. We will use this instance that is the following instant and using the basic machine learning model.
Use this data to predict [target variable]. Claim [model type] and reporting the machine learning methods such as [accuracy, precision, recall, F1-score]. Use only 5 relevant factors and explain your model steps.
Let's review this fight according to our project.
Use this data to predict an order_status_key. Apply Multicross separating (eg random forest), and report test metrics such as accuracy, accuracy, remembering, and F1-score. Use only 5 relevant factors and explain your model steps.
Now, paste this in the ongoing conversation and update the result.
Here's the outgoing.

As you can see, the model is done well, probably very well?
Bonus: Gemini Cli
Gamin has introduced an open source agent to contact from your terminal. You can install it by using this code. (60 model applications per minute and 1,000 applications per month without charge.)
Apart from Chatgpt, you can also use Gemini CLI to manage data data functions, such as cleaning, testing, and building a dashboard to change these functions.
Gemini Cli provides a direct display of the command-line and is available for free. Let's start by installing it using the code below.
sudo npm install -g @google/gemini-cli
After using the code above, open your terminal and paste the following code to start building with:
Once you run the instructions above, you will see Gemini Cli as shown on the screen below.

Gemini Cli allows you to use the code, ask questions, or build specific apps from your terminal. In this case, we will use Gemini Cili to create a streamlit app that uses everything we have done so far, Eda, vomiting, visualization.
To create a streamlit app, we will immediately use the combination of all the steps. Shown below.
His default streakitlit app is EDA, the cleaning of data, creates a default auto auto, preparation for machine learning data, and use machine for machine learning after selecting a tag.
Step 1 – Basis Fard:
• Display .Head () ,.info (), and .describe ()
• Indicate missing prices per column
• Indicate the heat of the Correlation of Numer Numbers
Step 2 – Data Cleaning:
• Find columns with missing amounts
• Treat the wrong data (drop or unclear)
• Show a summary of actions taken
Step 3 – Automatic recognition
• Before planning, use these viewing principles:
• Use histograms of numeric distribution
• Use bar-scale distribution sites
• Use boxplots or violin specimens to compare categories
• Use sites spreading relationships
• Use connection hairmaps to get benefits
• Use line of line series (if applicable)
• Produce the most relevant sites of this data
• Explain why each farm is selected
Step 4 – Preparing for a machine reading:
• A variable combination
• Rate the features of price
• Return clean data ready for model
Step 5 – Enter a machine study model:
• Provide the target variables to the user.
• Use many machine reading models.
• Report test matters.
Each step should display on a different tab. Run the streamlit app after her.
It will impart a permit when creating a directory or code code in your own adruit.

After a few steps of approval as we are, the strallit app will be fine, as shown below.

Now, let's examine it.


The last thoughts
In this article, we started using Chatgpt to manage regular activities, such as cleaning data, testing, and data view. Next, we continued one step by moving us to prepare our own machine learning dataset and machine reading models.
Finally, we used Gemini Chli to create streamlit dashboard that allows all these steps just clicking.
To show all this, use the data project from Gitt. Although AI is not completely disloyal to all work, you can put it to deal with regular activities, saves you most of the time.
Nate Rosid He is a data scientist and product plan. He is a person who is an educated educator, and the Founder of Stratascratch, a stage that helps data scientists prepare their conversations with the highest discussion of the chat. Nate writes the latest stylies in the work market, offers chat advice, sharing data science projects, and covered everything SQL.



