Portfolios of 5 Errors Keep Data Scientists That They Can Be Rented


Photo by writer | Kanele
A strong portfolio is usually the difference between making and break it. But what exactly makes portfolio strong? Best Child Projects? Cheap design? Impressive data observation? Yes and no. While these required portfolio items to make it good, they are obviously obvious that everyone knows you can't do without them.
However, many scientists make mistakes when they try to pass that. As a result, they interact with designated portfolios that do not have everything but it is actually not good.
Obvious Frame
Here is a framework that will help you avoid common mistakes when forming a great portfolio.
Obvious Decisions
Now let's talk about portfolio errors and how you can avoid using the framework.
// Error # 1: Career building projects
Most portfolios give the idea that just existing projects to limit the box: Titanic Survival, Iris Dataset, Mnist digits. You know – normal things. It is not only that you will drown thousands of similar portfolios, showing a lack of emergence and interesting in what you do. Autopilot projects.
Fix: Start on your favorite domains, e.g. nature, finance, music. When the title loves you, you will return deeply without trying. If you are a sports lover, you can analyze the efficiency of NBA or select these cool ways of practice project. The follower of the music may be carefully to review the playlists.
// Error # 2: Using whatever data falls in your lap
Visitors tend to hold the first clean CSV to find. The problem is that real science does not work that way.
Fix: You must show that you know how to get real data, reach it, and renew it in modeling stages. In your projects, use the APIs (eg. Twitter / X API), open government datasets (eg. Data.gov), web sources of Web (eg. Datasets are amazing in GitHub). Use many data sources as much as possible, to view data, add it to one data, and prepare model to model.
// Error # 3: Managing projects such as Kaggle competitions
Grandfather Contests focus on the development of one metric. This is good for practice but does not decide on the real world. Accuracy is not intention. You will have to do trademarks between your model characteristics and real business or social impact.
Fix: or use normal dassasets from Kaggle, always give a different angle and plan a problem so it has a business or social sum. For example, don't just just do the best things vs real news. Indicate which words, phrases, or lying articles. Another example: Don't just predict the breaking.
Indicate that 10% reduction in Churn can last $ 2M for annual income.
// Error # 4: Indicates models only, not the flow of work
Many projects are learned as a sequence of Jsyterer booklets: Import libraries, then renovations, then relevant models – Here is accurate models – Here is accurate models – Here is accuracy. Is not perfect and boring. The missing is how you treat various project categories and why you make certain decisions.
Fix: Make it to have last projects. Show all sections, from data collection to carrying out all. Describe why important decisions, eg, why you have chosen one model over another, or why the developer is a specific feature. Use similar tools Support, Kickor Power bi Dashboard for others will use. All of this will make your projects look like solving problems (eg. Arch Desai portfolio), not Code Walkthrough (eg This).
// Error # 5: End of model, not the action
Data scientists often end in technical level, eg, illustrates the accurate points. Okay, but what do you do about it? You have to remember what things are useful use of the model. The technical feature of the model is just one part of that, other business or social impact.
Fix: Complete the project with recommendation to what you did. For example, “This model suggests that he examines the test in restaurants that work for a huge dangerous cuisines.”
Obvious The project example: Medical prediction of the city power to reduce costs
In this section, I will create a project to postpone the Walkthrough to show how the framework can be used in operation.
BACKGROUND: The background I have chosen for power and sustainability. Staying in a large city has made me realize how the cities are suspected of electricity high during the high hour. More accurate predictions can help the apps estimate grid, reduce costs, cutting out.
Data: The main source can be US Energy Information Administration (EIA). In addition, I was able to use NOAA Weather API (eg.
The problem is: instead of incorporating the problem as “predicting electricity of electricity over time.” Thus, I change the problem of technology for the billing and the cost savings.
Building the end-End: The project will include these sections.
- Data cleaning: Manage lost hours, edit timestamp, change climate variables.
- Engineer feature:
- LAG Features: Duration in the Horek / Days
- Heaven Features: Heat Level, moisture
- Calendar features: The day of the week, holiday flag, big events
- Modeling:
- Delivery: For example, I was able to create a 24-hour dashboard for real-year forecast and imitate “the” conditions, the need for industrial responsibilities.
Action: We will not stop “the prediction of the lower rmse”. Instead, let us give a business recommendation and society, eg, “if the City has encouraged large businesses to leave 5% of the highest (predicted $ 3.5m annually at the Grid Case.
Obvious Bonus: Resources
Like a bonus, here are some suggestions on how platforms you can use for practice and where to get data.
// Platform platforms
// Open data sources
// The original-time Apis
Obvious Store
You probably realized that there were no mistakes mentioned by technology. That is not dangerous; The biggest mistake is to forget that portfolio shows how you solve problems.
Focus on two factors – display and solving problems – and your portfolio will eventually start to look like the evidence that you can do the work.
Nate Rosid He is a data scientist and product plan. He is a person who is an educated educator, and the Founder of Stratascratch, a stage that helps data scientists prepare their conversations with the highest discussion of the chat. Nate writes the latest stylies in the work market, offers chat advice, sharing data science projects, and covered everything SQL.