Machine Learning

A Complete Beginner's Guide to Pandas Dataframes

Background, learning Python for data analysis has been a bit of a challenge. Syntax is simple – true. However, the language and terminology are completely different. In SQL, you will have to interact with databases, tables and columns. In Python, however, for data analysis, your bread and butter will be data structures.

Data structures in Python are similar to data storage objects. Python includes several built-in data structures, such as lists, propositions, sets, and dictionaries. All this is used to store and manipulate data. Some are arousing (lists) and some are not (tuples). To learn more about Python data structures, I highly recommend reading the book “Python for data analysis” by Wes McKinney. I just started reading it, and I think it's Stellar.

In this article, I will walk you through the basic dataframes in pandas and how to create them step by step.

Understand the basics of Array

There is a library in Python called Numpy; You may have heard of it. It is mainly used for mathematical and numerical calculations. One of the things it offers is the ability to create arrays. You might be wondering. What the heck is organized?

An array is like an array, except it only stores values ​​of the same data type. Arrays, however, can store values ​​of various data types (Int, text, Boolean, etc). Here is an example list

my_list = [1, “hello”, 3.14, True]

Lists repeat. In other words, you can add and remove items.

Return to the array. In Numper, arrays can be nested – these are called n-dimensional arrays. For example, let's import the nungp platform library.

import numpy as np

To create a basic array in nunpy, we use np.array() work. In this work, our list is saved.

arr = np.array([1, 2, 3, 4, 5])
arr

Here is the result:

array([1, 2, 3, 4, 5])

Looking at the type of data.

type(arr)

We will find the data type.

numpy.ndarray

The cool thing about arrays is that you can do math on them. For example

arr*2

Result:

array([ 2, 4, 6, 8, 10])

Very nice, right?

Now that you know the basics of arleas in nunpy. Let's dig deeper into n-junsional encounters.

The array you see above is a 1-Dimensional (1D) array. Also known as Vector Arr, 1D arrays contain sequences of values. Like that, [1,2,3,4,5]

2-Dimensional Array (Matrix) can store 1D arroads as values. Similar to table rows in SQL, each 1D array corresponds to one row of data. The output is similar to a grid of values. For example:

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr

Output:

[[1 2 3]
[4 5 6]]

3-Dimensional Array (Tensirs) can store 2D arrays (matrices). For example,

import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
arr

Output:

[[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]

An Array can have an unlimited number of dimensions, depending on the amount of data you want to store.

Creating a dataframe from an array

Now that you've got the gist about arrays. Let's create a dataFRAME from One.

First, we will have to import the pandas and nunpy libraries

import pandas as pd
import numpy as np

Next, create our list:

data = np.array([[1, 4], [2, 5], [3, 6]])

Here, I have created a 2D array. Pandas DataFRAME can only store 1D and 2D arrays. If you try to pass a 3D list, you will get an error.

Now that we have our problem. Let's pass it to our DataFrame. To create a DataFRAME, use pd.DataFrame() work.

# creating the DataFrame
df = pd.DataFrame(data)

# showing the DataFrame
df

Output

0 1
0 1 4
1 2 5
2 3 6

He looks great so far. But it needs a little formatting:

# creating a dataframe
df = pd.DataFrame(data, index=['row1', 'row2', 'row3'],
columns=['col1', 'col2'])

# showing the dataframe
df

Released

col1 col2
row1 1 4
row2 2 5
row3 3 6

Now that's better. All I did was rename the rows using the index attribute and the columns using the column attribute.

And there you go, you have your dataframe. It's that simple. Let's explore some useful ways to create a database.

Creating a DataFRAME from a dictionary

One of the built-in data structures offered by Python is dictionaries. Basically, dictionaries are used to store key-values ​​in pairs, where all keys must be unique and constant. It is represented by curly brackets {}. Here is an example dictionary:

dict = {"name": "John", "age": 30}

Here, the keys are names and ages, and the values ​​are Alice and 30. Easy too. Now, let's create a DataFrame from a dictionary.

names = ["John", "David", "Jane", "Mary"]
age = [30, 27, 35, 23]

First, I created a list to store multiple names and Ages:

dict_names = {'Names': names, 'Age': age}

Next, I stored all the values ​​in a dictionary and created keys for names and ages.

# Creating the dataframe
df_names = pd.DataFrame(dict_names)
df_names

Above, we have our DataFrame to store the dictionary we created. Here is the output below:

Names Age
0 John 30
1 David 27
2 Jane 35
3 Mary 23

And when we go, we have a dataFrame created in the dictionary.

Creating a DataFRAME from a CSV file

This is probably the method you will be using the most. It is a common practice to read CSV files in pandas when trying to do data analysis. It's similar to how you open spreadsheets in Excel or import data into SQL. In Python, you read CSVS by using read_csv() work. Here is an example:

# reading the csv file
df_exams = pd.read_csv('StudentsPerformance.csv')

In some cases, you will have to copy the file path and paste as:

pd.read_csv(“C:datasuppliers lists — Sheet1.csv”)

Output:

And where you are going!

Wrapping up

Creating DataFrames in Pandas might seem complicated, but it really isn't. In most cases, you'll probably be reading CSV files anyway. So don't swear by it. I hope you found this article helpful. I'd love to hear your thoughts in the comments. Thanks for reading!

Want to connect? Feel free to say hello on the platforms

LinkedIn

Kind of stubborn

YouTube

The medium

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button