0%

Pandas基础

Pandas

Pandas 一个强大的分析结构化数据的工具集,基础是Numpy(提供高性能的矩阵运算)。Pandas 可以从各种文件格式比如 CSV、JSON、SQL、Microsoft Excel 导入数据。Pandas 可以对各种数据进行运算操作,比如归并、再成形、选择,还有数据清洗和数据加工特征。Pandas 广泛应用在学术、金融、统计学等各个数据分析领域。

Pandas 的主要数据结构是 Series (一维数据)与 DataFrame(二维数据),这两种数据结构足以处理金融、统计、社会科学、工程等领域里的大多数典型用例。

Series

创建

1
2
3
import pandas as pd
fruits={"origin":2,"bannaa":8}
print(pd.Series(fruits))
1
2
3
4
5
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series)

数据的引用

1
2
3
4
5
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series[0:2])
1
2
3
4
5
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series[["dog","pig"]])

数据与索引的读取

1
2
3
4
5
6
7
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
series_values =series.values
series_index =series.index
print(series_values,series_index)

元素的添加

在向Series中添加元素时,要添加的元素必须是Series类型的数据

1
2
3
4
5
6
7
8
9
10
11
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
#方式一
series=series.append(pd.Series([12],index=["goose"]))
series.append(pd.Series ({"orange":45}))
#方法二
grap=pd.Series([1],index=["grap"])
series.append(series)
print(series)

元素的删除

通过设置series数据的索引来实现元素的删除

1
2
3
4
5
6
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
series=series.drop("cat")
print(series)

过滤

1
2
3
4
5
6
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
conditions=[True,False,True,False,False]
print(series[conditions])
1
2
3
4
5
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series[series%2==0])

排序

1
2
3
4
5
6
import pandas as pd
index =["apple","dog","cat","pig","orange"]
data =[8,2,1,3,2]
series=pd.Series(data,index=index)
print(series.sort_values())
print(series.sort_index())

DataFrame

DataFrame就像将多个Series数据捆绑在一起的二维数据结构

创建

1
2
3
4
5
6
import pandas as pd
data={"fruits":["apple","orange","banana","peach"],
"num":[1,34,23,54],
"year":[2000,2023,2015,2045]}
df=pd.DataFrame(data)
print(df)

设置索引和列

  • DateFrame类型的变量df的索引可以通过将长度与其行数相同的的列表代入df.index来实现

  • df的列可以通过将与其列相同的列代入df.columns中来实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pandas as pd

index=["apple","orange","banna","strawberry","kiwifruit"]
data1=[10,5,8,12,3]
data2=[30,25,12,10,8]
series1=pd.Series(data1,index=index)
series2=pd.Series(data2,index=index)
df=pd.DataFrame([series1,series2])
print(df)
df.index=[1,2]
print("")
print(df)
print()
df.columns=[1,2,3,4,5]
print(df)

添加行

添加新的数据到DataFrame中。对DataFrame类型变量df调用df.append(“series类型数据“,ignore_index=True)

添加列

对DateFrame类型调用df[“新列”]

数据的引用