目的:
数据导入:excel, csv文件
数据导出
基本统计
缺省数据处理
数据导入
数据是分析基础,实际工作中,数据来自于企业内部数据,网络数据,开源数据集;
方法 |
说明 |
pd.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, ...) |
读取CSV文件 |
pd.read_excel(io, sheet_name=0, names=None, index_col=None, usecols=None, ...) |
读取Excel文件 |
pd.read_json(path_or_buf=None, orient=None, typ='frame', dtype=None, ...) |
读取JSON文件 |
读取excel文件
1 2 3 4 5 6
| import pandas as pd import numpy as np
fpath = r'data\test.xlsx' pdata = pd.read_excel(fpath) pdata
|
读取csv文件
1 2 3 4
| fpath = r'data\GDP.csv' pdata = pd.read_csv(fpath, encoding='gbk') pdata
|
导入csv指定列
1 2 3 4
| fpath = r'data\GDP.csv' pdata = pd.read_csv(fpath,usecols = ['Country Name','1990'], encoding='gbk') pdata
|
导入csv指定表头
1 2 3 4
| fpath = r'data\GDP.csv' pdata = pd.read_csv(fpath,header=1, encoding='gbk') pdata
|
csv无表头指定None
1 2 3
| pdata = pd.read_csv(fpath,header=None, encoding='gbk') pdata
|
数据保存
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| import pandas as pd import numpy as np
fpath = r'data\GDP.csv' csv_path1 = r'data\new_GDP_1.csv' csv_path2 = r'data\new_GDP_2.csv' csv_path3 = r'data\new_GDP_3.csv' pdata = pd.read_csv(fpath, encoding='gbk')
pdata.to_csv(csv_path1)
pdata.to_csv(csv_path2, index=False)
pdata.to_csv(csv_path3, index=False, columns=['1990','1991'])
|