Python-100-Days/Day66-80/code/day04.ipynb

13033 lines
417 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "196a647a-6faa-4aee-a0bf-a345852251dd",
"metadata": {},
"source": [
"## 深入浅出pandas-1\n",
"\n",
"pandas是一个支持数据分析全流程的Python开源库它的作者Wes McKinney于2008年开始开发这个库其主要目标是提供一个大数据分析和处理的工具。pandas封装了从数据加载、数据重塑、数据清洗到数据透视、数据呈现等一系列操作提供了三种核心的数据类型\n",
"1. `Series`:数据系列,表示一维的数据。跟一维数组的区别在于每条数据都有对应的索引,处理数据的方法比`ndarray`更为丰富。\n",
"2. `DataFrame`:数据框、数据窗、数据表,表示二维的数据。跟二维数组相比,`DataFrame`有行索引和列索引而且提供了100+方法来处理数据。\n",
"3. `Index`:为`Series`和`DataFrame`提供索引服务。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "eb84f909-921a-47da-87b1-61578c871422",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n",
"plt.rcParams['axes.unicode_minus'] = False\n",
"get_ipython().run_line_magic('config', \"InlineBackend.figure_format = 'svg'\")"
]
},
{
"cell_type": "markdown",
"id": "2102e83e-2a6d-47aa-b449-c058bea1a601",
"metadata": {},
"source": [
"### 创建DataFrame对象"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "87dbde08-dcab-4ede-a791-b56e11dd9115",
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(20)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "4c5b2767-2074-4cdf-b1ba-beff6f425942",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 95, 86, 75],\n",
" [ 91, 88, 86],\n",
" [ 69, 80, 71],\n",
" [ 82, 67, 94],\n",
" [ 92, 100, 81]])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"stu_names = ['狄仁杰', '白起', '李元芳', '苏妲己', '孙尚香']\n",
"cou_names = ['语文', '数学', '英语']\n",
"scores_arr = np.random.randint(60, 101, (5, 3))\n",
"scores_arr"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f8c2a6bf-ca5e-479d-ab63-f5c3620186e3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>语文</th>\n",
" <th>数学</th>\n",
" <th>英语</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>狄仁杰</th>\n",
" <td>95</td>\n",
" <td>86</td>\n",
" <td>75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>白起</th>\n",
" <td>91</td>\n",
" <td>88</td>\n",
" <td>86</td>\n",
" </tr>\n",
" <tr>\n",
" <th>李元芳</th>\n",
" <td>69</td>\n",
" <td>80</td>\n",
" <td>71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>苏妲己</th>\n",
" <td>82</td>\n",
" <td>67</td>\n",
" <td>94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>孙尚香</th>\n",
" <td>92</td>\n",
" <td>100</td>\n",
" <td>81</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 语文 数学 英语\n",
"狄仁杰 95 86 75\n",
"白起 91 88 86\n",
"李元芳 69 80 71\n",
"苏妲己 82 67 94\n",
"孙尚香 92 100 81"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 方法一通过二维数组构造DataFrame对象\n",
"df1 = pd.DataFrame(data=scores_arr, columns=cou_names, index=stu_names)\n",
"df1"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "baad5381-fb7d-4cc9-9288-a05d750144af",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['狄仁杰', '白起', '李元芳', '苏妲己', '孙尚香'], dtype='object')"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 行索引\n",
"df1.index"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "d7f06b76-b60b-49cb-be72-adafb0978fca",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['语文', '数学', '英语'], dtype='object')"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 列索引\n",
"df1.columns"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "13b1275d-77e5-4d5d-b227-19db3f4196fd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 95, 86, 75],\n",
" [ 91, 88, 86],\n",
" [ 69, 80, 71],\n",
" [ 82, 67, 94],\n",
" [ 92, 100, 81]])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 值 - 二维数组\n",
"df1.values"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "dbf5bb11-1600-4ae4-bc95-369bc8189c20",
"metadata": {},
"outputs": [],
"source": [
"scores_dict = {\n",
" '语文': [95, 91, 69, 82, 92],\n",
" '数学': [86, 88, 80, 67, 100],\n",
" '英语': [75, 86, 71, 94, 81]\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c300bbbd-329a-4852-bf76-78ce1de02b8f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>语文</th>\n",
" <th>数学</th>\n",
" <th>英语</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>狄仁杰</th>\n",
" <td>95</td>\n",
" <td>86</td>\n",
" <td>75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>白起</th>\n",
" <td>91</td>\n",
" <td>88</td>\n",
" <td>86</td>\n",
" </tr>\n",
" <tr>\n",
" <th>李元芳</th>\n",
" <td>69</td>\n",
" <td>80</td>\n",
" <td>71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>苏妲己</th>\n",
" <td>82</td>\n",
" <td>67</td>\n",
" <td>94</td>\n",
" </tr>\n",
" <tr>\n",
" <th>孙尚香</th>\n",
" <td>92</td>\n",
" <td>100</td>\n",
" <td>81</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 语文 数学 英语\n",
"狄仁杰 95 86 75\n",
"白起 91 88 86\n",
"李元芳 69 80 71\n",
"苏妲己 82 67 94\n",
"孙尚香 92 100 81"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 方法二通过数据字典构造DataFrame对象\n",
"df2 = pd.DataFrame(data=scores_dict, index=stu_names)\n",
"df2"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "705c0de6-43ff-46c6-85d5-301743d18d43",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 5 entries, 狄仁杰 to 孙尚香\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype\n",
"--- ------ -------------- -----\n",
" 0 语文 5 non-null int64\n",
" 1 数学 5 non-null int64\n",
" 2 英语 5 non-null int64\n",
"dtypes: int64(3)\n",
"memory usage: 558.0 bytes\n"
]
}
],
"source": [
"# 查看DataFrame信息\n",
"df2.info(memory_usage='deep')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "71417ac2-8f4b-4950-9336-de6fbc1f5da4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>公示编号</th>\n",
" <th>姓名</th>\n",
" <th>出生年月</th>\n",
" <th>单位名称</th>\n",
" <th>积分分值</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>202300001</td>\n",
" <td>张浩</td>\n",
" <td>1977-02</td>\n",
" <td>北京首钢股份有限公司</td>\n",
" <td>140.05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>202300002</td>\n",
" <td>冯云</td>\n",
" <td>1982-02</td>\n",
" <td>中国人民解放军空军二十三厂</td>\n",
" <td>134.29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>202300003</td>\n",
" <td>王天东</td>\n",
" <td>1975-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.63</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>202300004</td>\n",
" <td>陈军</td>\n",
" <td>1976-07</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>202300005</td>\n",
" <td>樊海瑞</td>\n",
" <td>1981-06</td>\n",
" <td>中国民生银行股份有限公司</td>\n",
" <td>132.46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5998</th>\n",
" <td>202305999</td>\n",
" <td>曹恰</td>\n",
" <td>1983-09</td>\n",
" <td>首都师范大学科德学院</td>\n",
" <td>109.92</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5999</th>\n",
" <td>202306000</td>\n",
" <td>罗佳</td>\n",
" <td>1981-05</td>\n",
" <td>厦门方胜众合企业服务有限公司海淀分公司</td>\n",
" <td>109.92</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6000</th>\n",
" <td>202306001</td>\n",
" <td>席盛代</td>\n",
" <td>1983-06</td>\n",
" <td>中国华能集团清洁能源技术研究院有限公司</td>\n",
" <td>109.92</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6001</th>\n",
" <td>202306002</td>\n",
" <td>彭芸芸</td>\n",
" <td>1981-09</td>\n",
" <td>北京汉杰凯德文化传播有限公司</td>\n",
" <td>109.92</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6002</th>\n",
" <td>202306003</td>\n",
" <td>张越</td>\n",
" <td>1982-01</td>\n",
" <td>大爱城投资控股有限公司</td>\n",
" <td>109.92</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6003 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" 公示编号 姓名 出生年月 单位名称 积分分值\n",
"0 202300001 张浩 1977-02 北京首钢股份有限公司 140.05\n",
"1 202300002 冯云 1982-02 中国人民解放军空军二十三厂 134.29\n",
"2 202300003 王天东 1975-01 中建二局第三建筑工程有限公司 133.63\n",
"3 202300004 陈军 1976-07 中建二局第三建筑工程有限公司 133.29\n",
"4 202300005 樊海瑞 1981-06 中国民生银行股份有限公司 132.46\n",
"... ... ... ... ... ...\n",
"5998 202305999 曹恰 1983-09 首都师范大学科德学院 109.92\n",
"5999 202306000 罗佳 1981-05 厦门方胜众合企业服务有限公司海淀分公司 109.92\n",
"6000 202306001 席盛代 1983-06 中国华能集团清洁能源技术研究院有限公司 109.92\n",
"6001 202306002 彭芸芸 1981-09 北京汉杰凯德文化传播有限公司 109.92\n",
"6002 202306003 张越 1982-01 大爱城投资控股有限公司 109.92\n",
"\n",
"[6003 rows x 5 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 方法三从CSV文件加载数据创建DataFrame对象\n",
"df3 = pd.read_csv(\n",
" 'res/2023年北京积分落户数据.csv',\n",
" # encoding='utf-8', # 指定字符编码\n",
" # sep='', # 指定字段的分隔符(默认逗号)\n",
" # delimiter='#',\n",
" # header=0, # 表头所在的行\n",
" # quotechar='\"', # 包裹字符串的字符(默认双引号)\n",
" # index_col='公示编号', # 索引列\n",
" # usecols=['公示编号', '姓名', '积分分值'], # 指定加载的列\n",
" # nrows=10, # 加载的行数\n",
" # skiprows=np.arange(1, 101), # 跳过哪些行\n",
" # true_values=['是', 'Yes', 'YES'], # 哪些值会被视为布尔值True\n",
" # false_values=['否', 'No', 'NO'], # 哪些值会被视为布尔值False\n",
" # na_values=['---', 'N/A'], # 哪些值会被视为空值\n",
" # iterator=True, # 开启迭代器模式\n",
" # chunksize=1000, # 每次加载的数据体量\n",
")\n",
"df3"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "e9bd62fd-19d2-4ac1-97a6-3f6a0542e1df",
"metadata": {},
"outputs": [],
"source": [
"# %pip install openpyxl"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "cb3387b9-3402-4b25-a5d5-ff9690a1ac06",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>83</td>\n",
" <td>3351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>205635-402</td>\n",
" <td>八匹马</td>\n",
" <td>219</td>\n",
" <td>29</td>\n",
" <td>1016</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-021</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>85</td>\n",
" <td>6320</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>211471-902/704</td>\n",
" <td>八匹马</td>\n",
" <td>59</td>\n",
" <td>59</td>\n",
" <td>852</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>211807-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>27</td>\n",
" <td>435</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1945 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"0 2020-01-01 上海 拼多多 182894-455 八匹马 99 83 3351\n",
"1 2020-01-01 上海 抖音 205635-402 八匹马 219 29 1016\n",
"2 2020-01-01 上海 天猫 205654-021 八匹马 169 85 6320\n",
"3 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485\n",
"4 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452\n",
"... ... ... ... ... ... ... ... ...\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277\n",
"1943 2020-12-31 福建 抖音 211471-902/704 八匹马 59 59 852\n",
"1944 2020-12-31 福建 天猫 211807-050 八匹马 99 27 435\n",
"\n",
"[1945 rows x 8 columns]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 方法四从Excel文件加载数据创建DataFrame对象\n",
"df6 = pd.read_excel(\n",
" 'res/2020年销售数据.xlsx',\n",
" sheet_name='data',\n",
")\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "d06abbd8-9a34-4ab3-a75c-76e3ed8eb36c",
"metadata": {},
"outputs": [],
"source": [
"# %pip install -U pymysql cryptography sqlalchemy"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "5aa0e35f-2a13-4c8e-a9fd-87b0bf72307e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Engine(mysql+pymysql://guest:***@47.109.26.237:3306/hrs)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 方法五从数据服务器加载数据创建DataFrame对象\n",
"from sqlalchemy import create_engine\n",
"\n",
"# URL \n",
"engine = create_engine('mysql+pymysql://guest:Guest.618@47.109.26.237:3306/hrs')\n",
"engine"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "4b344f17-f5a1-4d7d-ad3c-ede4b122609c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" <tr>\n",
" <th>dno</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>运维部</td>\n",
" <td>深圳</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dname dloc\n",
"dno \n",
"10 会计部 北京\n",
"20 研发部 成都\n",
"30 销售部 重庆\n",
"40 运维部 深圳"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dept_df = pd.read_sql('tb_dept', engine, index_col='dno')\n",
"dept_df"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "c5d1ffa3-6962-4c26-ae92-a8d7bc7da0cb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" </tr>\n",
" <tr>\n",
" <th>eno</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1359</th>\n",
" <td>胡一刀</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>1800</td>\n",
" <td>200.0</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2056</th>\n",
" <td>乔峰</td>\n",
" <td>分析师</td>\n",
" <td>7800.0</td>\n",
" <td>5000</td>\n",
" <td>1500.0</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3088</th>\n",
" <td>李莫愁</td>\n",
" <td>设计师</td>\n",
" <td>2056.0</td>\n",
" <td>3500</td>\n",
" <td>800.0</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3211</th>\n",
" <td>张无忌</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>3200</td>\n",
" <td>NaN</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3233</th>\n",
" <td>丘处机</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>3400</td>\n",
" <td>NaN</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3244</th>\n",
" <td>欧阳锋</td>\n",
" <td>程序员</td>\n",
" <td>3088.0</td>\n",
" <td>3200</td>\n",
" <td>NaN</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3251</th>\n",
" <td>张翠山</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>4000</td>\n",
" <td>NaN</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3344</th>\n",
" <td>黄蓉</td>\n",
" <td>销售主管</td>\n",
" <td>7800.0</td>\n",
" <td>3000</td>\n",
" <td>800.0</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3577</th>\n",
" <td>杨过</td>\n",
" <td>会计</td>\n",
" <td>5566.0</td>\n",
" <td>2200</td>\n",
" <td>NaN</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3588</th>\n",
" <td>朱九真</td>\n",
" <td>会计</td>\n",
" <td>5566.0</td>\n",
" <td>2500</td>\n",
" <td>NaN</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4466</th>\n",
" <td>苗人凤</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>2500</td>\n",
" <td>NaN</td>\n",
" <td>30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5234</th>\n",
" <td>郭靖</td>\n",
" <td>出纳</td>\n",
" <td>5566.0</td>\n",
" <td>2000</td>\n",
" <td>NaN</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5566</th>\n",
" <td>宋远桥</td>\n",
" <td>会计师</td>\n",
" <td>7800.0</td>\n",
" <td>4000</td>\n",
" <td>1000.0</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7800</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>9000</td>\n",
" <td>1200.0</td>\n",
" <td>20</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno\n",
"eno \n",
"1359 胡一刀 销售员 3344.0 1800 200.0 30\n",
"2056 乔峰 分析师 7800.0 5000 1500.0 20\n",
"3088 李莫愁 设计师 2056.0 3500 800.0 20\n",
"3211 张无忌 程序员 2056.0 3200 NaN 20\n",
"3233 丘处机 程序员 2056.0 3400 NaN 20\n",
"3244 欧阳锋 程序员 3088.0 3200 NaN 20\n",
"3251 张翠山 程序员 2056.0 4000 NaN 20\n",
"3344 黄蓉 销售主管 7800.0 3000 800.0 30\n",
"3577 杨过 会计 5566.0 2200 NaN 10\n",
"3588 朱九真 会计 5566.0 2500 NaN 10\n",
"4466 苗人凤 销售员 3344.0 2500 NaN 30\n",
"5234 郭靖 出纳 5566.0 2000 NaN 10\n",
"5566 宋远桥 会计师 7800.0 4000 1000.0 10\n",
"7800 张三丰 总裁 NaN 9000 1200.0 20"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emp_df1 = pd.read_sql('tb_emp', engine, index_col='eno')\n",
"emp_df1"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "f84b6886-09d8-4f13-89cc-487574991dba",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" </tr>\n",
" <tr>\n",
" <th>eno</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>9500</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>50000</td>\n",
" <td>8000</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9600</th>\n",
" <td>王大锤</td>\n",
" <td>程序员</td>\n",
" <td>9800.0</td>\n",
" <td>8000</td>\n",
" <td>600</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9700</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>60000</td>\n",
" <td>6000</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9800</th>\n",
" <td>骆昊</td>\n",
" <td>架构师</td>\n",
" <td>7800.0</td>\n",
" <td>30000</td>\n",
" <td>5000</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9900</th>\n",
" <td>陈小刀</td>\n",
" <td>分析师</td>\n",
" <td>9800.0</td>\n",
" <td>10000</td>\n",
" <td>1200</td>\n",
" <td>20</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno\n",
"eno \n",
"9500 张三丰 总裁 NaN 50000 8000 20\n",
"9600 王大锤 程序员 9800.0 8000 600 20\n",
"9700 张三丰 总裁 NaN 60000 6000 20\n",
"9800 骆昊 架构师 7800.0 30000 5000 20\n",
"9900 陈小刀 分析师 9800.0 10000 1200 20"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"emp_df2 = pd.read_sql('tb_emp2', engine, index_col='eno')\n",
"emp_df2"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "c60e96d2-9a0d-4901-b39c-c31760de47a0",
"metadata": {},
"outputs": [],
"source": [
"# 关闭连接释放资源\n",
"engine.connect().close()"
]
},
{
"cell_type": "markdown",
"id": "12086a7a-c161-4753-9a8e-180f9e8b2edf",
"metadata": {},
"source": [
"### 查看信息"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "785e58f9-b3f7-49a6-affc-8caaa66cebf1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 1945 entries, 0 to 1944\n",
"Data columns (total 8 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 销售日期 1945 non-null datetime64[ns]\n",
" 1 销售区域 1945 non-null object \n",
" 2 销售渠道 1945 non-null object \n",
" 3 销售订单 1945 non-null object \n",
" 4 品牌 1945 non-null object \n",
" 5 售价 1945 non-null int64 \n",
" 6 销售数量 1945 non-null int64 \n",
" 7 直接成本 1945 non-null int64 \n",
"dtypes: datetime64[ns](1), int64(3), object(4)\n",
"memory usage: 121.7+ KB\n"
]
}
],
"source": [
"df6.info()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "fd8a9156-3939-430d-9738-60b3d8a95563",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>83</td>\n",
" <td>3351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>205635-402</td>\n",
" <td>八匹马</td>\n",
" <td>219</td>\n",
" <td>29</td>\n",
" <td>1016</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-021</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>85</td>\n",
" <td>6320</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"0 2020-01-01 上海 拼多多 182894-455 八匹马 99 83 3351\n",
"1 2020-01-01 上海 抖音 205635-402 八匹马 219 29 1016\n",
"2 2020-01-01 上海 天猫 205654-021 八匹马 169 85 6320"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 获取前N行\n",
"df6.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "b75ace23-9b92-4425-b58f-bcd81e8d72e7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>211471-902/704</td>\n",
" <td>八匹马</td>\n",
" <td>59</td>\n",
" <td>59</td>\n",
" <td>852</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>211807-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>27</td>\n",
" <td>435</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277\n",
"1943 2020-12-31 福建 抖音 211471-902/704 八匹马 59 59 852\n",
"1944 2020-12-31 福建 天猫 211807-050 八匹马 99 27 435"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 获取后N行\n",
"df6.tail(5)"
]
},
{
"cell_type": "markdown",
"id": "c2b2a909-0b40-473c-bb3f-85aca1925a19",
"metadata": {},
"source": [
"### 操作行、列、单元格"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "fe964b3b-7f51-4202-b528-f5102d9be9f0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 2020-01-01\n",
"1 2020-01-01\n",
"2 2020-01-01\n",
"3 2020-01-01\n",
"4 2020-01-01\n",
" ... \n",
"1940 2020-12-30\n",
"1941 2020-12-30\n",
"1942 2020-12-31\n",
"1943 2020-12-31\n",
"1944 2020-12-31\n",
"Name: 销售日期, Length: 1945, dtype: datetime64[ns]"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 访问列\n",
"df6['销售日期']"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "b2e5ccb3-4b97-4a02-8316-b1321390f286",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 拼多多\n",
"1 抖音\n",
"2 天猫\n",
"3 天猫\n",
"4 天猫\n",
" ... \n",
"1940 京东\n",
"1941 实体\n",
"1942 实体\n",
"1943 抖音\n",
"1944 天猫\n",
"Name: 销售渠道, Length: 1945, dtype: object"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.销售渠道"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "80ad78dc-4f47-4421-8478-ba7797350db4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 拼多多\n",
"1 抖音\n",
"2 天猫\n",
"3 天猫\n",
"4 天猫\n",
" ... \n",
"1940 京东\n",
"1941 实体\n",
"1942 实体\n",
"1943 抖音\n",
"1944 天猫\n",
"Name: 销售渠道, Length: 1945, dtype: object"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6['销售渠道']"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "7b970671-6f16-4e07-8666-715495de2832",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df6['销售日期'])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "2c9cb56b-6a2b-479e-8c57-c61683858387",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售渠道</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>拼多多</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>抖音</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>天猫</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>天猫</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>天猫</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>京东</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>实体</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>实体</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>抖音</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>天猫</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1945 rows × 1 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售渠道\n",
"0 拼多多\n",
"1 抖音\n",
"2 天猫\n",
"3 天猫\n",
"4 天猫\n",
"... ...\n",
"1940 京东\n",
"1941 实体\n",
"1942 实体\n",
"1943 抖音\n",
"1944 天猫\n",
"\n",
"[1945 rows x 1 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6[['销售渠道']]"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "75730cd3-0459-4a62-97ee-e037256cc98a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df6[['销售渠道']])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "9e097e49-b762-4c9f-9d93-98abb1701d97",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>3351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>1016</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>6320</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>2452</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>1560</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>3028</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>2277</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>852</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>435</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1945 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 直接成本\n",
"0 2020-01-01 上海 3351\n",
"1 2020-01-01 上海 1016\n",
"2 2020-01-01 上海 6320\n",
"3 2020-01-01 上海 485\n",
"4 2020-01-01 上海 2452\n",
"... ... ... ...\n",
"1940 2020-12-30 北京 1560\n",
"1941 2020-12-30 福建 3028\n",
"1942 2020-12-31 福建 2277\n",
"1943 2020-12-31 福建 852\n",
"1944 2020-12-31 福建 435\n",
"\n",
"[1945 rows x 3 columns]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 访问多个列 - 花式索引\n",
"df6[['销售日期', '销售区域', '直接成本']]"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "cf31a169-549e-4182-8206-789f97316115",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['销售订单', '品牌', '售价', '销售数量'], dtype='object')"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.columns[3:7]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "792713c0-13bc-4810-86cc-5f6f6ce78719",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>83</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>205635-402</td>\n",
" <td>八匹马</td>\n",
" <td>219</td>\n",
" <td>29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>205654-021</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>85</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>211471-902/704</td>\n",
" <td>八匹马</td>\n",
" <td>59</td>\n",
" <td>59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>211807-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>27</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1945 rows × 4 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售订单 品牌 售价 销售数量\n",
"0 182894-455 八匹马 99 83\n",
"1 205635-402 八匹马 219 29\n",
"2 205654-021 八匹马 169 85\n",
"3 205654-519 八匹马 169 14\n",
"4 377781-010 皮皮虾 249 61\n",
"... ... ... ... ...\n",
"1940 D89677 花花姑娘 269 26\n",
"1941 182719-050 八匹马 79 97\n",
"1942 G70083 花花姑娘 269 55\n",
"1943 211471-902/704 八匹马 59 59\n",
"1944 211807-050 八匹马 99 27\n",
"\n",
"[1945 rows x 4 columns]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6[df6.columns[3:7]]"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "02d43b17-15e3-44d5-844b-a50d365bf863",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"销售日期 2020-12-31 00:00:00\n",
"销售区域 福建\n",
"销售渠道 天猫\n",
"销售订单 211807-050\n",
"品牌 八匹马\n",
"售价 99\n",
"销售数量 27\n",
"直接成本 435\n",
"Name: 1944, dtype: object"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 访问行 - loc属性\n",
"df6.loc[1944]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "79da6932-f985-44dc-9f4b-e051e4749c65",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"销售日期 2020-12-31 00:00:00\n",
"销售区域 福建\n",
"销售渠道 天猫\n",
"销售订单 211807-050\n",
"品牌 八匹马\n",
"售价 99\n",
"销售数量 27\n",
"直接成本 435\n",
"Name: 1944, dtype: object"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.iloc[-1]"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "6246b39b-7229-4e0f-af7b-0915e707492a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>83</td>\n",
" <td>3351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>529753-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>329</td>\n",
" <td>18</td>\n",
" <td>1839</td>\n",
" </tr>\n",
" <tr>\n",
" <th>58</th>\n",
" <td>2020-01-10</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>AWDH584-1</td>\n",
" <td>壁虎</td>\n",
" <td>299</td>\n",
" <td>14</td>\n",
" <td>1495</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1000</th>\n",
" <td>2020-05-29</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>G71332</td>\n",
" <td>花花姑娘</td>\n",
" <td>899</td>\n",
" <td>92</td>\n",
" <td>35120</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1000</th>\n",
" <td>2020-05-29</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>G71332</td>\n",
" <td>花花姑娘</td>\n",
" <td>899</td>\n",
" <td>92</td>\n",
" <td>35120</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1000</th>\n",
" <td>2020-05-29</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>G71332</td>\n",
" <td>花花姑娘</td>\n",
" <td>899</td>\n",
" <td>92</td>\n",
" <td>35120</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1099</th>\n",
" <td>2020-06-17</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>G70077</td>\n",
" <td>花花姑娘</td>\n",
" <td>329</td>\n",
" <td>38</td>\n",
" <td>2266</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"0 2020-01-01 上海 拼多多 182894-455 八匹马 99 83 3351\n",
"100 2020-01-15 福建 天猫 529753-010 皮皮虾 329 18 1839\n",
"58 2020-01-10 北京 天猫 AWDH584-1 壁虎 299 14 1495\n",
"1000 2020-05-29 上海 天猫 G71332 花花姑娘 899 92 35120\n",
"1000 2020-05-29 上海 天猫 G71332 花花姑娘 899 92 35120\n",
"1000 2020-05-29 上海 天猫 G71332 花花姑娘 899 92 35120\n",
"1099 2020-06-17 上海 拼多多 G70077 花花姑娘 329 38 2266"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 访问多行 - 花式索引\n",
"df6.loc[[0, 100, 58, 1000, 1000, 1000, 1099]]"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "77321324-0ca9-4c2e-a792-3c717189cb27",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>532500-011</td>\n",
" <td>皮皮虾</td>\n",
" <td>399</td>\n",
" <td>42</td>\n",
" <td>2771</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>京东</td>\n",
" <td>543179-011</td>\n",
" <td>皮皮虾</td>\n",
" <td>429</td>\n",
" <td>92</td>\n",
" <td>10216</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>543367-077</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>73</td>\n",
" <td>16161</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>拼多多</td>\n",
" <td>634872-021</td>\n",
" <td>皮皮虾</td>\n",
" <td>179</td>\n",
" <td>46</td>\n",
" <td>1322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>105</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>ADLG008-1</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>65</td>\n",
" <td>6154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>拼多多</td>\n",
" <td>449794-494</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>98</td>\n",
" <td>9996</td>\n",
" </tr>\n",
" <tr>\n",
" <th>197</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>543330-063</td>\n",
" <td>皮皮虾</td>\n",
" <td>549</td>\n",
" <td>32</td>\n",
" <td>3581</td>\n",
" </tr>\n",
" <tr>\n",
" <th>198</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>575088-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>399</td>\n",
" <td>40</td>\n",
" <td>4088</td>\n",
" </tr>\n",
" <tr>\n",
" <th>199</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>575107-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>449</td>\n",
" <td>32</td>\n",
" <td>4144</td>\n",
" </tr>\n",
" <tr>\n",
" <th>200</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>182721-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>85</td>\n",
" <td>3439</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"101 2020-01-15 福建 天猫 532500-011 皮皮虾 399 42 2771\n",
"102 2020-01-15 福建 京东 543179-011 皮皮虾 429 92 10216\n",
"103 2020-01-15 福建 实体 543367-077 皮皮虾 1199 73 16161\n",
"104 2020-01-15 福建 拼多多 634872-021 皮皮虾 179 46 1322\n",
"105 2020-01-15 福建 抖音 ADLG008-1 壁虎 239 65 6154\n",
".. ... ... ... ... ... ... ... ...\n",
"196 2020-01-26 福建 拼多多 449794-494 皮皮虾 249 98 9996\n",
"197 2020-01-26 福建 抖音 543330-063 皮皮虾 549 32 3581\n",
"198 2020-01-26 福建 天猫 575088-010 皮皮虾 399 40 4088\n",
"199 2020-01-26 福建 天猫 575107-010 皮皮虾 449 32 4144\n",
"200 2020-01-26 福建 天猫 182721-050 八匹马 99 85 3439\n",
"\n",
"[100 rows x 8 columns]"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 访问多行 - 切片索引\n",
"df6.loc[101:200]"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "5eb250eb-18e0-4181-a37a-dec55c633116",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>101</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>532500-011</td>\n",
" <td>皮皮虾</td>\n",
" <td>399</td>\n",
" <td>42</td>\n",
" <td>2771</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>京东</td>\n",
" <td>543179-011</td>\n",
" <td>皮皮虾</td>\n",
" <td>429</td>\n",
" <td>92</td>\n",
" <td>10216</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>543367-077</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>73</td>\n",
" <td>16161</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>拼多多</td>\n",
" <td>634872-021</td>\n",
" <td>皮皮虾</td>\n",
" <td>179</td>\n",
" <td>46</td>\n",
" <td>1322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>105</th>\n",
" <td>2020-01-15</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>ADLG008-1</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>65</td>\n",
" <td>6154</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>195</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>449794-091</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>78</td>\n",
" <td>3424</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>拼多多</td>\n",
" <td>449794-494</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>98</td>\n",
" <td>9996</td>\n",
" </tr>\n",
" <tr>\n",
" <th>197</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>543330-063</td>\n",
" <td>皮皮虾</td>\n",
" <td>549</td>\n",
" <td>32</td>\n",
" <td>3581</td>\n",
" </tr>\n",
" <tr>\n",
" <th>198</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>575088-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>399</td>\n",
" <td>40</td>\n",
" <td>4088</td>\n",
" </tr>\n",
" <tr>\n",
" <th>199</th>\n",
" <td>2020-01-26</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>575107-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>449</td>\n",
" <td>32</td>\n",
" <td>4144</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>99 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"101 2020-01-15 福建 天猫 532500-011 皮皮虾 399 42 2771\n",
"102 2020-01-15 福建 京东 543179-011 皮皮虾 429 92 10216\n",
"103 2020-01-15 福建 实体 543367-077 皮皮虾 1199 73 16161\n",
"104 2020-01-15 福建 拼多多 634872-021 皮皮虾 179 46 1322\n",
"105 2020-01-15 福建 抖音 ADLG008-1 壁虎 239 65 6154\n",
".. ... ... ... ... ... ... ... ...\n",
"195 2020-01-26 福建 实体 449794-091 皮皮虾 249 78 3424\n",
"196 2020-01-26 福建 拼多多 449794-494 皮皮虾 249 98 9996\n",
"197 2020-01-26 福建 抖音 543330-063 皮皮虾 549 32 3581\n",
"198 2020-01-26 福建 天猫 575088-010 皮皮虾 399 40 4088\n",
"199 2020-01-26 福建 天猫 575107-010 皮皮虾 449 32 4144\n",
"\n",
"[99 rows x 8 columns]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# df6[101:200]\n",
"df6.iloc[101:200]"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "f2daddd7-3635-40b1-9416-c1137315948c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>211807-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>27</td>\n",
" <td>435</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>211471-902/704</td>\n",
" <td>八匹马</td>\n",
" <td>59</td>\n",
" <td>59</td>\n",
" <td>852</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1849</th>\n",
" <td>2020-12-03</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>543458-452</td>\n",
" <td>皮皮虾</td>\n",
" <td>229</td>\n",
" <td>17</td>\n",
" <td>1041</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1848</th>\n",
" <td>2020-12-03</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>211894-021</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>76</td>\n",
" <td>3844</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1847</th>\n",
" <td>2020-12-02</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>22</td>\n",
" <td>731</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1846</th>\n",
" <td>2020-12-01</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>158609-477</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>80</td>\n",
" <td>2436</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1845</th>\n",
" <td>2020-12-01</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>G89395</td>\n",
" <td>花花姑娘</td>\n",
" <td>369</td>\n",
" <td>92</td>\n",
" <td>5291</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"1944 2020-12-31 福建 天猫 211807-050 八匹马 99 27 435\n",
"1943 2020-12-31 福建 抖音 211471-902/704 八匹马 59 59 852\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560\n",
"... ... ... ... ... ... ... ... ...\n",
"1849 2020-12-03 福建 抖音 543458-452 皮皮虾 229 17 1041\n",
"1848 2020-12-03 福建 实体 211894-021 八匹马 169 76 3844\n",
"1847 2020-12-02 北京 京东 182894-455 八匹马 99 22 731\n",
"1846 2020-12-01 北京 天猫 158609-477 八匹马 79 80 2436\n",
"1845 2020-12-01 北京 天猫 G89395 花花姑娘 369 92 5291\n",
"\n",
"[100 rows x 8 columns]"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.iloc[-1:-101:-1]"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "9321811f-e62b-4db5-a478-cdc0934f097b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"169"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 访问单元格\n",
"df6.at[2, '售价']"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "bd1670bc-0a13-457f-95f1-352a4d61b3a7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>83</td>\n",
" <td>3351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>205635-402</td>\n",
" <td>八匹马</td>\n",
" <td>219</td>\n",
" <td>29</td>\n",
" <td>1016</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-021</td>\n",
" <td>八匹马</td>\n",
" <td>999</td>\n",
" <td>85</td>\n",
" <td>6320</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>211471-902/704</td>\n",
" <td>八匹马</td>\n",
" <td>59</td>\n",
" <td>59</td>\n",
" <td>852</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>211807-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>27</td>\n",
" <td>435</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1945 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"0 2020-01-01 上海 拼多多 182894-455 八匹马 99 83 3351\n",
"1 2020-01-01 上海 抖音 205635-402 八匹马 219 29 1016\n",
"2 2020-01-01 上海 天猫 205654-021 八匹马 999 85 6320\n",
"3 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485\n",
"4 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452\n",
"... ... ... ... ... ... ... ... ...\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277\n",
"1943 2020-12-31 福建 抖音 211471-902/704 八匹马 59 59 852\n",
"1944 2020-12-31 福建 天猫 211807-050 八匹马 99 27 435\n",
"\n",
"[1945 rows x 8 columns]"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.at[2, '售价'] = 999\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "7460ef03-3f45-4cc0-99a3-85039c2606b0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>83</td>\n",
" <td>3351</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>205635-402</td>\n",
" <td>八匹马</td>\n",
" <td>219</td>\n",
" <td>29</td>\n",
" <td>1016</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-021</td>\n",
" <td>八匹马</td>\n",
" <td>888</td>\n",
" <td>85</td>\n",
" <td>6320</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>211471-902/704</td>\n",
" <td>八匹马</td>\n",
" <td>59</td>\n",
" <td>59</td>\n",
" <td>852</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>211807-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>27</td>\n",
" <td>435</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1945 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"0 2020-01-01 上海 拼多多 182894-455 八匹马 99 83 3351\n",
"1 2020-01-01 上海 抖音 205635-402 八匹马 219 29 1016\n",
"2 2020-01-01 上海 天猫 205654-021 八匹马 888 85 6320\n",
"3 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485\n",
"4 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452\n",
"... ... ... ... ... ... ... ... ...\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277\n",
"1943 2020-12-31 福建 抖音 211471-902/704 八匹马 59 59 852\n",
"1944 2020-12-31 福建 天猫 211807-050 八匹马 99 27 435\n",
"\n",
"[1945 rows x 8 columns]"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.iat[2, -3] = 888\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "34c81da6-f58f-4c36-8596-004266e9374b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>季度</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>83</td>\n",
" <td>3351</td>\n",
" <td>8217</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>205635-402</td>\n",
" <td>八匹马</td>\n",
" <td>219</td>\n",
" <td>29</td>\n",
" <td>1016</td>\n",
" <td>6351</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-021</td>\n",
" <td>八匹马</td>\n",
" <td>888</td>\n",
" <td>85</td>\n",
" <td>6320</td>\n",
" <td>75480</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" <td>2366</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" <td>15189</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" <td>6994</td>\n",
" <td>4</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" <td>7663</td>\n",
" <td>4</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" <td>14795</td>\n",
" <td>4</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>211471-902/704</td>\n",
" <td>八匹马</td>\n",
" <td>59</td>\n",
" <td>59</td>\n",
" <td>852</td>\n",
" <td>3481</td>\n",
" <td>4</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>211807-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>27</td>\n",
" <td>435</td>\n",
" <td>2673</td>\n",
" <td>4</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1945 rows × 11 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本 销售额 季度 \\\n",
"0 2020-01-01 上海 拼多多 182894-455 八匹马 99 83 3351 8217 1 \n",
"1 2020-01-01 上海 抖音 205635-402 八匹马 219 29 1016 6351 1 \n",
"2 2020-01-01 上海 天猫 205654-021 八匹马 888 85 6320 75480 1 \n",
"3 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485 2366 1 \n",
"4 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452 15189 1 \n",
"... ... ... ... ... ... ... ... ... ... .. \n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560 6994 4 \n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028 7663 4 \n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277 14795 4 \n",
"1943 2020-12-31 福建 抖音 211471-902/704 八匹马 59 59 852 3481 4 \n",
"1944 2020-12-31 福建 天猫 211807-050 八匹马 99 27 435 2673 4 \n",
"\n",
" 月份 \n",
"0 1 \n",
"1 1 \n",
"2 1 \n",
"3 1 \n",
"4 1 \n",
"... .. \n",
"1940 12 \n",
"1941 12 \n",
"1942 12 \n",
"1943 12 \n",
"1944 12 \n",
"\n",
"[1945 rows x 11 columns]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 添加列\n",
"df6['销售额'] = df6['售价'] * df6['销售数量']\n",
"df6['季度'] = df6['销售日期'].dt.quarter\n",
"df6['月份'] = df6['销售日期'].dt.month\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "c3c60210-202d-4bd8-8804-1d657746b29c",
"metadata": {},
"outputs": [],
"source": [
"# 添加行 - 实际工作中基本没有意义"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "6bf78f3d-05a2-4c7a-a0f0-fb6659f1bd6f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>182894-455</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>83</td>\n",
" <td>3351</td>\n",
" <td>8217</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>205635-402</td>\n",
" <td>八匹马</td>\n",
" <td>219</td>\n",
" <td>29</td>\n",
" <td>1016</td>\n",
" <td>6351</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-021</td>\n",
" <td>八匹马</td>\n",
" <td>888</td>\n",
" <td>85</td>\n",
" <td>6320</td>\n",
" <td>75480</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" <td>2366</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" <td>15189</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" <td>6994</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" <td>7663</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" <td>14795</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1943</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>211471-902/704</td>\n",
" <td>八匹马</td>\n",
" <td>59</td>\n",
" <td>59</td>\n",
" <td>852</td>\n",
" <td>3481</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1944</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>211807-050</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>27</td>\n",
" <td>435</td>\n",
" <td>2673</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1945 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本 销售额 月份\n",
"0 2020-01-01 上海 拼多多 182894-455 八匹马 99 83 3351 8217 1\n",
"1 2020-01-01 上海 抖音 205635-402 八匹马 219 29 1016 6351 1\n",
"2 2020-01-01 上海 天猫 205654-021 八匹马 888 85 6320 75480 1\n",
"3 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485 2366 1\n",
"4 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452 15189 1\n",
"... ... ... ... ... ... ... ... ... ... ..\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560 6994 12\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028 7663 12\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277 14795 12\n",
"1943 2020-12-31 福建 抖音 211471-902/704 八匹马 59 59 852 3481 12\n",
"1944 2020-12-31 福建 天猫 211807-050 八匹马 99 27 435 2673 12\n",
"\n",
"[1945 rows x 10 columns]"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 删除列\n",
"# inplace=False - 默认设定 - 不修改原对象返回修改后的新对象\n",
"# inplace=True - 直接修改DataFrame对象不返回新对象 - 方法没有返回值\n",
"df6.drop(columns=['季度'], inplace=True)\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "cdf8cf10-5193-4c38-8fef-bc3d38a8a0a8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" <td>2366</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" <td>15189</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>京东</td>\n",
" <td>543369-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>799</td>\n",
" <td>68</td>\n",
" <td>15203</td>\n",
" <td>54332</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>588685-002</td>\n",
" <td>皮皮虾</td>\n",
" <td>299</td>\n",
" <td>91</td>\n",
" <td>8008</td>\n",
" <td>27209</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2020-01-03</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>AKLH641-1</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>82</td>\n",
" <td>4127</td>\n",
" <td>19598</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1938</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>588682-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>269</td>\n",
" <td>50</td>\n",
" <td>4388</td>\n",
" <td>13450</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1939</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>599007-513</td>\n",
" <td>皮皮虾</td>\n",
" <td>349</td>\n",
" <td>18</td>\n",
" <td>2466</td>\n",
" <td>6282</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" <td>6994</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" <td>7663</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" <td>14795</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1939 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本 销售额 月份\n",
"3 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485 2366 1\n",
"4 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452 15189 1\n",
"5 2020-01-02 上海 京东 543369-010 皮皮虾 799 68 15203 54332 1\n",
"6 2020-01-02 上海 拼多多 588685-002 皮皮虾 299 91 8008 27209 1\n",
"7 2020-01-03 上海 天猫 AKLH641-1 壁虎 239 82 4127 19598 1\n",
"... ... ... ... ... ... ... ... ... ... ..\n",
"1938 2020-12-29 北京 拼多多 588682-010 皮皮虾 269 50 4388 13450 12\n",
"1939 2020-12-29 北京 天猫 599007-513 皮皮虾 349 18 2466 6282 12\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560 6994 12\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028 7663 12\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277 14795 12\n",
"\n",
"[1939 rows x 10 columns]"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 删除行\n",
"# df6.drop(index=[0, 1, 2, 100, 1944, 1943])\n",
"df6.drop(index=[0, 1, 2, 100, 1944, 1943], inplace=True)\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "1ddfe77d-aa92-4d6a-b2db-8469b1222ed3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" <td>2366</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" <td>15189</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>京东</td>\n",
" <td>543369-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>799</td>\n",
" <td>68</td>\n",
" <td>15203</td>\n",
" <td>54332</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>588685-002</td>\n",
" <td>皮皮虾</td>\n",
" <td>299</td>\n",
" <td>91</td>\n",
" <td>8008</td>\n",
" <td>27209</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2020-01-03</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>AKLH641-1</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>82</td>\n",
" <td>4127</td>\n",
" <td>19598</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1938</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>588682-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>269</td>\n",
" <td>50</td>\n",
" <td>4388</td>\n",
" <td>13450</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1939</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>599007-513</td>\n",
" <td>皮皮虾</td>\n",
" <td>349</td>\n",
" <td>18</td>\n",
" <td>2466</td>\n",
" <td>6282</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" <td>6994</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" <td>7663</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" <td>14795</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1839 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本 销售额 月份\n",
"3 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485 2366 1\n",
"4 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452 15189 1\n",
"5 2020-01-02 上海 京东 543369-010 皮皮虾 799 68 15203 54332 1\n",
"6 2020-01-02 上海 拼多多 588685-002 皮皮虾 299 91 8008 27209 1\n",
"7 2020-01-03 上海 天猫 AKLH641-1 壁虎 239 82 4127 19598 1\n",
"... ... ... ... ... ... ... ... ... ... ..\n",
"1938 2020-12-29 北京 拼多多 588682-010 皮皮虾 269 50 4388 13450 12\n",
"1939 2020-12-29 北京 天猫 599007-513 皮皮虾 349 18 2466 6282 12\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560 6994 12\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028 7663 12\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277 14795 12\n",
"\n",
"[1839 rows x 10 columns]"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.drop(index=df6.index[100:200], inplace=True)\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "8020bbb0-740e-496a-9224-fe3495a19c92",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>订单号</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" <td>2366</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" <td>15189</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>京东</td>\n",
" <td>543369-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>799</td>\n",
" <td>68</td>\n",
" <td>15203</td>\n",
" <td>54332</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>588685-002</td>\n",
" <td>皮皮虾</td>\n",
" <td>299</td>\n",
" <td>91</td>\n",
" <td>8008</td>\n",
" <td>27209</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2020-01-03</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>AKLH641-1</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>82</td>\n",
" <td>4127</td>\n",
" <td>19598</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1938</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>588682-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>269</td>\n",
" <td>50</td>\n",
" <td>4388</td>\n",
" <td>13450</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1939</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>599007-513</td>\n",
" <td>皮皮虾</td>\n",
" <td>349</td>\n",
" <td>18</td>\n",
" <td>2466</td>\n",
" <td>6282</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1940</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" <td>6994</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1941</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" <td>7663</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1942</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" <td>14795</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1839 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 订单号 品牌 售价 销售数量 直接成本 销售额 月份\n",
"3 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485 2366 1\n",
"4 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452 15189 1\n",
"5 2020-01-02 上海 京东 543369-010 皮皮虾 799 68 15203 54332 1\n",
"6 2020-01-02 上海 拼多多 588685-002 皮皮虾 299 91 8008 27209 1\n",
"7 2020-01-03 上海 天猫 AKLH641-1 壁虎 239 82 4127 19598 1\n",
"... ... .. ... ... ... ... ... ... ... ..\n",
"1938 2020-12-29 北京 拼多多 588682-010 皮皮虾 269 50 4388 13450 12\n",
"1939 2020-12-29 北京 天猫 599007-513 皮皮虾 349 18 2466 6282 12\n",
"1940 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560 6994 12\n",
"1941 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028 7663 12\n",
"1942 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277 14795 12\n",
"\n",
"[1839 rows x 10 columns]"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 重命名\n",
"df6.rename(columns={'销售区域': '区域', '销售渠道': '渠道', '销售订单': '订单号'}, inplace=True)\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "d028d2be-0944-4b70-a3ea-f7d06cdd458f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>订单号</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-519</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" <td>2366</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" <td>15189</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>京东</td>\n",
" <td>543369-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>799</td>\n",
" <td>68</td>\n",
" <td>15203</td>\n",
" <td>54332</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>588685-002</td>\n",
" <td>皮皮虾</td>\n",
" <td>299</td>\n",
" <td>91</td>\n",
" <td>8008</td>\n",
" <td>27209</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-03</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>AKLH641-1</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>82</td>\n",
" <td>4127</td>\n",
" <td>19598</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1834</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>588682-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>269</td>\n",
" <td>50</td>\n",
" <td>4388</td>\n",
" <td>13450</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1835</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>599007-513</td>\n",
" <td>皮皮虾</td>\n",
" <td>349</td>\n",
" <td>18</td>\n",
" <td>2466</td>\n",
" <td>6282</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1836</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>D89677</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" <td>6994</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1837</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>182719-050</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" <td>7663</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1838</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>G70083</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" <td>14795</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1839 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 订单号 品牌 售价 销售数量 直接成本 销售额 月份\n",
"0 2020-01-01 上海 天猫 205654-519 八匹马 169 14 485 2366 1\n",
"1 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452 15189 1\n",
"2 2020-01-02 上海 京东 543369-010 皮皮虾 799 68 15203 54332 1\n",
"3 2020-01-02 上海 拼多多 588685-002 皮皮虾 299 91 8008 27209 1\n",
"4 2020-01-03 上海 天猫 AKLH641-1 壁虎 239 82 4127 19598 1\n",
"... ... .. ... ... ... ... ... ... ... ..\n",
"1834 2020-12-29 北京 拼多多 588682-010 皮皮虾 269 50 4388 13450 12\n",
"1835 2020-12-29 北京 天猫 599007-513 皮皮虾 349 18 2466 6282 12\n",
"1836 2020-12-30 北京 京东 D89677 花花姑娘 269 26 1560 6994 12\n",
"1837 2020-12-30 福建 实体 182719-050 八匹马 79 97 3028 7663 12\n",
"1838 2020-12-31 福建 实体 G70083 花花姑娘 269 55 2277 14795 12\n",
"\n",
"[1839 rows x 10 columns]"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 重置索引\n",
"# drop=False - 默认值 - 原来的索引变成一个普通列\n",
"# drop=True - 原来的索引直接丢弃\n",
"df6.reset_index(drop=True, inplace=True)\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "cb55a518-f4bd-4fac-8554-4353c0798bc6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>205654-519</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" <td>2366</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>377781-010</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" <td>15189</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543369-010</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>京东</td>\n",
" <td>皮皮虾</td>\n",
" <td>799</td>\n",
" <td>68</td>\n",
" <td>15203</td>\n",
" <td>54332</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>588685-002</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>299</td>\n",
" <td>91</td>\n",
" <td>8008</td>\n",
" <td>27209</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AKLH641-1</th>\n",
" <td>2020-01-03</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>82</td>\n",
" <td>4127</td>\n",
" <td>19598</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>588682-010</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>269</td>\n",
" <td>50</td>\n",
" <td>4388</td>\n",
" <td>13450</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>599007-513</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>皮皮虾</td>\n",
" <td>349</td>\n",
" <td>18</td>\n",
" <td>2466</td>\n",
" <td>6282</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>D89677</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" <td>6994</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>182719-050</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" <td>7663</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G70083</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" <td>14795</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1839 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"205654-519 2020-01-01 上海 天猫 八匹马 169 14 485 2366 1\n",
"377781-010 2020-01-01 上海 天猫 皮皮虾 249 61 2452 15189 1\n",
"543369-010 2020-01-02 上海 京东 皮皮虾 799 68 15203 54332 1\n",
"588685-002 2020-01-02 上海 拼多多 皮皮虾 299 91 8008 27209 1\n",
"AKLH641-1 2020-01-03 上海 天猫 壁虎 239 82 4127 19598 1\n",
"... ... .. ... ... ... ... ... ... ..\n",
"588682-010 2020-12-29 北京 拼多多 皮皮虾 269 50 4388 13450 12\n",
"599007-513 2020-12-29 北京 天猫 皮皮虾 349 18 2466 6282 12\n",
"D89677 2020-12-30 北京 京东 花花姑娘 269 26 1560 6994 12\n",
"182719-050 2020-12-30 福建 实体 八匹马 79 97 3028 7663 12\n",
"G70083 2020-12-31 福建 实体 花花姑娘 269 55 2277 14795 12\n",
"\n",
"[1839 rows x 9 columns]"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 设置索引\n",
"df6.set_index('订单号', inplace=True)\n",
"df6"
]
},
{
"cell_type": "code",
"execution_count": 49,
"id": "101bd804-5a90-4cd3-a545-613df6d9b8e5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>G70509</th>\n",
" <td>2020-02-03</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1499</td>\n",
" <td>89</td>\n",
" <td>52302</td>\n",
" <td>133411</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G72186</th>\n",
" <td>2020-04-11</td>\n",
" <td>江苏</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>88</td>\n",
" <td>18381</td>\n",
" <td>114312</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543367-077</th>\n",
" <td>2020-04-12</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>88</td>\n",
" <td>25674</td>\n",
" <td>105512</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G68188</th>\n",
" <td>2020-06-08</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>80</td>\n",
" <td>29819</td>\n",
" <td>103920</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>577714-010</th>\n",
" <td>2020-06-17</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>97</td>\n",
" <td>40884</td>\n",
" <td>116303</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543367-077</th>\n",
" <td>2020-08-28</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>89</td>\n",
" <td>45442</td>\n",
" <td>106711</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G68188</th>\n",
" <td>2020-09-19</td>\n",
" <td>广东</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>93</td>\n",
" <td>34290</td>\n",
" <td>120807</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"G70509 2020-02-03 北京 拼多多 花花姑娘 1499 89 52302 133411 2\n",
"G72186 2020-04-11 江苏 天猫 花花姑娘 1299 88 18381 114312 4\n",
"543367-077 2020-04-12 北京 拼多多 皮皮虾 1199 88 25674 105512 4\n",
"G68188 2020-06-08 北京 拼多多 花花姑娘 1299 80 29819 103920 6\n",
"577714-010 2020-06-17 上海 拼多多 皮皮虾 1199 97 40884 116303 6\n",
"543367-077 2020-08-28 上海 天猫 皮皮虾 1199 89 45442 106711 8\n",
"G68188 2020-09-19 广东 拼多多 花花姑娘 1299 93 34290 120807 9"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 筛选数据 - 布尔索引\n",
"df6[df6['销售额'] > 100000]"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "64c83a43-fcb0-4ba1-9400-ae4a5b21715c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>G68188</th>\n",
" <td>2020-06-08</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>80</td>\n",
" <td>29819</td>\n",
" <td>103920</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>577714-010</th>\n",
" <td>2020-06-17</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>97</td>\n",
" <td>40884</td>\n",
" <td>116303</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"G68188 2020-06-08 北京 拼多多 花花姑娘 1299 80 29819 103920 6\n",
"577714-010 2020-06-17 上海 拼多多 皮皮虾 1199 97 40884 116303 6"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6[(df6['销售额'] > 100000) & (df6['月份'] == 6)]"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "22c01e56-b188-40f7-9e53-3a3d2f0bcb29",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>G70509</th>\n",
" <td>2020-02-03</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1499</td>\n",
" <td>89</td>\n",
" <td>52302</td>\n",
" <td>133411</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G72186</th>\n",
" <td>2020-04-11</td>\n",
" <td>江苏</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>88</td>\n",
" <td>18381</td>\n",
" <td>114312</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543367-077</th>\n",
" <td>2020-04-12</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>88</td>\n",
" <td>25674</td>\n",
" <td>105512</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>204396-900/021</th>\n",
" <td>2020-06-01</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>啊哟喂</td>\n",
" <td>199</td>\n",
" <td>55</td>\n",
" <td>4221</td>\n",
" <td>10945</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AHSJ008-2</th>\n",
" <td>2020-06-01</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>壁虎</td>\n",
" <td>139</td>\n",
" <td>61</td>\n",
" <td>3640</td>\n",
" <td>8479</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543179-011</th>\n",
" <td>2020-06-30</td>\n",
" <td>上海</td>\n",
" <td>京东</td>\n",
" <td>皮皮虾</td>\n",
" <td>429</td>\n",
" <td>74</td>\n",
" <td>11601</td>\n",
" <td>31746</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AKLH641-1</th>\n",
" <td>2020-06-30</td>\n",
" <td>上海</td>\n",
" <td>实体</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>67</td>\n",
" <td>3490</td>\n",
" <td>16013</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>158631-050</th>\n",
" <td>2020-06-30</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>89</td>\n",
" <td>1421</td>\n",
" <td>8811</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543367-077</th>\n",
" <td>2020-08-28</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>89</td>\n",
" <td>45442</td>\n",
" <td>106711</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G68188</th>\n",
" <td>2020-09-19</td>\n",
" <td>广东</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>93</td>\n",
" <td>34290</td>\n",
" <td>120807</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>152 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"G70509 2020-02-03 北京 拼多多 花花姑娘 1499 89 52302 133411 2\n",
"G72186 2020-04-11 江苏 天猫 花花姑娘 1299 88 18381 114312 4\n",
"543367-077 2020-04-12 北京 拼多多 皮皮虾 1199 88 25674 105512 4\n",
"204396-900/021 2020-06-01 北京 拼多多 啊哟喂 199 55 4221 10945 6\n",
"AHSJ008-2 2020-06-01 北京 天猫 壁虎 139 61 3640 8479 6\n",
"... ... .. ... ... ... ... ... ... ..\n",
"543179-011 2020-06-30 上海 京东 皮皮虾 429 74 11601 31746 6\n",
"AKLH641-1 2020-06-30 上海 实体 壁虎 239 67 3490 16013 6\n",
"158631-050 2020-06-30 北京 天猫 八匹马 99 89 1421 8811 6\n",
"543367-077 2020-08-28 上海 天猫 皮皮虾 1199 89 45442 106711 8\n",
"G68188 2020-09-19 广东 拼多多 花花姑娘 1299 93 34290 120807 9\n",
"\n",
"[152 rows x 9 columns]"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6[(df6['销售额'] > 100000) | (df6['月份'] == 6)]"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "5adb86b9-8b31-49cb-9292-94189f3714c5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>G70509</th>\n",
" <td>2020-02-03</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1499</td>\n",
" <td>89</td>\n",
" <td>52302</td>\n",
" <td>133411</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G72186</th>\n",
" <td>2020-04-11</td>\n",
" <td>江苏</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>88</td>\n",
" <td>18381</td>\n",
" <td>114312</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543367-077</th>\n",
" <td>2020-04-12</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>88</td>\n",
" <td>25674</td>\n",
" <td>105512</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G68188</th>\n",
" <td>2020-06-08</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>80</td>\n",
" <td>29819</td>\n",
" <td>103920</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>577714-010</th>\n",
" <td>2020-06-17</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>97</td>\n",
" <td>40884</td>\n",
" <td>116303</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543367-077</th>\n",
" <td>2020-08-28</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>89</td>\n",
" <td>45442</td>\n",
" <td>106711</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G68188</th>\n",
" <td>2020-09-19</td>\n",
" <td>广东</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>93</td>\n",
" <td>34290</td>\n",
" <td>120807</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"G70509 2020-02-03 北京 拼多多 花花姑娘 1499 89 52302 133411 2\n",
"G72186 2020-04-11 江苏 天猫 花花姑娘 1299 88 18381 114312 4\n",
"543367-077 2020-04-12 北京 拼多多 皮皮虾 1199 88 25674 105512 4\n",
"G68188 2020-06-08 北京 拼多多 花花姑娘 1299 80 29819 103920 6\n",
"577714-010 2020-06-17 上海 拼多多 皮皮虾 1199 97 40884 116303 6\n",
"543367-077 2020-08-28 上海 天猫 皮皮虾 1199 89 45442 106711 8\n",
"G68188 2020-09-19 广东 拼多多 花花姑娘 1299 93 34290 120807 9"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.query('销售额 > 100000')"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "b768afa0-7066-4a1d-8f10-b88386587388",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>D86056</th>\n",
" <td>2020-06-01</td>\n",
" <td>北京</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>469</td>\n",
" <td>24</td>\n",
" <td>3445</td>\n",
" <td>11256</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543179-011</th>\n",
" <td>2020-06-02</td>\n",
" <td>北京</td>\n",
" <td>实体</td>\n",
" <td>皮皮虾</td>\n",
" <td>429</td>\n",
" <td>58</td>\n",
" <td>8002</td>\n",
" <td>24882</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AKLH651-2</th>\n",
" <td>2020-06-04</td>\n",
" <td>北京</td>\n",
" <td>实体</td>\n",
" <td>壁虎</td>\n",
" <td>299</td>\n",
" <td>78</td>\n",
" <td>3577</td>\n",
" <td>23322</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>F89396</th>\n",
" <td>2020-06-07</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>199</td>\n",
" <td>93</td>\n",
" <td>7370</td>\n",
" <td>18507</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>X23567</th>\n",
" <td>2020-06-09</td>\n",
" <td>上海</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>429</td>\n",
" <td>46</td>\n",
" <td>6484</td>\n",
" <td>19734</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G71183</th>\n",
" <td>2020-06-10</td>\n",
" <td>北京</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>369</td>\n",
" <td>93</td>\n",
" <td>9247</td>\n",
" <td>34317</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>D89458</th>\n",
" <td>2020-06-11</td>\n",
" <td>北京</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>299</td>\n",
" <td>85</td>\n",
" <td>6379</td>\n",
" <td>25415</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AKLJ034-3</th>\n",
" <td>2020-06-12</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>81</td>\n",
" <td>8048</td>\n",
" <td>19359</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AHSJ017-3</th>\n",
" <td>2020-06-13</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>壁虎</td>\n",
" <td>139</td>\n",
" <td>96</td>\n",
" <td>5892</td>\n",
" <td>13344</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>182802-050</th>\n",
" <td>2020-06-15</td>\n",
" <td>上海</td>\n",
" <td>实体</td>\n",
" <td>八匹马</td>\n",
" <td>199</td>\n",
" <td>26</td>\n",
" <td>1760</td>\n",
" <td>5174</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G70260</th>\n",
" <td>2020-06-15</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>329</td>\n",
" <td>15</td>\n",
" <td>1491</td>\n",
" <td>4935</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FT001-18-1763</th>\n",
" <td>2020-06-17</td>\n",
" <td>上海</td>\n",
" <td>实体</td>\n",
" <td>八匹马</td>\n",
" <td>699</td>\n",
" <td>98</td>\n",
" <td>25835</td>\n",
" <td>68502</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>182717-001</th>\n",
" <td>2020-06-18</td>\n",
" <td>北京</td>\n",
" <td>实体</td>\n",
" <td>八匹马</td>\n",
" <td>69</td>\n",
" <td>10</td>\n",
" <td>255</td>\n",
" <td>690</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>158631-050</th>\n",
" <td>2020-06-20</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>66</td>\n",
" <td>2670</td>\n",
" <td>6534</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>D87692</th>\n",
" <td>2020-06-22</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>399</td>\n",
" <td>82</td>\n",
" <td>5058</td>\n",
" <td>32718</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>158636-050</th>\n",
" <td>2020-06-25</td>\n",
" <td>北京</td>\n",
" <td>实体</td>\n",
" <td>八匹马</td>\n",
" <td>119</td>\n",
" <td>22</td>\n",
" <td>781</td>\n",
" <td>2618</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G80825</th>\n",
" <td>2020-06-28</td>\n",
" <td>北京</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>399</td>\n",
" <td>84</td>\n",
" <td>12260</td>\n",
" <td>33516</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>X12399</th>\n",
" <td>2020-06-29</td>\n",
" <td>上海</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>329</td>\n",
" <td>83</td>\n",
" <td>4926</td>\n",
" <td>27307</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AKLH641-1</th>\n",
" <td>2020-06-30</td>\n",
" <td>上海</td>\n",
" <td>实体</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>67</td>\n",
" <td>3490</td>\n",
" <td>16013</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"D86056 2020-06-01 北京 实体 花花姑娘 469 24 3445 11256 6\n",
"543179-011 2020-06-02 北京 实体 皮皮虾 429 58 8002 24882 6\n",
"AKLH651-2 2020-06-04 北京 实体 壁虎 299 78 3577 23322 6\n",
"F89396 2020-06-07 福建 实体 花花姑娘 199 93 7370 18507 6\n",
"X23567 2020-06-09 上海 实体 花花姑娘 429 46 6484 19734 6\n",
"G71183 2020-06-10 北京 实体 花花姑娘 369 93 9247 34317 6\n",
"D89458 2020-06-11 北京 实体 花花姑娘 299 85 6379 25415 6\n",
"AKLJ034-3 2020-06-12 福建 实体 壁虎 239 81 8048 19359 6\n",
"AHSJ017-3 2020-06-13 福建 实体 壁虎 139 96 5892 13344 6\n",
"182802-050 2020-06-15 上海 实体 八匹马 199 26 1760 5174 6\n",
"G70260 2020-06-15 福建 实体 花花姑娘 329 15 1491 4935 6\n",
"FT001-18-1763 2020-06-17 上海 实体 八匹马 699 98 25835 68502 6\n",
"182717-001 2020-06-18 北京 实体 八匹马 69 10 255 690 6\n",
"158631-050 2020-06-20 福建 实体 八匹马 99 66 2670 6534 6\n",
"D87692 2020-06-22 福建 实体 花花姑娘 399 82 5058 32718 6\n",
"158636-050 2020-06-25 北京 实体 八匹马 119 22 781 2618 6\n",
"G80825 2020-06-28 北京 实体 花花姑娘 399 84 12260 33516 6\n",
"X12399 2020-06-29 上海 实体 花花姑娘 329 83 4926 27307 6\n",
"AKLH641-1 2020-06-30 上海 实体 壁虎 239 67 3490 16013 6"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.query('月份 == 6 and 渠道 == \"实体\"')"
]
},
{
"cell_type": "code",
"execution_count": 54,
"id": "2e57b21c-0565-4352-8924-de169497bce0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>G68188</th>\n",
" <td>2020-06-08</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>1299</td>\n",
" <td>80</td>\n",
" <td>29819</td>\n",
" <td>103920</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>577714-010</th>\n",
" <td>2020-06-17</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>97</td>\n",
" <td>40884</td>\n",
" <td>116303</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"G68188 2020-06-08 北京 拼多多 花花姑娘 1299 80 29819 103920 6\n",
"577714-010 2020-06-17 上海 拼多多 皮皮虾 1199 97 40884 116303 6"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.query('销售额 > 100000 and 月份 == 6')"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "7ef8ba56-5293-41b0-8208-85a0eed735e8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>205333-031</th>\n",
" <td>2020-12-21</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>98</td>\n",
" <td>6150</td>\n",
" <td>16562</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>F76717</th>\n",
" <td>2020-06-24</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>429</td>\n",
" <td>15</td>\n",
" <td>2403</td>\n",
" <td>6435</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>577714-010</th>\n",
" <td>2020-02-01</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>皮皮虾</td>\n",
" <td>1199</td>\n",
" <td>55</td>\n",
" <td>22707</td>\n",
" <td>65945</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>F45562</th>\n",
" <td>2020-04-16</td>\n",
" <td>福建</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>599</td>\n",
" <td>90</td>\n",
" <td>14111</td>\n",
" <td>53910</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>211466-901/519</th>\n",
" <td>2020-01-29</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>八匹马</td>\n",
" <td>199</td>\n",
" <td>52</td>\n",
" <td>4651</td>\n",
" <td>10348</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>F76716</th>\n",
" <td>2020-04-20</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>429</td>\n",
" <td>70</td>\n",
" <td>11772</td>\n",
" <td>30030</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G69627</th>\n",
" <td>2020-06-09</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>999</td>\n",
" <td>36</td>\n",
" <td>14206</td>\n",
" <td>35964</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>588670-010</th>\n",
" <td>2020-02-15</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>皮皮虾</td>\n",
" <td>499</td>\n",
" <td>75</td>\n",
" <td>15335</td>\n",
" <td>37425</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>D86041</th>\n",
" <td>2020-03-11</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>399</td>\n",
" <td>40</td>\n",
" <td>5490</td>\n",
" <td>15960</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>204266-050</th>\n",
" <td>2020-09-26</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>啊哟喂</td>\n",
" <td>239</td>\n",
" <td>46</td>\n",
" <td>4116</td>\n",
" <td>10994</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>100 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"205333-031 2020-12-21 北京 京东 八匹马 169 98 6150 16562 12\n",
"F76717 2020-06-24 福建 天猫 花花姑娘 429 15 2403 6435 6\n",
"577714-010 2020-02-01 北京 天猫 皮皮虾 1199 55 22707 65945 2\n",
"F45562 2020-04-16 福建 天猫 花花姑娘 599 90 14111 53910 4\n",
"211466-901/519 2020-01-29 上海 天猫 八匹马 199 52 4651 10348 1\n",
"... ... .. .. ... ... ... ... ... ..\n",
"F76716 2020-04-20 上海 天猫 花花姑娘 429 70 11772 30030 4\n",
"G69627 2020-06-09 北京 天猫 花花姑娘 999 36 14206 35964 6\n",
"588670-010 2020-02-15 上海 抖音 皮皮虾 499 75 15335 37425 2\n",
"D86041 2020-03-11 上海 天猫 花花姑娘 399 40 5490 15960 3\n",
"204266-050 2020-09-26 上海 抖音 啊哟喂 239 46 4116 10994 9\n",
"\n",
"[100 rows x 9 columns]"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 随机抽样\n",
"df6.sample(n=100)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "bfcd52d7-eac4-4776-b0e3-a37e67e349f3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>G74904</th>\n",
" <td>2020-08-18</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>花花姑娘</td>\n",
" <td>499</td>\n",
" <td>88</td>\n",
" <td>15952</td>\n",
" <td>43912</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>D89096</th>\n",
" <td>2020-03-29</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>399</td>\n",
" <td>73</td>\n",
" <td>13022</td>\n",
" <td>29127</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>F89399</th>\n",
" <td>2020-02-14</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>花花姑娘</td>\n",
" <td>499</td>\n",
" <td>46</td>\n",
" <td>5758</td>\n",
" <td>22954</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>D87692</th>\n",
" <td>2020-10-29</td>\n",
" <td>福建</td>\n",
" <td>拼多多</td>\n",
" <td>花花姑娘</td>\n",
" <td>399</td>\n",
" <td>48</td>\n",
" <td>8074</td>\n",
" <td>19152</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>205301-477</th>\n",
" <td>2020-11-24</td>\n",
" <td>广东</td>\n",
" <td>京东</td>\n",
" <td>八匹马</td>\n",
" <td>199</td>\n",
" <td>47</td>\n",
" <td>1762</td>\n",
" <td>9353</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>543369-010</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>京东</td>\n",
" <td>皮皮虾</td>\n",
" <td>799</td>\n",
" <td>68</td>\n",
" <td>15203</td>\n",
" <td>54332</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G72212</th>\n",
" <td>2020-07-07</td>\n",
" <td>江苏</td>\n",
" <td>天猫</td>\n",
" <td>花花姑娘</td>\n",
" <td>899</td>\n",
" <td>45</td>\n",
" <td>6922</td>\n",
" <td>40455</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>479935-012</th>\n",
" <td>2020-08-02</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>皮皮虾</td>\n",
" <td>349</td>\n",
" <td>25</td>\n",
" <td>2098</td>\n",
" <td>8725</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>480239-010</th>\n",
" <td>2020-09-02</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>皮皮虾</td>\n",
" <td>299</td>\n",
" <td>36</td>\n",
" <td>4384</td>\n",
" <td>10764</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AWDH721-2</th>\n",
" <td>2020-07-04</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>壁虎</td>\n",
" <td>269</td>\n",
" <td>39</td>\n",
" <td>4597</td>\n",
" <td>10491</td>\n",
" <td>7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>92 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"G74904 2020-08-18 北京 京东 花花姑娘 499 88 15952 43912 8\n",
"D89096 2020-03-29 上海 拼多多 花花姑娘 399 73 13022 29127 3\n",
"F89399 2020-02-14 上海 抖音 花花姑娘 499 46 5758 22954 2\n",
"D87692 2020-10-29 福建 拼多多 花花姑娘 399 48 8074 19152 10\n",
"205301-477 2020-11-24 广东 京东 八匹马 199 47 1762 9353 11\n",
"... ... .. ... ... ... ... ... ... ..\n",
"543369-010 2020-01-02 上海 京东 皮皮虾 799 68 15203 54332 1\n",
"G72212 2020-07-07 江苏 天猫 花花姑娘 899 45 6922 40455 7\n",
"479935-012 2020-08-02 北京 京东 皮皮虾 349 25 2098 8725 8\n",
"480239-010 2020-09-02 福建 抖音 皮皮虾 299 36 4384 10764 9\n",
"AWDH721-2 2020-07-04 上海 抖音 壁虎 269 39 4597 10491 7\n",
"\n",
"[92 rows x 9 columns]"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df6.sample(frac=0.05)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "1c654ca8-3179-4fa2-9213-7d7029357342",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>销售区域</th>\n",
" <th>销售渠道</th>\n",
" <th>销售订单</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>205654-021</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>85</td>\n",
" <td>6320</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>377781-010</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020-01-03</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>FT001-N10</td>\n",
" <td>八匹马</td>\n",
" <td>699</td>\n",
" <td>50</td>\n",
" <td>8380</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020-01-04</td>\n",
" <td>上海</td>\n",
" <td>实体</td>\n",
" <td>FT001-N10</td>\n",
" <td>八匹马</td>\n",
" <td>699</td>\n",
" <td>15</td>\n",
" <td>2635</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020-01-06</td>\n",
" <td>上海</td>\n",
" <td>抖音</td>\n",
" <td>G70357</td>\n",
" <td>花花姑娘</td>\n",
" <td>699</td>\n",
" <td>49</td>\n",
" <td>8809</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>190</th>\n",
" <td>2020-12-05</td>\n",
" <td>福建</td>\n",
" <td>抖音</td>\n",
" <td>G69924</td>\n",
" <td>花花姑娘</td>\n",
" <td>599</td>\n",
" <td>75</td>\n",
" <td>7057</td>\n",
" </tr>\n",
" <tr>\n",
" <th>191</th>\n",
" <td>2020-12-07</td>\n",
" <td>福建</td>\n",
" <td>拼多多</td>\n",
" <td>182898-258</td>\n",
" <td>八匹马</td>\n",
" <td>99</td>\n",
" <td>99</td>\n",
" <td>2506</td>\n",
" </tr>\n",
" <tr>\n",
" <th>192</th>\n",
" <td>2020-12-10</td>\n",
" <td>北京</td>\n",
" <td>抖音</td>\n",
" <td>AKLJ041-2</td>\n",
" <td>壁虎</td>\n",
" <td>269</td>\n",
" <td>42</td>\n",
" <td>1746</td>\n",
" </tr>\n",
" <tr>\n",
" <th>193</th>\n",
" <td>2020-12-21</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>205333-031</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>98</td>\n",
" <td>6150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>194</th>\n",
" <td>2020-12-24</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>D88376</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>32</td>\n",
" <td>2006</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>195 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 销售区域 销售渠道 销售订单 品牌 售价 销售数量 直接成本\n",
"0 2020-01-01 上海 天猫 205654-021 八匹马 169 85 6320\n",
"1 2020-01-01 上海 天猫 377781-010 皮皮虾 249 61 2452\n",
"2 2020-01-03 上海 天猫 FT001-N10 八匹马 699 50 8380\n",
"3 2020-01-04 上海 实体 FT001-N10 八匹马 699 15 2635\n",
"4 2020-01-06 上海 抖音 G70357 花花姑娘 699 49 8809\n",
".. ... ... ... ... ... ... ... ...\n",
"190 2020-12-05 福建 抖音 G69924 花花姑娘 599 75 7057\n",
"191 2020-12-07 福建 拼多多 182898-258 八匹马 99 99 2506\n",
"192 2020-12-10 北京 抖音 AKLJ041-2 壁虎 269 42 1746\n",
"193 2020-12-21 北京 京东 205333-031 八匹马 169 98 6150\n",
"194 2020-12-24 福建 实体 D88376 花花姑娘 269 32 2006\n",
"\n",
"[195 rows x 8 columns]"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# replace=False - 无放回抽样\n",
"ignore_rows = np.random.choice(np.arange(1, 1946), size=int(1945 * 0.9), replace=False)\n",
"pd.read_excel(\n",
" 'res/2020年销售数据.xlsx',\n",
" sheet_name='data',\n",
" skiprows=ignore_rows\n",
")"
]
},
{
"cell_type": "markdown",
"id": "2037ed6a-d616-4c67-9f5d-ea517d6e1c6b",
"metadata": {},
"source": [
"### 数据重塑\n",
"\n",
"1. 拼接(合并结构一致的数据)\n",
"2. 合并(事实表连接维度表)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "d2184fd4-bd44-459f-bda4-6dc11c09c219",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(19, 6)"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 拼接两个DataFrame - union\n",
"all_emp_df = pd.concat([emp_df1, emp_df2])\n",
"all_emp_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "05bc65a1-42ac-463c-a089-08fb8dc60855",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>胡一刀</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>1800</td>\n",
" <td>200.0</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>乔峰</td>\n",
" <td>分析师</td>\n",
" <td>7800.0</td>\n",
" <td>5000</td>\n",
" <td>1500.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>李莫愁</td>\n",
" <td>设计师</td>\n",
" <td>2056.0</td>\n",
" <td>3500</td>\n",
" <td>800.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>张无忌</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>3200</td>\n",
" <td>NaN</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>丘处机</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>3400</td>\n",
" <td>NaN</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>欧阳锋</td>\n",
" <td>程序员</td>\n",
" <td>3088.0</td>\n",
" <td>3200</td>\n",
" <td>NaN</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>张翠山</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>4000</td>\n",
" <td>NaN</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>黄蓉</td>\n",
" <td>销售主管</td>\n",
" <td>7800.0</td>\n",
" <td>3000</td>\n",
" <td>800.0</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>杨过</td>\n",
" <td>会计</td>\n",
" <td>5566.0</td>\n",
" <td>2200</td>\n",
" <td>NaN</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>朱九真</td>\n",
" <td>会计</td>\n",
" <td>5566.0</td>\n",
" <td>2500</td>\n",
" <td>NaN</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>苗人凤</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>2500</td>\n",
" <td>NaN</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>郭靖</td>\n",
" <td>出纳</td>\n",
" <td>5566.0</td>\n",
" <td>2000</td>\n",
" <td>NaN</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>宋远桥</td>\n",
" <td>会计师</td>\n",
" <td>7800.0</td>\n",
" <td>4000</td>\n",
" <td>1000.0</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>9000</td>\n",
" <td>1200.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>50000</td>\n",
" <td>8000.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>王大锤</td>\n",
" <td>程序员</td>\n",
" <td>9800.0</td>\n",
" <td>8000</td>\n",
" <td>600.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>60000</td>\n",
" <td>6000.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>骆昊</td>\n",
" <td>架构师</td>\n",
" <td>7800.0</td>\n",
" <td>30000</td>\n",
" <td>5000.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>陈小刀</td>\n",
" <td>分析师</td>\n",
" <td>9800.0</td>\n",
" <td>10000</td>\n",
" <td>1200.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno dname dloc\n",
"0 胡一刀 销售员 3344.0 1800 200.0 30 销售部 重庆\n",
"1 乔峰 分析师 7800.0 5000 1500.0 20 研发部 成都\n",
"2 李莫愁 设计师 2056.0 3500 800.0 20 研发部 成都\n",
"3 张无忌 程序员 2056.0 3200 NaN 20 研发部 成都\n",
"4 丘处机 程序员 2056.0 3400 NaN 20 研发部 成都\n",
"5 欧阳锋 程序员 3088.0 3200 NaN 20 研发部 成都\n",
"6 张翠山 程序员 2056.0 4000 NaN 20 研发部 成都\n",
"7 黄蓉 销售主管 7800.0 3000 800.0 30 销售部 重庆\n",
"8 杨过 会计 5566.0 2200 NaN 10 会计部 北京\n",
"9 朱九真 会计 5566.0 2500 NaN 10 会计部 北京\n",
"10 苗人凤 销售员 3344.0 2500 NaN 30 销售部 重庆\n",
"11 郭靖 出纳 5566.0 2000 NaN 10 会计部 北京\n",
"12 宋远桥 会计师 7800.0 4000 1000.0 10 会计部 北京\n",
"13 张三丰 总裁 NaN 9000 1200.0 20 研发部 成都\n",
"14 张三丰 总裁 NaN 50000 8000.0 20 研发部 成都\n",
"15 王大锤 程序员 9800.0 8000 600.0 20 研发部 成都\n",
"16 张三丰 总裁 NaN 60000 6000.0 20 研发部 成都\n",
"17 骆昊 架构师 7800.0 30000 5000.0 20 研发部 成都\n",
"18 陈小刀 分析师 9800.0 10000 1200.0 20 研发部 成都"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 连表 - 连接事实表和维度表 - 用维度把数据分组然后再做聚合\n",
"# 连接两个DataFrame内连接、左外连接、右外连接、全外连接- join\n",
"# how - 连表方式 - inner、left、right、outer\n",
"# on - 基于哪个字段连表 - left_on、right_on\n",
"all_emp_df = pd.merge(all_emp_df, dept_df, how='inner', on='dno')\n",
"all_emp_df"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "c6a3d52d-a04c-494d-9ee9-2dad9805b1c1",
"metadata": {},
"outputs": [],
"source": [
"# 作业在jobs目录下有若干个CVS文件它们的数据结构是一样的现在需要把所有CSV文件的数据拼接到一个DataFrame中\n",
"import os\n",
"\n",
"dfs = [pd.read_csv(os.path.join('res/jobs', filename))\n",
" for filename in os.listdir('res/jobs') \n",
" if filename.endswith('.csv')]\n",
"pd.concat(dfs, ignore_index=True).to_csv('res/all_jobs.csv', index=False)"
]
},
{
"cell_type": "markdown",
"id": "6b9ad1e1-fe5d-45a0-8755-ac6720a32ba0",
"metadata": {},
"source": [
"### 数据清洗\n",
"\n",
"1. 缺失值\n",
"2. 重复值\n",
"3. 异常值\n",
"4. 预处理"
]
},
{
"cell_type": "code",
"execution_count": 61,
"id": "45c835c4-559f-45f1-a501-70a8c12bbbb1",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno dname dloc\n",
"0 False False False False False False False False\n",
"1 False False False False False False False False\n",
"2 False False False False False False False False\n",
"3 False False False False True False False False\n",
"4 False False False False True False False False\n",
"5 False False False False True False False False\n",
"6 False False False False True False False False\n",
"7 False False False False False False False False\n",
"8 False False False False True False False False\n",
"9 False False False False True False False False\n",
"10 False False False False True False False False\n",
"11 False False False False True False False False\n",
"12 False False False False False False False False\n",
"13 False False True False False False False False\n",
"14 False False True False False False False False\n",
"15 False False False False False False False False\n",
"16 False False True False False False False False\n",
"17 False False False False False False False False\n",
"18 False False False False False False False False"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 甄别缺失值\n",
"all_emp_df.isna()"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "fd7fbdf8-ebf2-463b-ac3b-cdb24560873a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 True\n",
"4 True\n",
"5 True\n",
"6 True\n",
"7 False\n",
"8 True\n",
"9 True\n",
"10 True\n",
"11 True\n",
"12 False\n",
"13 False\n",
"14 False\n",
"15 False\n",
"16 False\n",
"17 False\n",
"18 False\n",
"Name: comm, dtype: bool"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# all_emp_df['comm'].isna()\n",
"all_emp_df['comm'].isnull()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "a4f16d30-83e9-4761-92a1-780e85e721e1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 True\n",
"8 False\n",
"9 False\n",
"10 False\n",
"11 False\n",
"12 True\n",
"13 True\n",
"14 True\n",
"15 True\n",
"16 True\n",
"17 True\n",
"18 True\n",
"Name: comm, dtype: bool"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# all_emp_df['comm'].notna()\n",
"all_emp_df['comm'].notnull()"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "9f2a153d-ab4a-475e-9ee3-0d623a289f7f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"comm\n",
"True 11\n",
"False 8\n",
"Name: count, dtype: int64"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_emp_df['comm'].notna().value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 65,
"id": "5d388d57-fa1a-405b-880e-9316354a6f05",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>胡一刀</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>1800</td>\n",
" <td>200.0</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>乔峰</td>\n",
" <td>分析师</td>\n",
" <td>7800.0</td>\n",
" <td>5000</td>\n",
" <td>1500.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>李莫愁</td>\n",
" <td>设计师</td>\n",
" <td>2056.0</td>\n",
" <td>3500</td>\n",
" <td>800.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>黄蓉</td>\n",
" <td>销售主管</td>\n",
" <td>7800.0</td>\n",
" <td>3000</td>\n",
" <td>800.0</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>宋远桥</td>\n",
" <td>会计师</td>\n",
" <td>7800.0</td>\n",
" <td>4000</td>\n",
" <td>1000.0</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>王大锤</td>\n",
" <td>程序员</td>\n",
" <td>9800.0</td>\n",
" <td>8000</td>\n",
" <td>600.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>骆昊</td>\n",
" <td>架构师</td>\n",
" <td>7800.0</td>\n",
" <td>30000</td>\n",
" <td>5000.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>陈小刀</td>\n",
" <td>分析师</td>\n",
" <td>9800.0</td>\n",
" <td>10000</td>\n",
" <td>1200.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno dname dloc\n",
"0 胡一刀 销售员 3344.0 1800 200.0 30 销售部 重庆\n",
"1 乔峰 分析师 7800.0 5000 1500.0 20 研发部 成都\n",
"2 李莫愁 设计师 2056.0 3500 800.0 20 研发部 成都\n",
"7 黄蓉 销售主管 7800.0 3000 800.0 30 销售部 重庆\n",
"12 宋远桥 会计师 7800.0 4000 1000.0 10 会计部 北京\n",
"15 王大锤 程序员 9800.0 8000 600.0 20 研发部 成都\n",
"17 骆昊 架构师 7800.0 30000 5000.0 20 研发部 成都\n",
"18 陈小刀 分析师 9800.0 10000 1200.0 20 研发部 成都"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 删除空值 - 删除带有空值的行\n",
"all_emp_df.dropna()"
]
},
{
"cell_type": "code",
"execution_count": 66,
"id": "b40fa037-3fab-454e-a300-2e9dcf4b2b60",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>sal</th>\n",
" <th>dno</th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>胡一刀</td>\n",
" <td>销售员</td>\n",
" <td>1800</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>乔峰</td>\n",
" <td>分析师</td>\n",
" <td>5000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>李莫愁</td>\n",
" <td>设计师</td>\n",
" <td>3500</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>张无忌</td>\n",
" <td>程序员</td>\n",
" <td>3200</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>丘处机</td>\n",
" <td>程序员</td>\n",
" <td>3400</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>欧阳锋</td>\n",
" <td>程序员</td>\n",
" <td>3200</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>张翠山</td>\n",
" <td>程序员</td>\n",
" <td>4000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>黄蓉</td>\n",
" <td>销售主管</td>\n",
" <td>3000</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>杨过</td>\n",
" <td>会计</td>\n",
" <td>2200</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>朱九真</td>\n",
" <td>会计</td>\n",
" <td>2500</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>苗人凤</td>\n",
" <td>销售员</td>\n",
" <td>2500</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>郭靖</td>\n",
" <td>出纳</td>\n",
" <td>2000</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>宋远桥</td>\n",
" <td>会计师</td>\n",
" <td>4000</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>9000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>50000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>王大锤</td>\n",
" <td>程序员</td>\n",
" <td>8000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>60000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>骆昊</td>\n",
" <td>架构师</td>\n",
" <td>30000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>陈小刀</td>\n",
" <td>分析师</td>\n",
" <td>10000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job sal dno dname dloc\n",
"0 胡一刀 销售员 1800 30 销售部 重庆\n",
"1 乔峰 分析师 5000 20 研发部 成都\n",
"2 李莫愁 设计师 3500 20 研发部 成都\n",
"3 张无忌 程序员 3200 20 研发部 成都\n",
"4 丘处机 程序员 3400 20 研发部 成都\n",
"5 欧阳锋 程序员 3200 20 研发部 成都\n",
"6 张翠山 程序员 4000 20 研发部 成都\n",
"7 黄蓉 销售主管 3000 30 销售部 重庆\n",
"8 杨过 会计 2200 10 会计部 北京\n",
"9 朱九真 会计 2500 10 会计部 北京\n",
"10 苗人凤 销售员 2500 30 销售部 重庆\n",
"11 郭靖 出纳 2000 10 会计部 北京\n",
"12 宋远桥 会计师 4000 10 会计部 北京\n",
"13 张三丰 总裁 9000 20 研发部 成都\n",
"14 张三丰 总裁 50000 20 研发部 成都\n",
"15 王大锤 程序员 8000 20 研发部 成都\n",
"16 张三丰 总裁 60000 20 研发部 成都\n",
"17 骆昊 架构师 30000 20 研发部 成都\n",
"18 陈小刀 分析师 10000 20 研发部 成都"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_emp_df.dropna(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"id": "67ae21a1-7dc1-496b-85b5-013d79d25a63",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 3344.0\n",
"1 7800.0\n",
"2 2056.0\n",
"3 2056.0\n",
"4 2056.0\n",
"5 3088.0\n",
"6 2056.0\n",
"7 7800.0\n",
"8 5566.0\n",
"9 5566.0\n",
"10 3344.0\n",
"11 5566.0\n",
"12 7800.0\n",
"15 9800.0\n",
"17 7800.0\n",
"18 9800.0\n",
"Name: mgr, dtype: float64"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_emp_df.mgr.dropna()"
]
},
{
"cell_type": "code",
"execution_count": 68,
"id": "66745379-9db7-42b0-ab6b-a55e870a515b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>胡一刀</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>1800</td>\n",
" <td>200.0</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>乔峰</td>\n",
" <td>分析师</td>\n",
" <td>7800.0</td>\n",
" <td>5000</td>\n",
" <td>1500.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>李莫愁</td>\n",
" <td>设计师</td>\n",
" <td>2056.0</td>\n",
" <td>3500</td>\n",
" <td>800.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>张无忌</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>3200</td>\n",
" <td>0.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>丘处机</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>3400</td>\n",
" <td>0.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>欧阳锋</td>\n",
" <td>程序员</td>\n",
" <td>3088.0</td>\n",
" <td>3200</td>\n",
" <td>0.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>张翠山</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>4000</td>\n",
" <td>0.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>黄蓉</td>\n",
" <td>销售主管</td>\n",
" <td>7800.0</td>\n",
" <td>3000</td>\n",
" <td>800.0</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>杨过</td>\n",
" <td>会计</td>\n",
" <td>5566.0</td>\n",
" <td>2200</td>\n",
" <td>0.0</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>朱九真</td>\n",
" <td>会计</td>\n",
" <td>5566.0</td>\n",
" <td>2500</td>\n",
" <td>0.0</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>苗人凤</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>2500</td>\n",
" <td>0.0</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>郭靖</td>\n",
" <td>出纳</td>\n",
" <td>5566.0</td>\n",
" <td>2000</td>\n",
" <td>0.0</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>宋远桥</td>\n",
" <td>会计师</td>\n",
" <td>7800.0</td>\n",
" <td>4000</td>\n",
" <td>1000.0</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>0.0</td>\n",
" <td>9000</td>\n",
" <td>1200.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>0.0</td>\n",
" <td>50000</td>\n",
" <td>8000.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>王大锤</td>\n",
" <td>程序员</td>\n",
" <td>9800.0</td>\n",
" <td>8000</td>\n",
" <td>600.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>0.0</td>\n",
" <td>60000</td>\n",
" <td>6000.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>骆昊</td>\n",
" <td>架构师</td>\n",
" <td>7800.0</td>\n",
" <td>30000</td>\n",
" <td>5000.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>陈小刀</td>\n",
" <td>分析师</td>\n",
" <td>9800.0</td>\n",
" <td>10000</td>\n",
" <td>1200.0</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno dname dloc\n",
"0 胡一刀 销售员 3344.0 1800 200.0 30 销售部 重庆\n",
"1 乔峰 分析师 7800.0 5000 1500.0 20 研发部 成都\n",
"2 李莫愁 设计师 2056.0 3500 800.0 20 研发部 成都\n",
"3 张无忌 程序员 2056.0 3200 0.0 20 研发部 成都\n",
"4 丘处机 程序员 2056.0 3400 0.0 20 研发部 成都\n",
"5 欧阳锋 程序员 3088.0 3200 0.0 20 研发部 成都\n",
"6 张翠山 程序员 2056.0 4000 0.0 20 研发部 成都\n",
"7 黄蓉 销售主管 7800.0 3000 800.0 30 销售部 重庆\n",
"8 杨过 会计 5566.0 2200 0.0 10 会计部 北京\n",
"9 朱九真 会计 5566.0 2500 0.0 10 会计部 北京\n",
"10 苗人凤 销售员 3344.0 2500 0.0 30 销售部 重庆\n",
"11 郭靖 出纳 5566.0 2000 0.0 10 会计部 北京\n",
"12 宋远桥 会计师 7800.0 4000 1000.0 10 会计部 北京\n",
"13 张三丰 总裁 0.0 9000 1200.0 20 研发部 成都\n",
"14 张三丰 总裁 0.0 50000 8000.0 20 研发部 成都\n",
"15 王大锤 程序员 9800.0 8000 600.0 20 研发部 成都\n",
"16 张三丰 总裁 0.0 60000 6000.0 20 研发部 成都\n",
"17 骆昊 架构师 7800.0 30000 5000.0 20 研发部 成都\n",
"18 陈小刀 分析师 9800.0 10000 1200.0 20 研发部 成都"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 填充空值\n",
"all_emp_df.fillna(0)"
]
},
{
"cell_type": "code",
"execution_count": 69,
"id": "e1664115-30e0-4946-ae4b-c919bb319ddc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 200\n",
"1 1500\n",
"2 800\n",
"3 0\n",
"4 0\n",
"5 0\n",
"6 0\n",
"7 800\n",
"8 0\n",
"9 0\n",
"10 0\n",
"11 0\n",
"12 1000\n",
"13 1200\n",
"14 8000\n",
"15 600\n",
"16 6000\n",
"17 5000\n",
"18 1200\n",
"Name: comm, dtype: int64"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_emp_df.comm.fillna(0).astype('i8')"
]
},
{
"cell_type": "code",
"execution_count": 70,
"id": "e1743531-66a2-42a4-8c28-ad268efc848c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 200.0\n",
"1 1500.0\n",
"2 800.0\n",
"3 800.0\n",
"4 800.0\n",
"5 800.0\n",
"6 800.0\n",
"7 800.0\n",
"8 1000.0\n",
"9 1000.0\n",
"10 1000.0\n",
"11 1000.0\n",
"12 1000.0\n",
"13 1200.0\n",
"14 8000.0\n",
"15 600.0\n",
"16 6000.0\n",
"17 5000.0\n",
"18 1200.0\n",
"Name: comm, dtype: float64"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 将空值下方的非空值向上填充 - backward fill\n",
"all_emp_df.comm.bfill()"
]
},
{
"cell_type": "code",
"execution_count": 71,
"id": "5fcef0a0-ff29-42bd-9955-5a97595390fd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 200.0\n",
"1 1500.0\n",
"2 800.0\n",
"3 800.0\n",
"4 800.0\n",
"5 800.0\n",
"6 800.0\n",
"7 800.0\n",
"8 800.0\n",
"9 800.0\n",
"10 800.0\n",
"11 800.0\n",
"12 1000.0\n",
"13 1200.0\n",
"14 8000.0\n",
"15 600.0\n",
"16 6000.0\n",
"17 5000.0\n",
"18 1200.0\n",
"Name: comm, dtype: float64"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 将空值上方的非空值向下填充 - forward fill\n",
"all_emp_df.comm.ffill()"
]
},
{
"cell_type": "code",
"execution_count": 72,
"id": "eeeb9be3-802c-44e3-80a0-465aba1a485a",
"metadata": {},
"outputs": [],
"source": [
"# 通过插值算法填充空值 - interpolate\n",
"all_emp_df['comm'] = all_emp_df.comm.interpolate(method='linear')"
]
},
{
"cell_type": "code",
"execution_count": 73,
"id": "f1f094c3-1cc2-4826-a04a-24150ea9cef8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>胡一刀</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>1800</td>\n",
" <td>200</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>乔峰</td>\n",
" <td>分析师</td>\n",
" <td>7800.0</td>\n",
" <td>5000</td>\n",
" <td>1500</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>李莫愁</td>\n",
" <td>设计师</td>\n",
" <td>2056.0</td>\n",
" <td>3500</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>张无忌</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>3200</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>丘处机</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>3400</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>欧阳锋</td>\n",
" <td>程序员</td>\n",
" <td>3088.0</td>\n",
" <td>3200</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>张翠山</td>\n",
" <td>程序员</td>\n",
" <td>2056.0</td>\n",
" <td>4000</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>黄蓉</td>\n",
" <td>销售主管</td>\n",
" <td>7800.0</td>\n",
" <td>3000</td>\n",
" <td>800</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>杨过</td>\n",
" <td>会计</td>\n",
" <td>5566.0</td>\n",
" <td>2200</td>\n",
" <td>840</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>朱九真</td>\n",
" <td>会计</td>\n",
" <td>5566.0</td>\n",
" <td>2500</td>\n",
" <td>880</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>苗人凤</td>\n",
" <td>销售员</td>\n",
" <td>3344.0</td>\n",
" <td>2500</td>\n",
" <td>920</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>郭靖</td>\n",
" <td>出纳</td>\n",
" <td>5566.0</td>\n",
" <td>2000</td>\n",
" <td>960</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>宋远桥</td>\n",
" <td>会计师</td>\n",
" <td>7800.0</td>\n",
" <td>4000</td>\n",
" <td>1000</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>9000</td>\n",
" <td>1200</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>50000</td>\n",
" <td>8000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>王大锤</td>\n",
" <td>程序员</td>\n",
" <td>9800.0</td>\n",
" <td>8000</td>\n",
" <td>600</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>NaN</td>\n",
" <td>60000</td>\n",
" <td>6000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>骆昊</td>\n",
" <td>架构师</td>\n",
" <td>7800.0</td>\n",
" <td>30000</td>\n",
" <td>5000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>陈小刀</td>\n",
" <td>分析师</td>\n",
" <td>9800.0</td>\n",
" <td>10000</td>\n",
" <td>1200</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno dname dloc\n",
"0 胡一刀 销售员 3344.0 1800 200 30 销售部 重庆\n",
"1 乔峰 分析师 7800.0 5000 1500 20 研发部 成都\n",
"2 李莫愁 设计师 2056.0 3500 800 20 研发部 成都\n",
"3 张无忌 程序员 2056.0 3200 800 20 研发部 成都\n",
"4 丘处机 程序员 2056.0 3400 800 20 研发部 成都\n",
"5 欧阳锋 程序员 3088.0 3200 800 20 研发部 成都\n",
"6 张翠山 程序员 2056.0 4000 800 20 研发部 成都\n",
"7 黄蓉 销售主管 7800.0 3000 800 30 销售部 重庆\n",
"8 杨过 会计 5566.0 2200 840 10 会计部 北京\n",
"9 朱九真 会计 5566.0 2500 880 10 会计部 北京\n",
"10 苗人凤 销售员 3344.0 2500 920 30 销售部 重庆\n",
"11 郭靖 出纳 5566.0 2000 960 10 会计部 北京\n",
"12 宋远桥 会计师 7800.0 4000 1000 10 会计部 北京\n",
"13 张三丰 总裁 NaN 9000 1200 20 研发部 成都\n",
"14 张三丰 总裁 NaN 50000 8000 20 研发部 成都\n",
"15 王大锤 程序员 9800.0 8000 600 20 研发部 成都\n",
"16 张三丰 总裁 NaN 60000 6000 20 研发部 成都\n",
"17 骆昊 架构师 7800.0 30000 5000 20 研发部 成都\n",
"18 陈小刀 分析师 9800.0 10000 1200 20 研发部 成都"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_emp_df['comm'] = all_emp_df.comm.astype('i8')\n",
"all_emp_df"
]
},
{
"cell_type": "code",
"execution_count": 74,
"id": "a739242d-ebd2-42d2-9ec7-9a5939cbf74a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>胡一刀</td>\n",
" <td>销售员</td>\n",
" <td>3344</td>\n",
" <td>1800</td>\n",
" <td>200</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>乔峰</td>\n",
" <td>分析师</td>\n",
" <td>7800</td>\n",
" <td>5000</td>\n",
" <td>1500</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>李莫愁</td>\n",
" <td>设计师</td>\n",
" <td>2056</td>\n",
" <td>3500</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>张无忌</td>\n",
" <td>程序员</td>\n",
" <td>2056</td>\n",
" <td>3200</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>丘处机</td>\n",
" <td>程序员</td>\n",
" <td>2056</td>\n",
" <td>3400</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>欧阳锋</td>\n",
" <td>程序员</td>\n",
" <td>3088</td>\n",
" <td>3200</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>张翠山</td>\n",
" <td>程序员</td>\n",
" <td>2056</td>\n",
" <td>4000</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>黄蓉</td>\n",
" <td>销售主管</td>\n",
" <td>7800</td>\n",
" <td>3000</td>\n",
" <td>800</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>杨过</td>\n",
" <td>会计</td>\n",
" <td>5566</td>\n",
" <td>2200</td>\n",
" <td>840</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>朱九真</td>\n",
" <td>会计</td>\n",
" <td>5566</td>\n",
" <td>2500</td>\n",
" <td>880</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>苗人凤</td>\n",
" <td>销售员</td>\n",
" <td>3344</td>\n",
" <td>2500</td>\n",
" <td>920</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>郭靖</td>\n",
" <td>出纳</td>\n",
" <td>5566</td>\n",
" <td>2000</td>\n",
" <td>960</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>宋远桥</td>\n",
" <td>会计师</td>\n",
" <td>7800</td>\n",
" <td>4000</td>\n",
" <td>1000</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>-1</td>\n",
" <td>9000</td>\n",
" <td>1200</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>-1</td>\n",
" <td>50000</td>\n",
" <td>8000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>王大锤</td>\n",
" <td>程序员</td>\n",
" <td>9800</td>\n",
" <td>8000</td>\n",
" <td>600</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>-1</td>\n",
" <td>60000</td>\n",
" <td>6000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>骆昊</td>\n",
" <td>架构师</td>\n",
" <td>7800</td>\n",
" <td>30000</td>\n",
" <td>5000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>陈小刀</td>\n",
" <td>分析师</td>\n",
" <td>9800</td>\n",
" <td>10000</td>\n",
" <td>1200</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno dname dloc\n",
"0 胡一刀 销售员 3344 1800 200 30 销售部 重庆\n",
"1 乔峰 分析师 7800 5000 1500 20 研发部 成都\n",
"2 李莫愁 设计师 2056 3500 800 20 研发部 成都\n",
"3 张无忌 程序员 2056 3200 800 20 研发部 成都\n",
"4 丘处机 程序员 2056 3400 800 20 研发部 成都\n",
"5 欧阳锋 程序员 3088 3200 800 20 研发部 成都\n",
"6 张翠山 程序员 2056 4000 800 20 研发部 成都\n",
"7 黄蓉 销售主管 7800 3000 800 30 销售部 重庆\n",
"8 杨过 会计 5566 2200 840 10 会计部 北京\n",
"9 朱九真 会计 5566 2500 880 10 会计部 北京\n",
"10 苗人凤 销售员 3344 2500 920 30 销售部 重庆\n",
"11 郭靖 出纳 5566 2000 960 10 会计部 北京\n",
"12 宋远桥 会计师 7800 4000 1000 10 会计部 北京\n",
"13 张三丰 总裁 -1 9000 1200 20 研发部 成都\n",
"14 张三丰 总裁 -1 50000 8000 20 研发部 成都\n",
"15 王大锤 程序员 9800 8000 600 20 研发部 成都\n",
"16 张三丰 总裁 -1 60000 6000 20 研发部 成都\n",
"17 骆昊 架构师 7800 30000 5000 20 研发部 成都\n",
"18 陈小刀 分析师 9800 10000 1200 20 研发部 成都"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_emp_df['mgr'] = all_emp_df.mgr.fillna(-1).astype('i8')\n",
"all_emp_df"
]
},
{
"cell_type": "code",
"execution_count": 75,
"id": "cd376d13-2245-48b8-ba14-3315d4c48f9c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"10 False\n",
"11 False\n",
"12 False\n",
"13 False\n",
"14 True\n",
"15 False\n",
"16 True\n",
"17 False\n",
"18 False\n",
"Name: ename, dtype: bool"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 甄别重复值\n",
"all_emp_df.ename.duplicated()"
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "6e107a38-c5e8-4e5e-9e42-71481c54e0d1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"10 False\n",
"11 False\n",
"12 False\n",
"13 False\n",
"14 True\n",
"15 False\n",
"16 True\n",
"17 False\n",
"18 False\n",
"dtype: bool"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_emp_df.duplicated(['ename', 'job'])"
]
},
{
"cell_type": "code",
"execution_count": 77,
"id": "097eaaf2-1112-4e0f-b361-786bf91d6c1f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ename\n",
"张三丰 3\n",
"胡一刀 1\n",
"朱九真 1\n",
"骆昊 1\n",
"王大锤 1\n",
"宋远桥 1\n",
"郭靖 1\n",
"苗人凤 1\n",
"杨过 1\n",
"乔峰 1\n",
"黄蓉 1\n",
"张翠山 1\n",
"欧阳锋 1\n",
"丘处机 1\n",
"张无忌 1\n",
"李莫愁 1\n",
"陈小刀 1\n",
"Name: count, dtype: int64"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 统计每个元素出现的频次\n",
"all_emp_df.ename.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 78,
"id": "6494bb56-7ac7-47df-a9f1-960b02586e31",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"job\n",
"程序员 5\n",
"总裁 3\n",
"销售员 2\n",
"分析师 2\n",
"会计 2\n",
"设计师 1\n",
"销售主管 1\n",
"出纳 1\n",
"会计师 1\n",
"架构师 1\n",
"Name: count, dtype: int64"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_emp_df.job.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "172e4d9a-63bd-44ca-98ea-e4614c8823ab",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"17"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 统计不重复的元素的个数\n",
"all_emp_df.ename.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 80,
"id": "d6fa062c-d338-407f-8647-e84878a5642e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ename</th>\n",
" <th>job</th>\n",
" <th>mgr</th>\n",
" <th>sal</th>\n",
" <th>comm</th>\n",
" <th>dno</th>\n",
" <th>dname</th>\n",
" <th>dloc</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>胡一刀</td>\n",
" <td>销售员</td>\n",
" <td>3344</td>\n",
" <td>1800</td>\n",
" <td>200</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>乔峰</td>\n",
" <td>分析师</td>\n",
" <td>7800</td>\n",
" <td>5000</td>\n",
" <td>1500</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>李莫愁</td>\n",
" <td>设计师</td>\n",
" <td>2056</td>\n",
" <td>3500</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>张无忌</td>\n",
" <td>程序员</td>\n",
" <td>2056</td>\n",
" <td>3200</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>丘处机</td>\n",
" <td>程序员</td>\n",
" <td>2056</td>\n",
" <td>3400</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>欧阳锋</td>\n",
" <td>程序员</td>\n",
" <td>3088</td>\n",
" <td>3200</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>张翠山</td>\n",
" <td>程序员</td>\n",
" <td>2056</td>\n",
" <td>4000</td>\n",
" <td>800</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>黄蓉</td>\n",
" <td>销售主管</td>\n",
" <td>7800</td>\n",
" <td>3000</td>\n",
" <td>800</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>杨过</td>\n",
" <td>会计</td>\n",
" <td>5566</td>\n",
" <td>2200</td>\n",
" <td>840</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>朱九真</td>\n",
" <td>会计</td>\n",
" <td>5566</td>\n",
" <td>2500</td>\n",
" <td>880</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>苗人凤</td>\n",
" <td>销售员</td>\n",
" <td>3344</td>\n",
" <td>2500</td>\n",
" <td>920</td>\n",
" <td>30</td>\n",
" <td>销售部</td>\n",
" <td>重庆</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>郭靖</td>\n",
" <td>出纳</td>\n",
" <td>5566</td>\n",
" <td>2000</td>\n",
" <td>960</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>宋远桥</td>\n",
" <td>会计师</td>\n",
" <td>7800</td>\n",
" <td>4000</td>\n",
" <td>1000</td>\n",
" <td>10</td>\n",
" <td>会计部</td>\n",
" <td>北京</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>王大锤</td>\n",
" <td>程序员</td>\n",
" <td>9800</td>\n",
" <td>8000</td>\n",
" <td>600</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>张三丰</td>\n",
" <td>总裁</td>\n",
" <td>-1</td>\n",
" <td>60000</td>\n",
" <td>6000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>骆昊</td>\n",
" <td>架构师</td>\n",
" <td>7800</td>\n",
" <td>30000</td>\n",
" <td>5000</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>陈小刀</td>\n",
" <td>分析师</td>\n",
" <td>9800</td>\n",
" <td>10000</td>\n",
" <td>1200</td>\n",
" <td>20</td>\n",
" <td>研发部</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ename job mgr sal comm dno dname dloc\n",
"0 胡一刀 销售员 3344 1800 200 30 销售部 重庆\n",
"1 乔峰 分析师 7800 5000 1500 20 研发部 成都\n",
"2 李莫愁 设计师 2056 3500 800 20 研发部 成都\n",
"3 张无忌 程序员 2056 3200 800 20 研发部 成都\n",
"4 丘处机 程序员 2056 3400 800 20 研发部 成都\n",
"5 欧阳锋 程序员 3088 3200 800 20 研发部 成都\n",
"6 张翠山 程序员 2056 4000 800 20 研发部 成都\n",
"7 黄蓉 销售主管 7800 3000 800 30 销售部 重庆\n",
"8 杨过 会计 5566 2200 840 10 会计部 北京\n",
"9 朱九真 会计 5566 2500 880 10 会计部 北京\n",
"10 苗人凤 销售员 3344 2500 920 30 销售部 重庆\n",
"11 郭靖 出纳 5566 2000 960 10 会计部 北京\n",
"12 宋远桥 会计师 7800 4000 1000 10 会计部 北京\n",
"15 王大锤 程序员 9800 8000 600 20 研发部 成都\n",
"16 张三丰 总裁 -1 60000 6000 20 研发部 成都\n",
"17 骆昊 架构师 7800 30000 5000 20 研发部 成都\n",
"18 陈小刀 分析师 9800 10000 1200 20 研发部 成都"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 删除重复值\n",
"# keep='first' - 默认值,重复元素保留第一项 - 'last' / False\n",
"all_emp_df.drop_duplicates(['ename', 'job'], keep='last', inplace=True)\n",
"all_emp_df"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "832a2ea2-6941-4364-b143-af7db9ff9701",
"metadata": {},
"outputs": [],
"source": [
"# 异常值的甄别\n",
"# 数值判定法data < Q1 - 1.5 * IQR 或者 data > Q3 + 1.5 * IQR\n",
"\n",
"\n",
"def find_outliers_by_iqr(data, whis=1.5):\n",
" q1, q3 = np.quantile(data, [0.25, 0.75])\n",
" iqr = q3 - q1\n",
" return data[(data < q1 - whis * iqr) | (data > q3 + whis * iqr)]"
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "1cd5d6aa-c60e-483e-995c-a627a0dfec15",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 83., 81., 89., 89., 76., 79., 78., 76., 79., 74., 89.,\n",
" 61., 90., 74., 68., 81., 81., 93., 69., 81., 76., 87.,\n",
" 80., 90., 72., 89., 72., 71., 93., 75., 75., 73., 85.,\n",
" 91., 96., 82., 74., 80., 72., 83., 72., 64., 83., 79.,\n",
" 78., 68., 68., 70., 68., 84., 120., 160., 200., 40., 20.,\n",
" -50.])"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp = np.random.normal(80, 8, 50).round(0)\n",
"temp = np.append(temp, [120, 160, 200, 40, 20, -50])\n",
"temp"
]
},
{
"cell_type": "code",
"execution_count": 83,
"id": "2121dab4-0efc-4fcd-a5fe-67585552cb53",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([120., 160., 200., 40., 20., -50.])"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"find_outliers_by_iqr(temp)"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "da048825-3f88-4009-9db5-159e8e883b10",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([160., 200., 20., -50.])"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"find_outliers_by_iqr(temp, whis=3)"
]
},
{
"cell_type": "code",
"execution_count": 85,
"id": "0da7034b-2350-43ff-a6eb-9e7f4361bdee",
"metadata": {},
"outputs": [],
"source": [
"# zscore判定法三西格玛法则 ---> 68-95-99.7法则)\n",
"\n",
"\n",
"def find_outliers_by_zscore(data, mul=3):\n",
" mu, sigma = np.mean(data), np.std(data)\n",
" zscore = (data - mu) / sigma\n",
" return data[np.abs(zscore) > mul]"
]
},
{
"cell_type": "code",
"execution_count": 86,
"id": "e88616c0-a4d8-4fd8-9ec2-e761cb5ba056",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([200., -50.])"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"find_outliers_by_zscore(temp)"
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "c902031c-2f78-4721-9734-5c5b0ca81650",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([160., 200., 20., -50.])"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"find_outliers_by_zscore(temp, mul=2)"
]
},
{
"cell_type": "code",
"execution_count": 88,
"id": "1e295014-d582-4e78-b5b9-6d9f0463ff8d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"订单号\n",
"G69924 23688\n",
"G70509 31935\n",
"G72204 26758\n",
"G70509 31594\n",
"G72186 30583\n",
"G70509 52302\n",
"G69631 32125\n",
"543369-010 29843\n",
"543367-077 31889\n",
"G69627 31028\n",
"G69645 23947\n",
"G72201 40327\n",
"G69631 26534\n",
"543367-077 25674\n",
"G71332 35120\n",
"588705-010 25502\n",
"543367-077 31375\n",
"G68188 29819\n",
"577714-010 40884\n",
"FT001-18-1763 25835\n",
"G72186 24770\n",
"G71330 29795\n",
"577714-010 27244\n",
"FT007-18-1763 25454\n",
"G69627 24634\n",
"G69627 23537\n",
"G72204 31613\n",
"543367-077 45442\n",
"G85411 22861\n",
"G68188 34290\n",
"AYMH063-1 29307\n",
"G69645 37782\n",
"Name: 直接成本, dtype: int64"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"find_outliers_by_zscore(df6.直接成本)"
]
},
{
"cell_type": "code",
"execution_count": 89,
"id": "97b98c82-fd09-42a9-8a75-a3e71ae10fbc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>销售日期</th>\n",
" <th>区域</th>\n",
" <th>渠道</th>\n",
" <th>品牌</th>\n",
" <th>售价</th>\n",
" <th>销售数量</th>\n",
" <th>直接成本</th>\n",
" <th>销售额</th>\n",
" <th>月份</th>\n",
" </tr>\n",
" <tr>\n",
" <th>订单号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>205654-519</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>八匹马</td>\n",
" <td>169</td>\n",
" <td>14</td>\n",
" <td>485</td>\n",
" <td>2366</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>377781-010</th>\n",
" <td>2020-01-01</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>皮皮虾</td>\n",
" <td>249</td>\n",
" <td>61</td>\n",
" <td>2452</td>\n",
" <td>15189</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>588685-002</th>\n",
" <td>2020-01-02</td>\n",
" <td>上海</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>299</td>\n",
" <td>91</td>\n",
" <td>8008</td>\n",
" <td>27209</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AKLH641-1</th>\n",
" <td>2020-01-03</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>壁虎</td>\n",
" <td>239</td>\n",
" <td>82</td>\n",
" <td>4127</td>\n",
" <td>19598</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AKLJ013-4</th>\n",
" <td>2020-01-03</td>\n",
" <td>上海</td>\n",
" <td>天猫</td>\n",
" <td>壁虎</td>\n",
" <td>219</td>\n",
" <td>57</td>\n",
" <td>2315</td>\n",
" <td>12483</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>588682-010</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>拼多多</td>\n",
" <td>皮皮虾</td>\n",
" <td>269</td>\n",
" <td>50</td>\n",
" <td>4388</td>\n",
" <td>13450</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>599007-513</th>\n",
" <td>2020-12-29</td>\n",
" <td>北京</td>\n",
" <td>天猫</td>\n",
" <td>皮皮虾</td>\n",
" <td>349</td>\n",
" <td>18</td>\n",
" <td>2466</td>\n",
" <td>6282</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>D89677</th>\n",
" <td>2020-12-30</td>\n",
" <td>北京</td>\n",
" <td>京东</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>26</td>\n",
" <td>1560</td>\n",
" <td>6994</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>182719-050</th>\n",
" <td>2020-12-30</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>八匹马</td>\n",
" <td>79</td>\n",
" <td>97</td>\n",
" <td>3028</td>\n",
" <td>7663</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>G70083</th>\n",
" <td>2020-12-31</td>\n",
" <td>福建</td>\n",
" <td>实体</td>\n",
" <td>花花姑娘</td>\n",
" <td>269</td>\n",
" <td>55</td>\n",
" <td>2277</td>\n",
" <td>14795</td>\n",
" <td>12</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1711 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" 销售日期 区域 渠道 品牌 售价 销售数量 直接成本 销售额 月份\n",
"订单号 \n",
"205654-519 2020-01-01 上海 天猫 八匹马 169 14 485 2366 1\n",
"377781-010 2020-01-01 上海 天猫 皮皮虾 249 61 2452 15189 1\n",
"588685-002 2020-01-02 上海 拼多多 皮皮虾 299 91 8008 27209 1\n",
"AKLH641-1 2020-01-03 上海 天猫 壁虎 239 82 4127 19598 1\n",
"AKLJ013-4 2020-01-03 上海 天猫 壁虎 219 57 2315 12483 1\n",
"... ... .. ... ... ... ... ... ... ..\n",
"588682-010 2020-12-29 北京 拼多多 皮皮虾 269 50 4388 13450 12\n",
"599007-513 2020-12-29 北京 天猫 皮皮虾 349 18 2466 6282 12\n",
"D89677 2020-12-30 北京 京东 花花姑娘 269 26 1560 6994 12\n",
"182719-050 2020-12-30 福建 实体 八匹马 79 97 3028 7663 12\n",
"G70083 2020-12-31 福建 实体 花花姑娘 269 55 2277 14795 12\n",
"\n",
"[1711 rows x 9 columns]"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 根据离群点的行索引删除行\n",
"df6.drop(index=find_outliers_by_zscore(df6.直接成本).index)"
]
},
{
"cell_type": "code",
"execution_count": 90,
"id": "0053ed12-c09f-4331-a6dd-487ff990c680",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"79.0"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"med_value = np.median(temp)\n",
"med_value"
]
},
{
"cell_type": "code",
"execution_count": 91,
"id": "f02c2985-1b07-4b1c-b248-aa1de9e98451",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([160., 200., 20., -50.])"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"find_outliers_by_zscore(temp, mul=2)"
]
},
{
"cell_type": "code",
"execution_count": 92,
"id": "485adc15-f39d-419b-9869-2b366f5d88ec",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([False, False, False, False, False, False, False, False, False,\n",
" False, False, False, False, False, False, False, False, False,\n",
" False, False, False, False, False, False, False, False, False,\n",
" False, False, False, False, False, False, False, False, False,\n",
" False, False, False, False, False, False, False, False, False,\n",
" False, False, False, False, False, False, True, True, False,\n",
" True, True])"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.in1d(temp, find_outliers_by_zscore(temp, mul=2))"
]
},
{
"cell_type": "code",
"execution_count": 93,
"id": "ce92f242-1f0f-476e-ae85-91e1615783ef",
"metadata": {},
"outputs": [],
"source": [
"# 替换离群点\n",
"np.place(temp, np.in1d(temp, find_outliers_by_zscore(temp, mul=2)), med_value)"
]
},
{
"cell_type": "code",
"execution_count": 94,
"id": "10b0b0bc-f98c-40fe-890f-976df9d9c52b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 83., 81., 89., 89., 76., 79., 78., 76., 79., 74., 89.,\n",
" 61., 90., 74., 68., 81., 81., 93., 69., 81., 76., 87.,\n",
" 80., 90., 72., 89., 72., 71., 93., 75., 75., 73., 85.,\n",
" 91., 96., 82., 74., 80., 72., 83., 72., 64., 83., 79.,\n",
" 78., 68., 68., 70., 68., 84., 120., 79., 79., 40., 79.,\n",
" 79.])"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"temp"
]
},
{
"cell_type": "markdown",
"id": "d970e838-42f2-44d0-8f2d-07ebbf6de2b0",
"metadata": {},
"source": [
"#### 案例1招聘数据清洗和预处理\n",
"\n",
"1. 数据加载\n",
"2. 去重\n",
"3. 数据抽取\n",
"4. 拆分列\n",
"5. 替换值\n",
"6. 数据筛选"
]
},
{
"cell_type": "code",
"execution_count": 95,
"id": "1ec417a9-457f-434e-96a6-f4fd35d75987",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>company_name</th>\n",
" <th>uri</th>\n",
" <th>salary</th>\n",
" <th>site</th>\n",
" <th>year</th>\n",
" <th>edu</th>\n",
" <th>job_name</th>\n",
" <th>city</th>\n",
" <th>pos_count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>软通动力集团</td>\n",
" <td>https://www.zhipin.com/job_detail/7ece55fbcbf7...</td>\n",
" <td>10-15K</td>\n",
" <td>成都 武侯区 草金立交</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>python开发</td>\n",
" <td>chengdu</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>思湃德</td>\n",
" <td>https://www.zhipin.com/job_detail/760b2b05535c...</td>\n",
" <td>20-40K</td>\n",
" <td>成都 双流区 华阳</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>chengdu</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>源码时代</td>\n",
" <td>https://www.zhipin.com/job_detail/9575f02d9a9f...</td>\n",
" <td>15-20K</td>\n",
" <td>成都 武侯区 石羊</td>\n",
" <td>3-5年</td>\n",
" <td>大专</td>\n",
" <td>python 讲师</td>\n",
" <td>chengdu</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>三源合众</td>\n",
" <td>https://www.zhipin.com/job_detail/912b6da8b12f...</td>\n",
" <td>6-10K</td>\n",
" <td>成都 武侯区 新会展</td>\n",
" <td>1年以内</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>chengdu</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>软通动力</td>\n",
" <td>https://www.zhipin.com/job_detail/c61ef9b261da...</td>\n",
" <td>8-13K</td>\n",
" <td>成都 武侯区 机投</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>python开发</td>\n",
" <td>chengdu</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>中佳业</td>\n",
" <td>https://www.zhipin.com/job_detail/4fbb387a96ff...</td>\n",
" <td>7-8K·13薪</td>\n",
" <td>成都 武侯区 肖家河</td>\n",
" <td>3-5年</td>\n",
" <td>大专</td>\n",
" <td>C++/Python</td>\n",
" <td>chengdu</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>知行天下</td>\n",
" <td>https://www.zhipin.com/job_detail/17fe77fdd3b1...</td>\n",
" <td>7-12K</td>\n",
" <td>成都 龙泉驿区 龙泉</td>\n",
" <td>1年以内</td>\n",
" <td>大专</td>\n",
" <td>Python讲师</td>\n",
" <td>chengdu</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>川大智胜</td>\n",
" <td>https://www.zhipin.com/job_detail/77186b69e915...</td>\n",
" <td>8-13K</td>\n",
" <td>成都 武侯区 保利花园</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>chengdu</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>电科荷福研究院</td>\n",
" <td>https://www.zhipin.com/job_detail/586d4207f3d7...</td>\n",
" <td>7-12K</td>\n",
" <td>成都 郫都区 高新西</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>python后端开发</td>\n",
" <td>chengdu</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>傲梦网络科技</td>\n",
" <td>https://www.zhipin.com/job_detail/dafa272932d0...</td>\n",
" <td>3-8K</td>\n",
" <td>成都 武侯区 高升桥</td>\n",
" <td>经验不限</td>\n",
" <td>本科</td>\n",
" <td>python线上试听课老师</td>\n",
" <td>chengdu</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" company_name uri salary \\\n",
"0 软通动力集团 https://www.zhipin.com/job_detail/7ece55fbcbf7... 10-15K \n",
"1 思湃德 https://www.zhipin.com/job_detail/760b2b05535c... 20-40K \n",
"2 源码时代 https://www.zhipin.com/job_detail/9575f02d9a9f... 15-20K \n",
"3 三源合众 https://www.zhipin.com/job_detail/912b6da8b12f... 6-10K \n",
"4 软通动力 https://www.zhipin.com/job_detail/c61ef9b261da... 8-13K \n",
"5 中佳业 https://www.zhipin.com/job_detail/4fbb387a96ff... 7-8K·13薪 \n",
"6 知行天下 https://www.zhipin.com/job_detail/17fe77fdd3b1... 7-12K \n",
"7 川大智胜 https://www.zhipin.com/job_detail/77186b69e915... 8-13K \n",
"8 电科荷福研究院 https://www.zhipin.com/job_detail/586d4207f3d7... 7-12K \n",
"9 傲梦网络科技 https://www.zhipin.com/job_detail/dafa272932d0... 3-8K \n",
"\n",
" site year edu job_name city pos_count \n",
"0 成都 武侯区 草金立交 1-3年 本科 python开发 chengdu 2 \n",
"1 成都 双流区 华阳 3-5年 本科 Python chengdu 5 \n",
"2 成都 武侯区 石羊 3-5年 大专 python 讲师 chengdu 3 \n",
"3 成都 武侯区 新会展 1年以内 本科 Python chengdu 1 \n",
"4 成都 武侯区 机投 1-3年 本科 python开发 chengdu 3 \n",
"5 成都 武侯区 肖家河 3-5年 大专 C++/Python chengdu 4 \n",
"6 成都 龙泉驿区 龙泉 1年以内 大专 Python讲师 chengdu 1 \n",
"7 成都 武侯区 保利花园 3-5年 本科 Python chengdu 1 \n",
"8 成都 郫都区 高新西 3-5年 本科 python后端开发 chengdu 6 \n",
"9 成都 武侯区 高升桥 经验不限 本科 python线上试听课老师 chengdu 6 "
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"jobs_df = pd.read_csv('res/all_jobs.csv')\n",
"jobs_df.head(10)"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "74e0e4a5-3c03-4617-9661-8cfa03b88fd7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(9777, 9)"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 根据URI列去重\n",
"jobs_df.drop_duplicates('uri', inplace=True)\n",
"jobs_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 97,
"id": "6cca7b8b-25f1-46b8-9946-34ba90f42116",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>company_name</th>\n",
" <th>uri</th>\n",
" <th>salary</th>\n",
" <th>site</th>\n",
" <th>year</th>\n",
" <th>edu</th>\n",
" <th>job_name</th>\n",
" <th>city</th>\n",
" <th>pos_count</th>\n",
" <th>salary_lower</th>\n",
" <th>salary_upper</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>软通动力集团</td>\n",
" <td>https://www.zhipin.com/job_detail/7ece55fbcbf7...</td>\n",
" <td>12.5</td>\n",
" <td>成都 武侯区 草金立交</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>python开发</td>\n",
" <td>chengdu</td>\n",
" <td>2</td>\n",
" <td>10</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>思湃德</td>\n",
" <td>https://www.zhipin.com/job_detail/760b2b05535c...</td>\n",
" <td>30.0</td>\n",
" <td>成都 双流区 华阳</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>chengdu</td>\n",
" <td>5</td>\n",
" <td>20</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>源码时代</td>\n",
" <td>https://www.zhipin.com/job_detail/9575f02d9a9f...</td>\n",
" <td>17.5</td>\n",
" <td>成都 武侯区 石羊</td>\n",
" <td>3-5年</td>\n",
" <td>大专</td>\n",
" <td>python 讲师</td>\n",
" <td>chengdu</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>三源合众</td>\n",
" <td>https://www.zhipin.com/job_detail/912b6da8b12f...</td>\n",
" <td>8.0</td>\n",
" <td>成都 武侯区 新会展</td>\n",
" <td>1年以内</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>chengdu</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>软通动力</td>\n",
" <td>https://www.zhipin.com/job_detail/c61ef9b261da...</td>\n",
" <td>10.5</td>\n",
" <td>成都 武侯区 机投</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>python开发</td>\n",
" <td>chengdu</td>\n",
" <td>3</td>\n",
" <td>8</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9820</th>\n",
" <td>公众智能</td>\n",
" <td>https://www.zhipin.com/job_detail/7b9c08dbce81...</td>\n",
" <td>9.0</td>\n",
" <td>西安</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>xian</td>\n",
" <td>2</td>\n",
" <td>8</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9821</th>\n",
" <td>微感</td>\n",
" <td>https://www.zhipin.com/job_detail/c7e99005528f...</td>\n",
" <td>9.0</td>\n",
" <td>西安 雁塔区 紫薇田园都市</td>\n",
" <td>3-5年</td>\n",
" <td>大专</td>\n",
" <td>产品经理</td>\n",
" <td>xian</td>\n",
" <td>4</td>\n",
" <td>8</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9822</th>\n",
" <td>巴斯光年</td>\n",
" <td>https://www.zhipin.com/job_detail/1045fe64f248...</td>\n",
" <td>15.0</td>\n",
" <td>西安 雁塔区 大雁塔</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>xian</td>\n",
" <td>6</td>\n",
" <td>10</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9823</th>\n",
" <td>西大华特科技</td>\n",
" <td>https://www.zhipin.com/job_detail/e3c21cc748e7...</td>\n",
" <td>6.5</td>\n",
" <td>西安 雁塔区 唐延路</td>\n",
" <td>1-3年</td>\n",
" <td>硕士</td>\n",
" <td>产品经理(农药)</td>\n",
" <td>xian</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9824</th>\n",
" <td>西安纯粹科技</td>\n",
" <td>https://www.zhipin.com/job_detail/09965129db3e...</td>\n",
" <td>4.5</td>\n",
" <td>西安 雁塔区 玫瑰大楼</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>xian</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>9777 rows × 11 columns</p>\n",
"</div>"
],
"text/plain": [
" company_name uri salary \\\n",
"0 软通动力集团 https://www.zhipin.com/job_detail/7ece55fbcbf7... 12.5 \n",
"1 思湃德 https://www.zhipin.com/job_detail/760b2b05535c... 30.0 \n",
"2 源码时代 https://www.zhipin.com/job_detail/9575f02d9a9f... 17.5 \n",
"3 三源合众 https://www.zhipin.com/job_detail/912b6da8b12f... 8.0 \n",
"4 软通动力 https://www.zhipin.com/job_detail/c61ef9b261da... 10.5 \n",
"... ... ... ... \n",
"9820 公众智能 https://www.zhipin.com/job_detail/7b9c08dbce81... 9.0 \n",
"9821 微感 https://www.zhipin.com/job_detail/c7e99005528f... 9.0 \n",
"9822 巴斯光年 https://www.zhipin.com/job_detail/1045fe64f248... 15.0 \n",
"9823 西大华特科技 https://www.zhipin.com/job_detail/e3c21cc748e7... 6.5 \n",
"9824 西安纯粹科技 https://www.zhipin.com/job_detail/09965129db3e... 4.5 \n",
"\n",
" site year edu job_name city pos_count salary_lower \\\n",
"0 成都 武侯区 草金立交 1-3年 本科 python开发 chengdu 2 10 \n",
"1 成都 双流区 华阳 3-5年 本科 Python chengdu 5 20 \n",
"2 成都 武侯区 石羊 3-5年 大专 python 讲师 chengdu 3 15 \n",
"3 成都 武侯区 新会展 1年以内 本科 Python chengdu 1 6 \n",
"4 成都 武侯区 机投 1-3年 本科 python开发 chengdu 3 8 \n",
"... ... ... .. ... ... ... ... \n",
"9820 西安 3-5年 本科 产品经理 xian 2 8 \n",
"9821 西安 雁塔区 紫薇田园都市 3-5年 大专 产品经理 xian 4 8 \n",
"9822 西安 雁塔区 大雁塔 3-5年 本科 产品经理 xian 6 10 \n",
"9823 西安 雁塔区 唐延路 1-3年 硕士 产品经理(农药) xian 6 5 \n",
"9824 西安 雁塔区 玫瑰大楼 1-3年 本科 产品经理 xian 5 3 \n",
"\n",
" salary_upper \n",
"0 15 \n",
"1 40 \n",
"2 20 \n",
"3 10 \n",
"4 13 \n",
"... ... \n",
"9820 10 \n",
"9821 10 \n",
"9822 20 \n",
"9823 8 \n",
"9824 6 \n",
"\n",
"[9777 rows x 11 columns]"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 通过正则表达式从列中提取信息\n",
"jobs_df[['salary_lower', 'salary_upper']] = jobs_df.salary.str.extract(r'(\\d+)-(\\d+)').astype('i8')\n",
"jobs_df['salary'] = (jobs_df.salary_lower + jobs_df.salary_upper) / 2\n",
"jobs_df"
]
},
{
"cell_type": "code",
"execution_count": 98,
"id": "ffaea2af-09f6-4577-9c0d-024966d6854f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>company_name</th>\n",
" <th>salary</th>\n",
" <th>site</th>\n",
" <th>year</th>\n",
" <th>edu</th>\n",
" <th>job_name</th>\n",
" <th>pos_count</th>\n",
" <th>salary_lower</th>\n",
" <th>salary_upper</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>软通动力集团</td>\n",
" <td>12.5</td>\n",
" <td>成都 武侯区 草金立交</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>python开发</td>\n",
" <td>2</td>\n",
" <td>10</td>\n",
" <td>15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>思湃德</td>\n",
" <td>30.0</td>\n",
" <td>成都 双流区 华阳</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>5</td>\n",
" <td>20</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>源码时代</td>\n",
" <td>17.5</td>\n",
" <td>成都 武侯区 石羊</td>\n",
" <td>3-5年</td>\n",
" <td>大专</td>\n",
" <td>python 讲师</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>三源合众</td>\n",
" <td>8.0</td>\n",
" <td>成都 武侯区 新会展</td>\n",
" <td>1年以内</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>软通动力</td>\n",
" <td>10.5</td>\n",
" <td>成都 武侯区 机投</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>python开发</td>\n",
" <td>3</td>\n",
" <td>8</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9820</th>\n",
" <td>公众智能</td>\n",
" <td>9.0</td>\n",
" <td>西安</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>2</td>\n",
" <td>8</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9821</th>\n",
" <td>微感</td>\n",
" <td>9.0</td>\n",
" <td>西安 雁塔区 紫薇田园都市</td>\n",
" <td>3-5年</td>\n",
" <td>大专</td>\n",
" <td>产品经理</td>\n",
" <td>4</td>\n",
" <td>8</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9822</th>\n",
" <td>巴斯光年</td>\n",
" <td>15.0</td>\n",
" <td>西安 雁塔区 大雁塔</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>6</td>\n",
" <td>10</td>\n",
" <td>20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9823</th>\n",
" <td>西大华特科技</td>\n",
" <td>6.5</td>\n",
" <td>西安 雁塔区 唐延路</td>\n",
" <td>1-3年</td>\n",
" <td>硕士</td>\n",
" <td>产品经理(农药)</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9824</th>\n",
" <td>西安纯粹科技</td>\n",
" <td>4.5</td>\n",
" <td>西安 雁塔区 玫瑰大楼</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>9777 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" company_name salary site year edu job_name pos_count \\\n",
"0 软通动力集团 12.5 成都 武侯区 草金立交 1-3年 本科 python开发 2 \n",
"1 思湃德 30.0 成都 双流区 华阳 3-5年 本科 Python 5 \n",
"2 源码时代 17.5 成都 武侯区 石羊 3-5年 大专 python 讲师 3 \n",
"3 三源合众 8.0 成都 武侯区 新会展 1年以内 本科 Python 1 \n",
"4 软通动力 10.5 成都 武侯区 机投 1-3年 本科 python开发 3 \n",
"... ... ... ... ... .. ... ... \n",
"9820 公众智能 9.0 西安 3-5年 本科 产品经理 2 \n",
"9821 微感 9.0 西安 雁塔区 紫薇田园都市 3-5年 大专 产品经理 4 \n",
"9822 巴斯光年 15.0 西安 雁塔区 大雁塔 3-5年 本科 产品经理 6 \n",
"9823 西大华特科技 6.5 西安 雁塔区 唐延路 1-3年 硕士 产品经理(农药) 6 \n",
"9824 西安纯粹科技 4.5 西安 雁塔区 玫瑰大楼 1-3年 本科 产品经理 5 \n",
"\n",
" salary_lower salary_upper \n",
"0 10 15 \n",
"1 20 40 \n",
"2 15 20 \n",
"3 6 10 \n",
"4 8 13 \n",
"... ... ... \n",
"9820 8 10 \n",
"9821 8 10 \n",
"9822 10 20 \n",
"9823 5 8 \n",
"9824 3 6 \n",
"\n",
"[9777 rows x 9 columns]"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"jobs_df.drop(columns=['uri', 'city'], inplace=True)\n",
"jobs_df"
]
},
{
"cell_type": "code",
"execution_count": 99,
"id": "d9ba5998-ca1d-44c8-87ca-363356074dd5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>company_name</th>\n",
" <th>salary</th>\n",
" <th>year</th>\n",
" <th>edu</th>\n",
" <th>job_name</th>\n",
" <th>pos_count</th>\n",
" <th>salary_lower</th>\n",
" <th>salary_upper</th>\n",
" <th>city</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>软通动力集团</td>\n",
" <td>12.5</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>python开发</td>\n",
" <td>2</td>\n",
" <td>10</td>\n",
" <td>15</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>思湃德</td>\n",
" <td>30.0</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>5</td>\n",
" <td>20</td>\n",
" <td>40</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>源码时代</td>\n",
" <td>17.5</td>\n",
" <td>3-5年</td>\n",
" <td>大专</td>\n",
" <td>python 讲师</td>\n",
" <td>3</td>\n",
" <td>15</td>\n",
" <td>20</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>三源合众</td>\n",
" <td>8.0</td>\n",
" <td>1年以内</td>\n",
" <td>本科</td>\n",
" <td>Python</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" <td>10</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>软通动力</td>\n",
" <td>10.5</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>python开发</td>\n",
" <td>3</td>\n",
" <td>8</td>\n",
" <td>13</td>\n",
" <td>成都</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9820</th>\n",
" <td>公众智能</td>\n",
" <td>9.0</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>2</td>\n",
" <td>8</td>\n",
" <td>10</td>\n",
" <td>西安</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9821</th>\n",
" <td>微感</td>\n",
" <td>9.0</td>\n",
" <td>3-5年</td>\n",
" <td>大专</td>\n",
" <td>产品经理</td>\n",
" <td>4</td>\n",
" <td>8</td>\n",
" <td>10</td>\n",
" <td>西安</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9822</th>\n",
" <td>巴斯光年</td>\n",
" <td>15.0</td>\n",
" <td>3-5年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>6</td>\n",
" <td>10</td>\n",
" <td>20</td>\n",
" <td>西安</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9823</th>\n",
" <td>西大华特科技</td>\n",
" <td>6.5</td>\n",
" <td>1-3年</td>\n",
" <td>硕士</td>\n",
" <td>产品经理(农药)</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" <td>8</td>\n",
" <td>西安</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9824</th>\n",
" <td>西安纯粹科技</td>\n",
" <td>4.5</td>\n",
" <td>1-3年</td>\n",
" <td>本科</td>\n",
" <td>产品经理</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" <td>西安</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>9777 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" company_name salary year edu job_name pos_count salary_lower \\\n",
"0 软通动力集团 12.5 1-3年 本科 python开发 2 10 \n",
"1 思湃德 30.0 3-5年 本科 Python 5 20 \n",
"2 源码时代 17.5 3-5年 大专 python 讲师 3 15 \n",
"3 三源合众 8.0 1年以内 本科 Python 1 6 \n",
"4 软通动力 10.5 1-3年 本科 python开发 3 8 \n",
"... ... ... ... .. ... ... ... \n",
"9820 公众智能 9.0 3-5年 本科 产品经理 2 8 \n",
"9821 微感 9.0 3-5年 大专 产品经理 4 8 \n",
"9822 巴斯光年 15.0 3-5年 本科 产品经理 6 10 \n",
"9823 西大华特科技 6.5 1-3年 硕士 产品经理(农药) 6 5 \n",
"9824 西安纯粹科技 4.5 1-3年 本科 产品经理 5 3 \n",
"\n",
" salary_upper city \n",
"0 15 成都 \n",
"1 40 成都 \n",
"2 20 成都 \n",
"3 10 成都 \n",
"4 13 成都 \n",
"... ... ... \n",
"9820 10 西安 \n",
"9821 10 西安 \n",
"9822 20 西安 \n",
"9823 8 西安 \n",
"9824 6 西安 \n",
"\n",
"[9777 rows x 9 columns]"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 拆分列\n",
"jobs_df['city'] = jobs_df.site.str.split(expand=True)[0]\n",
"jobs_df.drop(columns='site', inplace=True)\n",
"jobs_df"
]
},
{
"cell_type": "code",
"execution_count": 100,
"id": "933e9006-4f5e-4238-b6d9-940dfeb6caf1",
"metadata": {},
"outputs": [],
"source": [
"# 字符串正则表达式替换\n",
"jobs_df['year'] = jobs_df.year.replace(r'5-10年|10年以上', '5年以上', regex=True)"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "d10a9c1c-a9d5-49e1-8fdf-a68b5bb3d59a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['1-3年', '3-5年', '1年以内', '经验不限', '5年以上', '应届生'], dtype=object)"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"jobs_df.year.unique()"
]
},
{
"cell_type": "code",
"execution_count": 102,
"id": "d248e233-bac5-48d5-8a69-a1f04350867a",
"metadata": {},
"outputs": [],
"source": [
"jobs_df['edu'] = jobs_df.edu.replace(r'中专|高中', '学历不限', regex=True)\n",
"jobs_df['edu'] = jobs_df.edu.replace(r'硕士|博士', '研究生', regex=True)"
]
},
{
"cell_type": "code",
"execution_count": 103,
"id": "eec6fbd5-2355-4674-9e5d-7f47a5a808a2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['本科', '大专', '学历不限', '研究生'], dtype=object)"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"jobs_df.edu.unique()"
]
},
{
"cell_type": "code",
"execution_count": 104,
"id": "352b1921-aa2b-4016-af3e-02032b2a3935",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(6487, 9)"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"jobs_df['job_name'] = jobs_df.job_name.str.lower()\n",
"jobs_df = jobs_df[jobs_df.job_name.str.contains('python|数据|产品|运营|data', regex=True)]\n",
"jobs_df.shape"
]
},
{
"cell_type": "code",
"execution_count": 105,
"id": "df370013-1278-48d2-9891-8647df3c5e15",
"metadata": {},
"outputs": [],
"source": [
"jobs_df.to_csv('res/cleand_jobs.csv', index=False)"
]
},
{
"cell_type": "markdown",
"id": "8ee07676-737c-420e-b11a-235ff7f2c4c8",
"metadata": {},
"source": [
"#### 案例2北京积分落户数据预处理\n",
"\n",
"1. 加载数据\n",
"2. 日期时间处理\n",
"3. 年龄段分箱\n",
"4. 落户积分归一化"
]
},
{
"cell_type": "code",
"execution_count": 106,
"id": "1232d023-7591-47b3-b67b-4920642dd28d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>姓名</th>\n",
" <th>出生年月</th>\n",
" <th>单位名称</th>\n",
" <th>积分分值</th>\n",
" </tr>\n",
" <tr>\n",
" <th>公示编号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>202300001</th>\n",
" <td>张浩</td>\n",
" <td>1977-02</td>\n",
" <td>北京首钢股份有限公司</td>\n",
" <td>140.05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300002</th>\n",
" <td>冯云</td>\n",
" <td>1982-02</td>\n",
" <td>中国人民解放军空军二十三厂</td>\n",
" <td>134.29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300003</th>\n",
" <td>王天东</td>\n",
" <td>1975-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.63</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300004</th>\n",
" <td>陈军</td>\n",
" <td>1976-07</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.29</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300005</th>\n",
" <td>樊海瑞</td>\n",
" <td>1981-06</td>\n",
" <td>中国民生银行股份有限公司</td>\n",
" <td>132.46</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 姓名 出生年月 单位名称 积分分值\n",
"公示编号 \n",
"202300001 张浩 1977-02 北京首钢股份有限公司 140.05\n",
"202300002 冯云 1982-02 中国人民解放军空军二十三厂 134.29\n",
"202300003 王天东 1975-01 中建二局第三建筑工程有限公司 133.63\n",
"202300004 陈军 1976-07 中建二局第三建筑工程有限公司 133.29\n",
"202300005 樊海瑞 1981-06 中国民生银行股份有限公司 132.46"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"settle_df = pd.read_csv('res/2023年北京积分落户数据.csv', index_col='公示编号')\n",
"settle_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 107,
"id": "734eb268-3ad7-4e67-9661-08328075992b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 6003 entries, 202300001 to 202306003\n",
"Data columns (total 4 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 姓名 6003 non-null object \n",
" 1 出生年月 6003 non-null object \n",
" 2 单位名称 6003 non-null object \n",
" 3 积分分值 6003 non-null float64\n",
"dtypes: float64(1), object(3)\n",
"memory usage: 234.5+ KB\n"
]
}
],
"source": [
"settle_df.info()"
]
},
{
"cell_type": "code",
"execution_count": 108,
"id": "63698465-ddcd-430c-bd96-e78abaaebda3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 6003 entries, 202300001 to 202306003\n",
"Data columns (total 4 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 姓名 6003 non-null object \n",
" 1 出生年月 6003 non-null datetime64[ns]\n",
" 2 单位名称 6003 non-null object \n",
" 3 积分分值 6003 non-null float64 \n",
"dtypes: datetime64[ns](1), float64(1), object(2)\n",
"memory usage: 234.5+ KB\n"
]
}
],
"source": [
"# 将字符串处理成日期\n",
"settle_df['出生年月'] = pd.to_datetime(settle_df['出生年月'])\n",
"settle_df.info()"
]
},
{
"cell_type": "code",
"execution_count": 109,
"id": "989c56c7-85fa-4180-9b86-5247a41cdbab",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>姓名</th>\n",
" <th>出生年月</th>\n",
" <th>单位名称</th>\n",
" <th>积分分值</th>\n",
" <th>年龄</th>\n",
" </tr>\n",
" <tr>\n",
" <th>公示编号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>202300001</th>\n",
" <td>张浩</td>\n",
" <td>1977-02-01</td>\n",
" <td>北京首钢股份有限公司</td>\n",
" <td>140.05</td>\n",
" <td>45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300002</th>\n",
" <td>冯云</td>\n",
" <td>1982-02-01</td>\n",
" <td>中国人民解放军空军二十三厂</td>\n",
" <td>134.29</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300003</th>\n",
" <td>王天东</td>\n",
" <td>1975-01-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.63</td>\n",
" <td>48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300004</th>\n",
" <td>陈军</td>\n",
" <td>1976-07-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.29</td>\n",
" <td>46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300005</th>\n",
" <td>樊海瑞</td>\n",
" <td>1981-06-01</td>\n",
" <td>中国民生银行股份有限公司</td>\n",
" <td>132.46</td>\n",
" <td>41</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 姓名 出生年月 单位名称 积分分值 年龄\n",
"公示编号 \n",
"202300001 张浩 1977-02-01 北京首钢股份有限公司 140.05 45\n",
"202300002 冯云 1982-02-01 中国人民解放军空军二十三厂 134.29 40\n",
"202300003 王天东 1975-01-01 中建二局第三建筑工程有限公司 133.63 48\n",
"202300004 陈军 1976-07-01 中建二局第三建筑工程有限公司 133.29 46\n",
"202300005 樊海瑞 1981-06-01 中国民生银行股份有限公司 132.46 41"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 将生日换算成年龄\n",
"settle_df['年龄'] = (pd.to_datetime('2023-01-01') - settle_df.出生年月).dt.days // 365\n",
"settle_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 110,
"id": "4191c7a2-19fd-4347-ac79-2371c8e59c10",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>姓名</th>\n",
" <th>出生年月</th>\n",
" <th>单位名称</th>\n",
" <th>积分分值</th>\n",
" <th>年龄</th>\n",
" <th>年龄段</th>\n",
" </tr>\n",
" <tr>\n",
" <th>公示编号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>202300001</th>\n",
" <td>张浩</td>\n",
" <td>1977-02-01</td>\n",
" <td>北京首钢股份有限公司</td>\n",
" <td>140.05</td>\n",
" <td>45</td>\n",
" <td>45~49岁</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300002</th>\n",
" <td>冯云</td>\n",
" <td>1982-02-01</td>\n",
" <td>中国人民解放军空军二十三厂</td>\n",
" <td>134.29</td>\n",
" <td>40</td>\n",
" <td>40~44岁</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300003</th>\n",
" <td>王天东</td>\n",
" <td>1975-01-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.63</td>\n",
" <td>48</td>\n",
" <td>45~49岁</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300004</th>\n",
" <td>陈军</td>\n",
" <td>1976-07-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.29</td>\n",
" <td>46</td>\n",
" <td>45~49岁</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300005</th>\n",
" <td>樊海瑞</td>\n",
" <td>1981-06-01</td>\n",
" <td>中国民生银行股份有限公司</td>\n",
" <td>132.46</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 姓名 出生年月 单位名称 积分分值 年龄 年龄段\n",
"公示编号 \n",
"202300001 张浩 1977-02-01 北京首钢股份有限公司 140.05 45 45~49岁\n",
"202300002 冯云 1982-02-01 中国人民解放军空军二十三厂 134.29 40 40~44岁\n",
"202300003 王天东 1975-01-01 中建二局第三建筑工程有限公司 133.63 48 45~49岁\n",
"202300004 陈军 1976-07-01 中建二局第三建筑工程有限公司 133.29 46 45~49岁\n",
"202300005 樊海瑞 1981-06-01 中国民生银行股份有限公司 132.46 41 40~44岁"
]
},
"execution_count": 110,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 将年龄划分到年龄段 - 分箱 - 数据桶\n",
"settle_df['年龄段'] = pd.cut(\n",
" settle_df.年龄,\n",
" bins=np.arange(35, 61, 5),\n",
" labels=['35~39岁', '40~44岁', '45~49岁', '50~54岁', '55~59岁'],\n",
" right=False\n",
")\n",
"settle_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 111,
"id": "ea2e0c9b-0aa0-41d3-a52a-6926b797465c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"年龄段\n",
"40~44岁 4215\n",
"45~49岁 1053\n",
"35~39岁 681\n",
"50~54岁 34\n",
"55~59岁 20\n",
"Name: count, dtype: int64"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 统计每个元素出现的频次\n",
"temp = settle_df.年龄段.value_counts()\n",
"temp"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "30843274-b940-4527-92ed-97db86bb4ec7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0. , 0.39277201, 0.15816993, 1. ],\n",
" [0.18246828, 0.59332564, 0.30675894, 1. ],\n",
" [0.45176471, 0.76708958, 0.46120723, 1. ],\n",
" [0.72312188, 0.88961169, 0.69717801, 1. ],\n",
" [0.91326413, 0.96670511, 0.89619377, 1. ]])"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"plt.cm.Greens(np.linspace(0.9, 0.1, 5))"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "375dd407-9d0a-4788-a38e-3a37efbb6d3b",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"499.925pt\" height=\"252.574062pt\" viewBox=\"0 0 499.925 252.574062\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
" <metadata>\n",
" <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
" <cc:Work>\n",
" <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
" <dc:date>2025-03-04T16:47:31.005270</dc:date>\n",
" <dc:format>image/svg+xml</dc:format>\n",
" <dc:creator>\n",
" <cc:Agent>\n",
" <dc:title>Matplotlib v3.9.4, https://matplotlib.org/</dc:title>\n",
" </cc:Agent>\n",
" </dc:creator>\n",
" </cc:Work>\n",
" </rdf:RDF>\n",
" </metadata>\n",
" <defs>\n",
" <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
" </defs>\n",
" <g id=\"figure_1\">\n",
" <g id=\"patch_1\">\n",
" <path d=\"M 0 252.574062 \n",
"L 499.925 252.574062 \n",
"L 499.925 0 \n",
"L 0 0 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"axes_1\">\n",
" <g id=\"patch_2\">\n",
" <path d=\"M 46.325 228.96 \n",
"L 492.725 228.96 \n",
"L 492.725 7.2 \n",
"L 46.325 7.2 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"patch_3\">\n",
" <path d=\"M 68.645 228.96 \n",
"L 113.285 228.96 \n",
"L 113.285 17.76 \n",
"L 68.645 17.76 \n",
"z\n",
"\" clip-path=\"url(#p703aa4411f)\" style=\"fill: url(#h3de5bce4dd)\"/>\n",
" </g>\n",
" <g id=\"patch_4\">\n",
" <path d=\"M 157.925 228.96 \n",
"L 202.565 228.96 \n",
"L 202.565 176.19758 \n",
"L 157.925 176.19758 \n",
"z\n",
"\" clip-path=\"url(#p703aa4411f)\" style=\"fill: url(#h9cf71d2063)\"/>\n",
" </g>\n",
" <g id=\"patch_5\">\n",
" <path d=\"M 247.205 228.96 \n",
"L 291.845 228.96 \n",
"L 291.845 194.837295 \n",
"L 247.205 194.837295 \n",
"z\n",
"\" clip-path=\"url(#p703aa4411f)\" style=\"fill: url(#h43492deada)\"/>\n",
" </g>\n",
" <g id=\"patch_6\">\n",
" <path d=\"M 336.485 228.96 \n",
"L 381.125 228.96 \n",
"L 381.125 227.25637 \n",
"L 336.485 227.25637 \n",
"z\n",
"\" clip-path=\"url(#p703aa4411f)\" style=\"fill: url(#h6d0bdebefd)\"/>\n",
" </g>\n",
" <g id=\"patch_7\">\n",
" <path d=\"M 425.765 228.96 \n",
"L 470.405 228.96 \n",
"L 470.405 227.957865 \n",
"L 425.765 227.957865 \n",
"z\n",
"\" clip-path=\"url(#p703aa4411f)\" style=\"fill: url(#h6b47a2d288)\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_1\">\n",
" <g id=\"xtick_1\">\n",
" <g id=\"line2d_1\">\n",
" <defs>\n",
" <path id=\"m2b3d741f5c\" d=\"M 0 0 \n",
"L 0 3.5 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#m2b3d741f5c\" x=\"90.965\" y=\"228.96\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_1\">\n",
" <!-- 40~44岁 -->\n",
" <g transform=\"translate(73.465 244.085) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-34\" d=\"M 2975 1200 \n",
"L 2450 1200 \n",
"L 2450 100 \n",
"L 1875 100 \n",
"L 1875 1200 \n",
"L 200 1200 \n",
"L 200 1675 \n",
"L 1875 4425 \n",
"L 2450 4425 \n",
"L 2450 1675 \n",
"L 2975 1675 \n",
"L 2975 1200 \n",
"z\n",
"M 1875 1675 \n",
"L 1875 3525 \n",
"L 750 1675 \n",
"L 1875 1675 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-30\" d=\"M 2975 2250 \n",
"Q 2975 1350 2650 700 \n",
"Q 2325 50 1600 50 \n",
"Q 875 50 537 700 \n",
"Q 200 1350 200 2250 \n",
"Q 200 3150 537 3787 \n",
"Q 875 4425 1600 4425 \n",
"Q 2325 4425 2650 3787 \n",
"Q 2975 3150 2975 2250 \n",
"z\n",
"M 2375 2250 \n",
"Q 2375 3050 2187 3500 \n",
"Q 2000 3950 1600 3950 \n",
"Q 1200 3950 1000 3500 \n",
"Q 800 3050 800 2250 \n",
"Q 800 1450 1000 987 \n",
"Q 1200 525 1600 525 \n",
"Q 2000 525 2187 987 \n",
"Q 2375 1450 2375 2250 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-7e\" d=\"M 2925 5050 \n",
"Q 2775 4700 2587 4462 \n",
"Q 2400 4225 2150 4225 \n",
"Q 1975 4225 1650 4550 \n",
"Q 1325 4875 1100 4875 \n",
"Q 925 4875 825 4737 \n",
"Q 725 4600 625 4300 \n",
"L 375 4575 \n",
"Q 525 4925 687 5162 \n",
"Q 850 5400 1075 5400 \n",
"Q 1350 5400 1687 5087 \n",
"Q 2025 4775 2200 4775 \n",
"Q 2300 4775 2450 4925 \n",
"Q 2600 5075 2675 5375 \n",
"L 2925 5050 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-5c81\" d=\"M 2575 3250 \n",
"Q 2850 3100 3125 3000 \n",
"Q 2900 2800 2675 2450 \n",
"L 5400 2450 \n",
"Q 5175 1775 4925 1287 \n",
"Q 4675 800 4200 400 \n",
"Q 3725 0 2962 -225 \n",
"Q 2200 -450 950 -625 \n",
"Q 850 -350 625 -125 \n",
"Q 1775 -25 2350 100 \n",
"Q 2925 225 3250 375 \n",
"Q 2925 900 2550 1300 \n",
"Q 2750 1400 2975 1550 \n",
"Q 3350 1100 3700 600 \n",
"Q 4075 850 4337 1212 \n",
"Q 4600 1575 4750 2025 \n",
"L 2400 2025 \n",
"Q 2250 1800 2000 1537 \n",
"Q 1750 1275 1375 900 \n",
"Q 1150 1125 925 1225 \n",
"Q 1300 1475 1800 2037 \n",
"Q 2300 2600 2575 3250 \n",
"z\n",
"M 850 4725 \n",
"L 1400 4725 \n",
"Q 1350 4500 1350 3825 \n",
"L 2975 3825 \n",
"Q 2975 4775 2950 5125 \n",
"L 3475 5125 \n",
"Q 3450 4775 3450 3825 \n",
"L 5050 3825 \n",
"Q 5050 4400 5025 4700 \n",
"L 5550 4700 \n",
"Q 5525 4375 5525 4100 \n",
"L 5525 3400 \n",
"L 875 3400 \n",
"L 875 4200 \n",
"Q 875 4425 850 4725 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-34\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_2\">\n",
" <g id=\"line2d_2\">\n",
" <g>\n",
" <use xlink:href=\"#m2b3d741f5c\" x=\"180.245\" y=\"228.96\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_2\">\n",
" <!-- 45~49岁 -->\n",
" <g transform=\"translate(162.745 244.085) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-35\" d=\"M 2825 1650 \n",
"Q 2825 900 2462 475 \n",
"Q 2100 50 1500 50 \n",
"Q 975 50 637 400 \n",
"Q 300 750 275 1350 \n",
"L 850 1350 \n",
"Q 850 975 1025 750 \n",
"Q 1200 525 1525 525 \n",
"Q 1850 525 2037 800 \n",
"Q 2225 1075 2225 1650 \n",
"Q 2225 2150 2062 2387 \n",
"Q 1900 2625 1625 2625 \n",
"Q 1400 2625 1237 2525 \n",
"Q 1075 2425 925 2175 \n",
"L 425 2175 \n",
"L 575 4375 \n",
"L 2725 4375 \n",
"L 2725 3900 \n",
"L 1050 3900 \n",
"L 950 2750 \n",
"Q 1100 2900 1275 2975 \n",
"Q 1450 3050 1750 3050 \n",
"Q 2225 3050 2525 2687 \n",
"Q 2825 2325 2825 1650 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-39\" d=\"M 2825 2300 \n",
"Q 2825 1275 2462 662 \n",
"Q 2100 50 1425 50 \n",
"Q 950 50 662 400 \n",
"Q 375 750 375 1175 \n",
"L 950 1175 \n",
"Q 950 925 1087 725 \n",
"Q 1225 525 1450 525 \n",
"Q 1825 525 2012 950 \n",
"Q 2200 1375 2250 2200 \n",
"Q 2125 1925 1900 1775 \n",
"Q 1675 1625 1400 1625 \n",
"Q 925 1625 625 1975 \n",
"Q 325 2325 325 2975 \n",
"Q 325 3625 625 4025 \n",
"Q 925 4425 1525 4425 \n",
"Q 2125 4425 2475 3925 \n",
"Q 2825 3425 2825 2300 \n",
"z\n",
"M 2200 2875 \n",
"Q 2200 3425 2012 3700 \n",
"Q 1825 3975 1500 3975 \n",
"Q 1275 3975 1100 3762 \n",
"Q 925 3550 925 2975 \n",
"Q 925 2550 1062 2312 \n",
"Q 1200 2075 1500 2075 \n",
"Q 1825 2075 2012 2300 \n",
"Q 2200 2525 2200 2875 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-34\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-39\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_3\">\n",
" <g id=\"line2d_3\">\n",
" <g>\n",
" <use xlink:href=\"#m2b3d741f5c\" x=\"269.525\" y=\"228.96\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_3\">\n",
" <!-- 35~39岁 -->\n",
" <g transform=\"translate(252.025 244.085) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-33\" d=\"M 2825 1300 \n",
"Q 2825 725 2462 387 \n",
"Q 2100 50 1550 50 \n",
"Q 1000 50 637 387 \n",
"Q 275 725 275 1425 \n",
"L 850 1425 \n",
"Q 850 950 1037 737 \n",
"Q 1225 525 1550 525 \n",
"Q 1875 525 2050 725 \n",
"Q 2225 925 2225 1350 \n",
"Q 2225 1700 2037 1900 \n",
"Q 1850 2100 1375 2100 \n",
"L 1375 2525 \n",
"Q 1775 2525 1962 2725 \n",
"Q 2150 2925 2150 3325 \n",
"Q 2150 3625 2012 3800 \n",
"Q 1875 3975 1575 3975 \n",
"Q 1275 3975 1112 3762 \n",
"Q 950 3550 925 3150 \n",
"L 375 3150 \n",
"Q 425 3725 737 4075 \n",
"Q 1050 4425 1575 4425 \n",
"Q 2125 4425 2425 4112 \n",
"Q 2725 3800 2725 3350 \n",
"Q 2725 2925 2575 2687 \n",
"Q 2425 2450 2075 2325 \n",
"Q 2425 2250 2625 1975 \n",
"Q 2825 1700 2825 1300 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-33\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-33\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-39\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_4\">\n",
" <g id=\"line2d_4\">\n",
" <g>\n",
" <use xlink:href=\"#m2b3d741f5c\" x=\"358.805\" y=\"228.96\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_4\">\n",
" <!-- 50~54岁 -->\n",
" <g transform=\"translate(341.305 244.085) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-35\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"xtick_5\">\n",
" <g id=\"line2d_5\">\n",
" <g>\n",
" <use xlink:href=\"#m2b3d741f5c\" x=\"448.085\" y=\"228.96\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_5\">\n",
" <!-- 55~59岁 -->\n",
" <g transform=\"translate(430.585 244.085) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-35\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-39\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"matplotlib.axis_2\">\n",
" <g id=\"ytick_1\">\n",
" <g id=\"line2d_6\">\n",
" <defs>\n",
" <path id=\"mb6b8f0854d\" d=\"M 0 0 \n",
"L -3.5 0 \n",
"\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </defs>\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"228.96\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_6\">\n",
" <!-- 0 -->\n",
" <g transform=\"translate(34.325 232.377969) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-30\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_2\">\n",
" <g id=\"line2d_7\">\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"203.906619\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_7\">\n",
" <!-- 500 -->\n",
" <g transform=\"translate(24.325 207.324588) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-35\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"100\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_3\">\n",
" <g id=\"line2d_8\">\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"178.853238\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_8\">\n",
" <!-- 1000 -->\n",
" <g transform=\"translate(19.325 182.271207) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-31\" d=\"M 1950 100 \n",
"L 1375 100 \n",
"L 1375 3425 \n",
"L 625 3425 \n",
"L 625 3725 \n",
"Q 1075 3725 1325 3900 \n",
"Q 1575 4075 1650 4425 \n",
"L 1950 4425 \n",
"L 1950 100 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-31\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_4\">\n",
" <g id=\"line2d_9\">\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"153.799858\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_9\">\n",
" <!-- 1500 -->\n",
" <g transform=\"translate(19.325 157.217826) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-31\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_5\">\n",
" <g id=\"line2d_10\">\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"128.746477\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_10\">\n",
" <!-- 2000 -->\n",
" <g transform=\"translate(19.325 132.164446) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-32\" d=\"M 2850 100 \n",
"L 300 100 \n",
"L 300 500 \n",
"Q 450 900 712 1237 \n",
"Q 975 1575 1475 2000 \n",
"Q 1850 2325 2012 2600 \n",
"Q 2175 2875 2175 3200 \n",
"Q 2175 3525 2037 3737 \n",
"Q 1900 3950 1600 3950 \n",
"Q 1350 3950 1162 3725 \n",
"Q 975 3500 975 2925 \n",
"L 400 2925 \n",
"Q 425 3650 737 4037 \n",
"Q 1050 4425 1625 4425 \n",
"Q 2175 4425 2475 4087 \n",
"Q 2775 3750 2775 3175 \n",
"Q 2775 2700 2500 2350 \n",
"Q 2225 2000 1825 1650 \n",
"Q 1375 1250 1200 1050 \n",
"Q 1025 850 875 575 \n",
"L 2850 575 \n",
"L 2850 100 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-32\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_6\">\n",
" <g id=\"line2d_11\">\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"103.693096\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_11\">\n",
" <!-- 2500 -->\n",
" <g transform=\"translate(19.325 107.111065) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-32\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_7\">\n",
" <g id=\"line2d_12\">\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"78.639715\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_12\">\n",
" <!-- 3000 -->\n",
" <g transform=\"translate(19.325 82.057684) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-33\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_8\">\n",
" <g id=\"line2d_13\">\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"53.586335\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_13\">\n",
" <!-- 3500 -->\n",
" <g transform=\"translate(19.325 57.004303) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-33\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"ytick_9\">\n",
" <g id=\"line2d_14\">\n",
" <g>\n",
" <use xlink:href=\"#mb6b8f0854d\" x=\"46.325\" y=\"28.532954\" style=\"stroke: #000000; stroke-width: 0.8\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_14\">\n",
" <!-- 4000 -->\n",
" <g transform=\"translate(19.325 31.950922) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-34\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_15\">\n",
" <!-- Count -->\n",
" <g transform=\"translate(14.035938 130.58) rotate(-90) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-43\" d=\"M 3000 1800 \n",
"Q 2975 850 2600 450 \n",
"Q 2225 50 1700 50 \n",
"Q 1100 50 675 537 \n",
"Q 250 1025 250 2100 \n",
"Q 250 3275 662 3850 \n",
"Q 1075 4425 1725 4425 \n",
"Q 2275 4425 2637 4012 \n",
"Q 3000 3600 2975 2850 \n",
"L 2425 2850 \n",
"Q 2425 3400 2250 3675 \n",
"Q 2075 3950 1725 3950 \n",
"Q 1325 3950 1087 3537 \n",
"Q 850 3125 850 2150 \n",
"Q 850 1250 1087 887 \n",
"Q 1325 525 1700 525 \n",
"Q 1975 525 2200 787 \n",
"Q 2425 1050 2425 1800 \n",
"L 3000 1800 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-6f\" d=\"M 2950 1500 \n",
"Q 2950 850 2550 450 \n",
"Q 2150 50 1600 50 \n",
"Q 1050 50 650 450 \n",
"Q 250 850 250 1500 \n",
"Q 250 2150 650 2550 \n",
"Q 1050 2950 1600 2950 \n",
"Q 2150 2950 2550 2550 \n",
"Q 2950 2150 2950 1500 \n",
"z\n",
"M 2400 1500 \n",
"Q 2400 2000 2150 2250 \n",
"Q 1900 2500 1600 2500 \n",
"Q 1300 2500 1050 2250 \n",
"Q 800 2000 800 1500 \n",
"Q 800 1000 1050 750 \n",
"Q 1300 500 1600 500 \n",
"Q 1900 500 2150 750 \n",
"Q 2400 1000 2400 1500 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-75\" d=\"M 2825 100 \n",
"L 2325 100 \n",
"L 2325 625 \n",
"Q 2125 350 1887 200 \n",
"Q 1650 50 1275 50 \n",
"Q 825 50 600 300 \n",
"Q 375 550 375 925 \n",
"L 375 2900 \n",
"L 875 2900 \n",
"L 875 1100 \n",
"Q 875 800 1025 625 \n",
"Q 1175 450 1425 450 \n",
"Q 1750 450 2037 787 \n",
"Q 2325 1125 2325 1625 \n",
"L 2325 2900 \n",
"L 2825 2900 \n",
"L 2825 100 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-6e\" d=\"M 2825 100 \n",
"L 2325 100 \n",
"L 2325 1900 \n",
"Q 2325 2200 2175 2375 \n",
"Q 2025 2550 1775 2550 \n",
"Q 1450 2550 1162 2212 \n",
"Q 875 1875 875 1375 \n",
"L 875 100 \n",
"L 375 100 \n",
"L 375 2900 \n",
"L 875 2900 \n",
"L 875 2375 \n",
"Q 1075 2650 1312 2800 \n",
"Q 1550 2950 1925 2950 \n",
"Q 2375 2950 2600 2700 \n",
"Q 2825 2450 2825 2075 \n",
"L 2825 100 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-74\" d=\"M 2775 175 \n",
"Q 2650 125 2487 87 \n",
"Q 2325 50 2050 50 \n",
"Q 1600 50 1325 300 \n",
"Q 1050 550 1050 1000 \n",
"L 1050 2500 \n",
"L 200 2500 \n",
"L 200 2900 \n",
"L 1050 2900 \n",
"L 1050 3875 \n",
"L 1550 3875 \n",
"L 1550 2900 \n",
"L 2575 2900 \n",
"L 2575 2500 \n",
"L 1550 2500 \n",
"L 1550 975 \n",
"Q 1550 775 1650 637 \n",
"Q 1750 500 2025 500 \n",
"Q 2300 500 2475 550 \n",
"Q 2650 600 2775 675 \n",
"L 2775 175 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-43\"/>\n",
" <use xlink:href=\"#SimHei-6f\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-75\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-6e\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-74\" x=\"200\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <g id=\"patch_8\">\n",
" <path d=\"M 46.325 228.96 \n",
"L 46.325 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_9\">\n",
" <path d=\"M 492.725 228.96 \n",
"L 492.725 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_10\">\n",
" <path d=\"M 46.325 228.96 \n",
"L 492.725 228.96 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"patch_11\">\n",
" <path d=\"M 46.325 7.2 \n",
"L 492.725 7.2 \n",
"\" style=\"fill: none; stroke: #000000; stroke-width: 0.8; stroke-linejoin: miter; stroke-linecap: square\"/>\n",
" </g>\n",
" <g id=\"text_16\">\n",
" <!-- 4215 -->\n",
" <g transform=\"translate(80.965 16.256797) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-34\"/>\n",
" <use xlink:href=\"#SimHei-32\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-31\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_17\">\n",
" <!-- 1053 -->\n",
" <g transform=\"translate(170.245 174.694377) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-31\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-33\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_18\">\n",
" <!-- 681 -->\n",
" <g transform=\"translate(262.025 193.334093) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-36\" d=\"M 2850 1550 \n",
"Q 2850 850 2550 450 \n",
"Q 2250 50 1650 50 \n",
"Q 1050 50 700 550 \n",
"Q 350 1050 350 2175 \n",
"Q 350 3200 712 3812 \n",
"Q 1075 4425 1750 4425 \n",
"Q 2225 4425 2512 4075 \n",
"Q 2800 3725 2800 3300 \n",
"L 2225 3300 \n",
"Q 2225 3550 2087 3750 \n",
"Q 1950 3950 1725 3950 \n",
"Q 1350 3950 1150 3562 \n",
"Q 950 3175 925 2375 \n",
"Q 1100 2700 1300 2825 \n",
"Q 1500 2950 1775 2950 \n",
"Q 2250 2950 2550 2575 \n",
"Q 2850 2200 2850 1550 \n",
"z\n",
"M 2250 1550 \n",
"Q 2250 2000 2100 2250 \n",
"Q 1950 2500 1675 2500 \n",
"Q 1350 2500 1162 2250 \n",
"Q 975 2000 975 1650 \n",
"Q 975 1100 1162 800 \n",
"Q 1350 500 1675 500 \n",
"Q 1900 500 2075 725 \n",
"Q 2250 950 2250 1550 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-38\" d=\"M 2875 1325 \n",
"Q 2875 700 2525 375 \n",
"Q 2175 50 1575 50 \n",
"Q 975 50 625 375 \n",
"Q 275 700 275 1325 \n",
"Q 275 1650 475 1912 \n",
"Q 675 2175 1025 2300 \n",
"Q 725 2425 562 2650 \n",
"Q 400 2875 400 3225 \n",
"Q 400 3775 750 4100 \n",
"Q 1100 4425 1575 4425 \n",
"Q 2050 4425 2400 4100 \n",
"Q 2750 3775 2750 3225 \n",
"Q 2750 2875 2587 2650 \n",
"Q 2425 2425 2125 2300 \n",
"Q 2475 2175 2675 1912 \n",
"Q 2875 1650 2875 1325 \n",
"z\n",
"M 2200 3225 \n",
"Q 2200 3625 2025 3800 \n",
"Q 1850 3975 1575 3975 \n",
"Q 1300 3975 1125 3800 \n",
"Q 950 3625 950 3225 \n",
"Q 950 2825 1137 2662 \n",
"Q 1325 2500 1575 2500 \n",
"Q 1825 2500 2012 2662 \n",
"Q 2200 2825 2200 3225 \n",
"z\n",
"M 2300 1325 \n",
"Q 2300 1675 2112 1875 \n",
"Q 1925 2075 1575 2075 \n",
"Q 1225 2075 1037 1875 \n",
"Q 850 1675 850 1325 \n",
"Q 850 925 1050 712 \n",
"Q 1250 500 1575 500 \n",
"Q 1900 500 2100 712 \n",
"Q 2300 925 2300 1325 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-36\"/>\n",
" <use xlink:href=\"#SimHei-38\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-31\" x=\"100\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_19\">\n",
" <!-- 34 -->\n",
" <g transform=\"translate(353.805 225.753167) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-33\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"50\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_20\">\n",
" <!-- 20 -->\n",
" <g transform=\"translate(443.085 226.454662) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-32\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" <defs>\n",
" <clipPath id=\"p703aa4411f\">\n",
" <rect x=\"46.325\" y=\"7.2\" width=\"446.4\" height=\"221.76\"/>\n",
" </clipPath>\n",
" </defs>\n",
" <defs>\n",
" <pattern id=\"h3de5bce4dd\" patternUnits=\"userSpaceOnUse\" x=\"0\" y=\"0\" width=\"72\" height=\"72\">\n",
" <rect x=\"0\" y=\"0\" width=\"73\" height=\"73\" fill=\"#006428\"/>\n",
" <path d=\"M -36 36 \n",
"L 36 -36 \n",
"M -30 42 \n",
"L 42 -30 \n",
"M -24 48 \n",
"L 48 -24 \n",
"M -18 54 \n",
"L 54 -18 \n",
"M -12 60 \n",
"L 60 -12 \n",
"M -6 66 \n",
"L 66 -6 \n",
"M 0 72 \n",
"L 72 0 \n",
"M 6 78 \n",
"L 78 6 \n",
"M 12 84 \n",
"L 84 12 \n",
"M 18 90 \n",
"L 90 18 \n",
"M 24 96 \n",
"L 96 24 \n",
"M 30 102 \n",
"L 102 30 \n",
"M 36 108 \n",
"L 108 36 \n",
"\" style=\"fill: #000000; stroke: #000000; stroke-width: 1.0; stroke-linecap: butt; stroke-linejoin: miter\"/>\n",
" </pattern>\n",
" <pattern id=\"h9cf71d2063\" patternUnits=\"userSpaceOnUse\" x=\"0\" y=\"0\" width=\"72\" height=\"72\">\n",
" <rect x=\"0\" y=\"0\" width=\"73\" height=\"73\" fill=\"#228a44\"/>\n",
" <path d=\"M -36 36 \n",
"L 36 -36 \n",
"M -30 42 \n",
"L 42 -30 \n",
"M -24 48 \n",
"L 48 -24 \n",
"M -18 54 \n",
"L 54 -18 \n",
"M -12 60 \n",
"L 60 -12 \n",
"M -6 66 \n",
"L 66 -6 \n",
"M 0 72 \n",
"L 72 0 \n",
"M 6 78 \n",
"L 78 6 \n",
"M 12 84 \n",
"L 84 12 \n",
"M 18 90 \n",
"L 90 18 \n",
"M 24 96 \n",
"L 96 24 \n",
"M 30 102 \n",
"L 102 30 \n",
"M 36 108 \n",
"L 108 36 \n",
"\" style=\"fill: #000000; stroke: #000000; stroke-width: 1.0; stroke-linecap: butt; stroke-linejoin: miter\"/>\n",
" </pattern>\n",
" <pattern id=\"h43492deada\" patternUnits=\"userSpaceOnUse\" x=\"0\" y=\"0\" width=\"72\" height=\"72\">\n",
" <rect x=\"0\" y=\"0\" width=\"73\" height=\"73\" fill=\"#4bb062\"/>\n",
" <path d=\"M -36 36 \n",
"L 36 -36 \n",
"M -30 42 \n",
"L 42 -30 \n",
"M -24 48 \n",
"L 48 -24 \n",
"M -18 54 \n",
"L 54 -18 \n",
"M -12 60 \n",
"L 60 -12 \n",
"M -6 66 \n",
"L 66 -6 \n",
"M 0 72 \n",
"L 72 0 \n",
"M 6 78 \n",
"L 78 6 \n",
"M 12 84 \n",
"L 84 12 \n",
"M 18 90 \n",
"L 90 18 \n",
"M 24 96 \n",
"L 96 24 \n",
"M 30 102 \n",
"L 102 30 \n",
"M 36 108 \n",
"L 108 36 \n",
"\" style=\"fill: #000000; stroke: #000000; stroke-width: 1.0; stroke-linecap: butt; stroke-linejoin: miter\"/>\n",
" </pattern>\n",
" <pattern id=\"h6d0bdebefd\" patternUnits=\"userSpaceOnUse\" x=\"0\" y=\"0\" width=\"72\" height=\"72\">\n",
" <rect x=\"0\" y=\"0\" width=\"73\" height=\"73\" fill=\"#86cc85\"/>\n",
" <path d=\"M -36 36 \n",
"L 36 -36 \n",
"M -30 42 \n",
"L 42 -30 \n",
"M -24 48 \n",
"L 48 -24 \n",
"M -18 54 \n",
"L 54 -18 \n",
"M -12 60 \n",
"L 60 -12 \n",
"M -6 66 \n",
"L 66 -6 \n",
"M 0 72 \n",
"L 72 0 \n",
"M 6 78 \n",
"L 78 6 \n",
"M 12 84 \n",
"L 84 12 \n",
"M 18 90 \n",
"L 90 18 \n",
"M 24 96 \n",
"L 96 24 \n",
"M 30 102 \n",
"L 102 30 \n",
"M 36 108 \n",
"L 108 36 \n",
"\" style=\"fill: #000000; stroke: #000000; stroke-width: 1.0; stroke-linecap: butt; stroke-linejoin: miter\"/>\n",
" </pattern>\n",
" <pattern id=\"h6b47a2d288\" patternUnits=\"userSpaceOnUse\" x=\"0\" y=\"0\" width=\"72\" height=\"72\">\n",
" <rect x=\"0\" y=\"0\" width=\"73\" height=\"73\" fill=\"#b8e3b2\"/>\n",
" <path d=\"M -36 36 \n",
"L 36 -36 \n",
"M -30 42 \n",
"L 42 -30 \n",
"M -24 48 \n",
"L 48 -24 \n",
"M -18 54 \n",
"L 54 -18 \n",
"M -12 60 \n",
"L 60 -12 \n",
"M -6 66 \n",
"L 66 -6 \n",
"M 0 72 \n",
"L 72 0 \n",
"M 6 78 \n",
"L 78 6 \n",
"M 12 84 \n",
"L 84 12 \n",
"M 18 90 \n",
"L 90 18 \n",
"M 24 96 \n",
"L 96 24 \n",
"M 30 102 \n",
"L 102 30 \n",
"M 36 108 \n",
"L 108 36 \n",
"\" style=\"fill: #000000; stroke: #000000; stroke-width: 1.0; stroke-linecap: butt; stroke-linejoin: miter\"/>\n",
" </pattern>\n",
" </defs>\n",
"</svg>\n"
],
"text/plain": [
"<Figure size 800x400 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# 绘制柱状图\n",
"temp.plot(\n",
" kind='bar', # 图表类型\n",
" figsize=(8, 4), # 图表尺寸\n",
" xlabel='', # 横轴标签\n",
" ylabel='Count', # 纵轴标签\n",
" width=0.5, # 柱子宽度\n",
" hatch='//', # 柱子条纹\n",
" color=plt.cm.Greens(np.linspace(0.9, 0.3, temp.size)) # 颜色值\n",
")\n",
"\n",
"for i in range(temp.size):\n",
" # plt.text(横坐标, 纵坐标, 标签内容)\n",
" plt.text(i, temp.iloc[i] + 30, temp.iloc[i], ha='center')\n",
"\n",
"# 定制横轴的刻度\n",
"plt.xticks(rotation=0)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "e020ba6c-d16d-482f-ad3b-a9e855257b91",
"metadata": {},
"outputs": [
{
"data": {
"image/svg+xml": [
"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n",
"<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
" \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
"<svg xmlns:xlink=\"http://www.w3.org/1999/xlink\" width=\"299.538866pt\" height=\"280.512pt\" viewBox=\"0 0 299.538866 280.512\" xmlns=\"http://www.w3.org/2000/svg\" version=\"1.1\">\n",
" <metadata>\n",
" <rdf:RDF xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n",
" <cc:Work>\n",
" <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\n",
" <dc:date>2025-03-04T16:47:31.053333</dc:date>\n",
" <dc:format>image/svg+xml</dc:format>\n",
" <dc:creator>\n",
" <cc:Agent>\n",
" <dc:title>Matplotlib v3.9.4, https://matplotlib.org/</dc:title>\n",
" </cc:Agent>\n",
" </dc:creator>\n",
" </cc:Work>\n",
" </rdf:RDF>\n",
" </metadata>\n",
" <defs>\n",
" <style type=\"text/css\">*{stroke-linejoin: round; stroke-linecap: butt}</style>\n",
" </defs>\n",
" <g id=\"figure_1\">\n",
" <g id=\"patch_1\">\n",
" <path d=\"M 0 280.512 \n",
"L 299.538866 280.512 \n",
"L 299.538866 0 \n",
"L 0 0 \n",
"z\n",
"\" style=\"fill: #ffffff\"/>\n",
" </g>\n",
" <g id=\"axes_1\">\n",
" <g id=\"patch_2\">\n",
" <path d=\"M 246.7008 140.256 \n",
"C 246.7008 120.565973 241.237445 101.256765 230.921114 84.485645 \n",
"C 220.604784 67.714525 205.834526 54.130223 188.26051 45.250408 \n",
"C 170.686494 36.370594 150.988505 32.53875 131.36725 34.18298 \n",
"C 111.745995 35.82721 92.960449 42.883912 77.109379 54.564696 \n",
"C 61.258308 66.245479 48.954852 82.098515 41.573583 100.352666 \n",
"C 34.192314 118.606816 32.018749 138.555987 35.295723 157.971408 \n",
"C 38.572697 177.386829 47.173453 195.517488 60.137094 210.337783 \n",
"C 73.100735 225.158079 89.92581 236.094743 108.732583 241.925905 \n",
"L 118.189608 211.424934 \n",
"C 105.024867 207.34312 93.247314 199.687455 84.172766 189.313248 \n",
"C 75.098217 178.939041 69.077688 166.24758 66.783806 152.656786 \n",
"C 64.489924 139.065991 66.01142 125.101571 71.178308 112.323666 \n",
"C 76.345197 99.545761 84.957616 88.448635 96.053365 80.272087 \n",
"C 107.149115 72.095539 120.298997 67.155847 134.033875 66.004886 \n",
"C 147.768754 64.853925 161.557346 67.536216 173.859157 73.752086 \n",
"C 186.160968 79.967956 196.500148 89.476968 203.72158 101.216752 \n",
"C 210.943012 112.956535 214.76736 126.472981 214.76736 140.256 \n",
"z\n",
"\" style=\"fill: #1f77b4\"/>\n",
" </g>\n",
" <g id=\"patch_3\">\n",
" <path d=\"M 108.732583 241.925905 \n",
"C 127.525804 247.752866 147.571709 248.25653 166.633844 243.380705 \n",
"C 185.695979 238.50488 203.038062 228.437896 216.725295 214.302894 \n",
"L 193.784506 192.088825 \n",
"C 184.203443 201.983327 172.063985 209.030216 158.720491 212.443294 \n",
"C 145.376997 215.856371 131.344863 215.503806 118.189608 211.424934 \n",
"z\n",
"\" style=\"fill: #ff7f0e\"/>\n",
" </g>\n",
" <g id=\"patch_4\">\n",
" <path d=\"M 216.725295 214.302894 \n",
"C 225.545093 205.19456 232.667012 194.583258 237.754729 182.970088 \n",
"C 242.842446 171.356917 245.814596 158.927606 246.530823 146.269106 \n",
"L 214.648376 144.465174 \n",
"C 214.147017 153.326124 212.066512 162.026642 208.50511 170.155861 \n",
"C 204.943708 178.285081 199.958365 185.712992 193.784506 192.088825 \n",
"z\n",
"\" style=\"fill: #2ca02c\"/>\n",
" </g>\n",
" <g id=\"patch_5\">\n",
" <path d=\"M 246.530823 146.269106 \n",
"C 246.566488 145.63877 246.596544 145.008129 246.620988 144.377257 \n",
"C 246.645432 143.746386 246.664263 143.11531 246.677478 142.484103 \n",
"L 214.751035 141.815672 \n",
"C 214.741784 142.257517 214.728602 142.69927 214.711492 143.14088 \n",
"C 214.694381 143.58249 214.673342 144.023939 214.648376 144.465174 \n",
"z\n",
"\" style=\"fill: #d62728\"/>\n",
" </g>\n",
" <g id=\"patch_6\">\n",
" <path d=\"M 246.677478 142.484103 \n",
"C 246.685252 142.112807 246.691082 141.741473 246.694969 141.370115 \n",
"C 246.698856 140.998758 246.7008 140.627383 246.7008 140.256005 \n",
"L 214.76736 140.256004 \n",
"C 214.76736 140.515968 214.765999 140.775931 214.763279 141.035881 \n",
"C 214.760558 141.295831 214.756476 141.555765 214.751035 141.815672 \n",
"z\n",
"\" style=\"fill: #9467bd\"/>\n",
" </g>\n",
" <g id=\"matplotlib.axis_1\"/>\n",
" <g id=\"matplotlib.axis_2\"/>\n",
" <g id=\"text_1\">\n",
" <!-- 40~44岁 -->\n",
" <g transform=\"translate(35.794716 49.413534) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-34\" d=\"M 2975 1200 \n",
"L 2450 1200 \n",
"L 2450 100 \n",
"L 1875 100 \n",
"L 1875 1200 \n",
"L 200 1200 \n",
"L 200 1675 \n",
"L 1875 4425 \n",
"L 2450 4425 \n",
"L 2450 1675 \n",
"L 2975 1675 \n",
"L 2975 1200 \n",
"z\n",
"M 1875 1675 \n",
"L 1875 3525 \n",
"L 750 1675 \n",
"L 1875 1675 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-30\" d=\"M 2975 2250 \n",
"Q 2975 1350 2650 700 \n",
"Q 2325 50 1600 50 \n",
"Q 875 50 537 700 \n",
"Q 200 1350 200 2250 \n",
"Q 200 3150 537 3787 \n",
"Q 875 4425 1600 4425 \n",
"Q 2325 4425 2650 3787 \n",
"Q 2975 3150 2975 2250 \n",
"z\n",
"M 2375 2250 \n",
"Q 2375 3050 2187 3500 \n",
"Q 2000 3950 1600 3950 \n",
"Q 1200 3950 1000 3500 \n",
"Q 800 3050 800 2250 \n",
"Q 800 1450 1000 987 \n",
"Q 1200 525 1600 525 \n",
"Q 2000 525 2187 987 \n",
"Q 2375 1450 2375 2250 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-7e\" d=\"M 2925 5050 \n",
"Q 2775 4700 2587 4462 \n",
"Q 2400 4225 2150 4225 \n",
"Q 1975 4225 1650 4550 \n",
"Q 1325 4875 1100 4875 \n",
"Q 925 4875 825 4737 \n",
"Q 725 4600 625 4300 \n",
"L 375 4575 \n",
"Q 525 4925 687 5162 \n",
"Q 850 5400 1075 5400 \n",
"Q 1350 5400 1687 5087 \n",
"Q 2025 4775 2200 4775 \n",
"Q 2300 4775 2450 4925 \n",
"Q 2600 5075 2675 5375 \n",
"L 2925 5050 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-5c81\" d=\"M 2575 3250 \n",
"Q 2850 3100 3125 3000 \n",
"Q 2900 2800 2675 2450 \n",
"L 5400 2450 \n",
"Q 5175 1775 4925 1287 \n",
"Q 4675 800 4200 400 \n",
"Q 3725 0 2962 -225 \n",
"Q 2200 -450 950 -625 \n",
"Q 850 -350 625 -125 \n",
"Q 1775 -25 2350 100 \n",
"Q 2925 225 3250 375 \n",
"Q 2925 900 2550 1300 \n",
"Q 2750 1400 2975 1550 \n",
"Q 3350 1100 3700 600 \n",
"Q 4075 850 4337 1212 \n",
"Q 4600 1575 4750 2025 \n",
"L 2400 2025 \n",
"Q 2250 1800 2000 1537 \n",
"Q 1750 1275 1375 900 \n",
"Q 1150 1125 925 1225 \n",
"Q 1300 1475 1800 2037 \n",
"Q 2300 2600 2575 3250 \n",
"z\n",
"M 850 4725 \n",
"L 1400 4725 \n",
"Q 1350 4500 1350 3825 \n",
"L 2975 3825 \n",
"Q 2975 4775 2950 5125 \n",
"L 3475 5125 \n",
"Q 3450 4775 3450 3825 \n",
"L 5050 3825 \n",
"Q 5050 4400 5025 4700 \n",
"L 5550 4700 \n",
"Q 5525 4375 5525 4100 \n",
"L 5525 3400 \n",
"L 875 3400 \n",
"L 875 4200 \n",
"Q 875 4425 850 4725 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-34\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_2\">\n",
" <!-- 70.2% -->\n",
" <g transform=\"translate(74.081372 70.191829) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-37\" d=\"M 2775 3850 \n",
"L 1600 100 \n",
"L 1025 100 \n",
"L 2225 3900 \n",
"L 400 3900 \n",
"L 400 4375 \n",
"L 2775 4375 \n",
"L 2775 3850 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-2e\" d=\"M 1100 100 \n",
"L 525 100 \n",
"L 525 650 \n",
"L 1100 650 \n",
"L 1100 100 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-32\" d=\"M 2850 100 \n",
"L 300 100 \n",
"L 300 500 \n",
"Q 450 900 712 1237 \n",
"Q 975 1575 1475 2000 \n",
"Q 1850 2325 2012 2600 \n",
"Q 2175 2875 2175 3200 \n",
"Q 2175 3525 2037 3737 \n",
"Q 1900 3950 1600 3950 \n",
"Q 1350 3950 1162 3725 \n",
"Q 975 3500 975 2925 \n",
"L 400 2925 \n",
"Q 425 3650 737 4037 \n",
"Q 1050 4425 1625 4425 \n",
"Q 2175 4425 2475 4087 \n",
"Q 2775 3750 2775 3175 \n",
"Q 2775 2700 2500 2350 \n",
"Q 2225 2000 1825 1650 \n",
"Q 1375 1250 1200 1050 \n",
"Q 1025 850 875 575 \n",
"L 2850 575 \n",
"L 2850 100 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-25\" d=\"M 1425 3300 \n",
"Q 1425 2575 1212 2375 \n",
"Q 1000 2175 800 2175 \n",
"Q 600 2175 387 2375 \n",
"Q 175 2575 175 3300 \n",
"Q 175 4025 387 4225 \n",
"Q 600 4425 800 4425 \n",
"Q 1000 4425 1212 4225 \n",
"Q 1425 4025 1425 3300 \n",
"z\n",
"M 2650 4350 \n",
"L 725 50 \n",
"L 525 125 \n",
"L 2450 4425 \n",
"L 2650 4350 \n",
"z\n",
"M 3000 1175 \n",
"Q 3000 450 2787 250 \n",
"Q 2575 50 2375 50 \n",
"Q 2175 50 1962 250 \n",
"Q 1750 450 1750 1175 \n",
"Q 1750 1900 1962 2100 \n",
"Q 2175 2300 2375 2300 \n",
"Q 2575 2300 2787 2100 \n",
"Q 3000 1900 3000 1175 \n",
"z\n",
"M 1025 3300 \n",
"Q 1025 3750 975 3900 \n",
"Q 925 4050 800 4050 \n",
"Q 675 4050 625 3900 \n",
"Q 575 3750 575 3300 \n",
"Q 575 2850 625 2700 \n",
"Q 675 2550 800 2550 \n",
"Q 925 2550 975 2700 \n",
"Q 1025 2850 1025 3300 \n",
"z\n",
"M 2600 1175 \n",
"Q 2600 1625 2550 1775 \n",
"Q 2500 1925 2375 1925 \n",
"Q 2250 1925 2200 1775 \n",
"Q 2150 1625 2150 1175 \n",
"Q 2150 725 2200 575 \n",
"Q 2250 425 2375 425 \n",
"Q 2500 425 2550 575 \n",
"Q 2600 725 2600 1175 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-37\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-2e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-32\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-25\" x=\"200\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_3\">\n",
" <!-- 45~49岁 -->\n",
" <g transform=\"translate(169.271628 257.111144) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-35\" d=\"M 2825 1650 \n",
"Q 2825 900 2462 475 \n",
"Q 2100 50 1500 50 \n",
"Q 975 50 637 400 \n",
"Q 300 750 275 1350 \n",
"L 850 1350 \n",
"Q 850 975 1025 750 \n",
"Q 1200 525 1525 525 \n",
"Q 1850 525 2037 800 \n",
"Q 2225 1075 2225 1650 \n",
"Q 2225 2150 2062 2387 \n",
"Q 1900 2625 1625 2625 \n",
"Q 1400 2625 1237 2525 \n",
"Q 1075 2425 925 2175 \n",
"L 425 2175 \n",
"L 575 4375 \n",
"L 2725 4375 \n",
"L 2725 3900 \n",
"L 1050 3900 \n",
"L 950 2750 \n",
"Q 1100 2900 1275 2975 \n",
"Q 1450 3050 1750 3050 \n",
"Q 2225 3050 2525 2687 \n",
"Q 2825 2325 2825 1650 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" <path id=\"SimHei-39\" d=\"M 2825 2300 \n",
"Q 2825 1275 2462 662 \n",
"Q 2100 50 1425 50 \n",
"Q 950 50 662 400 \n",
"Q 375 750 375 1175 \n",
"L 950 1175 \n",
"Q 950 925 1087 725 \n",
"Q 1225 525 1450 525 \n",
"Q 1825 525 2012 950 \n",
"Q 2200 1375 2250 2200 \n",
"Q 2125 1925 1900 1775 \n",
"Q 1675 1625 1400 1625 \n",
"Q 925 1625 625 1975 \n",
"Q 325 2325 325 2975 \n",
"Q 325 3625 625 4025 \n",
"Q 925 4425 1525 4425 \n",
"Q 2125 4425 2475 3925 \n",
"Q 2825 3425 2825 2300 \n",
"z\n",
"M 2200 2875 \n",
"Q 2200 3425 2012 3700 \n",
"Q 1825 3975 1500 3975 \n",
"Q 1275 3975 1100 3762 \n",
"Q 925 3550 925 2975 \n",
"Q 925 2550 1062 2312 \n",
"Q 1200 2075 1500 2075 \n",
"Q 1825 2075 2012 2300 \n",
"Q 2200 2525 2200 2875 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-34\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-39\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_4\">\n",
" <!-- 17.5% -->\n",
" <g transform=\"translate(150.177167 230.685437) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-31\" d=\"M 1950 100 \n",
"L 1375 100 \n",
"L 1375 3425 \n",
"L 625 3425 \n",
"L 625 3725 \n",
"Q 1075 3725 1325 3900 \n",
"Q 1575 4075 1650 4425 \n",
"L 1950 4425 \n",
"L 1950 100 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-31\"/>\n",
" <use xlink:href=\"#SimHei-37\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-2e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-25\" x=\"200\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_5\">\n",
" <!-- 35~39岁 -->\n",
" <g transform=\"translate(247.504602 190.659465) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-33\" d=\"M 2825 1300 \n",
"Q 2825 725 2462 387 \n",
"Q 2100 50 1550 50 \n",
"Q 1000 50 637 387 \n",
"Q 275 725 275 1425 \n",
"L 850 1425 \n",
"Q 850 950 1037 737 \n",
"Q 1225 525 1550 525 \n",
"Q 1875 525 2050 725 \n",
"Q 2225 925 2225 1350 \n",
"Q 2225 1700 2037 1900 \n",
"Q 1850 2100 1375 2100 \n",
"L 1375 2525 \n",
"Q 1775 2525 1962 2725 \n",
"Q 2150 2925 2150 3325 \n",
"Q 2150 3625 2012 3800 \n",
"Q 1875 3975 1575 3975 \n",
"Q 1275 3975 1112 3762 \n",
"Q 950 3550 925 3150 \n",
"L 375 3150 \n",
"Q 425 3725 737 4075 \n",
"Q 1050 4425 1575 4425 \n",
"Q 2125 4425 2425 4112 \n",
"Q 2725 3800 2725 3350 \n",
"Q 2725 2925 2575 2687 \n",
"Q 2425 2450 2075 2325 \n",
"Q 2425 2250 2625 1975 \n",
"Q 2825 1700 2825 1300 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-33\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-33\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-39\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_6\">\n",
" <!-- 11.3% -->\n",
" <g transform=\"translate(210.62992 179.336412) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-31\"/>\n",
" <use xlink:href=\"#SimHei-31\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-2e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-33\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-25\" x=\"200\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_7\">\n",
" <!-- 50~54岁 -->\n",
" <g transform=\"translate(257.257487 148.207352) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-35\"/>\n",
" <use xlink:href=\"#SimHei-30\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-34\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_8\">\n",
" <!-- 0.6% -->\n",
" <g transform=\"translate(220.66624 146.532506) scale(0.1 -0.1)\">\n",
" <defs>\n",
" <path id=\"SimHei-36\" d=\"M 2850 1550 \n",
"Q 2850 850 2550 450 \n",
"Q 2250 50 1650 50 \n",
"Q 1050 50 700 550 \n",
"Q 350 1050 350 2175 \n",
"Q 350 3200 712 3812 \n",
"Q 1075 4425 1750 4425 \n",
"Q 2225 4425 2512 4075 \n",
"Q 2800 3725 2800 3300 \n",
"L 2225 3300 \n",
"Q 2225 3550 2087 3750 \n",
"Q 1950 3950 1725 3950 \n",
"Q 1350 3950 1150 3562 \n",
"Q 950 3175 925 2375 \n",
"Q 1100 2700 1300 2825 \n",
"Q 1500 2950 1775 2950 \n",
"Q 2250 2950 2550 2575 \n",
"Q 2850 2200 2850 1550 \n",
"z\n",
"M 2250 1550 \n",
"Q 2250 2000 2100 2250 \n",
"Q 1950 2500 1675 2500 \n",
"Q 1350 2500 1162 2250 \n",
"Q 975 2000 975 1650 \n",
"Q 975 1100 1162 800 \n",
"Q 1350 500 1675 500 \n",
"Q 1900 500 2075 725 \n",
"Q 2250 950 2250 1550 \n",
"z\n",
"\" transform=\"scale(0.015625)\"/>\n",
" </defs>\n",
" <use xlink:href=\"#SimHei-30\"/>\n",
" <use xlink:href=\"#SimHei-2e\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-36\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-25\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_9\">\n",
" <!-- 55~59岁 -->\n",
" <g transform=\"translate(257.338866 144.899496) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-35\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-7e\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-35\" x=\"150\"/>\n",
" <use xlink:href=\"#SimHei-39\" x=\"200\"/>\n",
" <use xlink:href=\"#SimHei-5c81\" x=\"250\"/>\n",
" </g>\n",
" </g>\n",
" <g id=\"text_10\">\n",
" <!-- 0.3% -->\n",
" <g transform=\"translate(220.729124 143.976436) scale(0.1 -0.1)\">\n",
" <use xlink:href=\"#SimHei-30\"/>\n",
" <use xlink:href=\"#SimHei-2e\" x=\"50\"/>\n",
" <use xlink:href=\"#SimHei-33\" x=\"100\"/>\n",
" <use xlink:href=\"#SimHei-25\" x=\"150\"/>\n",
" </g>\n",
" </g>\n",
" </g>\n",
" </g>\n",
"</svg>\n"
],
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# 绘制饼图\n",
"temp.plot(\n",
" kind='pie',\n",
" ylabel='',\n",
" autopct='%.1f%%', # 自动计算并显示百分比\n",
" wedgeprops={'width': 0.3}, # 环状结构部分的宽度\n",
" pctdistance=0.85, # 百分比到圆心的距离\n",
" labeldistance=1.1, # 标签到圆心的距离\n",
" # shadow=True, # 阴影效果\n",
" # startangle=0, # 起始角度\n",
" counterclock=True, # 是否反时针方向绘制\n",
")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 115,
"id": "e846eec2-6c95-409c-8b15-2b14cab3f57c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"mean 111.849640\n",
"max 140.050000\n",
"min 109.920000\n",
"std 2.481941\n",
"skew 3.485351\n",
"kurt 17.390027\n",
"Name: 积分分值, dtype: float64"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# agg - aggregate - 聚合\n",
"settle_df.积分分值.agg(['mean', 'max', 'min', 'std', 'skew', 'kurt'])"
]
},
{
"cell_type": "markdown",
"id": "b1669102-1c03-4751-813c-b241a05718e3",
"metadata": {},
"source": [
"线性归一化:\n",
"$$\n",
"x^{\\prime} = \\frac{x - x_{min}}{x_{max} - x_{min}}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 116,
"id": "e8d9dca7-b976-43ab-96b8-abefca66cc53",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(140.05, 109.92)"
]
},
"execution_count": 116,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 将积分分值处理成0~1范围的值\n",
"max_score, min_score = settle_df.积分分值.agg(['max', 'min'])\n",
"max_score, min_score"
]
},
{
"cell_type": "code",
"execution_count": 117,
"id": "10acd550-8422-4934-b38f-03554f86d305",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>姓名</th>\n",
" <th>出生年月</th>\n",
" <th>单位名称</th>\n",
" <th>积分分值</th>\n",
" <th>年龄</th>\n",
" <th>年龄段</th>\n",
" <th>线性归一化积分</th>\n",
" </tr>\n",
" <tr>\n",
" <th>公示编号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>202300001</th>\n",
" <td>张浩</td>\n",
" <td>1977-02-01</td>\n",
" <td>北京首钢股份有限公司</td>\n",
" <td>140.05</td>\n",
" <td>45</td>\n",
" <td>45~49岁</td>\n",
" <td>1.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300002</th>\n",
" <td>冯云</td>\n",
" <td>1982-02-01</td>\n",
" <td>中国人民解放军空军二十三厂</td>\n",
" <td>134.29</td>\n",
" <td>40</td>\n",
" <td>40~44岁</td>\n",
" <td>0.81</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300003</th>\n",
" <td>王天东</td>\n",
" <td>1975-01-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.63</td>\n",
" <td>48</td>\n",
" <td>45~49岁</td>\n",
" <td>0.79</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300004</th>\n",
" <td>陈军</td>\n",
" <td>1976-07-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.29</td>\n",
" <td>46</td>\n",
" <td>45~49岁</td>\n",
" <td>0.78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300005</th>\n",
" <td>樊海瑞</td>\n",
" <td>1981-06-01</td>\n",
" <td>中国民生银行股份有限公司</td>\n",
" <td>132.46</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" <td>0.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202305999</th>\n",
" <td>曹恰</td>\n",
" <td>1983-09-01</td>\n",
" <td>首都师范大学科德学院</td>\n",
" <td>109.92</td>\n",
" <td>39</td>\n",
" <td>35~39岁</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202306000</th>\n",
" <td>罗佳</td>\n",
" <td>1981-05-01</td>\n",
" <td>厦门方胜众合企业服务有限公司海淀分公司</td>\n",
" <td>109.92</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202306001</th>\n",
" <td>席盛代</td>\n",
" <td>1983-06-01</td>\n",
" <td>中国华能集团清洁能源技术研究院有限公司</td>\n",
" <td>109.92</td>\n",
" <td>39</td>\n",
" <td>35~39岁</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202306002</th>\n",
" <td>彭芸芸</td>\n",
" <td>1981-09-01</td>\n",
" <td>北京汉杰凯德文化传播有限公司</td>\n",
" <td>109.92</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202306003</th>\n",
" <td>张越</td>\n",
" <td>1982-01-01</td>\n",
" <td>大爱城投资控股有限公司</td>\n",
" <td>109.92</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" <td>0.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6003 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" 姓名 出生年月 单位名称 积分分值 年龄 年龄段 线性归一化积分\n",
"公示编号 \n",
"202300001 张浩 1977-02-01 北京首钢股份有限公司 140.05 45 45~49岁 1.00\n",
"202300002 冯云 1982-02-01 中国人民解放军空军二十三厂 134.29 40 40~44岁 0.81\n",
"202300003 王天东 1975-01-01 中建二局第三建筑工程有限公司 133.63 48 45~49岁 0.79\n",
"202300004 陈军 1976-07-01 中建二局第三建筑工程有限公司 133.29 46 45~49岁 0.78\n",
"202300005 樊海瑞 1981-06-01 中国民生银行股份有限公司 132.46 41 40~44岁 0.75\n",
"... ... ... ... ... .. ... ...\n",
"202305999 曹恰 1983-09-01 首都师范大学科德学院 109.92 39 35~39岁 0.00\n",
"202306000 罗佳 1981-05-01 厦门方胜众合企业服务有限公司海淀分公司 109.92 41 40~44岁 0.00\n",
"202306001 席盛代 1983-06-01 中国华能集团清洁能源技术研究院有限公司 109.92 39 35~39岁 0.00\n",
"202306002 彭芸芸 1981-09-01 北京汉杰凯德文化传播有限公司 109.92 41 40~44岁 0.00\n",
"202306003 张越 1982-01-01 大爱城投资控股有限公司 109.92 41 40~44岁 0.00\n",
"\n",
"[6003 rows x 7 columns]"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# map - 映射 - 将指定的函数作用到数据系列的每个元素上\n",
"# apply - 应用 - 将指定的函数应用到数据系列的每个元素上\n",
"settle_df['线性归一化积分'] = settle_df.积分分值.map(lambda x: (x - min_score) / (max_score - min_score)).round(2)\n",
"settle_df"
]
},
{
"cell_type": "markdown",
"id": "55e57b00-cb9e-4c9e-bc59-e99b738e2f5d",
"metadata": {},
"source": [
"zscore标准化\n",
"$$\n",
"x^{\\prime} = \\frac{x - \\mu}{\\sigma}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": 118,
"id": "b5fc6260-5337-4161-99f0-d7be43d59361",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>姓名</th>\n",
" <th>出生年月</th>\n",
" <th>单位名称</th>\n",
" <th>积分分值</th>\n",
" <th>年龄</th>\n",
" <th>年龄段</th>\n",
" <th>线性归一化积分</th>\n",
" <th>zscore评分</th>\n",
" </tr>\n",
" <tr>\n",
" <th>公示编号</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>202300001</th>\n",
" <td>张浩</td>\n",
" <td>1977-02-01</td>\n",
" <td>北京首钢股份有限公司</td>\n",
" <td>140.05</td>\n",
" <td>45</td>\n",
" <td>45~49岁</td>\n",
" <td>1.00</td>\n",
" <td>11.362219</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300002</th>\n",
" <td>冯云</td>\n",
" <td>1982-02-01</td>\n",
" <td>中国人民解放军空军二十三厂</td>\n",
" <td>134.29</td>\n",
" <td>40</td>\n",
" <td>40~44岁</td>\n",
" <td>0.81</td>\n",
" <td>9.041455</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300003</th>\n",
" <td>王天东</td>\n",
" <td>1975-01-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.63</td>\n",
" <td>48</td>\n",
" <td>45~49岁</td>\n",
" <td>0.79</td>\n",
" <td>8.775534</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300004</th>\n",
" <td>陈军</td>\n",
" <td>1976-07-01</td>\n",
" <td>中建二局第三建筑工程有限公司</td>\n",
" <td>133.29</td>\n",
" <td>46</td>\n",
" <td>45~49岁</td>\n",
" <td>0.78</td>\n",
" <td>8.638545</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202300005</th>\n",
" <td>樊海瑞</td>\n",
" <td>1981-06-01</td>\n",
" <td>中国民生银行股份有限公司</td>\n",
" <td>132.46</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" <td>0.75</td>\n",
" <td>8.304129</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202305999</th>\n",
" <td>曹恰</td>\n",
" <td>1983-09-01</td>\n",
" <td>首都师范大学科德学院</td>\n",
" <td>109.92</td>\n",
" <td>39</td>\n",
" <td>35~39岁</td>\n",
" <td>0.00</td>\n",
" <td>-0.777472</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202306000</th>\n",
" <td>罗佳</td>\n",
" <td>1981-05-01</td>\n",
" <td>厦门方胜众合企业服务有限公司海淀分公司</td>\n",
" <td>109.92</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" <td>0.00</td>\n",
" <td>-0.777472</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202306001</th>\n",
" <td>席盛代</td>\n",
" <td>1983-06-01</td>\n",
" <td>中国华能集团清洁能源技术研究院有限公司</td>\n",
" <td>109.92</td>\n",
" <td>39</td>\n",
" <td>35~39岁</td>\n",
" <td>0.00</td>\n",
" <td>-0.777472</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202306002</th>\n",
" <td>彭芸芸</td>\n",
" <td>1981-09-01</td>\n",
" <td>北京汉杰凯德文化传播有限公司</td>\n",
" <td>109.92</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" <td>0.00</td>\n",
" <td>-0.777472</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202306003</th>\n",
" <td>张越</td>\n",
" <td>1982-01-01</td>\n",
" <td>大爱城投资控股有限公司</td>\n",
" <td>109.92</td>\n",
" <td>41</td>\n",
" <td>40~44岁</td>\n",
" <td>0.00</td>\n",
" <td>-0.777472</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>6003 rows × 8 columns</p>\n",
"</div>"
],
"text/plain": [
" 姓名 出生年月 单位名称 积分分值 年龄 年龄段 线性归一化积分 \\\n",
"公示编号 \n",
"202300001 张浩 1977-02-01 北京首钢股份有限公司 140.05 45 45~49岁 1.00 \n",
"202300002 冯云 1982-02-01 中国人民解放军空军二十三厂 134.29 40 40~44岁 0.81 \n",
"202300003 王天东 1975-01-01 中建二局第三建筑工程有限公司 133.63 48 45~49岁 0.79 \n",
"202300004 陈军 1976-07-01 中建二局第三建筑工程有限公司 133.29 46 45~49岁 0.78 \n",
"202300005 樊海瑞 1981-06-01 中国民生银行股份有限公司 132.46 41 40~44岁 0.75 \n",
"... ... ... ... ... .. ... ... \n",
"202305999 曹恰 1983-09-01 首都师范大学科德学院 109.92 39 35~39岁 0.00 \n",
"202306000 罗佳 1981-05-01 厦门方胜众合企业服务有限公司海淀分公司 109.92 41 40~44岁 0.00 \n",
"202306001 席盛代 1983-06-01 中国华能集团清洁能源技术研究院有限公司 109.92 39 35~39岁 0.00 \n",
"202306002 彭芸芸 1981-09-01 北京汉杰凯德文化传播有限公司 109.92 41 40~44岁 0.00 \n",
"202306003 张越 1982-01-01 大爱城投资控股有限公司 109.92 41 40~44岁 0.00 \n",
"\n",
" zscore评分 \n",
"公示编号 \n",
"202300001 11.362219 \n",
"202300002 9.041455 \n",
"202300003 8.775534 \n",
"202300004 8.638545 \n",
"202300005 8.304129 \n",
"... ... \n",
"202305999 -0.777472 \n",
"202306000 -0.777472 \n",
"202306001 -0.777472 \n",
"202306002 -0.777472 \n",
"202306003 -0.777472 \n",
"\n",
"[6003 rows x 8 columns]"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mu, sigma = settle_df.积分分值.agg(['mean', 'std'])\n",
"settle_df['zscore评分'] = settle_df.积分分值.apply(lambda x: (x - mu) / sigma)\n",
"settle_df"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}