603 lines
15 KiB
Plaintext
603 lines
15 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c664c108-059f-402a-b216-5ba4caa2d98b",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Python数据分析第1天\n",
|
||
"\n",
|
||
"### 热身练习\n",
|
||
"\n",
|
||
"如下列表保存着本公司从2022年1月到12月五个销售区域(南京、无锡、苏州、徐州、南通)的销售额(以百万元为单位),请利用这些数据完成以下操作:\n",
|
||
"\n",
|
||
"```python\n",
|
||
"sales_month = [f'{i:>2d}月' for i in range(1, 13)]\n",
|
||
"sales_area = ['南京', '无锡', '苏州', '徐州', '南通']\n",
|
||
"sales_data = [\n",
|
||
" [32, 17, 12, 20, 28],\n",
|
||
" [41, 30, 17, 15, 35],\n",
|
||
" [35, 18, 13, 11, 24],\n",
|
||
" [12, 42, 44, 21, 34],\n",
|
||
" [29, 11, 42, 32, 50],\n",
|
||
" [10, 15, 11, 12, 26],\n",
|
||
" [16, 28, 48, 22, 28],\n",
|
||
" [31, 40, 45, 30, 39],\n",
|
||
" [25, 41, 47, 42, 47],\n",
|
||
" [47, 21, 13, 49, 48],\n",
|
||
" [41, 36, 17, 36, 22],\n",
|
||
" [22, 25, 15, 20, 37]\n",
|
||
"]\n",
|
||
"```\n",
|
||
"\n",
|
||
"1. 统计本公司每个月的销售额。\n",
|
||
"2. 统计本公司销售额的月环比。\n",
|
||
"3. 统计每个销售区域全年的销售额。\n",
|
||
"4. 按销售额从高到低排序销售区域及其销售额。\n",
|
||
"5. 统计全年最高的销售额出现在哪个月哪个区域。\n",
|
||
"6. 找出哪个销售区域的业绩最不稳定。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f9d87cfc-deb0-46eb-b98c-2799a4908bc8",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sales_month = [f'{i:>2d}月' for i in range(1, 13)]\n",
|
||
"sales_area = ['南京', '无锡', '苏州', '徐州', '南通']\n",
|
||
"sales_data = [\n",
|
||
" [32, 17, 12, 20, 28],\n",
|
||
" [41, 30, 17, 15, 35],\n",
|
||
" [35, 18, 13, 11, 24],\n",
|
||
" [12, 42, 44, 21, 34],\n",
|
||
" [29, 11, 42, 32, 50],\n",
|
||
" [10, 15, 11, 12, 26],\n",
|
||
" [16, 28, 48, 22, 28],\n",
|
||
" [31, 40, 45, 30, 39],\n",
|
||
" [25, 41, 47, 42, 47],\n",
|
||
" [47, 21, 13, 49, 48],\n",
|
||
" [41, 36, 17, 36, 22],\n",
|
||
" [22, 25, 15, 20, 37]\n",
|
||
"]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "dc581dfc-9108-46fa-ace2-60ace650434e",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 魔法指令 - %whos - 查看变量\n",
|
||
"%whos"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a50e4c3e-6dc1-426f-977b-aef9a5c9a02f",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"print = 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "4c0b54ca-1556-4a14-9a6a-b6bd6af5d822",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 魔法指令 - %xdel - 删除变量\n",
|
||
"%xdel print"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "fe8eb05f-f45b-491a-b98e-6f6c924997ff",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 1. 统计本公司每个月的销售额。\n",
|
||
"monthly_sales = []\n",
|
||
"for i, month in enumerate(sales_month):\n",
|
||
" monthly_sales.append(sum(sales_data[i]))\n",
|
||
" print(f'{month}销售额: {monthly_sales[i]}百万')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "53e6bf88-e6a9-4ac9-a7fe-bd1d18ff88f5",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 2. 统计本公司销售额的月环比。\n",
|
||
"for i in range(1, len(monthly_sales)):\n",
|
||
" temp = (monthly_sales[i] - monthly_sales[i - 1]) / monthly_sales[i - 1]\n",
|
||
" print(f'{sales_month[i]}: {temp:.2%}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f5a130d6-b781-4ee3-a96b-d1fe5e3b4b90",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 3. 统计每个销售区域全年的销售额。\n",
|
||
"arealy_sales = {}\n",
|
||
"for j, area in enumerate(sales_area):\n",
|
||
" temp = [sales_data[i][j] for i in range(len(sales_month))]\n",
|
||
" arealy_sales[area] = sum(temp)\n",
|
||
" print(f'{area}: {arealy_sales[area]}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a7bd0510-5e68-4e58-ac3b-6c531f7abccb",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 4. 按销售额从高到低排序销售区域及其销售额。\n",
|
||
"sorted_keys = sorted(arealy_sales, key=lambda x: arealy_sales[x], reverse=True)\n",
|
||
"for key in sorted_keys:\n",
|
||
" print(f'{key}: {arealy_sales[key]}')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b4b2f3e8-c5c2-481e-b277-9623d30892ac",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 5. 统计全年最高的销售额出现在哪个月哪个区域。\n",
|
||
"max_value = sales_data[0][0]\n",
|
||
"max_i, max_j = 0, 0\n",
|
||
"for i in range(len(sales_month)):\n",
|
||
" for j in range(len(sales_area)):\n",
|
||
" temp = sales_data[i][j]\n",
|
||
" if temp > max_value:\n",
|
||
" max_value = temp\n",
|
||
" max_i, max_j = i, j\n",
|
||
"print(sales_month[max_i], sales_area[max_j])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "647d0a87-b672-4e0c-81cc-a3bbb76dca11",
|
||
"metadata": {},
|
||
"source": [
|
||
"总体方差:\n",
|
||
"$$\n",
|
||
"\\sigma^{2} = \\frac{1}{N} \\sum_{i=1}^{N}(x_{i} - \\mu)^{2}\n",
|
||
"$$\n",
|
||
"\n",
|
||
"样本方差:\n",
|
||
"$$\n",
|
||
"s^{2} = \\frac{1}{n - 1} \\sum_{i=1}^{n}(x_{i} - \\bar{x})^{2}\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b43fb247-32fc-4e10-a9ee-488fd1f56a9a",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 6. 找出哪个销售区域的业绩最不稳定。\n",
|
||
"import statistics as stats\n",
|
||
"\n",
|
||
"arealy_vars = []\n",
|
||
"for j, area in enumerate(sales_area):\n",
|
||
" temp = [sales_data[i][j] for i in range(len(sales_month))]\n",
|
||
" arealy_vars.append(stats.pvariance(temp))\n",
|
||
"sales_area[arealy_vars.index(max(arealy_vars))]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3ea677d0-7a33-43e5-b10b-ddfcb82f7f6a",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 三大神器\n",
|
||
"\n",
|
||
"1. numpy - Numerical Python - 核心是`ndarray`类型,可以用来表示N维数组,提供了一系列处理数据的运算、函数和方法。\n",
|
||
"2. pandas - Panel Data Set - 封装了和数据分析(加载、重塑、清洗、预处理、透视、呈现)相关的类型、函数和诸多的方法,为数据分析提供了一站式解决方案。它的核心有三个数据类型,分别是:`Series`、`DataFrame`、`Index`。\n",
|
||
"3. matplotlib - 封装了各种常用的统计图表,帮助我们实现数据呈现。\n",
|
||
"4. scipy - Scientific Python - 针对NumPy进行了很好的补充,提供了高级的数据运算的函数和方法。\n",
|
||
"5. scikit-learn - 封装了常用的机器学习(分类、聚类、回归等)算法,除此之外,还提供了数据预处理、特征工程、模型验证相关的函数和方法。\n",
|
||
"6. sympy - Symbolic Python - 封装了符号运算相关操作。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "0db758cc-d83c-47c4-9a0b-c7ef5abd6c18",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 魔法指令 - %pip - 调用包管理工具pip\n",
|
||
"# %pip install numpy pandas matplotlib openpyxl"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "8eb6970b-3907-4b84-af60-67cbf67f2e74",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"import pandas as pd\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"\n",
|
||
"plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n",
|
||
"plt.rcParams['axes.unicode_minus'] = False"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "5fb76dec-cd51-4e79-9bd2-3b210ae20522",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"np.__version__"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "e6369df9-7577-496c-bfc1-2fce096c0162",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"pd.__version__"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "eb5733cd-38f7-4afd-b45b-70c1439ab36b",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 将嵌套列表处理成二维数组\n",
|
||
"data = np.array(sales_data)\n",
|
||
"data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "da304104-8cf0-4425-b3b4-dcb148ac4b3a",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 沿着1轴求和(每个月的销售额)\n",
|
||
"data.sum(axis=1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "1507ac63-f53b-4e36-a7fb-b9c636fd81ea",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 沿着0轴求和(每个区域的销售)\n",
|
||
"data.sum(axis=0)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "26be450d-44ba-4d83-9351-c52a13c2c338",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 总体方差\n",
|
||
"data.var(axis=0).round(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "81e5b2a0-c86e-4720-909f-ce8b1b6fdd58",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 样本方差\n",
|
||
"data.var(axis=0, ddof=1).round(1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ba4e0f0a-e711-4041-8834-1e3be86ce8a4",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 构造DataFrame对象(处理二维数据)\n",
|
||
"df = pd.DataFrame(data, columns=sales_area, index=sales_month)\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "9d1a6a43-6dfc-41e3-98c8-be2681e0d547",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 求和(默认沿着0轴)\n",
|
||
"df.sum()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a478ec0e-499f-4e31-b8c2-ba45e691b834",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 排序\n",
|
||
"df.sum().sort_values(ascending=False)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "6f221833-855c-45ad-91b2-e3f4da627704",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 求和(指定沿着1轴)\n",
|
||
"df.sum(axis=1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "80df8865-4ea0-4c72-a581-215cd953cfbe",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 计算月环比\n",
|
||
"df.sum(axis=1).pct_change()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ea4579c3-11cd-4179-9c96-8dbe9a033da2",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"df['合计'] = df.sum(axis=1)\n",
|
||
"df['月环比'] = df['合计'].pct_change()\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "3c660052-dded-4a0a-8b72-7747d3cae816",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 渲染DataFrame\n",
|
||
"df.style.format(\n",
|
||
" formatter={'月环比': '{:.2%}'},\n",
|
||
" na_rep='------'\n",
|
||
").bar(\n",
|
||
" subset='合计'\n",
|
||
").background_gradient(\n",
|
||
" 'RdYlBu', subset='月环比'\n",
|
||
")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a092c12c-dab6-4272-b1cd-5218998fcd90",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 将DataFrame输出到Excel文件\n",
|
||
"df.to_excel('sales.xlsx', sheet_name='data')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "54c3f505-e866-4c4e-a3f8-f55a71a95c3f",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 魔法指令 - %config - 修改配置\n",
|
||
"# %config InlineBackend.figure_format = 'svg'\n",
|
||
"get_ipython().run_line_magic('config', 'InlineBackend.figure_format = \"svg\"')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "3951055d-d5d2-4e4e-bbe7-a1b40a6731e0",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 绘制柱状图\n",
|
||
"plt.figure(figsize=(8, 4), dpi=200)\n",
|
||
"df.plot(ax=plt.gca(), kind='bar', y='合计', legend=False)\n",
|
||
"plt.xticks(rotation=0)\n",
|
||
"plt.savefig('aa.png')\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "8a5236f7-072b-466c-9be3-afbab394f5cb",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 魔法指令"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d5c6a18b-2863-4855-8ef7-2c0aa99b7d5c",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 查看当前工作路径 - print working directory\n",
|
||
"%pwd"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "80a9f9e0-1528-40cf-910c-f3c8e5e7e3b9",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 查看指定路径文件列表 - list directory contents\n",
|
||
"%ls"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "620a54ed-9c29-4058-9d20-c4df72ba4c62",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 执行系统命令\n",
|
||
"%system date"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "659215ed-113a-4d8f-9036-0fcf47c96021",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 保存运行过的代码\n",
|
||
"%save temp.py"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "8fc9c4e4-1423-40f3-b4ee-db2ba2e5d125",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 加载指定文件内容\n",
|
||
"%load temp.py"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "58a08283-561c-43d4-8db6-74cde401b8a9",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 统计代码执行时间\n",
|
||
"%timeit (1, 2, 3, 4, 5)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "22a271ab-3f5c-4167-b89e-66a31e891cbd",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 查看历史输入\n",
|
||
"%hist"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d4ffa792-f1a0-4be9-b2aa-642ee0b9a1ae",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# 查看魔法指令\n",
|
||
"%lsmagic"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a15db907-c068-41d7-a24c-8f1c5c20d4ec",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 获取帮助"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "5e037694-9357-46b9-864a-c5f93e1aa8c8",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"np.random?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "11a97abd-d73d-493e-b727-9c4ded3e5060",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"np.random.normal?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "66503921-cd69-4394-80ea-7fecf6ecdc33",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"np.random.r*?"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.7"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|