Python-100-Days/Day66-80/code/day01.ipynb

603 lines
15 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "c664c108-059f-402a-b216-5ba4caa2d98b",
"metadata": {},
"source": [
"## Python数据分析第1天\n",
"\n",
"### 热身练习\n",
"\n",
"如下列表保存着本公司从2022年1月到12月五个销售区域南京、无锡、苏州、徐州、南通的销售额以百万元为单位请利用这些数据完成以下操作\n",
"\n",
"```python\n",
"sales_month = [f'{i:>2d}月' for i in range(1, 13)]\n",
"sales_area = ['南京', '无锡', '苏州', '徐州', '南通']\n",
"sales_data = [\n",
" [32, 17, 12, 20, 28],\n",
" [41, 30, 17, 15, 35],\n",
" [35, 18, 13, 11, 24],\n",
" [12, 42, 44, 21, 34],\n",
" [29, 11, 42, 32, 50],\n",
" [10, 15, 11, 12, 26],\n",
" [16, 28, 48, 22, 28],\n",
" [31, 40, 45, 30, 39],\n",
" [25, 41, 47, 42, 47],\n",
" [47, 21, 13, 49, 48],\n",
" [41, 36, 17, 36, 22],\n",
" [22, 25, 15, 20, 37]\n",
"]\n",
"```\n",
"\n",
"1. 统计本公司每个月的销售额。\n",
"2. 统计本公司销售额的月环比。\n",
"3. 统计每个销售区域全年的销售额。\n",
"4. 按销售额从高到低排序销售区域及其销售额。\n",
"5. 统计全年最高的销售额出现在哪个月哪个区域。\n",
"6. 找出哪个销售区域的业绩最不稳定。"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f9d87cfc-deb0-46eb-b98c-2799a4908bc8",
"metadata": {},
"outputs": [],
"source": [
"sales_month = [f'{i:>2d}月' for i in range(1, 13)]\n",
"sales_area = ['南京', '无锡', '苏州', '徐州', '南通']\n",
"sales_data = [\n",
" [32, 17, 12, 20, 28],\n",
" [41, 30, 17, 15, 35],\n",
" [35, 18, 13, 11, 24],\n",
" [12, 42, 44, 21, 34],\n",
" [29, 11, 42, 32, 50],\n",
" [10, 15, 11, 12, 26],\n",
" [16, 28, 48, 22, 28],\n",
" [31, 40, 45, 30, 39],\n",
" [25, 41, 47, 42, 47],\n",
" [47, 21, 13, 49, 48],\n",
" [41, 36, 17, 36, 22],\n",
" [22, 25, 15, 20, 37]\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc581dfc-9108-46fa-ace2-60ace650434e",
"metadata": {},
"outputs": [],
"source": [
"# 魔法指令 - %whos - 查看变量\n",
"%whos"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a50e4c3e-6dc1-426f-977b-aef9a5c9a02f",
"metadata": {},
"outputs": [],
"source": [
"print = 100"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4c0b54ca-1556-4a14-9a6a-b6bd6af5d822",
"metadata": {},
"outputs": [],
"source": [
"# 魔法指令 - %xdel - 删除变量\n",
"%xdel print"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe8eb05f-f45b-491a-b98e-6f6c924997ff",
"metadata": {},
"outputs": [],
"source": [
"# 1. 统计本公司每个月的销售额。\n",
"monthly_sales = []\n",
"for i, month in enumerate(sales_month):\n",
" monthly_sales.append(sum(sales_data[i]))\n",
" print(f'{month}销售额: {monthly_sales[i]}百万')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53e6bf88-e6a9-4ac9-a7fe-bd1d18ff88f5",
"metadata": {},
"outputs": [],
"source": [
"# 2. 统计本公司销售额的月环比。\n",
"for i in range(1, len(monthly_sales)):\n",
" temp = (monthly_sales[i] - monthly_sales[i - 1]) / monthly_sales[i - 1]\n",
" print(f'{sales_month[i]}: {temp:.2%}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5a130d6-b781-4ee3-a96b-d1fe5e3b4b90",
"metadata": {},
"outputs": [],
"source": [
"# 3. 统计每个销售区域全年的销售额。\n",
"arealy_sales = {}\n",
"for j, area in enumerate(sales_area):\n",
" temp = [sales_data[i][j] for i in range(len(sales_month))]\n",
" arealy_sales[area] = sum(temp)\n",
" print(f'{area}: {arealy_sales[area]}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7bd0510-5e68-4e58-ac3b-6c531f7abccb",
"metadata": {},
"outputs": [],
"source": [
"# 4. 按销售额从高到低排序销售区域及其销售额。\n",
"sorted_keys = sorted(arealy_sales, key=lambda x: arealy_sales[x], reverse=True)\n",
"for key in sorted_keys:\n",
" print(f'{key}: {arealy_sales[key]}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b4b2f3e8-c5c2-481e-b277-9623d30892ac",
"metadata": {},
"outputs": [],
"source": [
"# 5. 统计全年最高的销售额出现在哪个月哪个区域。\n",
"max_value = sales_data[0][0]\n",
"max_i, max_j = 0, 0\n",
"for i in range(len(sales_month)):\n",
" for j in range(len(sales_area)):\n",
" temp = sales_data[i][j]\n",
" if temp > max_value:\n",
" max_value = temp\n",
" max_i, max_j = i, j\n",
"print(sales_month[max_i], sales_area[max_j])"
]
},
{
"cell_type": "markdown",
"id": "647d0a87-b672-4e0c-81cc-a3bbb76dca11",
"metadata": {},
"source": [
"总体方差:\n",
"$$\n",
"\\sigma^{2} = \\frac{1}{N} \\sum_{i=1}^{N}(x_{i} - \\mu)^{2}\n",
"$$\n",
"\n",
"样本方差:\n",
"$$\n",
"s^{2} = \\frac{1}{n - 1} \\sum_{i=1}^{n}(x_{i} - \\bar{x})^{2}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b43fb247-32fc-4e10-a9ee-488fd1f56a9a",
"metadata": {},
"outputs": [],
"source": [
"# 6. 找出哪个销售区域的业绩最不稳定。\n",
"import statistics as stats\n",
"\n",
"arealy_vars = []\n",
"for j, area in enumerate(sales_area):\n",
" temp = [sales_data[i][j] for i in range(len(sales_month))]\n",
" arealy_vars.append(stats.pvariance(temp))\n",
"sales_area[arealy_vars.index(max(arealy_vars))]"
]
},
{
"cell_type": "markdown",
"id": "3ea677d0-7a33-43e5-b10b-ddfcb82f7f6a",
"metadata": {},
"source": [
"### 三大神器\n",
"\n",
"1. numpy - Numerical Python - 核心是`ndarray`类型可以用来表示N维数组提供了一系列处理数据的运算、函数和方法。\n",
"2. pandas - Panel Data Set - 封装了和数据分析(加载、重塑、清洗、预处理、透视、呈现)相关的类型、函数和诸多的方法,为数据分析提供了一站式解决方案。它的核心有三个数据类型,分别是:`Series`、`DataFrame`、`Index`。\n",
"3. matplotlib - 封装了各种常用的统计图表,帮助我们实现数据呈现。\n",
"4. scipy - Scientific Python - 针对NumPy进行了很好的补充提供了高级的数据运算的函数和方法。\n",
"5. scikit-learn - 封装了常用的机器学习(分类、聚类、回归等)算法,除此之外,还提供了数据预处理、特征工程、模型验证相关的函数和方法。\n",
"6. sympy - Symbolic Python - 封装了符号运算相关操作。"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0db758cc-d83c-47c4-9a0b-c7ef5abd6c18",
"metadata": {},
"outputs": [],
"source": [
"# 魔法指令 - %pip - 调用包管理工具pip\n",
"# %pip install numpy pandas matplotlib openpyxl"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8eb6970b-3907-4b84-af60-67cbf67f2e74",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"plt.rcParams['font.sans-serif'].insert(0, 'SimHei')\n",
"plt.rcParams['axes.unicode_minus'] = False"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5fb76dec-cd51-4e79-9bd2-3b210ae20522",
"metadata": {},
"outputs": [],
"source": [
"np.__version__"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6369df9-7577-496c-bfc1-2fce096c0162",
"metadata": {},
"outputs": [],
"source": [
"pd.__version__"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb5733cd-38f7-4afd-b45b-70c1439ab36b",
"metadata": {},
"outputs": [],
"source": [
"# 将嵌套列表处理成二维数组\n",
"data = np.array(sales_data)\n",
"data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da304104-8cf0-4425-b3b4-dcb148ac4b3a",
"metadata": {},
"outputs": [],
"source": [
"# 沿着1轴求和每个月的销售额\n",
"data.sum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1507ac63-f53b-4e36-a7fb-b9c636fd81ea",
"metadata": {},
"outputs": [],
"source": [
"# 沿着0轴求和每个区域的销售\n",
"data.sum(axis=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26be450d-44ba-4d83-9351-c52a13c2c338",
"metadata": {},
"outputs": [],
"source": [
"# 总体方差\n",
"data.var(axis=0).round(1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "81e5b2a0-c86e-4720-909f-ce8b1b6fdd58",
"metadata": {},
"outputs": [],
"source": [
"# 样本方差\n",
"data.var(axis=0, ddof=1).round(1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba4e0f0a-e711-4041-8834-1e3be86ce8a4",
"metadata": {},
"outputs": [],
"source": [
"# 构造DataFrame对象处理二维数据\n",
"df = pd.DataFrame(data, columns=sales_area, index=sales_month)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d1a6a43-6dfc-41e3-98c8-be2681e0d547",
"metadata": {},
"outputs": [],
"source": [
"# 求和默认沿着0轴\n",
"df.sum()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a478ec0e-499f-4e31-b8c2-ba45e691b834",
"metadata": {},
"outputs": [],
"source": [
"# 排序\n",
"df.sum().sort_values(ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6f221833-855c-45ad-91b2-e3f4da627704",
"metadata": {},
"outputs": [],
"source": [
"# 求和指定沿着1轴\n",
"df.sum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80df8865-4ea0-4c72-a581-215cd953cfbe",
"metadata": {},
"outputs": [],
"source": [
"# 计算月环比\n",
"df.sum(axis=1).pct_change()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea4579c3-11cd-4179-9c96-8dbe9a033da2",
"metadata": {},
"outputs": [],
"source": [
"df['合计'] = df.sum(axis=1)\n",
"df['月环比'] = df['合计'].pct_change()\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3c660052-dded-4a0a-8b72-7747d3cae816",
"metadata": {},
"outputs": [],
"source": [
"# 渲染DataFrame\n",
"df.style.format(\n",
" formatter={'月环比': '{:.2%}'},\n",
" na_rep='------'\n",
").bar(\n",
" subset='合计'\n",
").background_gradient(\n",
" 'RdYlBu', subset='月环比'\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a092c12c-dab6-4272-b1cd-5218998fcd90",
"metadata": {},
"outputs": [],
"source": [
"# 将DataFrame输出到Excel文件\n",
"df.to_excel('sales.xlsx', sheet_name='data')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "54c3f505-e866-4c4e-a3f8-f55a71a95c3f",
"metadata": {},
"outputs": [],
"source": [
"# 魔法指令 - %config - 修改配置\n",
"# %config InlineBackend.figure_format = 'svg'\n",
"get_ipython().run_line_magic('config', 'InlineBackend.figure_format = \"svg\"')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3951055d-d5d2-4e4e-bbe7-a1b40a6731e0",
"metadata": {},
"outputs": [],
"source": [
"# 绘制柱状图\n",
"plt.figure(figsize=(8, 4), dpi=200)\n",
"df.plot(ax=plt.gca(), kind='bar', y='合计', legend=False)\n",
"plt.xticks(rotation=0)\n",
"plt.savefig('aa.png')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "8a5236f7-072b-466c-9be3-afbab394f5cb",
"metadata": {},
"source": [
"### 魔法指令"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5c6a18b-2863-4855-8ef7-2c0aa99b7d5c",
"metadata": {},
"outputs": [],
"source": [
"# 查看当前工作路径 - print working directory\n",
"%pwd"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "80a9f9e0-1528-40cf-910c-f3c8e5e7e3b9",
"metadata": {},
"outputs": [],
"source": [
"# 查看指定路径文件列表 - list directory contents\n",
"%ls"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "620a54ed-9c29-4058-9d20-c4df72ba4c62",
"metadata": {},
"outputs": [],
"source": [
"# 执行系统命令\n",
"%system date"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "659215ed-113a-4d8f-9036-0fcf47c96021",
"metadata": {},
"outputs": [],
"source": [
"# 保存运行过的代码\n",
"%save temp.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8fc9c4e4-1423-40f3-b4ee-db2ba2e5d125",
"metadata": {},
"outputs": [],
"source": [
"# 加载指定文件内容\n",
"%load temp.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58a08283-561c-43d4-8db6-74cde401b8a9",
"metadata": {},
"outputs": [],
"source": [
"# 统计代码执行时间\n",
"%timeit (1, 2, 3, 4, 5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22a271ab-3f5c-4167-b89e-66a31e891cbd",
"metadata": {},
"outputs": [],
"source": [
"# 查看历史输入\n",
"%hist"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d4ffa792-f1a0-4be9-b2aa-642ee0b9a1ae",
"metadata": {},
"outputs": [],
"source": [
"# 查看魔法指令\n",
"%lsmagic"
]
},
{
"cell_type": "markdown",
"id": "a15db907-c068-41d7-a24c-8f1c5c20d4ec",
"metadata": {},
"source": [
"### 获取帮助"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e037694-9357-46b9-864a-c5f93e1aa8c8",
"metadata": {},
"outputs": [],
"source": [
"np.random?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11a97abd-d73d-493e-b727-9c4ded3e5060",
"metadata": {},
"outputs": [],
"source": [
"np.random.normal?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66503921-cd69-4394-80ea-7fecf6ecdc33",
"metadata": {},
"outputs": [],
"source": [
"np.random.r*?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}