mirror of https://github.com/kubeflow/examples.git
2269 lines
95 KiB
Plaintext
2269 lines
95 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# JPX Tokyo Stock Exchange Kale Pipeline\n",
|
||
"\n",
|
||
"In this [Kaggle competition](https://www.kaggle.com/competitions/jpx-tokyo-stock-exchange-prediction/overview) \n",
|
||
"\n",
|
||
">Japan Exchange Group, Inc. (JPX) is a holding company operating one of the largest stock exchanges in the world, Tokyo Stock Exchange (TSE), and derivatives exchanges Osaka Exchange (OSE) and Tokyo Commodity Exchange (TOCOM). JPX is hosting this competition and is supported by AI technology company AlpacaJapan Co.,Ltd.\n",
|
||
"\n",
|
||
"> In this competition, you will model real future returns of around 2,000 stocks. The competition will involve building portfolios from the stocks eligible for predictions. The stocks are ranked from highest to lowest expected returns and they are evaluated on the difference in returns between the top and bottom 200 stocks."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# Install necessary packages\n",
|
||
"\n",
|
||
"We can install the necessary package by either running `pip install --user <package_name>` or include everything in a `requirements.txt` file and run `pip install --user -r requirements.txt`. We have put the dependencies in a `requirements.txt` file so we will use the former method.\n",
|
||
"\n",
|
||
"> NOTE: Do not forget to use the `--user` argument. It is necessary if you want to use Kale to transform this notebook into a Kubeflow pipeline.",
|
||
"\n",
|
||
"After installing python packages, restart notebook kernel before proceeding.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"papermill": {
|
||
"duration": 1.321604,
|
||
"end_time": "2022-04-17T07:17:04.141763",
|
||
"exception": false,
|
||
"start_time": "2022-04-17T07:17:02.820159",
|
||
"status": "completed"
|
||
},
|
||
"tags": [
|
||
"skip"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# After installation, restart the kernel.\n",
|
||
"!pip install -r requirements.txt --user --quiet"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# Imports\n",
|
||
"\n",
|
||
"In this section we import the packages we need for this example. Make it a habit to gather your imports in a single place. It will make your life easier if you are going to transform this notebook into a Kubeflow pipeline using Kale."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {
|
||
"tags": [
|
||
"imports"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import sys, os, subprocess\n",
|
||
"from tqdm import tqdm\n",
|
||
"import numpy as np\n",
|
||
"import pandas as pd\n",
|
||
"from scipy import stats\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"import zipfile\n",
|
||
"import joblib\n",
|
||
"\n",
|
||
"from lightgbm import LGBMRegressor\n",
|
||
"from sklearn.metrics import mean_squared_error\n",
|
||
"pd.set_option('display.max_columns', 500)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# Project hyper-parameters\n",
|
||
"\n",
|
||
"In this cell, we define the different hyper-parameters. Defining them in one place makes it easier to experiment with their values and also facilitates the execution of HP Tuning experiments using Kale and Katib."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {
|
||
"tags": [
|
||
"pipeline-parameters"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Hyper-parameters\n",
|
||
"LR = 0.379687157316759\n",
|
||
"N_EST = 100"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"Set random seed for reproducibility and ignore warning messages."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {
|
||
"tags": [
|
||
"skip"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"np.random.seed(2022)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# Download and load the dataset\n",
|
||
"\n",
|
||
"In this section, we download the data from kaggle to get it in a ready-to-use form by the model. \n",
|
||
"\n",
|
||
"First, let us load and analyze the data.\n",
|
||
"\n",
|
||
"The data are in csv format, thus, we use the handy read_csv pandas method. There is one train data set and two test sets (one public and one private)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {
|
||
"tags": [
|
||
"block:load_data"
|
||
]
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"CompletedProcess(args=['kaggle', 'competitions', 'download', '-c', 'jpx-tokyo-stock-exchange-prediction'], returncode=0)"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# setup kaggle environment for data download\n",
|
||
"dataset = \"jpx-tokyo-stock-exchange-prediction\"\n",
|
||
"\n",
|
||
"# setup kaggle environment for data download\n",
|
||
"with open('/secret/kaggle-secret/password', 'r') as file:\n",
|
||
" kaggle_key = file.read().rstrip()\n",
|
||
"with open('/secret/kaggle-secret/username', 'r') as file:\n",
|
||
" kaggle_user = file.read().rstrip()\n",
|
||
"\n",
|
||
"os.environ['KAGGLE_USERNAME'], os.environ['KAGGLE_KEY'] = kaggle_user, kaggle_key\n",
|
||
"\n",
|
||
"# download kaggle's jpx-tokyo-stock-exchange-prediction data\n",
|
||
"subprocess.run([\"kaggle\",\"competitions\", \"download\", \"-c\", dataset])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# path to download to\n",
|
||
"data_path = 'data'\n",
|
||
"\n",
|
||
"# extract jpx-tokyo-stock-exchange-prediction.zip to load_data_path\n",
|
||
"with zipfile.ZipFile(f\"{dataset}.zip\",\"r\") as zip_ref:\n",
|
||
" zip_ref.extractall(data_path)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# read train_files/stock_prices.csv\n",
|
||
"df_prices = pd.read_csv(f\"{data_path}/train_files/stock_prices.csv\", parse_dates=['Date'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Timestamp('2021-12-03 00:00:00')"
|
||
]
|
||
},
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_prices['Date'].max()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>RowId</th>\n",
|
||
" <th>Date</th>\n",
|
||
" <th>SecuritiesCode</th>\n",
|
||
" <th>Open</th>\n",
|
||
" <th>High</th>\n",
|
||
" <th>Low</th>\n",
|
||
" <th>Close</th>\n",
|
||
" <th>Volume</th>\n",
|
||
" <th>AdjustmentFactor</th>\n",
|
||
" <th>ExpectedDividend</th>\n",
|
||
" <th>SupervisionFlag</th>\n",
|
||
" <th>Target</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2332528</th>\n",
|
||
" <td>20211203_9993</td>\n",
|
||
" <td>2021-12-03</td>\n",
|
||
" <td>9993</td>\n",
|
||
" <td>1690.0</td>\n",
|
||
" <td>1690.0</td>\n",
|
||
" <td>1645.0</td>\n",
|
||
" <td>1645.0</td>\n",
|
||
" <td>7200</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>-0.004302</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2332529</th>\n",
|
||
" <td>20211203_9994</td>\n",
|
||
" <td>2021-12-03</td>\n",
|
||
" <td>9994</td>\n",
|
||
" <td>2388.0</td>\n",
|
||
" <td>2396.0</td>\n",
|
||
" <td>2380.0</td>\n",
|
||
" <td>2389.0</td>\n",
|
||
" <td>6500</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>0.009098</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2332530</th>\n",
|
||
" <td>20211203_9997</td>\n",
|
||
" <td>2021-12-03</td>\n",
|
||
" <td>9997</td>\n",
|
||
" <td>690.0</td>\n",
|
||
" <td>711.0</td>\n",
|
||
" <td>686.0</td>\n",
|
||
" <td>696.0</td>\n",
|
||
" <td>381100</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>0.018414</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" RowId Date SecuritiesCode Open High Low \\\n",
|
||
"2332528 20211203_9993 2021-12-03 9993 1690.0 1690.0 1645.0 \n",
|
||
"2332529 20211203_9994 2021-12-03 9994 2388.0 2396.0 2380.0 \n",
|
||
"2332530 20211203_9997 2021-12-03 9997 690.0 711.0 686.0 \n",
|
||
"\n",
|
||
" Close Volume AdjustmentFactor ExpectedDividend SupervisionFlag \\\n",
|
||
"2332528 1645.0 7200 1.0 NaN False \n",
|
||
"2332529 2389.0 6500 1.0 NaN False \n",
|
||
"2332530 696.0 381100 1.0 NaN False \n",
|
||
"\n",
|
||
" Target \n",
|
||
"2332528 -0.004302 \n",
|
||
"2332529 0.009098 \n",
|
||
"2332530 0.018414 "
|
||
]
|
||
},
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_prices.tail(3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(2332531, 12)"
|
||
]
|
||
},
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# lets check data dimensions\n",
|
||
"df_prices.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"RangeIndex: 2332531 entries, 0 to 2332530\n",
|
||
"Data columns (total 12 columns):\n",
|
||
" # Column Dtype \n",
|
||
"--- ------ ----- \n",
|
||
" 0 RowId object \n",
|
||
" 1 Date datetime64[ns]\n",
|
||
" 2 SecuritiesCode int64 \n",
|
||
" 3 Open float64 \n",
|
||
" 4 High float64 \n",
|
||
" 5 Low float64 \n",
|
||
" 6 Close float64 \n",
|
||
" 7 Volume int64 \n",
|
||
" 8 AdjustmentFactor float64 \n",
|
||
" 9 ExpectedDividend float64 \n",
|
||
" 10 SupervisionFlag bool \n",
|
||
" 11 Target float64 \n",
|
||
"dtypes: bool(1), datetime64[ns](1), float64(7), int64(2), object(1)\n",
|
||
"memory usage: 198.0+ MB\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"df_prices.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"RowId 0\n",
|
||
"Date 0\n",
|
||
"SecuritiesCode 0\n",
|
||
"Open 7608\n",
|
||
"High 7608\n",
|
||
"Low 7608\n",
|
||
"Close 7608\n",
|
||
"Volume 0\n",
|
||
"AdjustmentFactor 0\n",
|
||
"ExpectedDividend 2313666\n",
|
||
"SupervisionFlag 0\n",
|
||
"Target 238\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# check total nan values per column\n",
|
||
"df_prices.isna().sum()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# Transform Data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"metadata": {
|
||
"tags": [
|
||
"block:transform_data",
|
||
"prev:load_data"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# sort data by 'Date' and 'SecuritiesCode'\n",
|
||
"df_prices.sort_values(by=['Date','SecuritiesCode'], inplace=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# sort data by 'Date' and 'SecuritiesCode'\n",
|
||
"df_prices.sort_values(by=['Date','SecuritiesCode'], inplace=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Date</th>\n",
|
||
" <th>SecuritiesCode</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>2017-01-04</td>\n",
|
||
" <td>1865</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>2017-01-05</td>\n",
|
||
" <td>1865</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>2017-01-06</td>\n",
|
||
" <td>1865</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>2017-01-10</td>\n",
|
||
" <td>1865</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>2017-01-11</td>\n",
|
||
" <td>1865</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1197</th>\n",
|
||
" <td>2021-11-29</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1198</th>\n",
|
||
" <td>2021-11-30</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1199</th>\n",
|
||
" <td>2021-12-01</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1200</th>\n",
|
||
" <td>2021-12-02</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1201</th>\n",
|
||
" <td>2021-12-03</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>1202 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Date SecuritiesCode\n",
|
||
"0 2017-01-04 1865\n",
|
||
"1 2017-01-05 1865\n",
|
||
"2 2017-01-06 1865\n",
|
||
"3 2017-01-10 1865\n",
|
||
"4 2017-01-11 1865\n",
|
||
"... ... ...\n",
|
||
"1197 2021-11-29 2000\n",
|
||
"1198 2021-11-30 2000\n",
|
||
"1199 2021-12-01 2000\n",
|
||
"1200 2021-12-02 2000\n",
|
||
"1201 2021-12-03 2000\n",
|
||
"\n",
|
||
"[1202 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# count total trading stocks per day \n",
|
||
"idcount = df_prices.groupby(\"Date\")[\"SecuritiesCode\"].count().reset_index()\n",
|
||
"idcount"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 720x360 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"plt.figure(figsize=(10, 5))\n",
|
||
"plt.plot(idcount[\"Date\"],idcount[\"SecuritiesCode\"])\n",
|
||
"plt.axvline(x=['2021-01-01'], color='blue', label='2021-01-01')\n",
|
||
"plt.axvline(x=['2020-06-01'], color='red', label='2020-06-01')\n",
|
||
"plt.legend()\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Date</th>\n",
|
||
" <th>SecuritiesCode</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>970</th>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>971</th>\n",
|
||
" <td>2020-12-24</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>972</th>\n",
|
||
" <td>2020-12-25</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>973</th>\n",
|
||
" <td>2020-12-28</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>974</th>\n",
|
||
" <td>2020-12-29</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1197</th>\n",
|
||
" <td>2021-11-29</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1198</th>\n",
|
||
" <td>2021-11-30</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1199</th>\n",
|
||
" <td>2021-12-01</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1200</th>\n",
|
||
" <td>2021-12-02</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1201</th>\n",
|
||
" <td>2021-12-03</td>\n",
|
||
" <td>2000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>232 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Date SecuritiesCode\n",
|
||
"970 2020-12-23 2000\n",
|
||
"971 2020-12-24 2000\n",
|
||
"972 2020-12-25 2000\n",
|
||
"973 2020-12-28 2000\n",
|
||
"974 2020-12-29 2000\n",
|
||
"... ... ...\n",
|
||
"1197 2021-11-29 2000\n",
|
||
"1198 2021-11-30 2000\n",
|
||
"1199 2021-12-01 2000\n",
|
||
"1200 2021-12-02 2000\n",
|
||
"1201 2021-12-03 2000\n",
|
||
"\n",
|
||
"[232 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"idcount[idcount['SecuritiesCode'] >= 2000]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"464000"
|
||
]
|
||
},
|
||
"execution_count": 19,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"idcount[idcount['SecuritiesCode'] >= 2000]['SecuritiesCode'].sum()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# filter out data with less than 2000 stock counts in a day\n",
|
||
"# dates before ‘2020-12-23’ all have stock counts less than 2000\n",
|
||
"# This is done to work with consistent data \n",
|
||
"df_prices = df_prices[(df_prices[\"Date\"]>=\"2020-12-23\")]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_prices = df_prices.reset_index(drop=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>RowId</th>\n",
|
||
" <th>Date</th>\n",
|
||
" <th>SecuritiesCode</th>\n",
|
||
" <th>Open</th>\n",
|
||
" <th>High</th>\n",
|
||
" <th>Low</th>\n",
|
||
" <th>Close</th>\n",
|
||
" <th>Volume</th>\n",
|
||
" <th>AdjustmentFactor</th>\n",
|
||
" <th>ExpectedDividend</th>\n",
|
||
" <th>SupervisionFlag</th>\n",
|
||
" <th>Target</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>20201223_1301</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1301</td>\n",
|
||
" <td>2913.0</td>\n",
|
||
" <td>2920.0</td>\n",
|
||
" <td>2906.0</td>\n",
|
||
" <td>2913.0</td>\n",
|
||
" <td>6300</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>-0.000343</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>20201223_1332</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1332</td>\n",
|
||
" <td>419.0</td>\n",
|
||
" <td>421.0</td>\n",
|
||
" <td>416.0</td>\n",
|
||
" <td>419.0</td>\n",
|
||
" <td>1413600</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>0.007143</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>20201223_1333</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1333</td>\n",
|
||
" <td>2187.0</td>\n",
|
||
" <td>2195.0</td>\n",
|
||
" <td>2158.0</td>\n",
|
||
" <td>2165.0</td>\n",
|
||
" <td>119000</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>0.005051</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>20201223_1375</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1375</td>\n",
|
||
" <td>1711.0</td>\n",
|
||
" <td>1757.0</td>\n",
|
||
" <td>1701.0</td>\n",
|
||
" <td>1752.0</td>\n",
|
||
" <td>446300</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>-0.003484</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>20201223_1376</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1376</td>\n",
|
||
" <td>1589.0</td>\n",
|
||
" <td>1589.0</td>\n",
|
||
" <td>1575.0</td>\n",
|
||
" <td>1586.0</td>\n",
|
||
" <td>1900</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>-0.009494</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" RowId Date SecuritiesCode Open High Low Close \\\n",
|
||
"0 20201223_1301 2020-12-23 1301 2913.0 2920.0 2906.0 2913.0 \n",
|
||
"1 20201223_1332 2020-12-23 1332 419.0 421.0 416.0 419.0 \n",
|
||
"2 20201223_1333 2020-12-23 1333 2187.0 2195.0 2158.0 2165.0 \n",
|
||
"3 20201223_1375 2020-12-23 1375 1711.0 1757.0 1701.0 1752.0 \n",
|
||
"4 20201223_1376 2020-12-23 1376 1589.0 1589.0 1575.0 1586.0 \n",
|
||
"\n",
|
||
" Volume AdjustmentFactor ExpectedDividend SupervisionFlag Target \n",
|
||
"0 6300 1.0 NaN False -0.000343 \n",
|
||
"1 1413600 1.0 NaN False 0.007143 \n",
|
||
"2 119000 1.0 NaN False 0.005051 \n",
|
||
"3 446300 1.0 NaN False -0.003484 \n",
|
||
"4 1900 1.0 NaN False -0.009494 "
|
||
]
|
||
},
|
||
"execution_count": 22,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_prices.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Index(['RowId', 'Date', 'SecuritiesCode', 'Open', 'High', 'Low', 'Close',\n",
|
||
" 'Volume', 'AdjustmentFactor', 'ExpectedDividend', 'SupervisionFlag',\n",
|
||
" 'Target'],\n",
|
||
" dtype='object')"
|
||
]
|
||
},
|
||
"execution_count": 23,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_prices.columns"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#calculate z-scores of `df`\n",
|
||
"z_scores = stats.zscore(df_prices[['Open', 'High', 'Low', 'Close','Volume']], nan_policy='omit')\n",
|
||
"abs_z_scores = np.abs(z_scores)\n",
|
||
"filtered_entries = (abs_z_scores < 3).all(axis=1)\n",
|
||
"df_zscore = df_prices[filtered_entries]\n",
|
||
"df_zscore = df_zscore.reset_index(drop=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_zscore = df_zscore.reset_index(drop=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"papermill": {
|
||
"duration": 0.01421,
|
||
"end_time": "2022-04-17T07:17:13.396620",
|
||
"exception": false,
|
||
"start_time": "2022-04-17T07:17:13.382410",
|
||
"status": "completed"
|
||
},
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"<h1>Feature Engineering\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"metadata": {
|
||
"tags": [
|
||
"block:feature_engineering",
|
||
"prev:transform_data"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def feat_eng(df, features):\n",
|
||
"\n",
|
||
" for i in tqdm(range(1, 4)):\n",
|
||
" # creating lag features\n",
|
||
" tmp = df[features].shift(i)\n",
|
||
" tmp.columns = [c + f'_next_shift_{i}' for c in tmp.columns]\n",
|
||
" df = pd.concat([df, tmp], sort=False, axis=1)\n",
|
||
"\n",
|
||
" for i in tqdm(range(1, 4)):\n",
|
||
" df[f'weighted_vol_price_{i}'] = np.log(df[f'Volume_next_shift_{i}'] * df[[col for col in df if col.endswith(f'next_shift_{i}')][:-1]].apply(np.mean, axis=1))\n",
|
||
" \n",
|
||
" # feature engineering\n",
|
||
" df['weighted_vol_price'] = np.log(df['Volume'] * (np.mean(df[features[:-1]], axis=1)))\n",
|
||
" df['BOP'] = (df['Open']-df['Close'])/(df['High']-df['Low'])\n",
|
||
" df['HL'] = df['High'] - df['Low']\n",
|
||
" df['OC'] = df['Close'] - df['Open']\n",
|
||
" df['OHLCstd'] = df[['Open','Close','High','Low']].std(axis=1)\n",
|
||
" \n",
|
||
" feats = df.select_dtypes(include=float).columns\n",
|
||
" df[feats] = df[feats].apply(np.log)\n",
|
||
" \n",
|
||
" # replace inf with nan\n",
|
||
" df.replace([np.inf, -np.inf], np.nan, inplace=True)\n",
|
||
" \n",
|
||
" # datetime features\n",
|
||
" df['Date'] = pd.to_datetime(df['Date'])\n",
|
||
" df['Day'] = df['Date'].dt.weekday.astype(np.int32)\n",
|
||
" df[\"dayofyear\"] = df['Date'].dt.dayofyear\n",
|
||
" df[\"is_weekend\"] = df['Day'].isin([5, 6])\n",
|
||
" df[\"weekofyear\"] = df['Date'].dt.weekofyear\n",
|
||
" df[\"month\"] = df['Date'].dt.month\n",
|
||
" df[\"season\"] = (df[\"month\"]%12 + 3)//3\n",
|
||
" \n",
|
||
" # fill nan values\n",
|
||
" df = df.fillna(0)\n",
|
||
" return df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"100%|██████████| 3/3 [00:00<00:00, 12.99it/s]\n",
|
||
"100%|██████████| 3/3 [02:58<00:00, 59.46s/it]\n",
|
||
"/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:30: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated. Please use Series.dt.isocalendar().week instead.\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"new_feats = feat_eng(df_zscore, ['High', 'Low', 'Open', 'Close', 'Volume'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(452481, 41)"
|
||
]
|
||
},
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"new_feats.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"new_feats['Target'] = df_zscore['Target']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>RowId</th>\n",
|
||
" <th>Date</th>\n",
|
||
" <th>SecuritiesCode</th>\n",
|
||
" <th>Open</th>\n",
|
||
" <th>High</th>\n",
|
||
" <th>Low</th>\n",
|
||
" <th>Close</th>\n",
|
||
" <th>Volume</th>\n",
|
||
" <th>AdjustmentFactor</th>\n",
|
||
" <th>ExpectedDividend</th>\n",
|
||
" <th>SupervisionFlag</th>\n",
|
||
" <th>Target</th>\n",
|
||
" <th>High_next_shift_1</th>\n",
|
||
" <th>Low_next_shift_1</th>\n",
|
||
" <th>Open_next_shift_1</th>\n",
|
||
" <th>Close_next_shift_1</th>\n",
|
||
" <th>Volume_next_shift_1</th>\n",
|
||
" <th>High_next_shift_2</th>\n",
|
||
" <th>Low_next_shift_2</th>\n",
|
||
" <th>Open_next_shift_2</th>\n",
|
||
" <th>Close_next_shift_2</th>\n",
|
||
" <th>Volume_next_shift_2</th>\n",
|
||
" <th>High_next_shift_3</th>\n",
|
||
" <th>Low_next_shift_3</th>\n",
|
||
" <th>Open_next_shift_3</th>\n",
|
||
" <th>Close_next_shift_3</th>\n",
|
||
" <th>Volume_next_shift_3</th>\n",
|
||
" <th>weighted_vol_price_1</th>\n",
|
||
" <th>weighted_vol_price_2</th>\n",
|
||
" <th>weighted_vol_price_3</th>\n",
|
||
" <th>weighted_vol_price</th>\n",
|
||
" <th>BOP</th>\n",
|
||
" <th>HL</th>\n",
|
||
" <th>OC</th>\n",
|
||
" <th>OHLCstd</th>\n",
|
||
" <th>Day</th>\n",
|
||
" <th>dayofyear</th>\n",
|
||
" <th>is_weekend</th>\n",
|
||
" <th>weekofyear</th>\n",
|
||
" <th>month</th>\n",
|
||
" <th>season</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>20201223_1301</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1301</td>\n",
|
||
" <td>7.976939</td>\n",
|
||
" <td>7.979339</td>\n",
|
||
" <td>7.974533</td>\n",
|
||
" <td>7.976939</td>\n",
|
||
" <td>6300</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>-0.000343</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>2.816919</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>2.639057</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>1.743178</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>358</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>20201223_1332</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1332</td>\n",
|
||
" <td>6.037871</td>\n",
|
||
" <td>6.042633</td>\n",
|
||
" <td>6.030685</td>\n",
|
||
" <td>6.037871</td>\n",
|
||
" <td>1413600</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>0.007143</td>\n",
|
||
" <td>7.979339</td>\n",
|
||
" <td>7.974533</td>\n",
|
||
" <td>7.976939</td>\n",
|
||
" <td>7.976939</td>\n",
|
||
" <td>8.748305</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>2.816919</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>3.005629</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>1.609438</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.723459</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>358</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>20201223_1333</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1333</td>\n",
|
||
" <td>7.690286</td>\n",
|
||
" <td>7.693937</td>\n",
|
||
" <td>7.676937</td>\n",
|
||
" <td>7.680176</td>\n",
|
||
" <td>119000</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>0.005051</td>\n",
|
||
" <td>6.042633</td>\n",
|
||
" <td>6.030685</td>\n",
|
||
" <td>6.037871</td>\n",
|
||
" <td>6.037871</td>\n",
|
||
" <td>14.161650</td>\n",
|
||
" <td>7.979339</td>\n",
|
||
" <td>7.974533</td>\n",
|
||
" <td>7.976939</td>\n",
|
||
" <td>7.976939</td>\n",
|
||
" <td>8.748305</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>3.005629</td>\n",
|
||
" <td>2.816919</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>2.963841</td>\n",
|
||
" <td>-0.519875</td>\n",
|
||
" <td>3.610918</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>2.866536</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>358</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>20201223_1375</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1375</td>\n",
|
||
" <td>7.444833</td>\n",
|
||
" <td>7.471363</td>\n",
|
||
" <td>7.438972</td>\n",
|
||
" <td>7.468513</td>\n",
|
||
" <td>446300</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>-0.003484</td>\n",
|
||
" <td>7.693937</td>\n",
|
||
" <td>7.676937</td>\n",
|
||
" <td>7.690286</td>\n",
|
||
" <td>7.680176</td>\n",
|
||
" <td>11.686879</td>\n",
|
||
" <td>6.042633</td>\n",
|
||
" <td>6.030685</td>\n",
|
||
" <td>6.037871</td>\n",
|
||
" <td>6.037871</td>\n",
|
||
" <td>14.161650</td>\n",
|
||
" <td>7.979339</td>\n",
|
||
" <td>7.974533</td>\n",
|
||
" <td>7.976939</td>\n",
|
||
" <td>7.976939</td>\n",
|
||
" <td>8.748305</td>\n",
|
||
" <td>2.963841</td>\n",
|
||
" <td>3.005629</td>\n",
|
||
" <td>2.816919</td>\n",
|
||
" <td>3.018705</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>4.025352</td>\n",
|
||
" <td>3.713572</td>\n",
|
||
" <td>3.345369</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>358</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>20201223_1376</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1376</td>\n",
|
||
" <td>7.370860</td>\n",
|
||
" <td>7.370860</td>\n",
|
||
" <td>7.362011</td>\n",
|
||
" <td>7.368970</td>\n",
|
||
" <td>1900</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>-0.009494</td>\n",
|
||
" <td>7.471363</td>\n",
|
||
" <td>7.438972</td>\n",
|
||
" <td>7.444833</td>\n",
|
||
" <td>7.468513</td>\n",
|
||
" <td>13.008747</td>\n",
|
||
" <td>7.693937</td>\n",
|
||
" <td>7.676937</td>\n",
|
||
" <td>7.690286</td>\n",
|
||
" <td>7.680176</td>\n",
|
||
" <td>11.686879</td>\n",
|
||
" <td>6.042633</td>\n",
|
||
" <td>6.030685</td>\n",
|
||
" <td>6.037871</td>\n",
|
||
" <td>6.037871</td>\n",
|
||
" <td>14.161650</td>\n",
|
||
" <td>3.018705</td>\n",
|
||
" <td>2.963841</td>\n",
|
||
" <td>3.005629</td>\n",
|
||
" <td>2.702555</td>\n",
|
||
" <td>-1.540445</td>\n",
|
||
" <td>2.639057</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>1.894928</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>358</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>20201223_1377</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1377</td>\n",
|
||
" <td>8.167636</td>\n",
|
||
" <td>8.173293</td>\n",
|
||
" <td>8.160518</td>\n",
|
||
" <td>8.167636</td>\n",
|
||
" <td>62700</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>0.011252</td>\n",
|
||
" <td>7.370860</td>\n",
|
||
" <td>7.362011</td>\n",
|
||
" <td>7.370860</td>\n",
|
||
" <td>7.368970</td>\n",
|
||
" <td>7.549609</td>\n",
|
||
" <td>7.471363</td>\n",
|
||
" <td>7.438972</td>\n",
|
||
" <td>7.444833</td>\n",
|
||
" <td>7.468513</td>\n",
|
||
" <td>13.008747</td>\n",
|
||
" <td>7.693937</td>\n",
|
||
" <td>7.676937</td>\n",
|
||
" <td>7.690286</td>\n",
|
||
" <td>7.680176</td>\n",
|
||
" <td>11.686879</td>\n",
|
||
" <td>2.702555</td>\n",
|
||
" <td>3.018705</td>\n",
|
||
" <td>2.963841</td>\n",
|
||
" <td>2.955608</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>3.806662</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>2.913860</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>358</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6</th>\n",
|
||
" <td>20201223_1379</td>\n",
|
||
" <td>2020-12-23</td>\n",
|
||
" <td>1379</td>\n",
|
||
" <td>7.652546</td>\n",
|
||
" <td>7.656337</td>\n",
|
||
" <td>7.647786</td>\n",
|
||
" <td>7.653969</td>\n",
|
||
" <td>29900</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>0.002373</td>\n",
|
||
" <td>8.173293</td>\n",
|
||
" <td>8.160518</td>\n",
|
||
" <td>8.167636</td>\n",
|
||
" <td>8.167636</td>\n",
|
||
" <td>11.046117</td>\n",
|
||
" <td>7.370860</td>\n",
|
||
" <td>7.362011</td>\n",
|
||
" <td>7.370860</td>\n",
|
||
" <td>7.368970</td>\n",
|
||
" <td>7.549609</td>\n",
|
||
" <td>7.471363</td>\n",
|
||
" <td>7.438972</td>\n",
|
||
" <td>7.444833</td>\n",
|
||
" <td>7.468513</td>\n",
|
||
" <td>13.008747</td>\n",
|
||
" <td>2.955608</td>\n",
|
||
" <td>2.702555</td>\n",
|
||
" <td>3.018705</td>\n",
|
||
" <td>2.888051</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>2.890372</td>\n",
|
||
" <td>1.098612</td>\n",
|
||
" <td>2.026617</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>358</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" RowId Date SecuritiesCode Open High Low \\\n",
|
||
"0 20201223_1301 2020-12-23 1301 7.976939 7.979339 7.974533 \n",
|
||
"1 20201223_1332 2020-12-23 1332 6.037871 6.042633 6.030685 \n",
|
||
"2 20201223_1333 2020-12-23 1333 7.690286 7.693937 7.676937 \n",
|
||
"3 20201223_1375 2020-12-23 1375 7.444833 7.471363 7.438972 \n",
|
||
"4 20201223_1376 2020-12-23 1376 7.370860 7.370860 7.362011 \n",
|
||
"5 20201223_1377 2020-12-23 1377 8.167636 8.173293 8.160518 \n",
|
||
"6 20201223_1379 2020-12-23 1379 7.652546 7.656337 7.647786 \n",
|
||
"\n",
|
||
" Close Volume AdjustmentFactor ExpectedDividend SupervisionFlag \\\n",
|
||
"0 7.976939 6300 0.0 0.0 False \n",
|
||
"1 6.037871 1413600 0.0 0.0 False \n",
|
||
"2 7.680176 119000 0.0 0.0 False \n",
|
||
"3 7.468513 446300 0.0 0.0 False \n",
|
||
"4 7.368970 1900 0.0 0.0 False \n",
|
||
"5 8.167636 62700 0.0 0.0 False \n",
|
||
"6 7.653969 29900 0.0 0.0 False \n",
|
||
"\n",
|
||
" Target High_next_shift_1 Low_next_shift_1 Open_next_shift_1 \\\n",
|
||
"0 -0.000343 0.000000 0.000000 0.000000 \n",
|
||
"1 0.007143 7.979339 7.974533 7.976939 \n",
|
||
"2 0.005051 6.042633 6.030685 6.037871 \n",
|
||
"3 -0.003484 7.693937 7.676937 7.690286 \n",
|
||
"4 -0.009494 7.471363 7.438972 7.444833 \n",
|
||
"5 0.011252 7.370860 7.362011 7.370860 \n",
|
||
"6 0.002373 8.173293 8.160518 8.167636 \n",
|
||
"\n",
|
||
" Close_next_shift_1 Volume_next_shift_1 High_next_shift_2 \\\n",
|
||
"0 0.000000 0.000000 0.000000 \n",
|
||
"1 7.976939 8.748305 0.000000 \n",
|
||
"2 6.037871 14.161650 7.979339 \n",
|
||
"3 7.680176 11.686879 6.042633 \n",
|
||
"4 7.468513 13.008747 7.693937 \n",
|
||
"5 7.368970 7.549609 7.471363 \n",
|
||
"6 8.167636 11.046117 7.370860 \n",
|
||
"\n",
|
||
" Low_next_shift_2 Open_next_shift_2 Close_next_shift_2 \\\n",
|
||
"0 0.000000 0.000000 0.000000 \n",
|
||
"1 0.000000 0.000000 0.000000 \n",
|
||
"2 7.974533 7.976939 7.976939 \n",
|
||
"3 6.030685 6.037871 6.037871 \n",
|
||
"4 7.676937 7.690286 7.680176 \n",
|
||
"5 7.438972 7.444833 7.468513 \n",
|
||
"6 7.362011 7.370860 7.368970 \n",
|
||
"\n",
|
||
" Volume_next_shift_2 High_next_shift_3 Low_next_shift_3 \\\n",
|
||
"0 0.000000 0.000000 0.000000 \n",
|
||
"1 0.000000 0.000000 0.000000 \n",
|
||
"2 8.748305 0.000000 0.000000 \n",
|
||
"3 14.161650 7.979339 7.974533 \n",
|
||
"4 11.686879 6.042633 6.030685 \n",
|
||
"5 13.008747 7.693937 7.676937 \n",
|
||
"6 7.549609 7.471363 7.438972 \n",
|
||
"\n",
|
||
" Open_next_shift_3 Close_next_shift_3 Volume_next_shift_3 \\\n",
|
||
"0 0.000000 0.000000 0.000000 \n",
|
||
"1 0.000000 0.000000 0.000000 \n",
|
||
"2 0.000000 0.000000 0.000000 \n",
|
||
"3 7.976939 7.976939 8.748305 \n",
|
||
"4 6.037871 6.037871 14.161650 \n",
|
||
"5 7.690286 7.680176 11.686879 \n",
|
||
"6 7.444833 7.468513 13.008747 \n",
|
||
"\n",
|
||
" weighted_vol_price_1 weighted_vol_price_2 weighted_vol_price_3 \\\n",
|
||
"0 0.000000 0.000000 0.000000 \n",
|
||
"1 2.816919 0.000000 0.000000 \n",
|
||
"2 3.005629 2.816919 0.000000 \n",
|
||
"3 2.963841 3.005629 2.816919 \n",
|
||
"4 3.018705 2.963841 3.005629 \n",
|
||
"5 2.702555 3.018705 2.963841 \n",
|
||
"6 2.955608 2.702555 3.018705 \n",
|
||
"\n",
|
||
" weighted_vol_price BOP HL OC OHLCstd Day dayofyear \\\n",
|
||
"0 2.816919 0.000000 2.639057 0.000000 1.743178 2 358 \n",
|
||
"1 3.005629 0.000000 1.609438 0.000000 0.723459 2 358 \n",
|
||
"2 2.963841 -0.519875 3.610918 0.000000 2.866536 2 358 \n",
|
||
"3 3.018705 0.000000 4.025352 3.713572 3.345369 2 358 \n",
|
||
"4 2.702555 -1.540445 2.639057 0.000000 1.894928 2 358 \n",
|
||
"5 2.955608 0.000000 3.806662 0.000000 2.913860 2 358 \n",
|
||
"6 2.888051 0.000000 2.890372 1.098612 2.026617 2 358 \n",
|
||
"\n",
|
||
" is_weekend weekofyear month season \n",
|
||
"0 False 52 12 1 \n",
|
||
"1 False 52 12 1 \n",
|
||
"2 False 52 12 1 \n",
|
||
"3 False 52 12 1 \n",
|
||
"4 False 52 12 1 \n",
|
||
"5 False 52 12 1 \n",
|
||
"6 False 52 12 1 "
|
||
]
|
||
},
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"new_feats.head(7)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Index(['RowId', 'Date', 'SecuritiesCode', 'Open', 'High', 'Low', 'Close',\n",
|
||
" 'Volume', 'AdjustmentFactor', 'ExpectedDividend', 'SupervisionFlag',\n",
|
||
" 'Target', 'High_next_shift_1', 'Low_next_shift_1', 'Open_next_shift_1',\n",
|
||
" 'Close_next_shift_1', 'Volume_next_shift_1', 'High_next_shift_2',\n",
|
||
" 'Low_next_shift_2', 'Open_next_shift_2', 'Close_next_shift_2',\n",
|
||
" 'Volume_next_shift_2', 'High_next_shift_3', 'Low_next_shift_3',\n",
|
||
" 'Open_next_shift_3', 'Close_next_shift_3', 'Volume_next_shift_3',\n",
|
||
" 'weighted_vol_price_1', 'weighted_vol_price_2', 'weighted_vol_price_3',\n",
|
||
" 'weighted_vol_price', 'BOP', 'HL', 'OC', 'OHLCstd', 'Day', 'dayofyear',\n",
|
||
" 'is_weekend', 'weekofyear', 'month', 'season'],\n",
|
||
" dtype='object')"
|
||
]
|
||
},
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"new_feats.columns"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# Modelling"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"metadata": {
|
||
"tags": [
|
||
"block:modelling",
|
||
"prev:feature_engineering"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# columns to be used for modelling.\n",
|
||
"feats = ['Date','SecuritiesCode', 'Open', 'High', 'Low', 'Close', 'Volume',\n",
|
||
" 'weighted_vol_price_1', 'weighted_vol_price_2', 'weighted_vol_price_3', \n",
|
||
" 'weighted_vol_price', 'BOP', 'HL', 'OC', 'OHLCstd', 'Day', 'dayofyear',\n",
|
||
" 'is_weekend', 'weekofyear', 'month', 'season']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# transform date to int\n",
|
||
"new_feats['Date'] = new_feats['Date'].dt.strftime(\"%Y%m%d\").astype(int)\n",
|
||
"\n",
|
||
"# split data into valid for validation and train for model training\n",
|
||
"valid = new_feats[(new_feats['Date'] >= 20211111)].copy()\n",
|
||
"train = new_feats[(new_feats['Date'] < 20211111)].copy()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"((421376, 41), (31105, 41))"
|
||
]
|
||
},
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"train.shape, valid.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"metadata": {
|
||
"papermill": {
|
||
"duration": 10.373257,
|
||
"end_time": "2022-04-17T07:17:23.930551",
|
||
"exception": false,
|
||
"start_time": "2022-04-17T07:17:13.557294",
|
||
"status": "completed"
|
||
},
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/home/jovyan/.local/lib/python3.6/site-packages/lightgbm/sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.\n",
|
||
" _log_warning(\"'verbose' argument is deprecated and will be removed in a future release of LightGBM. \"\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"[LightGBM] [Debug] Dataset::GetMultiBinFromAllFeatures: sparse rate 0.015047\n",
|
||
"[LightGBM] [Debug] init for col-wise cost 0.000138 seconds, init for row-wise cost 0.057121 seconds\n",
|
||
"[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.102671 seconds.\n",
|
||
"You can set `force_col_wise=true` to remove the overhead.\n",
|
||
"[LightGBM] [Info] Total Bins 3978\n",
|
||
"[LightGBM] [Info] Number of data points in the train set: 421376, number of used features: 20\n",
|
||
"[LightGBM] [Info] Start training from score 0.000716\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 17\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 19\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 20\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 16\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 17\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 16\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 15\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 7\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 9\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 16\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 13\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 16\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 14\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 8\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 11\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 12\n",
|
||
"[LightGBM] [Debug] Trained a tree with leaves = 31 and depth = 10\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"LGBMRegressor(learning_rate=0.379687157316759, random_state=2022, verbose=2)"
|
||
]
|
||
},
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# model parameter\n",
|
||
"params = {\n",
|
||
" 'n_estimators': int(N_EST),\n",
|
||
" 'learning_rate': float(LR),\n",
|
||
" 'random_state': 2022,\n",
|
||
" 'verbose' : 2}\n",
|
||
"\n",
|
||
"# model initialization\n",
|
||
"model = LGBMRegressor(**params)\n",
|
||
"\n",
|
||
"\n",
|
||
"X = train[feats]\n",
|
||
"y = train[\"Target\"]\n",
|
||
"\n",
|
||
"X_test = valid[feats]\n",
|
||
"y_test = valid[\"Target\"]\n",
|
||
"\n",
|
||
"# fitting\n",
|
||
"model.fit(X, y, verbose=False, eval_set=(X_test, y_test))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"papermill": {
|
||
"duration": 0.01428,
|
||
"end_time": "2022-04-17T07:17:23.959655",
|
||
"exception": false,
|
||
"start_time": "2022-04-17T07:17:23.945375",
|
||
"status": "completed"
|
||
},
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"<h1> Evaluation and Prediction"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"metadata": {
|
||
"tags": [
|
||
"block:prediction",
|
||
"prev:modelling"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# model prediction\n",
|
||
"preds = model.predict(X_test)\n",
|
||
"\n",
|
||
"# model evaluation\n",
|
||
"rmse = np.round(mean_squared_error(preds, y_test)**0.5, 5)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"metadata": {
|
||
"tags": [
|
||
"pipeline-metrics"
|
||
]
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0.02665\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(rmse)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# Make submission"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {
|
||
"tags": [
|
||
"skip"
|
||
]
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"sys.path.insert(0, 'helper-files')\n",
|
||
"from local_api import local_api"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"metadata": {
|
||
"tags": [
|
||
"skip"
|
||
]
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py:3263: DtypeWarning: Columns (7,8,9,10) have mixed types.Specify dtype option on import or set low_memory=False.\n",
|
||
" if (await self.run_code(code, result, async_=asy)):\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 178.55it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.28it/s]\n",
|
||
"/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:30: FutureWarning: Series.dt.weekofyear and Series.dt.week have been deprecated. Please use Series.dt.isocalendar().week instead.\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 214.74it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.46it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 278.83it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.38it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 265.66it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.15it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 190.16it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.86it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 218.28it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.64it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 293.18it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 4.37it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 272.18it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.49it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 192.11it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.31it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 274.34it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.99it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 230.67it/s]\n",
|
||
"100%|██████████| 3/3 [00:01<00:00, 2.90it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 260.86it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.86it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 258.50it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.34it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 211.81it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.50it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 231.95it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.23it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 194.62it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.59it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 193.03it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.37it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 224.41it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.60it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 232.46it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.58it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 199.00it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.95it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 277.85it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.78it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 263.43it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.24it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 202.12it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.49it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 251.46it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.45it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 209.96it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.37it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 284.58it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.60it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 224.20it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.25it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 156.18it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.46it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 211.85it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.09it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 207.08it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.75it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 269.89it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.67it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 278.68it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.28it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 248.94it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.33it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 251.54it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.10it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 240.36it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.59it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 254.44it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.19it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 142.48it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.34it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 309.95it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.56it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 314.17it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.10it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 240.49it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.91it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 191.52it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.51it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 176.42it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.28it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 219.95it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.90it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 213.78it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.31it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 194.32it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.59it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 293.83it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.76it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 280.89it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 4.28it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 302.84it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 4.38it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 141.82it/s]\n",
|
||
"100%|██████████| 3/3 [00:01<00:00, 2.92it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 197.56it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.24it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 275.77it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 4.26it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 273.01it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.90it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 305.28it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.45it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 285.51it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.48it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 296.99it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.36it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 250.82it/s]\n",
|
||
"100%|██████████| 3/3 [00:00<00:00, 3.79it/s]\n"
|
||
]
|
||
},
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0.10455440901403816\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Date</th>\n",
|
||
" <th>SecuritiesCode</th>\n",
|
||
" <th>Rank</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>2022-02-28</td>\n",
|
||
" <td>1301</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>2022-02-28</td>\n",
|
||
" <td>1332</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>2022-02-28</td>\n",
|
||
" <td>1333</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>2022-02-28</td>\n",
|
||
" <td>1375</td>\n",
|
||
" <td>3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>2022-02-28</td>\n",
|
||
" <td>1376</td>\n",
|
||
" <td>4</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Date SecuritiesCode Rank\n",
|
||
"0 2022-02-28 1301 0\n",
|
||
"1 2022-02-28 1332 1\n",
|
||
"2 2022-02-28 1333 2\n",
|
||
"3 2022-02-28 1375 3\n",
|
||
"4 2022-02-28 1376 4"
|
||
]
|
||
},
|
||
"execution_count": 39,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"myapi = local_api('data/supplemental_files')\n",
|
||
"env = myapi.make_env()\n",
|
||
"\n",
|
||
"iter_test = env.iter_test()\n",
|
||
"for (prices, options, financials, trades, secondary_prices, sample_prediction) in iter_test:\n",
|
||
" prices = feat_eng(prices, ['High', 'Low', 'Open', 'Close', 'Volume'])\n",
|
||
" prices['Date'] = prices['Date'].dt.strftime(\"%Y%m%d\").astype(int)\n",
|
||
" prices[\"Target\"] = model.predict(prices[feats])\n",
|
||
" if prices[\"Volume\"].min()==0:\n",
|
||
" sample_prediction[\"Prediction\"] = 0\n",
|
||
" else:\n",
|
||
" sample_prediction[\"Prediction\"] = prices[\"Target\"]/prices[\"Volume\"]\n",
|
||
" sample_prediction[\"Prediction\"] = prices[\"Target\"]\n",
|
||
" sample_prediction.sort_values(by=\"Prediction\", ascending=False, inplace=True)\n",
|
||
" sample_prediction['Rank'] = np.arange(0,2000)\n",
|
||
" sample_prediction.sort_values(by = \"SecuritiesCode\", ascending=True, inplace=True)\n",
|
||
" submission = sample_prediction[[\"Date\",\"SecuritiesCode\",\"Rank\"]]\n",
|
||
" env.predict(submission)\n",
|
||
"print(env.score())\n",
|
||
"submission.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"kubeflow_notebook": {
|
||
"autosnapshot": true,
|
||
"experiment": {
|
||
"id": "new",
|
||
"name": "jpx-tokyo-stock-exchange"
|
||
},
|
||
"experiment_name": "jpx-tokyo-stock-exchange",
|
||
"katib_metadata": {
|
||
"algorithm": {
|
||
"algorithmName": "grid"
|
||
},
|
||
"maxFailedTrialCount": 3,
|
||
"maxTrialCount": 12,
|
||
"objective": {
|
||
"objectiveMetricName": "",
|
||
"type": "minimize"
|
||
},
|
||
"parallelTrialCount": 3,
|
||
"parameters": []
|
||
},
|
||
"katib_run": false,
|
||
"pipeline_description": "JPX Tokyo Stock Exchange Prediction",
|
||
"pipeline_name": "jpx-tokyo-stock-exchange-pipeline",
|
||
"snapshot_volumes": true,
|
||
"steps_defaults": [
|
||
"label:access-ml-pipeline:true",
|
||
"label:kaggle-secret:true",
|
||
"label:access-rok:true"
|
||
],
|
||
"volume_access_mode": "rwm",
|
||
"volumes": [
|
||
{
|
||
"annotations": [],
|
||
"mount_point": "/home/jovyan",
|
||
"name": "dem-workspace-snqdc",
|
||
"size": 5,
|
||
"size_type": "Gi",
|
||
"snapshot": false,
|
||
"type": "clone"
|
||
}
|
||
]
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.6.9"
|
||
},
|
||
"papermill": {
|
||
"default_parameters": {},
|
||
"duration": 32.012084,
|
||
"end_time": "2022-04-17T07:17:25.053666",
|
||
"environment_variables": {},
|
||
"exception": null,
|
||
"input_path": "__notebook__.ipynb",
|
||
"output_path": "__notebook__.ipynb",
|
||
"parameters": {},
|
||
"start_time": "2022-04-17T07:16:53.041582",
|
||
"version": "2.3.4"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|