# Kaggle Featured Prediction Competition: H&M Personalized Fashion Recommendations

In this [competition](https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations), product recommendations have to be done based on previous purchases. There's a whole range of data available including customer meta data, product meta data, and meta data that spans from simple data, such as garment type and customer age, to text data from product descriptions, to image data from garment images.

## Install necessary packages

We can install the necessary package by either running `pip install --user <package_name>` or include everything in a `requirements.txt` file and run `pip install --user -r requirements.txt`. We have put the dependencies in a `requirements.txt` file so we will use the former method.

Restart the kernel after installation

In [None]:
# !pip install --user -r requirements.txt

## Imports

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import implicit

In [None]:
path = "data/"
train_data_filepath = path + "transactions_train.csv"
article_metadata_filepath = path + "articles.csv"
customer_metadata_filepath = path + "customers.csv"
test_data_filepath = path + "sample_submission.csv"

In [None]:
train_data = pd.read_csv(train_data_filepath,index_col='customer_id')

In [None]:
train_data.head()

In [None]:
train_data.drop(['t_dat','sales_channel_id','price'],axis= 1, inplace = True)

In [None]:
train_data.head()

In [None]:
train_data=train_data.sort_values(by=['customer_id']).reset_index()
train_data.head()

In [None]:
print("Unique customers",train_data['customer_id'].nunique())
print("Unique articles",train_data['article_id'].nunique())

In [None]:
train_data.info()

In [None]:
X = train_data.groupby(['customer_id', 'article_id'])['article_id'].count().reset_index(name = "purchase_count")    

In [None]:
unique_customers = X['customer_id'].unique()

In [None]:
unique_articles = X['article_id'].unique()

In [None]:
customer_id_dict = {unique_customers[i]:i  for i in range(len(unique_customers))}

In [None]:
reverse_customer_id_dict = {i:unique_customers[i] for i in range(len(unique_customers))}                           

In [None]:
numeric_cus_id = []

In [None]:
for i in range(len(X['customer_id'])):
    numeric_cus_id.append(customer_id_dict.get(X['customer_id'][i]))

In [None]:
print(X['customer_id'].nunique())

In [None]:
len(numeric_cus_id)

In [None]:
X['customer_id'] = numeric_cus_id

In [None]:
X.head()

In [None]:
article_id_dict = {unique_articles[i]:i  for i in range(len(unique_articles))}
reverse_article_id_dict = {i:unique_articles[i] for i in range(len(unique_articles))}        

In [None]:
numeric_art_id = []

In [None]:
for i in range(len(X['article_id'])):
    numeric_art_id.append(article_id_dict.get(X['article_id'][i]))

In [None]:
X['article_id'] = numeric_art_id
X.head()

In [None]:
# Constructing sparse matrices for alternating least squares algorithm    
import scipy.sparse as sparse
sparse_user_item_coo = sparse.coo_matrix((X.purchase_count, (X.customer_id, X.article_id)), shape = (n_customers, n_articles))
sparse_user_item_csr = sparse.csr_matrix((X['purchase_count'], (X['customer_id'], X['article_id'])), shape = (n_customers, n_articles))

In [None]:
# parameters for the model
als_params = dict(
    factors = 200,         # number of latent factors - try between 50 to 1000
    regularization = 0.01, # regularization factor - try between 0.001 to 0.2
    iterations = 5,        # iterations            - try between 2 to 100
)

# initialize a model
model = implicit.als.AlternatingLeastSquares(**als_params)

# train the model on a sparse matrix of user/item/confidence weights    
model.fit(sparse_user_item_csr)

In [None]:
test_data = pd.read_csv(test_data_filepath)

In [None]:
test_data.head()

In [None]:
predictions=[]
count = 0
for cust_id in test_data.customer_id:
    cust_id = customer_id_dict.get(cust_id)
#     if(cust_id!=None):    
    recommendations = model.recommend(cust_id, sparse_user_item_csr[cust_id],10)
    result=[]
    for i in range(len(recommendations[0])):
        val = reverse_article_id_dict.get(recommendations[0][i])
        result.append(val)  
    predictions.append(result)


In [None]:
test_data['prediction'] = predictions
test_data

In [None]:
test_data.to_csv('submission.csv', index=False)