Skip to content

thenileshmishra/LLMSeqRec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Below is a sample README.md file for your project. You can place this file in the root directory of your project.


LLMSeqRec: An LLM-Enhanced Contextual Sequential Recommender

LLMSeqRec is an innovative framework that integrates Large Language Model (LLM)-generated semantic embeddings with traditional sequential recommendation architectures. By leveraging deep semantic representations extracted from rich textual item metadata, LLMSeqRec improves recommendation accuracy, especially in cold-start and sparse data scenarios. This project implements both the proposed LLMSeqRec and a baseline SASRec model, using the MovieLens dataset to evaluate performance.

Table of Contents

Overview

LLMSeqRec enhances sequential recommendation by fusing two embedding modalities:

  • LLM-based Embeddings: Pretrained semantic embeddings generated from item descriptions.
  • ID-based Embeddings: Learnable collaborative embeddings.

The model employs a Transformer-based architecture (inspired by SASRec) with causal masking and multiple self-attention layers. This project includes training and evaluation scripts for both LLMSeqRec and a baseline SASRec model, alongside modules for logging and result visualization.

Folder Structure

LLMSeqRec/
├── data/
│   ├── raw/                   # Raw MovieLens dataset files
│   └── processed/             # Preprocessed CSV files (train_sequences.csv, val_sequences.csv) and LLM embeddings (llm_embeddings.npy)
├── eval/
│   ├── evaluate.py            # Evaluation script for LLMSeqRec (and baseline, if needed)
│   └── analysis.py            # Scripts for plotting training and evaluation results
├── logs/
│   ├── llmseqrec_train_log.csv
│   ├── sasrec_train_log.csv
│   ├── llmseqrec_metrics.csv
│   └── sasrec_metrics.csv
├── models/
│   ├── llmseqrec.py           # LLMSeqRec model implementation
│   └── sasrec.py              # Baseline SASRec model implementation
├── train/
│   ├── train_llmseqrec.py     # Training script for LLMSeqRec model
│   └── train_sasrec.py        # Training script for SASRec baseline
├── README.md
└── sources.bib                # BibTeX references for literature review

Requirements

  • Python 3.7+
  • TensorFlow 2.x
  • Pandas, NumPy
  • Matplotlib (for plotting)
  • Additional packages as required (see requirements.txt if available)

Installation

  1. Clone the repository:
    git clone https://github.com/yourusername/LLMSeqRec.git
    cd LLMSeqRec
  2. (Optional) Create and activate a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install required packages:
    pip install -r requirements.txt

Data Preparation

  1. Download the MovieLens dataset from GroupLens.
  2. Place raw files in the data/raw/ folder.
  3. Run the data preprocessing scripts to generate processed CSV files and compute LLM embeddings (stored in data/processed/).

Usage

Training

  • To train the LLMSeqRec model:
    python -m LLMSeqRec.train.train_llmseqrec
  • To train the SASRec baseline:
    python -m LLMSeqRec.train.train_sasrec

Training logs and metrics are saved in the logs/ folder.

Evaluation

  • To evaluate the LLMSeqRec model on the validation set:
    python -m LLMSeqRec.eval.evaluate --val_csv "data/processed/val_sequences.csv" --emb_path "data/processed/llm_embeddings.npy" --top_k 10
  • For the baseline, modify the import in the evaluation script accordingly.

Plotting Results

Generate visual comparisons of training loss and evaluation metrics:

python -m LLMSeqRec.eval.analysis

This script reads CSV logs from the logs/ folder and creates plots in the logs/ directory.

Future Scope

Future work may include integrating knowledge graphs to further enhance the semantic representation of items. By incorporating structured domain knowledge, such as genre hierarchies and actor/director relationships, the model could better capture intricate item interrelations and provide more explainable recommendations. Additionally, multimodal data such as images and reviews could be integrated to create richer, multi-dimensional embeddings. Optimizing the model for real-time recommendation and scaling it to larger, more diverse datasets also represents promising avenues for further research.

Citation

Please refer to sources.bib for detailed citations of all referenced research papers.


This README file provides a comprehensive overview of the project, guiding users through setup, usage, and potential future enhancements. Let me know if you need further modifications!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published