Topic Modeling arXiv Abstracts Using BERTopic

This project aims to visually structure and organize a given number of papers, given their abstracts. The goal is to allow for a quicker time finding extremely related topics without having to go through references.

📂 Project Structure

notebook.ipynb – Main analysis and modeling notebook
data/ – Users will not host data, but it will be pulled from Kaggle using their notebook capabilities.

📊 Dataset

Source: ArXiv
Import Method: We used kaggle notebooks to avoid the use of any needed APIs for fetching the data.
Description: Briefly describe the features or target variable

⚙️ Requirements

Python version: 3.x+
Libraries:
- numpy
- pandas
- matplotlib
- scikit-learn
- nltk
- plotly
- torch
- hdbscan
- KeyBERTInspired
- MaximalMarginalRelevance
- itertools

The notebook should have everything in place so you can run it without worry.

How to Run

Open the notebook on Kaggle.
Import the dataset from the sidebar.
(Optional) Enable GPU (Under Settings > Accelerator > GPU T4 x2) as it will run very quickly compared to CPU.
Run all cells from top to bottom.

If switching out certain topics, please note that some topics will require more abstracts to be used in training, as some topics may heavily overlap, leading to an error within “topic_model.visualize_topics()”.

Authors

Name: Anirudha Bhaktharahalli Subramanya
Name: Anvaya Chandrika Gudibanda Sreesha
Name: Ekta Mulkalwar
Name: David Camacho-Fernandez
Name: Joseph Comeaux
Name: Runyi Yang

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
README.md		README.md
cs_AI_notebook054ea43cbd.ipynb		cs_AI_notebook054ea43cbd.ipynb
cs_CL_notebook054ea43cbd.ipynb		cs_CL_notebook054ea43cbd.ipynb
cs_LG_notebook054ea43cbd.ipynb		cs_LG_notebook054ea43cbd.ipynb
cs_SE_notebook054ea43cbd.ipynb		cs_SE_notebook054ea43cbd.ipynb
stat_ml_notebook054ea43cbd.ipynb		stat_ml_notebook054ea43cbd.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic Modeling arXiv Abstracts Using BERTopic

📂 Project Structure

📊 Dataset

⚙️ Requirements

How to Run

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Topic Modeling arXiv Abstracts Using BERTopic

📂 Project Structure

📊 Dataset

⚙️ Requirements

How to Run

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages