Devashish Gaikwad

LLMs and Real Time Systems | Bosch, RWTH Aachen | C++, Python, Linux

VectorCluster | Devashish Gaikwad

VectorCluster

March 18, 2026

GitHub Repository

AbstractClustering is a final-year NLP/ML project focused on grouping research paper abstracts into meaningful thematic clusters.

What it does

Ingests large paper metadata/abstract datasets.
Cleans and preprocesses text for downstream vectorization.
Builds word and sentence representations using multiple embedding choices.
Runs clustering experiments across algorithms and tracks quality metrics.

How it is built

Core modules separate preprocessing, word embedding loading, sentence embedding creation, and data conversion.
Uses p-means style sentence embedding aggregation from word vectors.
Supports several clustering workflows (K-Means, spectral, hierarchical, DBSCAN, and deep embedded clustering notebooks).
Includes templates/notebooks for parameter sweeps and optimal K analysis, with model metadata logging.

Tech stack

Python, NumPy, pandas, scikit-learn, TensorFlow/Keras, NLTK/Gensim, Jupyter notebooks, SQLite.