Given the embedding of the search query we can efficent get the top matching k results form DB with 20M document.The objective of this project is to design and implement an indexing system for a semantic search database.
Check Final Notebook
https://github.com/ZiadSheriif/IntelliQuery/blob/main/Evaluate_ADB_Project.ipynb
Clone Repo
git clone https://github.com/ZiadSheriif/IntelliQuery.git
Install dependencies
pip install -r requirements.txt
Run Indexer
$ python ./src/evaluation.py
This is out final Approach with Some Enhancements
- Changed MiniBatchKMeans to regular KMeans
 - We calculate initial centroids with just the first chunk of data
 - Introduced parallel processing for different regions
 
It Combines both LSH & PQ
Ziad Sherif  | 
    Zeyad Tarek  | 
     Abdalhameed Emad  | 
Basma Elhoseny  | 
  
This software is licensed under MIT License, See License for more information ©Ziad Sherif.




