Skip to content

jazzandrock/information-retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Information retrieval project

A pile of C++ code with lots of templates, algorithms, low-level bit manipulation and raw pointer memory management.

It indexes your files and helps you find some most useful of them.

We build index: a word gets mapped to a list of IDs of documents that contain it. When index reaches some critical size, we compress it: we use less bytes for small numbers. Further, for sorted lists of IDs we only save differences between numbers, reaching even more efficient compression. Compressed indexes can also be merged together.

For searching, we represent documents as vectors in number-of-unique-words-dimensional space. We treat a query as another document, and find angle between all our docs and the query, returning the most similar ones.

screen of search results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages