GitHub - jazzandrock/information-retrieval

Information retrieval project

A pile of C++ code with lots of templates, algorithms, low-level bit manipulation and raw pointer memory management.

It indexes your files and helps you find some most useful of them.

We build index: a word gets mapped to a list of IDs of documents that contain it. When index reaches some critical size, we compress it: we use less bytes for small numbers. Further, for sorted lists of IDs we only save differences between numbers, reaching even more efficient compression. Compressed indexes can also be merged together.

For searching, we represent documents as vectors in number-of-unique-words-dimensional space. We treat a query as another document, and find angle between all our docs and the query, returning the most similar ones.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Test		Test
data_structures		data_structures
indexer		indexer
misc		misc
searching		searching
sorting		sorting
.gitignore		.gitignore
main.cpp		main.cpp
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information retrieval project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Information retrieval project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages