Skip to content

Add Rabin-Karp String Matching Algorithm #13918

@vivekkumarrathour

Description

@vivekkumarrathour

Feature description

Feature Description

The repository has several string matching algorithms (KMP, Boyer-Moore, Naive Search) but is missing the Rabin-Karp algorithm, which is an important string matching algorithm that uses hashing for pattern searching. This is particularly useful for multiple pattern searches and plagiarism detection.

Proposed Implementation

I propose adding a Rabin-Karp string matching algorithm in the strings/ directory with the following features:

  1. Single pattern matching using rolling hash technique
  2. Multiple pattern matching - search for multiple patterns in a single pass
  3. Configurable hash parameters (base and modulus)
  4. Collision handling - verify matches to avoid false positives
  5. Time complexity: O(n + m) average case, O(nm) worst case
  6. Comprehensive doctests with edge cases (empty strings, no matches, multiple matches)
  7. Type hints and clear documentation

Why This Enhancement?

  • Educational Value: Rabin-Karp introduces the concept of rolling hash, an important technique in computer science
  • Practical Applications:
    • Plagiarism detection (comparing documents)
    • DNA sequence matching in bioinformatics
    • Substring search in large texts
    • Finding duplicate files
  • Complements existing algorithms: Provides comparison with KMP and Boyer-Moore
  • Multiple pattern searching: Unlike other algorithms, Rabin-Karp can efficiently search for multiple patterns simultaneously

Key Features to Highlight

  • Rolling hash technique: Demonstrates efficient hash recalculation
  • Spurious hit handling: Shows how to deal with hash collisions
  • Modular design: Separate functions for single and multiple pattern matching
  • Performance comparison: Can be compared with naive, KMP, and Boyer-Moore approaches

Example Applications

# Single pattern search
text = "hello world hello"
pattern = "hello"
indices = rabin_karp_search(text, pattern)  # Returns [0, 12]

# Multiple pattern search
patterns = ["hello", "world"]
matches = rabin_karp_multiple(text, patterns)
# Returns {"hello": [0, 12], "world": [6]}

Benefits

  • Teaches rolling hash concept
  • Useful for real-world string matching problems
  • Shows trade-offs between different string matching algorithms
  • Demonstrates practical applications of hashing

I'm willing to implement this following the repository's guidelines if approved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementThis PR modified some existing files

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions