-
-
Notifications
You must be signed in to change notification settings - Fork 49.4k
Open
Labels
enhancementThis PR modified some existing filesThis PR modified some existing files
Description
Feature description
Feature Description
The repository has several string matching algorithms (KMP, Boyer-Moore, Naive Search) but is missing the Rabin-Karp algorithm, which is an important string matching algorithm that uses hashing for pattern searching. This is particularly useful for multiple pattern searches and plagiarism detection.
Proposed Implementation
I propose adding a Rabin-Karp string matching algorithm in the strings/ directory with the following features:
- Single pattern matching using rolling hash technique
- Multiple pattern matching - search for multiple patterns in a single pass
- Configurable hash parameters (base and modulus)
- Collision handling - verify matches to avoid false positives
- Time complexity: O(n + m) average case, O(nm) worst case
- Comprehensive doctests with edge cases (empty strings, no matches, multiple matches)
- Type hints and clear documentation
Why This Enhancement?
- Educational Value: Rabin-Karp introduces the concept of rolling hash, an important technique in computer science
- Practical Applications:
- Plagiarism detection (comparing documents)
- DNA sequence matching in bioinformatics
- Substring search in large texts
- Finding duplicate files
- Complements existing algorithms: Provides comparison with KMP and Boyer-Moore
- Multiple pattern searching: Unlike other algorithms, Rabin-Karp can efficiently search for multiple patterns simultaneously
Key Features to Highlight
- Rolling hash technique: Demonstrates efficient hash recalculation
- Spurious hit handling: Shows how to deal with hash collisions
- Modular design: Separate functions for single and multiple pattern matching
- Performance comparison: Can be compared with naive, KMP, and Boyer-Moore approaches
Example Applications
# Single pattern search
text = "hello world hello"
pattern = "hello"
indices = rabin_karp_search(text, pattern) # Returns [0, 12]
# Multiple pattern search
patterns = ["hello", "world"]
matches = rabin_karp_multiple(text, patterns)
# Returns {"hello": [0, 12], "world": [6]}Benefits
- Teaches rolling hash concept
- Useful for real-world string matching problems
- Shows trade-offs between different string matching algorithms
- Demonstrates practical applications of hashing
I'm willing to implement this following the repository's guidelines if approved.
Metadata
Metadata
Assignees
Labels
enhancementThis PR modified some existing filesThis PR modified some existing files