-
Notifications
You must be signed in to change notification settings - Fork 7
Add automatic highlight detection for any color #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
that's awesome, thank you! I really didn't expect a PR on this repo. I need some time to test it locally and will come back to you soon! Thanks again! 👏 |
Thank you for the quick response! I found your tool really helpful for a project I was working on, but my use case required detecting different highlight colors without manual configuration. I thought this addition might be useful for the project. Thanks again! |
|
|
||
|
|
||
| # Set path to Tesseract executable | ||
| pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is OS-specific, could you enabled it based on the current OS?
|
Sorry for the late reply! Thank you so much for testing the PR and providing feedback. I see you found a simpler approach that works better, just using the saturation mask with a small denoising kernel. That's a really good insight. I will upload more test images with different highlight colors to see how this simplified version performs across various cases, and I'll share the results with you soon. Thanks for taking the time to review and improve the code! |
|
Can you show me your code? I tested these images and the result was correct: Only the last image has some issues, but it looks like the problem here is with the OCR in general as the bounding box indicates that it doesn't recognize the word I can push my code if you want to compare it with your results. |
|
Hi @zirkelc, |











This PR adds a new function
detect_highlights()that can automatically detect highlighted text of any color without requiring manual color range specification. This makes the tool more flexible and user-friendly as it no longer requires pre-configuring HSV color ranges for different highlighter colors.Current Status
The implementation is functional but still needs refinement. It occasionally detects a small number of non-highlighted words or letters. I'm actively working on improving the accuracy and reducing false positives.
Implementation Details
Testing
Tested on documents with various highlighter colors with promising results, though some edge cases need further tuning.
Next Steps
I'm working on:
Feedback and suggestions for improvement are welcome! I can update the implementation based on your recommendations.