-
Notifications
You must be signed in to change notification settings - Fork 90
fix: start_block misalignment caused by SIGSTOP #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v2.4.8-windows-support
Are you sure you want to change the base?
fix: start_block misalignment caused by SIGSTOP #552
Conversation
|
It doesn't make sense that position mismatch happens after SIGSTOP/SIGCONT. Please open an issue and show me a procedure how to recreate and a detail scenario how this problem happens. |
It does happen. The problem is more In a write loop like this: int count = 0;
wile(sg_write(...) > 0) {
count++;
}Lets say that we wrote 10 blocks, after the loop ends the count variable would be 10, but in the tape we wrote more than 10 blocks because of this. |
|
Hello @piste-jp, thanks for your comment. We have opened an issue in RHEL regarding the sg driver behavior mentioned by @madjesc. We will open an issue in this repository showing this. |
I cannot open the link provided
Please do this soon, because |
|
I'm feeling this PR is really bad idea.
So I strongly recommend that you open an issue to describe the problem first and start again from the first step. And also I strongly suggest to close this PR once. |
System tap also send this signals. Users that have systemtap can encounter this behavior.
I don't see how a simple position check after the extent is written is gonna be a performance problem. Can you elaborate?
I also have a test with the lintape driver and it does not happen.
Yes, I will open an issue with the detailed explanation and a code to replicate. But why close the PR? If we conclude in the issue that this is not needed, sure we will close the issue. |
Summary of changes
This pull request fix the following issue:
When SIGSTOP is received while writing data to tape, it could cause a position mismatch between what is present in LE cache and the one reported by the drive. If that is the case, the index is not written in the position described by its self-pointer, and the start_block of subsequent files do not reflect the actual position where the file was written, giving LTFS error LTFS11089E when reading those files.
By making the following changes we prevent this scenario:
Description
Motivation and context for each change.
When doing the sync process, ltfs code calls
ltfs_write_index, so avoid writing the index if for any reason the current position reported by LE is not the same as real reported by the drive.Some customers may not do Sync frequently so we may need a quicker way to detect the issue, the current method is catching SIGCONT, set a flag and read it in
tape_writefunction, since SIGSTOP signal cannot be caught.It is helpful to explicitly logging when ltfs process receives a SIGCONT signal.
Type of change
Checklist: