fix: start_block misalignment caused by SIGSTOP #552

amissael95 · 2025-10-28T03:18:33Z

Summary of changes

This pull request fix the following issue:

When SIGSTOP is received while writing data to tape, it could cause a position mismatch between what is present in LE cache and the one reported by the drive. If that is the case, the index is not written in the position described by its self-pointer, and the start_block of subsequent files do not reflect the actual position where the file was written, giving LTFS error LTFS11089E when reading those files.

By making the following changes we prevent this scenario:

Checked drive position before writing the index
Caught SIGCONT, set a flag and do the position check for writes
Displayed message LTFS17294I when SIGCONT is received.

Description

Motivation and context for each change.

When doing the sync process, ltfs code calls ltfs_write_index, so avoid writing the index if for any reason the current position reported by LE is not the same as real reported by the drive.
Some customers may not do Sync frequently so we may need a quicker way to detect the issue, the current method is catching SIGCONT, set a flag and read it in tape_write function, since SIGSTOP signal cannot be caught.
It is helpful to explicitly logging when ltfs process receives a SIGCONT signal.

Type of change

Bug fix

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have confirmed my fix is effective or that my feature works

…ileSystem#516)

piste-jp · 2025-10-28T10:17:26Z

It doesn't make sense that position mismatch happens after SIGSTOP/SIGCONT.

Please open an issue and show me a procedure how to recreate and a detail scenario how this problem happens.

madjesc · 2025-10-28T15:58:20Z

It doesn't make sense that position mismatch happens after SIGSTOP/SIGCONT.

Please open an issue and show me a procedure how to recreate and a detail scenario how this problem happens.

It does happen.
I have a code that can replicate this but out of ltfs.

The problem is more sg related, after executing the ioctl request to the sg driver, if the SIGSTOP happens while ltfs is waiting for the response the sg driver would replicate the request and if we SIGCONT the program would not be aware of this.

In a write loop like this:

int count = 0;
wile(sg_write(...) > 0) {
  count++;
}

Lets say that we wrote 10 blocks, after the loop ends the count variable would be 10, but in the tape we wrote more than 10 blocks because of this.
LTFS is not aware after writing the extent that the sg driver duplicated some blocks, so its better if we check the position after the extent is written.

amissael95 · 2025-10-28T17:04:48Z

Hello @piste-jp, thanks for your comment.

We have opened an issue in RHEL regarding the sg driver behavior mentioned by @madjesc.
The issue is reached on LTFS after some retries of sending the SIGSTOP/SIGCONT signal while ltfs is written data. This issue is more evident if you send SIGSTOP/SIGCONT while writing files whose length is not multiple of 512KiB, since they will not fill the last block size, then when reading those files ltfs will fail with LTFS11089E because the start_block in the index will not point to the real start_block of the file.

LTFS11089E Cannot read: expected 524288 bytes from the medium, but received %u bytes.

We will open an issue in this repository showing this.

piste-jp · 2025-10-29T13:43:44Z

We have opened an issue in RHEL regarding the sg driver behavior mentioned by @madjesc

I cannot open the link provided

I will open an issue in this repository showing this.

Please do this soon, because
- PR is code change request, problem itself shall be discussed into an issue
- PR shall be linked to an issue to be solved

piste-jp · 2025-10-29T13:54:39Z

I'm feeling this PR is really bad idea.

I believe SIGSTOP is used only from developer for debug
The code have a serious performance problem
The code is effective not only to the problem but also wider environment
- Please think again this problem happens only on sg or all other backends

So I strongly recommend that you open an issue to describe the problem first and start again from the first step.

And also I strongly suggest to close this PR once.

madjesc · 2025-10-29T14:28:46Z

I'm feeling this PR is really bad idea.

I believe SIGSTOP is used only from developer for debug

System tap also send this signals. Users that have systemtap can encounter this behavior.

The code have a serious performance problem

I don't see how a simple position check after the extent is written is gonna be a performance problem. Can you elaborate?

The code is effective not only to the problem but also wider environment

Please think again this problem happens only on sg or all other backends

I also have a test with the lintape driver and it does not happen.

So I strongly recommend that you open an issue to describe the problem first and start again from the first step.
And also I strongly suggest to close this PR once.

Yes, I will open an issue with the detailed explanation and a code to replicate. But why close the PR? If we conclude in the issue that this is not needed, sure we will close the issue.

vandelvan and others added 8 commits May 8, 2025 11:00

Update: validate_error_messages.py to work using python3 (LinearTapeF…

50f611f

…ileSystem#516)

Adding logic for the feasibility check (position check)

33807bf

Protecting LE agains error caused by SIGSTOP and SIGCONT

c89ef57

Using message 17294I when receiving SIGCONT

0cc254e

Failing against mismatch, make tape to require validation

b24237b

Merge branch 'v2.4-stable' into v2.4.8-windows-support

ecefedd

Merge branch 'v2.4.8-windows-support' into fix/position_misalignment

9c20a47

Removing temporal messages

855ae40

amissael95 requested review from XV02, madjesc and vandelvan October 28, 2025 03:19

amissael95 changed the title ~~Prevent start_block misalignment caused by SIGSTOP~~ fix: start_block misalignment caused by SIGSTOP Oct 28, 2025

Removing unneeded messages

1b904ff

amissael95 requested a review from syaoraang October 28, 2025 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: start_block misalignment caused by SIGSTOP #552

fix: start_block misalignment caused by SIGSTOP #552

amissael95 commented Oct 28, 2025 •

edited

Loading

Uh oh!

piste-jp commented Oct 28, 2025 •

edited

Loading

Uh oh!

madjesc commented Oct 28, 2025

Uh oh!

amissael95 commented Oct 28, 2025 •

edited by madjesc

Loading

Uh oh!

piste-jp commented Oct 29, 2025 •

edited by madjesc

Loading

Uh oh!

piste-jp commented Oct 29, 2025 •

edited

Loading

Uh oh!

madjesc commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: start_block misalignment caused by SIGSTOP #552

Are you sure you want to change the base?

fix: start_block misalignment caused by SIGSTOP #552

Conversation

amissael95 commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Description

Type of change

Checklist:

Uh oh!

piste-jp commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madjesc commented Oct 28, 2025

Uh oh!

amissael95 commented Oct 28, 2025 • edited by madjesc Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

piste-jp commented Oct 29, 2025 • edited by madjesc Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

piste-jp commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madjesc commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amissael95 commented Oct 28, 2025 •

edited

Loading

piste-jp commented Oct 28, 2025 •

edited

Loading

amissael95 commented Oct 28, 2025 •

edited by madjesc

Loading

piste-jp commented Oct 29, 2025 •

edited by madjesc

Loading

piste-jp commented Oct 29, 2025 •

edited

Loading