suprsync: Add error handling to each database transaction #973

BrianJKoopman · 2026-01-21T21:57:07Z

Description

This PR adds error handling for all database transactions within the run process. If an error occurs the agent waits for 5 seconds before continuing at the top of the process loop and retrying all operations.

I believe this is a safe operation, but it'd be good to get a second opinion. It may reattempt a file transfer (if files were transferred, but couldn't be marked as such), but that should be fine.

I added errors_sqlite to the list of errors in the counters stat, stored in session data, so we can see how often this is happening.

I also added some comments, mainly to describe whether a write operation was happening within the called functions. (I had thought implementing #886 would have helped us here, but it's mostly writes. Only in the case of hitting a lock during srfm.get_archive_stats() will this actually help. That said, this is a relatively slow step, especially as the sqlite file grows.)

Motivation and Context

We've seen various OperationalError messages in the suprsync agent, which can occur at any point in the process that interacts with the database. This is because the Pysmurf Monitor agent also writes to the database file. This should fix the regularly occurring crashes in the suprsync agents on site.

Resolves #483.
Resolves #874.

How Has This Been Tested?

This branch was run on the E2E testing system. Timestreams were generated with the SMuRF file emulator and then manually added to the suprsync database.

(.venv) ocs@ocs3:/mnt/nfs/data/ocs3/temp_data$ suprsync add-local-files --db ./suprsync.db timestreams/17690/ timestreams
Adding 2 files to the add to ./suprsync.db from /mnt/nfs/data/ocs3/temp_data/timestreams/17690
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.47it/s]

The timestream suprsync agent then picked up and copied the files:

2026-01-21T21:35:55+0000 Creating timecode dirs for recent files.....
2026-01-21T21:35:55+0000 Finished creating tcdirs
2026-01-21T21:37:06+0000 Copying files:
2026-01-21T21:37:06+0000 - /mnt/nfs/data/ocs3/temp_data/timestreams/17690/emulator2/1769031165_000.g3
2026-01-21T21:37:06+0000 - /mnt/nfs/data/ocs3/temp_data/timestreams/17690/emulator2/1769031151_000.g3
2026-01-21T21:37:08+0000 Checksumming on remote.
2026-01-21T21:37:08+0000 Copy session complete.

I don't really have a good method for testing the database lock and corresponding handling. But I'm satisfied that normal behavior works. Ideas for testing certainly welcome.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.

BrianJKoopman added 2 commits January 21, 2026 15:45

Add comments describing database interactions

61022ba

Add error handling for locked DB to all transactions

288549e

BrianJKoopman requested a review from kmharrington January 21, 2026 21:57

BrianJKoopman added bug Something isn't working agent: suprsync labels Jan 21, 2026

BrianJKoopman mentioned this pull request Jan 21, 2026

Make Agents robust to connection dropouts #721

Open

50 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

suprsync: Add error handling to each database transaction #973

suprsync: Add error handling to each database transaction #973

Uh oh!

BrianJKoopman commented Jan 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

suprsync: Add error handling to each database transaction #973

Are you sure you want to change the base?

suprsync: Add error handling to each database transaction #973

Uh oh!

Conversation

BrianJKoopman commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BrianJKoopman commented Jan 21, 2026 •

edited

Loading