Skip to content

Move instruction profiling to a dedicated thread#526

Open
mohitejaikumar wants to merge 1 commit intosolana-foundation:mainfrom
mohitejaikumar:feat/dedicated-profiling-thread
Open

Move instruction profiling to a dedicated thread#526
mohitejaikumar wants to merge 1 commit intosolana-foundation:mainfrom
mohitejaikumar:feat/dedicated-profiling-thread

Conversation

@mohitejaikumar
Copy link

@mohitejaikumar mohitejaikumar commented Feb 8, 2026

Implements #320

  1. Added start_profiling_runloop that spawns a long-lived "Instruction Profiler" thread via hiro_system_kit::thread_named.
  2. ProfilingJob struct - Packages all data needed for profiling (cloned SVM, transaction, accounts, etc.) to send across the thread boundary.
  3. Added Option<Sender<ProfilingJob>> to SurfnetSvm.
  4. Refactored fetch_all_tx_accounts_then_process_tx_returning_profile_res to send profiling jobs to the dedicated thread, execute the transaction on the current thread, then collect profiling results — running both in parallel.
  5. setup_profiling helper - Extracted profiling channel setup into a reusable function called from both start_local_surfnet_runloop and tests.
  6. Added Clone to IndexedLoadedAddresses and TransactionLoadedAddresses to enable moving data into the profiling job.

@mohitejaikumar
Copy link
Author

@lgalabru I’d really appreciate your review and any feedback you may have.

@MicaiahReid MicaiahReid self-requested a review March 3, 2026 17:42
@MicaiahReid
Copy link
Collaborator

This is really awesome, @mohitejaikumar! I still haven't gotten to test yet so I'm not ready to sign-off.

One concern I have - currently this code:

let ix_profiles = match ix_profile_rx {
    Some(rx) => tokio::task::block_in_place(|| rx.recv().ok().flatten()),
    None => None,
};

is happening directly after the transaction is processed for the real result.

So in the original implementation, the an instruction profile for a transaction with 10 instructions executed 10 transactions:

TX 1: (IX 1)
TX 2: (IX 1 + IX 2)
TX 3: (IX 1 + IX 2 + IX 3)
...
TX 10: Final tx with all ixs

Your implementation slightly parallelizes by spawning a thread for 1-9, awaiting tx 10, then awaiting the thread:

TX 1               TX 10
TX 2               (tx 10 complete, awaiting thread completion)
TX 3
...
TX 9

So this only shortens the number of txs the user is waiting on by 1.


Can we slightly expand this approach. Rather than waiting on the instruction profiling thread to complete, we return the original result without ix profiles and append them later? Later, if the user fetches the profile result for a signature/uuid, we'll fetch whatever we have, which is likely to include the profile result.

Thoughts @mohitejaikumar, @lgalabru?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants