Add ~find_best~ profileset_cull feature for profileset early elimination #10597

thebigjc · 2025-09-03T02:59:18Z

Raidbots was struggling today, so I tried running simc locally and noticed it was trying to hit the same error for each run, even when it's clear that a profileset is not going to be the 'winner'. If the user cares about finding the 'best' configuration over a number of profilesets, this will speed it up greatly.

Implements statistical confidence-based early elimination for profilesets to avoid simulating clearly inferior gear combinations. Uses confidence intervals to compare candidates against the current best, with configurable safety margins.

Options:

profileset_cull=1 to enable
profileset_cull_verbose=N for verbosity control
profileset_cull_method= {t_test (default), ci}
profileset_cull_min_iterations=N (default 100, min 30 before t-test breaks)
profileset_cull_alpha=N (default 0.05, used as the alpha for the t-test)

renanthera · 2025-09-08T03:38:41Z

I've always wanted this, but never sat down to actually get this going. Thanks for your work on this PR so far!

I find the naming of variables in this commit to be a little dense/hard to parse. I've oftentimes seen this sort of technique referred to as "profileset culling", or things like that.

Instead of checking whether or not the target error CIs are disjoint (which is more or less what is occurring in Raidbots), considering instead a two sample t-test to determine if the means are not equal may be a better strategy. I would also like the option to use a variety of culling methods, if only for the ability to experiment with alternate schemes.

Much of the code could become a member function of find_best_state_t, which could better allow for modularity for culling strategies.

vituscze · 2025-09-08T20:10:06Z

engine/report/json/report_json.cpp

+                     if ( sim.find_best.enabled )
+                     {
+                       obj[ "find_best_best" ] = ( sim.find_best.best_name == profileset->name() );
+                       if ( sim.find_best.best_name == profileset->name() )
+                       {
+                         obj[ "find_best_best_error" ] = sim.find_best.best_error;
+                       }
+                     }
+


Is there any particular reason why this is repeated for each profileset? Seems like attaching it to the baseline sim would make more sense. Also, there should probably be some indication that the find_best mode is on in the HTML report.

vituscze · 2025-09-08T20:12:50Z

engine/sim/profileset.cpp

+    if ( profile_sim->find_best_eliminated )
+    {
+      fmt::print( stderr, "\nProfileset '{}' early-stopped: {}\n", set.name(), profile_sim->find_best_reason );
+    }


This doesn't look like an early stop. You eliminate a profileset by telling simc that it ran enough iterations, not that the sim failed.

vituscze · 2025-09-08T20:15:55Z

engine/sim/sim.cpp

+    double abs_error = ( current_error / 100.0 ) * current_mean; 
+
+    // ensure enough iterations
+    if ( work_queue->progress().current_iterations >= s.min_iterations )


Is it necessary to keep asking work_queue->progress() multiple times? Also, does it work correctly when strict_work_queue is on, in which case the children don't share the same work queue.

vituscze · 2025-09-08T20:17:09Z

engine/sim/sim.hpp

+    double best_mean = 0.0;         // mean of current best
+    double best_error = 0.0;        // absolute half-width (same units as mean)
+    unsigned best_iterations = 0;   // iterations when last updated
+    bool best_precision_satisfied = false; // winner precision threshold reached


This doesn't seem to do anything. You only ever write into it.

vituscze · 2025-09-08T20:18:04Z

engine/sim/sim.hpp

+    bool best_precision_satisfied = false; // winner precision threshold reached
+    // Configuration copied from options (stored here for child sims to read)
+    int    min_iterations = 500;          // minimum iterations before evaluating elimination
+    double winner_precision = -1.0;       // percent relative error threshold (same unit as target_error/current_error)


This is only used with best_precision_satisfied (see above) and as such doesn't do anything.

vituscze · 2025-09-08T20:22:18Z

engine/sim/sim.cpp

+      }
+
+      // If no best yet, promote self
+      if ( s.best_name.empty() )


What about the baseline sim? Seems like a waste to ignore it when it typically finishes (with full precision) before any of the profilesets even start.

Agreed. I've changed the approach to start from the baseline sim.

vituscze · 2025-09-08T20:23:29Z

engine/sim/sim.cpp

+          double safety = s.elim_safety_margin_frac > 0 ? s.elim_safety_margin_frac * s.best_mean : 0.0;
+          double candidate_upper = current_mean + abs_error + safety;
+          double best_lower = s.best_mean - s.best_error;


As kate mentioned, this could really use a proper statistical test. Especially the safety margin seems pretty ad-hoc.

There are some valid reasons to compare using a CI, but it's very (probably too) conservative. It was the easiest to implement given what you already had, but it's not the bet it could be. The biggest barrier on the t-test is that we don't have a good statistics library - I'll add in an quick implementation, but it would be easier if we brought in Boost.Math. If you'd rather do that, let me know after you see the next patch

vituscze · 2025-09-08T20:25:02Z

engine/sim/sim.cpp

+            work_queue->unlock(); 
+            interrupt();


work_queue uses a reentrant lock partly so that you can interrupt the sim without having to let the lock go.

vituscze · 2025-09-08T20:26:26Z

engine/sim/sim.cpp

+  add_option( opt_bool( "find_best", find_best.enabled ) );
+  add_option( opt_string( "find_best_metric", find_best_metric_str ) );
+  add_option( opt_int( "find_best_min_iterations", find_best.min_iterations ) );
+  add_option( opt_float( "find_best_winner_precision", find_best.winner_precision ) );
+  add_option( opt_float( "find_best_elim_safety_margin", find_best.elim_safety_margin_frac ) );
+  add_option( opt_int( "find_best_verbose", find_best.verbose ) );


A minor point, but maybe a more descriptive name would be better for the options (and the whole mode). "Finding best" is kinda what the sim does normally, so this doesn't really provide any more info about what's going on.

I'm going to adopt profileset_cull as the prefix for everything going forward.

thebigjc · 2025-09-08T21:06:23Z

Thanks for the great feedback! I'll share some thoughts and another patch in the next day or so!

thebigjc · 2025-09-10T20:12:01Z

Okay - I cleaned up the approach and added the t-test and made it the default. It's a little messy to pick between the CI approach and the t-test approach, but I've tried to keep it clean enough that it could be refactored into something more 'strategy pattern' like if we want to go down that path.

Also added the HTML and JSON report outputs.

I like the t-test approach - it is more 'optimistic' than the CI approach. One 'theoretical' concern is that we are doing a lot of repeated comparisons so we might want to either make alpha smaller by default, or take an approach that is more theoretically sound for this kind of sequential arm search. (Interestingly enough, it looks more like the Confidence Interface approach, but with a different method for calculating the bounds)

Let me know if you have any other feedback here.

thebigjc · 2025-09-10T20:14:53Z

Oh, I added a quick and dirty t-test into the code. If we'd rather use Boost.Math I can change that, but as I mentioned above I didn't see that anywhere in the project so I didn't want to add a dependency.

vituscze

I'll go over the code in more detail later, so I'm just commenting on some stuff that stood out when I took a quick look.

Regarding the statistical test: I played with it some more after writing the initial comment and while the CI overlap testing is pretty conservative (and perhaps not entirely statistically sound) it might not necessarily be a bad thing. After trying it with some dummy data, when the 95% CIs just barely didn't overlap, a proper statistical test (a one-tailed Welch's t-test) gave me p-values of around 0.002-0.003. So yeah, I agree with your point: if we're gonna be culling potentially thousands of profilesets, a lower alpha is probably warranted (I'm thinking 0.005?).

The general approach of simc is to keep the dependencies pretty light, so we'd rather avoid bringing in Boost.Math for a single statistical test.

engine/sim/sim.cpp

thebigjc · 2025-09-12T23:40:12Z

Regarding the statistical test: I played with it some more after writing the initial comment and while the CI overlap testing is pretty conservative (and perhaps not entirely statistically sound) it might not necessarily be a bad thing. After trying it with some dummy data, when the 95% CIs just barely didn't overlap, a proper statistical test (a one-tailed Welch's t-test) gave me p-values of around 0.002-0.003. So yeah, I agree with your point: if we're gonna be culling potentially thousands of profilesets, a lower alpha is probably warranted (I'm thinking 0.005?).

The 'traditional' method would be to take a single alpha (0.05) and divide by the number of comparisons (profilesets, or arms, k), called a 'bonferroni correction' which is a much smaller value than you're suggesting (0.05 / 1000 -> 0.00005). As you saw, this is a VERY conservative number and wouldn't prune many, if any, arms. Other classical approaches refine this, but have requirements that don't fit into our 'sequential' model of testing each profileset (arm) (Holm-Bonferroni, which requires all the p-values to exist and be sorted)

There is a whole sub-field of stats here (SAVI) we could get into for the exact type of test that is right here if we want to very 'tight', or we could start with the CI approach, which is a little hand wavy, and add in the 'more correct' approach later.

Or we could do the research and add in the best option now!

thebigjc · 2025-09-12T23:42:13Z

We don't need to pull in the library, but https://github.com/gostevehoward/confseq is an example of the 'theoretically correct' way to do this.

engine/report/json/report_json.cpp

engine/report/report_html_sim.cpp

engine/sim/profileset.cpp

engine/sim/sim.cpp

vituscze · 2025-09-14T14:41:28Z

engine/sim/sim.cpp

+    if ( util::str_compare_ci( value, "ci" ) )
+      sim->profileset_cull.method = sim_t::profileset_cull_state_t::CI_OVERLAP;
+    else if ( util::str_compare_ci( value, "t_test" ) )
+      sim->profileset_cull.method = sim_t::profileset_cull_state_t::T_TEST;
+    else {
+      sim->error( "Invalid profileset_cull_method '{}', valid options: ci, t_test", value );
+      return false;


There are some enum helper functions that can automatically derive a parser (i.e. string -> enum function) when given an enum -> string function. Not really a big deal, but might be convenient in case more methods are added down the line.

To use those, I'd have to move the enum out of being declared inline, and touch a bunch of extra files. I cleaned this up by moving the parsing out of line here. If you want the full change (move the enum declaration to sc_enum.hpp, etc) I can do that too, but it seemed like more than you intended.

thebigjc · 2025-09-25T22:38:21Z

PR comments addressed. Let me know if you have any other feedback!

Implements statistical confidence-based early elimination for profilesets to avoid simulating clearly inferior gear combinations. Uses Bayesian confidence intervals to compare candidates against the current best, with configurable safety margins. Key features: - Early elimination based on statistical confidence intervals - Default verbosity 0 (quiet), always shows new best discoveries - Configurable minimum iterations and winner precision thresholds - Thread-safe implementation with mutex protection - Integration with existing profileset workflow Options: - find_best=1 to enable - find_best_verbose=N for verbosity control - find_best_min_iterations=N (default 500) - find_best_winner_precision=N (default inherits target_error) - find_best_safety_margin=N (default 0.001) (cherry picked from commit d4cf4eea35ecbea4a39cd755ebe35abe7bc4e028)

(cherry picked from commit a5113b7f59c0221046ffa0b38d278bc90cf32dc1)

(cherry picked from commit 010eead3109828e57c9d88b5f6ba1338b8ce3f9d)

(cherry picked from commit d1e3e536466fdb0d86de5d4256194d146a423346)

(cherry picked from commit 12472f597fdf91cb43d18d8dbec1c53fd9b3eaa8)

(cherry picked from commit 7de1c5a7b1e26a4ea6012274f4fef4e4409f3d0a)

(cherry picked from commit c61fcd4e3d93c97fedee0582af60290554a76f7a)

…N, and simplified iteration counts. Cleaned up method parsing (cherry picked from commit 84d2402bb345925c1d6c46bbfda93d95090b3547)

(cherry picked from commit c3ac34bb71cb1d08cf631de46318859d451eab58)

thebigjc · 2025-09-30T17:19:49Z

Rebased

vituscze

Finally had some time to go over it in more detail and I don't think I have any other comments. I'll try to play with it to get a feel for the culling with some real profiles and I'll poke another dev to take a look.

thebigjc · 2025-10-19T00:19:18Z

Thanks for the update!

nyterage · 2025-10-21T01:44:16Z

engine/sim/sim.cpp


  double mean_total=0;
  int mean_count=0;
+  double current_standard_error = 0.0;


value of this variable outside of single_actor_batch sims is never set, leading to t_test always seeing a se of 0. Thus, breaking the t_test culling logic.

Hmm , that's a good catch, but I'm not exactly sure how to test a fix.

I haven't used multi-actor simulation with profilesets. I see how to get a valid se in the other arm of !single_actor_batch, but I'm not really sure what the right behavior would be. Do you have a good test case I could validate against?

Alternatively I can try to some multiactor tests and just figure it out.

It looks like there are already hacks in the case of multi actor metrics. Perhaps we should just fail if culling with multiactor is used? Happy to take a suggestion.

only reason i noticed is because it came up in my testing. Doesn't even have to be multi-actor, just a base profile without single_actor_batch=1 set, and the profilesets.

Test case for me was as follows

Basic Unholy Death Knight profile
46k talent combination profilesets

best solution is probably throwing an error in the is_multiactor_metric check if its trying to use culling, and setting the variable value in the else function though.

Thinking on the above comment a bit more, might be better to just set the variable no matter if its multi-actor or not, as throwing an error in multi-actor will just prevent it from working on augmentation sims.

Thanks - I'll have some time on the weekend to work on this. I'll make a change and push a patch once I have it.

vituscze reviewed Sep 8, 2025

View reviewed changes

thebigjc force-pushed the thewarwithin branch from 992953d to 11d49c0 Compare September 10, 2025 20:08

thebigjc changed the title ~~Add find_best feature for profileset early elimination~~ Add ~find_best~ profileset_cull feature for profileset early elimination Sep 10, 2025

vituscze reviewed Sep 12, 2025

View reviewed changes

engine/sim/sim.cpp Outdated Show resolved Hide resolved

vituscze reviewed Sep 14, 2025

View reviewed changes

thebigjc added 9 commits September 30, 2025 13:14

clean up spurious comments

1390b05

(cherry picked from commit a5113b7f59c0221046ffa0b38d278bc90cf32dc1)

Changed default

630e946

(cherry picked from commit 010eead3109828e57c9d88b5f6ba1338b8ce3f9d)

good changes for patch

d6e6ad4

(cherry picked from commit d1e3e536466fdb0d86de5d4256194d146a423346)

rest of culling changes

635efcb

(cherry picked from commit 12472f597fdf91cb43d18d8dbec1c53fd9b3eaa8)

add baseline

162ff7d

(cherry picked from commit 7de1c5a7b1e26a4ea6012274f4fef4e4409f3d0a)

switch order of interupt / unlock

f5d88d0

(cherry picked from commit c61fcd4e3d93c97fedee0582af60290554a76f7a)

addressed PR comments. Moved HTML down, and collapsed, cleaned up JSO…

2e4dd14

…N, and simplified iteration counts. Cleaned up method parsing (cherry picked from commit 84d2402bb345925c1d6c46bbfda93d95090b3547)

missed one std_error

bd87830

(cherry picked from commit c3ac34bb71cb1d08cf631de46318859d451eab58)

thebigjc force-pushed the thewarwithin branch from a3e0bfb to bd87830 Compare September 30, 2025 17:15

thebigjc requested a review from vituscze September 30, 2025 17:19

vituscze reviewed Oct 19, 2025

View reviewed changes

nyterage reviewed Oct 21, 2025

View reviewed changes

renanthera mentioned this pull request Oct 21, 2025

Generalized Sim Control #10681

Merged

Add ~find_best~ profileset_cull feature for profileset early elimination #10597

Are you sure you want to change the base?

Add ~find_best~ profileset_cull feature for profileset early elimination #10597

Uh oh!

Conversation

thebigjc commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

renanthera commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vituscze Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thebigjc commented Sep 8, 2025

Uh oh!

thebigjc commented Sep 10, 2025

Uh oh!

thebigjc commented Sep 10, 2025

Uh oh!

vituscze left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thebigjc commented Sep 12, 2025

Uh oh!

thebigjc commented Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thebigjc commented Sep 25, 2025

Uh oh!

thebigjc commented Sep 30, 2025

Uh oh!

vituscze left a comment

Choose a reason for hiding this comment

Uh oh!

thebigjc commented Oct 19, 2025

Uh oh!

nyterage Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thebigjc commented Sep 3, 2025 •

edited

Loading

renanthera commented Sep 8, 2025 •

edited

Loading

vituscze Sep 8, 2025 •

edited

Loading

vituscze left a comment •

edited

Loading

nyterage Oct 21, 2025 •

edited

Loading

nyterage Oct 21, 2025 •

edited

Loading