Skip to content

Conversation

@jaydeepkumar1984
Copy link
Contributor

@jaydeepkumar1984 jaydeepkumar1984 commented Oct 3, 2024

CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution

Design doc:
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0

The Cassandra JIRA: https://issues.apache.org/jira/browse/CASSANDRA-19918

Code review comments

  • The comments mentioned in this PR are not exhaustive; a lot of review happened on 4.1 PR, so please refer to those comments as well here: autorepair v2 framework (against 4.1.6) #3367
  • There have been significant code reviews happend on individual sub-PRs, some of them are here:
  1. AutoRepair documentation improvements: AutoRepair documentation improvements jaydeepkumar1984/cassandra#22
  2. Implement preview_repaired auto-repair type: Implement preview_repaired auto-repair type jaydeepkumar1984/cassandra#4
  3. RangeRepairSplitter prototype: RangeRepairSplitter prototype jaydeepkumar1984/cassandra#3
  4. Make RepairTokenRangeSplitter the default implementation: Make RepairTokenRangeSplitter the default implementation jaydeepkumar1984/cassandra#18
  5. Update RepairTokenRangeSplitter to work with BTI-formatted SSTables Update RepairTokenRangeSplitter to work with BTI-formatted SSTables jaydeepkumar1984/cassandra#36
  6. Deprioritize repair on nodes whose replicas are actively being repaired Avoid returning MY_TURN when replicas are actively taking their turn jaydeepkumar1984/cassandra#51
  7. Update the documentation Auto Repair Documentation improvements, bytes_per_assignment to 50GiB jaydeepkumar1984/cassandra#47
  8. Avoid micro splitting of token ranges, specifically with vnodes Avoid micros splitting of token ranges for FixedSplitTokenRangeSplitter jaydeepkumar1984/cassandra#46
  9. Avoid selecting nodes when their replicas are already scheduling repairs https://github.com/jaydeepkumar1984/cassandra/pull/51/files

@jaydeepkumar1984 jaydeepkumar1984 force-pushed the trunk_cep_37 branch 2 times, most recently from ee6b375 to 2f42d8c Compare October 4, 2024 19:06
@jaydeepkumar1984 jaydeepkumar1984 changed the title [Draft] CEP-37 on Trunk CEP-37 on Trunk Oct 8, 2024
@michaelsembwever
Copy link
Member

michaelsembwever commented Oct 19, 2024

@masokol , @emolsson , @itskarlsson , @tommystendahl , @etedpet , @jwaeab , @VictorCavichioli , @SajidRiaz138 , @ch1bbe , @ArcturusMengsk , @DanielwEriksson , @manmagic3

as contributors to https://github.com/Ericsson/ecchronos we would very much appreciate any last-minute-pre-CEP-vote review to this PR for Cassandra's new repair solution. (more technical/code review will continue post-CEP vote.)

CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution

Design doc:
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0

@masokol
Copy link
Contributor

masokol commented Oct 21, 2024

Hi,

I think this looks promising! I have a few points:

  • Combine ranges instead of splitting - in ecChronos we saw huge improvements in some scenarios (when the data is low or empty tables) compared to repairing 1 vnode at a time. This improvement scaled with amount of vnodes, although it might've been related to overhead due to running repairs through JMX. Not sure but might be worth investigating.
  • Major versions, during major version upgrades like 3 -> 4 we weren't supposed to run repairs. If Cassandra plans to keep this then it would be nice for repairs to automatically pause during major version upgrades.
  • Observabliity, i saw there're metrics but it would also be nice to see repair status with nodetool.
  • Repair priority per table, not per node.

@jaydeepkumar1984
Copy link
Contributor Author

jaydeepkumar1984 commented Oct 21, 2024

Thanks, @masokol, for the review! Please find my response here:

  • Combine ranges instead of splitting - in ecChronos we saw huge improvements in some scenarios (when the data is low or empty tables) compared to repairing 1 vnode at a time. This improvement scaled with amount of vnodes, although it might've been related to overhead due to running repairs through JMX. Not sure but might be worth investigating.
  1. Generally, empty tables or tables with a small amount of data runs through pretty fast, in seconds, so it is not a major issue for smaller/empty tables.
  2. The current framework already has support to combine ranges through a setting (src/java/org/apache/cassandra/repair/autorepair/AutoRepair.java:266)
  • Major versions, during major version upgrades like 3 -> 4 we weren't supposed to run repairs. If Cassandra plans to keep this then it would be nice for repairs to automatically pause during major version upgrades.

This is a great suggestion, and the framework can be enhanced easily - I just filed a new sub-ticket CASSANDRA-20013 to track this

  • Observabliity, i saw there're metrics but it would also be nice to see repair status with nodetool.

There is already a new nodetool command that would print the current status. Please take a look at it here.

  • Repair priority per table, not per node.

Currently, it will repair the tables randomly, but it can be enhanced to add a priority as a CQL table property that an end user can configure, which can also be enhanced easily. Just added this enhancement to the ticket mentioned above.

Copy link
Contributor

@tolbertam tolbertam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still in process of reviewing but thought I'd leave comments as I go since I still have a bit left.

.readRepair(params.readRepair);
.readRepair(params.readRepair)
.automatedRepair(params.autoRepair)
;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move ";" on the line above

*/

package org.apache.cassandra.repair.autorepair;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra line

}
});


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra line

autoRepairProxy.setAutoRepairEnabled(repairType, enabled);
}

public void setRepairThreads(String repairType, int repairThreads)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/setRepairThreads/setAutoRepairThreads/ here and elsewhere in this class

public class AutoRepairMetrics
{
public static final String TYPE_NAME = "autorepair";
public Gauge<Integer> repairsInProgress;
Copy link
Contributor Author

@jaydeepkumar1984 jaydeepkumar1984 Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the other Gauge in this class can be final

opts.table_max_repair_time = new DurationSpec.IntSecondsBound("6h");
opts.materialized_view_repair_enabled = false;
opts.token_range_splitter = new ParameterizedClass(DEFAULT_SPLITTER.getName(), Collections.emptyMap());
opts.initial_scheduler_delay = new DurationSpec.IntSecondsBound("5m"); // 5 minutes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not needed here

COL_REPAIR_TYPE, COL_HOST_ID);

final static String RECORD_FINISH_REPAIR_HISTORY = String.format(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove empty line

}
}

private ImmutableMap<String, String> options;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a final.

return StringUtils.isNumeric(value);
}


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra line

.add("extensions", params.extensions);
.add("extensions", params.extensions)
.add("auto_repair", params.autoRepair.asMap());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the new line

.readRepair(getReadRepairStrategy(row));
.readRepair(getReadRepairStrategy(row))
.automatedRepair(AutoRepairParams.fromMap(row.getFrozenTextMap("auto_repair")))
;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it to the above line

private static final TableMetadata AutoRepairPriorityTable =
parse(AUTO_REPAIR_PRIORITY, "Auto repair priority for each group", AUTO_REPAIR_PRIORITY_CQL).build();


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove extra line

READ_REPAIR;
READ_REPAIR,
AUTO_REPAIR,
;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it above

public interface AutoRepairServiceMBean
{
/**
* Enable or disable auto-repair for a given repair type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is not needed as APIs are self explanatory

tolbertam and others added 12 commits April 18, 2025 09:06
Previously AutoRepairUtils.myTurnToRunRepair would return MY_TURN
if there is capacity to select a new node for repair and the current
node has the oldest lastRepairFinishTime.

This change adjusts this method to make use of a newly added method
getMostEligibleHostToRepair which will identify the node with the
oldest lastRepairFinishTime of which none of the node's replicas
are actively repairing.

This should prevent replicas from actively repairing concurrently,
which would increase load on replica sets and also create anticompaction
conflicts for incremental repair.

Behavior is configurable by allow_parallel_replica_repair and
allow_parallel_replica_repair_across_schedules configuration.

patch by Andy Tolbert, reviewed by Jaydeepkumar Chovatia and Francisco Guerrero for CASSANDRA-20180
@jaydeepkumar1984
Copy link
Contributor Author

Rebase on top of the latest trunk has been completed; all the unit tests are passing on CircleCI

Many dtests are failing on CircleCI due to resource limitations. For that, I have run them internally inside our infrastructure and most of them are passing as expected.

Copy link
Contributor

@frankgh frankgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I left a few minor comments. Also I have a few NITs , so feel free to ignore those


public void setIncrementalRepairDiskHeadroomRejectRatio(double value)
{
DatabaseDescriptor.setIncrementalRepairDiskHeadroomRejectRatio(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we validate for only positive numbers to be set here between the values of 0-1?

* // TODO: TCM - how do we evolve these tables?
*/
public static final long GENERATION = 6;
public static final long GENERATION = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a description for gen 7 above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* <p>
* While this splitter has a lot of tuning parameters, the expectation is that the established default configuration
* shall be sensible for all {@link org.apache.cassandra.repair.autorepair.AutoRepairConfig.RepairType}'s. The following
* configuration parameters are offered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we intend to list the configuration parameters below in the javadocs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated

Copy link
Contributor

@frankgh frankgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

jaydeepkumar1984 and others added 2 commits April 21, 2025 15:49
…0572

Validate disk headroom for IR and a couple of comment updates
@belliottsmith belliottsmith force-pushed the trunk branch 2 times, most recently from df3eb40 to 54e39a9 Compare July 23, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants