-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CEP-37 on Trunk #3598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CEP-37 on Trunk #3598
Conversation
ee6b375 to
2f42d8c
Compare
75b0bad to
a145408
Compare
db7d73b to
d3589c2
Compare
|
@masokol , @emolsson , @itskarlsson , @tommystendahl , @etedpet , @jwaeab , @VictorCavichioli , @SajidRiaz138 , @ch1bbe , @ArcturusMengsk , @DanielwEriksson , @manmagic3 as contributors to https://github.com/Ericsson/ecchronos we would very much appreciate any last-minute-pre-CEP-vote review to this PR for Cassandra's new repair solution. (more technical/code review will continue post-CEP vote.) |
|
Hi, I think this looks promising! I have a few points:
|
|
Thanks, @masokol, for the review! Please find my response here:
This is a great suggestion, and the framework can be enhanced easily - I just filed a new sub-ticket CASSANDRA-20013 to track this
There is already a new nodetool command that would print the current status. Please take a look at it here.
Currently, it will repair the tables randomly, but it can be enhanced to add a priority as a CQL table property that an end user can configure, which can also be enhanced easily. Just added this enhancement to the ticket mentioned above. |
d3589c2 to
3610701
Compare
5730aa2 to
126684a
Compare
ec99162 to
f9f6971
Compare
95d0a5e to
e093b34
Compare
tolbertam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still in process of reviewing but thought I'd leave comments as I go since I still have a bit left.
test/unit/org/apache/cassandra/service/AutoRepairServiceSetterTest.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tcm/sequences/BootstrapAndJoin.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/config/YamlConfigurationLoaderTest.java
Outdated
Show resolved
Hide resolved
| .readRepair(params.readRepair); | ||
| .readRepair(params.readRepair) | ||
| .automatedRepair(params.autoRepair) | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move ";" on the line above
| */ | ||
|
|
||
| package org.apache.cassandra.repair.autorepair; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra line
| } | ||
| }); | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra line
| autoRepairProxy.setAutoRepairEnabled(repairType, enabled); | ||
| } | ||
|
|
||
| public void setRepairThreads(String repairType, int repairThreads) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/setRepairThreads/setAutoRepairThreads/ here and elsewhere in this class
| public class AutoRepairMetrics | ||
| { | ||
| public static final String TYPE_NAME = "autorepair"; | ||
| public Gauge<Integer> repairsInProgress; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the other Gauge in this class can be final
| opts.table_max_repair_time = new DurationSpec.IntSecondsBound("6h"); | ||
| opts.materialized_view_repair_enabled = false; | ||
| opts.token_range_splitter = new ParameterizedClass(DEFAULT_SPLITTER.getName(), Collections.emptyMap()); | ||
| opts.initial_scheduler_delay = new DurationSpec.IntSecondsBound("5m"); // 5 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is not needed here
| COL_REPAIR_TYPE, COL_HOST_ID); | ||
|
|
||
| final static String RECORD_FINISH_REPAIR_HISTORY = String.format( | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove empty line
| } | ||
| } | ||
|
|
||
| private ImmutableMap<String, String> options; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a final.
| return StringUtils.isNumeric(value); | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra line
| .add("extensions", params.extensions); | ||
| .add("extensions", params.extensions) | ||
| .add("auto_repair", params.autoRepair.asMap()); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove the new line
| .readRepair(getReadRepairStrategy(row)); | ||
| .readRepair(getReadRepairStrategy(row)) | ||
| .automatedRepair(AutoRepairParams.fromMap(row.getFrozenTextMap("auto_repair"))) | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move it to the above line
| private static final TableMetadata AutoRepairPriorityTable = | ||
| parse(AUTO_REPAIR_PRIORITY, "Auto repair priority for each group", AUTO_REPAIR_PRIORITY_CQL).build(); | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove extra line
| READ_REPAIR; | ||
| READ_REPAIR, | ||
| AUTO_REPAIR, | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move it above
| public interface AutoRepairServiceMBean | ||
| { | ||
| /** | ||
| * Enable or disable auto-repair for a given repair type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is not needed as APIs are self explanatory
…geSplitter.java Co-authored-by: Francisco Guerrero <[email protected]>
…geSplitter.java Co-authored-by: Francisco Guerrero <[email protected]>
Previously AutoRepairUtils.myTurnToRunRepair would return MY_TURN if there is capacity to select a new node for repair and the current node has the oldest lastRepairFinishTime. This change adjusts this method to make use of a newly added method getMostEligibleHostToRepair which will identify the node with the oldest lastRepairFinishTime of which none of the node's replicas are actively repairing. This should prevent replicas from actively repairing concurrently, which would increase load on replica sets and also create anticompaction conflicts for incremental repair. Behavior is configurable by allow_parallel_replica_repair and allow_parallel_replica_repair_across_schedules configuration. patch by Andy Tolbert, reviewed by Jaydeepkumar Chovatia and Francisco Guerrero for CASSANDRA-20180
9feb931 to
89b4ab6
Compare
|
Rebase on top of the latest trunk has been completed; all the unit tests are passing on CircleCI Many dtests are failing on CircleCI due to resource limitations. For that, I have run them internally inside our infrastructure and most of them are passing as expected. |
frankgh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I left a few minor comments. Also I have a few NITs , so feel free to ignore those
|
|
||
| public void setIncrementalRepairDiskHeadroomRejectRatio(double value) | ||
| { | ||
| DatabaseDescriptor.setIncrementalRepairDiskHeadroomRejectRatio(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we validate for only positive numbers to be set here between the values of 0-1?
| * // TODO: TCM - how do we evolve these tables? | ||
| */ | ||
| public static final long GENERATION = 6; | ||
| public static final long GENERATION = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a description for gen 7 above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| * <p> | ||
| * While this splitter has a lot of tuning parameters, the expectation is that the established default configuration | ||
| * shall be sensible for all {@link org.apache.cassandra.repair.autorepair.AutoRepairConfig.RepairType}'s. The following | ||
| * configuration parameters are offered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we intend to list the configuration parameters below in the javadocs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorporated
src/java/org/apache/cassandra/repair/autorepair/RepairTokenRangeSplitter.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Francisco Guerrero <[email protected]>
…geSplitter.java Co-authored-by: Francisco Guerrero <[email protected]>
Co-authored-by: Francisco Guerrero <[email protected]>
Co-authored-by: Francisco Guerrero <[email protected]>
Co-authored-by: Francisco Guerrero <[email protected]>
frankgh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
…0572 Validate disk headroom for IR and a couple of comment updates
Co-authored-by: Francisco Guerrero <[email protected]>
df3eb40 to
54e39a9
Compare
CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+Apache+Cassandra+Unified+Repair+Solution
Design doc:
https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit?tab=t.0#heading=h.r112r46toau0
The Cassandra JIRA: https://issues.apache.org/jira/browse/CASSANDRA-19918
Code review comments