-
Notifications
You must be signed in to change notification settings - Fork 128
fix(udb): fix rocksdb conflict tracker #3225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(udb): fix rocksdb conflict tracker #3225
Conversation
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
💡 Enable Vercel Agent with $100 free credit for automated AI reviews |
Claude encountered an error —— View job I'll analyze this and get back to you. |
How to use the Graphite Merge QueueAdd the label merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
commit: |
ac3ffeb
to
ae7e816
Compare
3723de7
to
c504886
Compare
Pull Request Review: Fix RocksDB Conflict TrackerSummaryThis PR refactors the RocksDB conflict tracking mechanism from an eager, per-operation conflict checking system to a deferred, commit-time validation system. This is a significant architectural improvement that aligns better with optimistic concurrency control patterns. Code Quality and Architecture✅ Strengths
Potential Issues🔴 Critical: Incorrect Conflict Detection LogicLocation: The conflict detection logic has a critical bug: // Check txn versions overlap
if txn1_start_version > txn2.start_version && txn1_start_version < txn2.commit_version {
for (cr1_start, cr1_end, cr1_type) in &txn1_conflict_ranges {
for (cr2_start, cr2_end, cr2_type) in &txn2.conflict_ranges {
// Check conflict ranges overlap
if cr1_start < cr2_end && cr2_start < cr1_end && cr1_type != cr2_type {
return true;
}
}
}
} Issues:
Recommended fix: pub async fn check_and_insert(
&self,
txn1_start_version: u64,
txn1_conflict_ranges: Vec<(Vec<u8>, Vec<u8>, ConflictRangeType)>,
) -> bool {
let mut txns = self.txns.lock().await;
// Prune old entries
txns.retain(|txn| txn.insert_instant.elapsed() < TXN_CONFLICT_TTL);
for txn2 in &*txns {
// Check if transactions overlap in time
// txn1 conflicts with txn2 if txn1 started before txn2 committed
if txn1_start_version < txn2.commit_version {
for (cr1_start, cr1_end, cr1_type) in &txn1_conflict_ranges {
for (cr2_start, cr2_end, cr2_type) in &txn2.conflict_ranges {
// Check if ranges overlap
let ranges_overlap = cr1_start < cr2_end && cr2_start < cr1_end;
// Conflict if at least one is a write
let has_write = matches!(cr1_type, ConflictRangeType::Write)
|| matches!(cr2_type, ConflictRangeType::Write);
if ranges_overlap && has_write {
return true;
}
}
}
}
}
// Only assign commit version after successfully checking conflicts
let txn1_commit_version = self.next_global_version();
// If no conflicts were detected, save txn data
txns.push(PreviousTransaction {
insert_instant: Instant::now(),
start_version: txn1_start_version,
commit_version: txn1_commit_version,
conflict_ranges: txn1_conflict_ranges,
});
false
} 🟡 Medium: Missing Cleanup on Transaction FailureLocation: When a RocksDB commit fails, you call
🟡 Medium: Transaction Timeout HandlingLocation: The transaction timeout is set to 5 seconds: const TXN_TIMEOUT: Duration = Duration::from_secs(5); But when a timeout occurs, you return
🟡 Medium: SeqCst Ordering May Be OverkillLocation: You're using 🟢 Minor: Unused VariableLocation: The 🟢 Minor: DocumentationThe comment at // Transactions timeout after 5 seconds, but we keep conflict data for 10 seconds
// to ensure proper conflict detection for any in-flight commits
const TXN_CONFLICT_TTL: Duration = Duration::from_secs(10); Performance Considerations✅ Improvements
|
Code Review: Fix RocksDB Conflict TrackerOverviewThis PR refactors the RocksDB conflict tracking mechanism from an eager, pre-operation conflict detection approach to a lazy, commit-time validation approach. This is a significant architectural improvement that aligns the RocksDB driver more closely with how optimistic concurrency control typically works. Positive Changes1. Improved Conflict Detection Architecture ✅
2. Simplified Transaction Logic ✅
3. Better Concurrency Primitives ✅
4. Transaction Timeout ✅
5. Test Coverage ✅
Issues and Concerns1. Critical Bug: Incorrect Conflict Detection Logic 🔴 if txn1_start_version > txn2.start_version && txn1_start_version < txn2.commit_version {
// Check ranges...
if cr1_start < cr2_end && cr2_start < cr1_end && cr1_type != cr2_type {
return true;
}
} Issues:
Correct logic should be: // Check if transactions overlap in time
if txn1_start_version < txn2.commit_version && txn2.start_version < txn1_commit_version {
for (cr1_start, cr1_end, cr1_type) in &txn1_conflict_ranges {
for (cr2_start, cr2_end, cr2_type) in &txn2.conflict_ranges {
// Check if ranges overlap
if cr1_start < cr2_end && cr2_start < cr1_end {
// Conflict if at least one is a write
if *cr1_type == ConflictRangeType::Write || *cr2_type == ConflictRangeType::Write {
return true;
}
}
}
}
} 2. Transaction Cleanup Issue self.txn_conflict_tracker.remove(start_version).await; However, the transaction was already inserted into the tracker at line 379. If a RocksDB conflict occurs (line 395), you correctly return Suggestion: Only remove from tracker on non-conflict errors, or reconsider whether removal is needed at all since TTL-based cleanup handles this. 3. Missing Conflict Ranges Looking at 4. Postgres Transaction Reset Issue 🔴 self.tx_sender = OnceCell::new(); However, this leaves the old transaction task running! The old task will continue to hold database resources. You should either:
5. Potential Race Condition let mut txns = self.txns.lock().await;
let txn1_commit_version = self.next_global_version(); // Line 53 The Recommendation: This is likely fine since the lock is held throughout, but add a comment explaining the locking guarantees. 6. Time-Based Cleanup Concern
Suggestion: Consider making the TTL configurable or add a max size limit with LRU eviction. 7. Test Coverage Gaps
8. Minor: Unused Variable ℹ️ Performance ConsiderationsPositive:
Concerns:
Security ConsiderationsNo security concerns identified. The changes are internal to the database driver and don't expose new attack surfaces. RecommendationsHigh Priority:
Medium Priority: Low Priority: ConclusionThis is a solid refactoring that moves the RocksDB conflict detection to a more appropriate architecture. However, there are critical bugs in the conflict detection logic that must be fixed before merging. The test coverage should also be expanded to verify correctness of the conflict detection itself. The PR demonstrates good understanding of optimistic concurrency control principles, but needs careful review of the conflict detection implementation to ensure correctness. |
No description provided.