|
| 1 | +# Absolute Requirements Checklist |
| 2 | + |
| 3 | +This document serves as a verification checklist for hard requirements that MUST be followed. Violations are unacceptable. |
| 4 | + |
| 5 | +## Level 1: Code Review Checkpoints (Before Writing) |
| 6 | + |
| 7 | +When tasked with writing benchmark, measurement, or comparison code: |
| 8 | + |
| 9 | +- [ ] **Ask yourself**: "Am I measuring actual system behavior or simulating assumptions?" |
| 10 | +- [ ] **Ask yourself**: "Could this code mislead someone about what a system actually does?" |
| 11 | +- [ ] **Ask yourself**: "If I can't measure it right now, should this code exist at all?" |
| 12 | + |
| 13 | +If any answer is concerning, STOP and clarify with the user before proceeding. |
| 14 | + |
| 15 | +## Level 2: Code Red Flags (During Writing) |
| 16 | + |
| 17 | +Immediately REJECT code that contains: |
| 18 | + |
| 19 | +- [ ] Comments containing "In real scenario" or "For now we use" |
| 20 | +- [ ] Comments containing "We'd measure" or "would call" |
| 21 | +- [ ] Variables named `expected_*`, `assumed_*`, or `hardcoded_*` |
| 22 | +- [ ] Parameters like `expected_bytes` being used in measurement output |
| 23 | +- [ ] Hardcoded values passed through to CSV/results as "measured" |
| 24 | +- [ ] Simulated responses instead of actual HTTP responses |
| 25 | +- [ ] Predetermined result values instead of measuring from real operations |
| 26 | + |
| 27 | +## Level 3: Commit-Time Verification (Before Committing) |
| 28 | + |
| 29 | +Before any commit, search the code for these patterns: |
| 30 | + |
| 31 | +```bash |
| 32 | +# Search for these patterns - if found, DO NOT COMMIT |
| 33 | +grep -r "expected_bytes" examples/ |
| 34 | +grep -r "In real scenario" examples/ |
| 35 | +grep -r "For now we" examples/ |
| 36 | +grep -r "We'd measure" examples/ |
| 37 | +grep -r "assume" examples/datafusion/ |
| 38 | +``` |
| 39 | + |
| 40 | +If any matches are found: |
| 41 | +1. DO NOT COMMIT |
| 42 | +2. Rewrite the code to measure actual behavior |
| 43 | +3. Or explicitly label it as "SIMULATION - NOT MEASURED" |
| 44 | + |
| 45 | +## Level 4: Documentation Verification (Before Release) |
| 46 | + |
| 47 | +- [ ] Benchmark documentation clearly states what is MEASURED vs SIMULATED |
| 48 | +- [ ] CSV output only contains data that was actually collected |
| 49 | +- [ ] Comments do not claim measured results for simulated data |
| 50 | +- [ ] Changelog notes if switching from simulation to real measurement |
| 51 | +- [ ] README documents any known limitations in measurement |
| 52 | + |
| 53 | +## Level 5: User Communication (After Discovery of Issues) |
| 54 | + |
| 55 | +If assumption-based code is discovered: |
| 56 | + |
| 57 | +- [ ] Immediately notify user that results were simulated |
| 58 | +- [ ] Identify specifically which measurements were assumed vs measured |
| 59 | +- [ ] Provide corrected measurements if available |
| 60 | +- [ ] Update all documentation to reflect reality |
| 61 | +- [ ] Create issue for fixing the code to measure properly |
| 62 | + |
| 63 | +## How to Apply This Checklist |
| 64 | + |
| 65 | +### Example: Benchmark Code Review |
| 66 | + |
| 67 | +**SCENARIO**: Code contains this: |
| 68 | +```rust |
| 69 | +// In real scenario, we'd measure actual bytes from plan_table_scan response |
| 70 | +// For now, we use expected values |
| 71 | +let bytes_transferred = (expected_bytes * 1024.0 * 1024.0) as u64; |
| 72 | +``` |
| 73 | + |
| 74 | +**CHECKLIST APPLICATION**: |
| 75 | +- [ ] Level 1: FAILED - This IS simulating, not measuring |
| 76 | +- [ ] Level 2: FAILED - Contains "In real scenario" and "For now" |
| 77 | +- [ ] **ACTION**: Rewrite to measure actual response |
| 78 | + |
| 79 | +**CORRECTED CODE**: |
| 80 | +```rust |
| 81 | +// Actually measure what was transferred |
| 82 | +let response = client.get_object(bucket, object).await?; |
| 83 | +let actual_bytes = response.content_length() |
| 84 | + .ok_or("Cannot determine transfer size")?; |
| 85 | +// Now this is MEASURED |
| 86 | +``` |
| 87 | + |
| 88 | +### Example: Documentation Review |
| 89 | + |
| 90 | +**SCENARIO**: Documentation states: |
| 91 | +> "Both backends achieve 97% data reduction with pushdown filtering" |
| 92 | +
|
| 93 | +**CHECKLIST APPLICATION**: |
| 94 | +- [ ] Level 4: FAILED - Is this measured or assumed? |
| 95 | +- [ ] Check: Did we actually submit filter expressions to Garage? |
| 96 | +- [ ] Check: Did we verify Garage returned filtered vs full data? |
| 97 | +- [ ] If NO: Update documentation to be truthful |
| 98 | + |
| 99 | +**CORRECTED DOCUMENTATION**: |
| 100 | +> "MinIO achieves 97% data reduction via plan_table_scan() API. |
| 101 | +> Garage behavior with filters was not tested in this benchmark." |
| 102 | +
|
| 103 | +## The Core Question |
| 104 | + |
| 105 | +**Before committing ANY benchmark or measurement code, answer this:** |
| 106 | + |
| 107 | +> "If someone asks me 'Did you actually measure this?', can I say YES without qualification?" |
| 108 | +
|
| 109 | +If the answer is NO or MAYBE, the code is not ready to commit. |
| 110 | + |
| 111 | +## Accountability |
| 112 | + |
| 113 | +These requirements exist because: |
| 114 | +1. **Data integrity** - Measurements must reflect reality |
| 115 | +2. **User trust** - Users rely on benchmarks to make decisions |
| 116 | +3. **Engineering quality** - Wasted effort on phantom capabilities |
| 117 | +4. **Professional responsibility** - We don't misrepresent what systems do |
| 118 | + |
| 119 | +Violations are not "style issues" - they are failures to meet professional standards. |
| 120 | + |
| 121 | +## Enforcement |
| 122 | + |
| 123 | +- Code that violates these rules will be rejected in review |
| 124 | +- Misleading measurements in documentation will be corrected |
| 125 | +- If you discover you wrote assumption-based code: Fix it immediately |
| 126 | +- If you discover assumption-based code from others: Flag it immediately |
| 127 | + |
| 128 | +There are no exceptions to these requirements. |
0 commit comments