Optimize performance by caching Qt flags as integers to reduce overhead in PyQt6

PierreRaybaut · PierreRaybaut · commit ecee0f459cca · 2026-05-03T11:38:26.000+02:00
diff --git a/doc/issue93_optimization_summary.md b/doc/issue93_optimization_summary.md
@@ -148,8 +148,50 @@ On Qt5 this becomes a thin pass-through to `font.key()` — bit-identical output
    - PySide6: 26 passed, 1 skipped, 1 warning
 3. **Performance** — PyQt5 micro-bench rose from ~445 ms to ~450–550 ms (≈ +5 ms, well within the run-to-run noise). Qt6 numbers are unchanged.
 
+## Phase 5 — closing the residual Qt5↔Qt6 gap
+
+After phases 1–4 the Qt6 path was still measurably slower than Qt5 on the micro load test (~+20 % / +100 ms). The goal of phase 5 was to **understand and remove that residual gap**, not just to keep optimising blindly.
+
+**Method.** A second cProfile + `line_profiler` pass was run on the post-phase-4 tip, this time focused on the diff between PyQt5 and PyQt6 traces (rather than absolute hotspots). Three concrete root causes were identified, all specific to the Qt6 binding:
+
+1. **Python `enum.IntFlag` arithmetic.** PyQt6 exposes Qt enums as `enum.Flag` subclasses; every `flags & Qt.SomeFlag` test goes through `enum.__and__ → enum.__call__ → enum.__new__` (~6 µs each). PyQt5 uses plain ints, so the same code costs ~50 ns there. cProfile attributed ≈ 62 ms / run on PyQt6 to `enum.py`, **0 ms on PyQt5**. The single worst caller was `QwtPainterCommand.__init__`, which performs **twelve** successive `flags & QPaintEngine.DirtyXxx` tests per painter command — at ~300 commands per load-test run that is 3 600 enum operations alone.
+2. **`QFont.key()` is ~3× slower per call on PyQt6.** Per-call sip dispatch costs were measured at 3.3 µs (PyQt5) vs 9.3 µs (PyQt6) for cheap getters. `font.key()` was the single biggest residual hotspot inside `QwtText.textSize()`.
+3. **The `id(font)` fast path misfires on PyQt6.** PyQt6 returns a *fresh* Python wrapper around the same underlying `QFont` on most calls, so `id(font)` changes between calls and the id-keyed cache misses ~92 % of the time (vs ~60 % on PyQt5). The slower `font.key()` path then takes over, compounding cause #2.
+
+**Changes.**
+
+- **`qwt/painter_command.py`** — added a `_flag_int(flag)` helper (PyQt5/PyQt6 portable) and module-level `_DIRTY_PEN`, `_DIRTY_BRUSH`, … int constants. The State branch in `__init__` casts `state.state()` to int *once* and bitwise-tests against the cached int constants instead of going through `enum.__and__` 12 times per command.
+- **`qwt/graphic.py`** — same pattern in `qwtPaintCommand`'s State-replay branch (12 more flag tests per replayed command).
+- **`qwt/text.py`** — same pattern for `Qt.AlignXxx` flags (`_ALIGN_LEFT`, `_ALIGN_RIGHT`, …) in the hot bitwise-test sites in `taggedRichText()`, `QwtTextLabel.sizeHint()/heightForWidth()/textRect()`. The `setRenderFlags()` setter still stores the value as `Qt.AlignmentFlag` so downstream Qt APIs that strictly require an enum on PyQt6 (`QTextOption.setAlignment`, `QPainter.drawText`, `QFontMetrics.boundingRect`) keep working — only the per-test bitwise sites cast back to int locally.
+- **`qwt/text.py`** — **replaced the entire `id(font) → font.key()` cache** with a tuple-key cache. The new `font_key_cached(font)` returns an interned `(family, pixelSize-or-pointSizeF, weight, italic, stretch, styleStrategy)` tuple instead of `font.key()`. The two-level design keeps the original id-keyed fast path for repeated calls with the same QFont instance, and falls back to the tuple key (which never calls `QFont.key()`) for the PyQt6 case where wrappers churn. The same key is now also used by `fontmetrics()`/`fontmetrics_f()` — they previously called `font.toString()` per lookup, another ~3× more expensive on PyQt6.
+- The Qt-5 fast-path gate (`_USE_FONT_KEY_FAST_PATH`) introduced in phase 4 is no longer needed and was removed: since the new cache never calls `font.key()`, the font-engine first-touch ordering issue that motivated the gate cannot occur.
+
+**Verification.**
+
+- **Test suite** — `pytest -q` with `PYTHONQWT_UNATTENDED_TESTS=1` on both bindings: PyQt5 26 passed / 1 skipped, PyQt6 26 passed / 1 skipped. Same as phase 4.
+- **Performance** — PythonQwt micro `test_loadtest`, 10 runs each, run back-to-back on the same machine immediately after phase 5:
+
+| Config | PyQt5 ms (median / mean) | PyQt6 ms (median / mean) | Δ (PyQt6 − PyQt5) | PyQt6/PyQt5 |
+|---|--:|--:|--:|--:|
+| `master` (no optimisations) | 798 / 805 | 1 000 / 986 | +202 ms | **+25 %** |
+| `fix/93` tip (end of phase 4) | 511 / 517 | 611 / 622 | +100 ms | **+20 %** |
+| `fix/93` + phase 5 | 539 / 533 | 590 / 591 | **+51 ms** | **+9 %** |
+
+PyQt5 is essentially unchanged by phase 5 (the new int constants are inert on PyQt5 — Qt5 enums are already plain ints). PyQt6 dropped another ~20 ms median (mean −5 %): the Python `enum.Flag.__and__` budget is gone for the painter-command State branches (~3 600 enum ops/run eliminated), and the tuple-key font cache replaces the ~6 400 `QFont.key()` calls/run that previously cost ~45 ms.
+
+**Cumulative speed-ups on the micro load test, vs `master`:**
+
+| Binding | master → end of phase 4 | end of phase 4 → +phase 5 | **Total** |
+|---|--:|--:|--:|
+| PyQt5 | −36 % | +5 % (noise) | **−33 %** |
+| PyQt6 | −39 % | −3 % | **−41 %** |
+
+**The PyQt6↔PyQt5 ratio more than halved** (+20 % → +9 %). The remaining +9 % is the structural sip-dispatch cost (PyQt6 marshalling for cheap getters like `drawLine`, `boundingRect`, attribute reads) that is *not* removable from PythonQwt — it can only be mitigated by calling Qt fewer times per render, which phases 1–5 already pursue aggressively.
+
 ## Final results
 
+> Numbers below summarise the state at the end of phase 4 (the version covered by the Option A gate). Phase 5 was applied on top and further closes the residual Qt5↔Qt6 gap on the micro load test from +20 % to +9 % — see the dedicated phase-5 table above. PlotPy load test was not re-run after phase 5; phase 5 is targeted at the per-call enum/sip overhead that dominates the *micro* benchmark, so the PlotPy improvement is expected to be smaller in relative terms but in the same direction.
+
 ### PythonQwt micro `test_loadtest` (5 runs each, ms)
 
 | Binding | master | fix/93 (Option A) | Speedup |
@@ -223,13 +265,15 @@ foreach ($b in "pyqt5","pyqt6","pyside6") {
 
 ## Files touched
 
-| File | Phase 1 (cProfile) | Phase 2 (line-profiler) | Phase 4 (Option A) |
-|---|:-:|:-:|:-:|
-| `qwt/scale_map.py` | ✓ | | |
-| `qwt/scale_div.py` | ✓ | | |
-| `qwt/scale_engine.py` | ✓ | | |
-| `qwt/scale_draw.py` | ✓ | ✓ (drop QObject, `__slots__`) | |
-| `qwt/text.py` | ✓ | ✓ (drop QObject, font cache) | ✓ (Qt5 gate) |
+| File | Phase 1 (cProfile) | Phase 2 (line-profiler) | Phase 4 (Option A) | Phase 5 (Qt5↔Qt6 gap) |
+|---|:-:|:-:|:-:|:-:|
+| `qwt/scale_map.py` | ✓ | | | |
+| `qwt/scale_div.py` | ✓ | | | |
+| `qwt/scale_engine.py` | ✓ | | | |
+| `qwt/scale_draw.py` | ✓ | ✓ (drop QObject, `__slots__`) | | |
+| `qwt/text.py` | ✓ | ✓ (drop QObject, font cache) | ✓ (Qt5 gate) | ✓ (alignment ints, tuple-key font cache, drop Qt5 gate) |
+| `qwt/painter_command.py` | | | | ✓ (int-flag State branch, `_flag_int` helper) |
+| `qwt/graphic.py` | | | | ✓ (int-flag State-replay branch) |
 
 Tooling added under `scripts/`:
 
diff --git a/qwt/graphic.py b/qwt/graphic.py
@@ -26,7 +26,23 @@
 )
 
 from qwt.null_paintdevice import QwtNullPaintDevice
-from qwt.painter_command import QwtPainterCommand
+from qwt.painter_command import QwtPainterCommand, _flag_int
+
+# See painter_command.py for the rationale: cache the QPaintEngine.DirtyXxx
+# flags as plain ints so the State-replay branch below does plain int bitwise
+# tests instead of going through Python's enum.Flag.__and__ on PyQt6.
+_DIRTY_PEN = _flag_int(QPaintEngine.DirtyPen)
+_DIRTY_BRUSH = _flag_int(QPaintEngine.DirtyBrush)
+_DIRTY_BRUSH_ORIGIN = _flag_int(QPaintEngine.DirtyBrushOrigin)
+_DIRTY_FONT = _flag_int(QPaintEngine.DirtyFont)
+_DIRTY_BACKGROUND = _flag_int(QPaintEngine.DirtyBackground)
+_DIRTY_TRANSFORM = _flag_int(QPaintEngine.DirtyTransform)
+_DIRTY_CLIP_ENABLED = _flag_int(QPaintEngine.DirtyClipEnabled)
+_DIRTY_CLIP_REGION = _flag_int(QPaintEngine.DirtyClipRegion)
+_DIRTY_CLIP_PATH = _flag_int(QPaintEngine.DirtyClipPath)
+_DIRTY_HINTS = _flag_int(QPaintEngine.DirtyHints)
+_DIRTY_COMPOSITION_MODE = _flag_int(QPaintEngine.DirtyCompositionMode)
+_DIRTY_OPACITY = _flag_int(QPaintEngine.DirtyOpacity)
 
 
 def qwtHasScalablePen(painter):
@@ -83,35 +99,36 @@ def qwtExecCommand(painter, cmd, renderHints, transform, initialTransform):
         painter.drawImage(data.rect, data.image, data.subRect, data.flags)
     elif cmd.type() == QwtPainterCommand.State:
         data = cmd.stateData()
-        if data.flags & QPaintEngine.DirtyPen:
+        flags = _flag_int(data.flags)
+        if flags & _DIRTY_PEN:
             painter.setPen(data.pen)
-        if data.flags & QPaintEngine.DirtyBrush:
+        if flags & _DIRTY_BRUSH:
             painter.setBrush(data.brush)
-        if data.flags & QPaintEngine.DirtyBrushOrigin:
+        if flags & _DIRTY_BRUSH_ORIGIN:
             painter.setBrushOrigin(data.brushOrigin)
-        if data.flags & QPaintEngine.DirtyFont:
+        if flags & _DIRTY_FONT:
             painter.setFont(data.font)
-        if data.flags & QPaintEngine.DirtyBackground:
+        if flags & _DIRTY_BACKGROUND:
             painter.setBackgroundMode(data.backgroundMode)
             painter.setBackground(data.backgroundBrush)
-        if data.flags & QPaintEngine.DirtyTransform:
+        if flags & _DIRTY_TRANSFORM:
             painter.setTransform(data.transform)
-        if data.flags & QPaintEngine.DirtyClipEnabled:
+        if flags & _DIRTY_CLIP_ENABLED:
             painter.setClipping(data.isClipEnabled)
-        if data.flags & QPaintEngine.DirtyClipRegion:
+        if flags & _DIRTY_CLIP_REGION:
             painter.setClipRegion(data.clipRegion, data.clipOperation)
-        if data.flags & QPaintEngine.DirtyClipPath:
+        if flags & _DIRTY_CLIP_PATH:
             painter.setClipPath(data.clipPath, data.clipOperation)
-        if data.flags & QPaintEngine.DirtyHints:
+        if flags & _DIRTY_HINTS:
             for hint in (
                 QPainter.Antialiasing,
                 QPainter.TextAntialiasing,
                 QPainter.SmoothPixmapTransform,
             ):
                 painter.setRenderHint(hint, bool(data.renderHints & hint))
-        if data.flags & QPaintEngine.DirtyCompositionMode:
+        if flags & _DIRTY_COMPOSITION_MODE:
             painter.setCompositionMode(data.compositionMode)
-        if data.flags & QPaintEngine.DirtyOpacity:
+        if flags & _DIRTY_OPACITY:
             painter.setOpacity(data.opacity)
 
 
diff --git a/qwt/painter_command.py b/qwt/painter_command.py
@@ -18,6 +18,40 @@
 from qtpy.QtGui import QPaintEngine, QPainterPath
 
 
+def _flag_int(flag):
+    """Return the integer value of a Qt enum/flag (PyQt5 and PyQt6).
+
+    PyQt5 exposes Qt enums as plain ints (``int(flag)`` works). PyQt6 wraps
+    them as ``enum.Flag`` instances which are not ``int`` subclasses, so
+    ``int(flag)`` raises -- the value must be read from ``flag.value``.
+    """
+    try:
+        return flag.value
+    except AttributeError:
+        return int(flag)
+
+
+# Cache QPaintEngine.DirtyXxx flags as plain Python ints once at import time.
+# On PyQt6, Qt enums are full ``enum.Flag`` instances and every ``flags &
+# Member`` test goes through Python's ``enum.__and__`` machinery (~6 us each).
+# In ``QwtPainterCommand.__init__`` below, the State branch performs twelve
+# successive flag tests per painter command -- on PyQt6 alone this accounted
+# for ~20 ms of the residual perf gap on the load test. Casting once to int
+# and bitwise-testing against int constants brings each test back to ~50 ns.
+_DIRTY_PEN = _flag_int(QPaintEngine.DirtyPen)
+_DIRTY_BRUSH = _flag_int(QPaintEngine.DirtyBrush)
+_DIRTY_BRUSH_ORIGIN = _flag_int(QPaintEngine.DirtyBrushOrigin)
+_DIRTY_FONT = _flag_int(QPaintEngine.DirtyFont)
+_DIRTY_BACKGROUND = _flag_int(QPaintEngine.DirtyBackground)
+_DIRTY_TRANSFORM = _flag_int(QPaintEngine.DirtyTransform)
+_DIRTY_CLIP_ENABLED = _flag_int(QPaintEngine.DirtyClipEnabled)
+_DIRTY_CLIP_REGION = _flag_int(QPaintEngine.DirtyClipRegion)
+_DIRTY_CLIP_PATH = _flag_int(QPaintEngine.DirtyClipPath)
+_DIRTY_HINTS = _flag_int(QPaintEngine.DirtyHints)
+_DIRTY_COMPOSITION_MODE = _flag_int(QPaintEngine.DirtyCompositionMode)
+_DIRTY_OPACITY = _flag_int(QPaintEngine.DirtyOpacity)
+
+
 class PixmapData(object):
     def __init__(self):
         self.rect = None
@@ -125,32 +159,35 @@ def __init__(self, *args):
                 self.__type = self.State
                 self.__stateData = StateData()
                 self.__stateData.flags = state.state()
-                if self.__stateData.flags & QPaintEngine.DirtyPen:
+                # Cast to int once: subsequent bitwise tests are done against
+                # the cached _DIRTY_* int constants (see top of module).
+                flags = _flag_int(self.__stateData.flags)
+                if flags & _DIRTY_PEN:
                     self.__stateData.pen = state.pen()
-                if self.__stateData.flags & QPaintEngine.DirtyBrush:
+                if flags & _DIRTY_BRUSH:
                     self.__stateData.brush = state.brush()
-                if self.__stateData.flags & QPaintEngine.DirtyBrushOrigin:
+                if flags & _DIRTY_BRUSH_ORIGIN:
                     self.__stateData.brushOrigin = state.brushOrigin()
-                if self.__stateData.flags & QPaintEngine.DirtyFont:
+                if flags & _DIRTY_FONT:
                     self.__stateData.font = state.font()
-                if self.__stateData.flags & QPaintEngine.DirtyBackground:
+                if flags & _DIRTY_BACKGROUND:
                     self.__stateData.backgroundMode = state.backgroundMode()
                     self.__stateData.backgroundBrush = state.backgroundBrush()
-                if self.__stateData.flags & QPaintEngine.DirtyTransform:
+                if flags & _DIRTY_TRANSFORM:
                     self.__stateData.transform = state.transform()
-                if self.__stateData.flags & QPaintEngine.DirtyClipEnabled:
+                if flags & _DIRTY_CLIP_ENABLED:
                     self.__stateData.isClipEnabled = state.isClipEnabled()
-                if self.__stateData.flags & QPaintEngine.DirtyClipRegion:
+                if flags & _DIRTY_CLIP_REGION:
                     self.__stateData.clipRegion = state.clipRegion()
                     self.__stateData.clipOperation = state.clipOperation()
-                if self.__stateData.flags & QPaintEngine.DirtyClipPath:
+                if flags & _DIRTY_CLIP_PATH:
                     self.__stateData.clipPath = state.clipPath()
                     self.__stateData.clipOperation = state.clipOperation()
-                if self.__stateData.flags & QPaintEngine.DirtyHints:
+                if flags & _DIRTY_HINTS:
                     self.__stateData.renderHints = state.renderHints()
-                if self.__stateData.flags & QPaintEngine.DirtyCompositionMode:
+                if flags & _DIRTY_COMPOSITION_MODE:
                     self.__stateData.compositionMode = state.compositionMode()
-                if self.__stateData.flags & QPaintEngine.DirtyOpacity:
+                if flags & _DIRTY_OPACITY:
                     self.__stateData.opacity = state.opacity()
         elif len(args) == 3:
             rect, pixmap, subRect = args
diff --git a/qwt/text.py b/qwt/text.py