Skip to content

Commit ecee0f4

Browse files
committed
Optimize performance by caching Qt flags as integers to reduce overhead in PyQt6
1 parent ad24e61 commit ecee0f4

4 files changed

Lines changed: 220 additions & 77 deletions

File tree

doc/issue93_optimization_summary.md

Lines changed: 51 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -148,8 +148,50 @@ On Qt5 this becomes a thin pass-through to `font.key()` — bit-identical output
148148
- PySide6: 26 passed, 1 skipped, 1 warning
149149
3. **Performance** — PyQt5 micro-bench rose from ~445 ms to ~450–550 ms (≈ +5 ms, well within the run-to-run noise). Qt6 numbers are unchanged.
150150

151+
## Phase 5 — closing the residual Qt5↔Qt6 gap
152+
153+
After phases 1–4 the Qt6 path was still measurably slower than Qt5 on the micro load test (~+20 % / +100 ms). The goal of phase 5 was to **understand and remove that residual gap**, not just to keep optimising blindly.
154+
155+
**Method.** A second cProfile + `line_profiler` pass was run on the post-phase-4 tip, this time focused on the diff between PyQt5 and PyQt6 traces (rather than absolute hotspots). Three concrete root causes were identified, all specific to the Qt6 binding:
156+
157+
1. **Python `enum.IntFlag` arithmetic.** PyQt6 exposes Qt enums as `enum.Flag` subclasses; every `flags & Qt.SomeFlag` test goes through `enum.__and__ → enum.__call__ → enum.__new__` (~6 µs each). PyQt5 uses plain ints, so the same code costs ~50 ns there. cProfile attributed ≈ 62 ms / run on PyQt6 to `enum.py`, **0 ms on PyQt5**. The single worst caller was `QwtPainterCommand.__init__`, which performs **twelve** successive `flags & QPaintEngine.DirtyXxx` tests per painter command — at ~300 commands per load-test run that is 3 600 enum operations alone.
158+
2. **`QFont.key()` is ~3× slower per call on PyQt6.** Per-call sip dispatch costs were measured at 3.3 µs (PyQt5) vs 9.3 µs (PyQt6) for cheap getters. `font.key()` was the single biggest residual hotspot inside `QwtText.textSize()`.
159+
3. **The `id(font)` fast path misfires on PyQt6.** PyQt6 returns a *fresh* Python wrapper around the same underlying `QFont` on most calls, so `id(font)` changes between calls and the id-keyed cache misses ~92 % of the time (vs ~60 % on PyQt5). The slower `font.key()` path then takes over, compounding cause #2.
160+
161+
**Changes.**
162+
163+
- **`qwt/painter_command.py`** — added a `_flag_int(flag)` helper (PyQt5/PyQt6 portable) and module-level `_DIRTY_PEN`, `_DIRTY_BRUSH`, … int constants. The State branch in `__init__` casts `state.state()` to int *once* and bitwise-tests against the cached int constants instead of going through `enum.__and__` 12 times per command.
164+
- **`qwt/graphic.py`** — same pattern in `qwtPaintCommand`'s State-replay branch (12 more flag tests per replayed command).
165+
- **`qwt/text.py`** — same pattern for `Qt.AlignXxx` flags (`_ALIGN_LEFT`, `_ALIGN_RIGHT`, …) in the hot bitwise-test sites in `taggedRichText()`, `QwtTextLabel.sizeHint()/heightForWidth()/textRect()`. The `setRenderFlags()` setter still stores the value as `Qt.AlignmentFlag` so downstream Qt APIs that strictly require an enum on PyQt6 (`QTextOption.setAlignment`, `QPainter.drawText`, `QFontMetrics.boundingRect`) keep working — only the per-test bitwise sites cast back to int locally.
166+
- **`qwt/text.py`****replaced the entire `id(font) → font.key()` cache** with a tuple-key cache. The new `font_key_cached(font)` returns an interned `(family, pixelSize-or-pointSizeF, weight, italic, stretch, styleStrategy)` tuple instead of `font.key()`. The two-level design keeps the original id-keyed fast path for repeated calls with the same QFont instance, and falls back to the tuple key (which never calls `QFont.key()`) for the PyQt6 case where wrappers churn. The same key is now also used by `fontmetrics()`/`fontmetrics_f()` — they previously called `font.toString()` per lookup, another ~3× more expensive on PyQt6.
167+
- The Qt-5 fast-path gate (`_USE_FONT_KEY_FAST_PATH`) introduced in phase 4 is no longer needed and was removed: since the new cache never calls `font.key()`, the font-engine first-touch ordering issue that motivated the gate cannot occur.
168+
169+
**Verification.**
170+
171+
- **Test suite**`pytest -q` with `PYTHONQWT_UNATTENDED_TESTS=1` on both bindings: PyQt5 26 passed / 1 skipped, PyQt6 26 passed / 1 skipped. Same as phase 4.
172+
- **Performance** — PythonQwt micro `test_loadtest`, 10 runs each, run back-to-back on the same machine immediately after phase 5:
173+
174+
| Config | PyQt5 ms (median / mean) | PyQt6 ms (median / mean) | Δ (PyQt6 − PyQt5) | PyQt6/PyQt5 |
175+
|---|--:|--:|--:|--:|
176+
| `master` (no optimisations) | 798 / 805 | 1 000 / 986 | +202 ms | **+25 %** |
177+
| `fix/93` tip (end of phase 4) | 511 / 517 | 611 / 622 | +100 ms | **+20 %** |
178+
| `fix/93` + phase 5 | 539 / 533 | 590 / 591 | **+51 ms** | **+9 %** |
179+
180+
PyQt5 is essentially unchanged by phase 5 (the new int constants are inert on PyQt5 — Qt5 enums are already plain ints). PyQt6 dropped another ~20 ms median (mean −5 %): the Python `enum.Flag.__and__` budget is gone for the painter-command State branches (~3 600 enum ops/run eliminated), and the tuple-key font cache replaces the ~6 400 `QFont.key()` calls/run that previously cost ~45 ms.
181+
182+
**Cumulative speed-ups on the micro load test, vs `master`:**
183+
184+
| Binding | master → end of phase 4 | end of phase 4 → +phase 5 | **Total** |
185+
|---|--:|--:|--:|
186+
| PyQt5 | −36 % | +5 % (noise) | **−33 %** |
187+
| PyQt6 | −39 % | −3 % | **−41 %** |
188+
189+
**The PyQt6↔PyQt5 ratio more than halved** (+20 % → +9 %). The remaining +9 % is the structural sip-dispatch cost (PyQt6 marshalling for cheap getters like `drawLine`, `boundingRect`, attribute reads) that is *not* removable from PythonQwt — it can only be mitigated by calling Qt fewer times per render, which phases 1–5 already pursue aggressively.
190+
151191
## Final results
152192

193+
> Numbers below summarise the state at the end of phase 4 (the version covered by the Option A gate). Phase 5 was applied on top and further closes the residual Qt5↔Qt6 gap on the micro load test from +20 % to +9 % — see the dedicated phase-5 table above. PlotPy load test was not re-run after phase 5; phase 5 is targeted at the per-call enum/sip overhead that dominates the *micro* benchmark, so the PlotPy improvement is expected to be smaller in relative terms but in the same direction.
194+
153195
### PythonQwt micro `test_loadtest` (5 runs each, ms)
154196

155197
| Binding | master | fix/93 (Option A) | Speedup |
@@ -223,13 +265,15 @@ foreach ($b in "pyqt5","pyqt6","pyside6") {
223265

224266
## Files touched
225267

226-
| File | Phase 1 (cProfile) | Phase 2 (line-profiler) | Phase 4 (Option A) |
227-
|---|:-:|:-:|:-:|
228-
| `qwt/scale_map.py` || | |
229-
| `qwt/scale_div.py` || | |
230-
| `qwt/scale_engine.py` || | |
231-
| `qwt/scale_draw.py` || ✓ (drop QObject, `__slots__`) | |
232-
| `qwt/text.py` || ✓ (drop QObject, font cache) | ✓ (Qt5 gate) |
268+
| File | Phase 1 (cProfile) | Phase 2 (line-profiler) | Phase 4 (Option A) | Phase 5 (Qt5↔Qt6 gap) |
269+
|---|:-:|:-:|:-:|:-:|
270+
| `qwt/scale_map.py` || | | |
271+
| `qwt/scale_div.py` || | | |
272+
| `qwt/scale_engine.py` || | | |
273+
| `qwt/scale_draw.py` || ✓ (drop QObject, `__slots__`) | | |
274+
| `qwt/text.py` || ✓ (drop QObject, font cache) | ✓ (Qt5 gate) | ✓ (alignment ints, tuple-key font cache, drop Qt5 gate) |
275+
| `qwt/painter_command.py` | | | | ✓ (int-flag State branch, `_flag_int` helper) |
276+
| `qwt/graphic.py` | | | | ✓ (int-flag State-replay branch) |
233277

234278
Tooling added under `scripts/`:
235279

qwt/graphic.py

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,23 @@
2626
)
2727

2828
from qwt.null_paintdevice import QwtNullPaintDevice
29-
from qwt.painter_command import QwtPainterCommand
29+
from qwt.painter_command import QwtPainterCommand, _flag_int
30+
31+
# See painter_command.py for the rationale: cache the QPaintEngine.DirtyXxx
32+
# flags as plain ints so the State-replay branch below does plain int bitwise
33+
# tests instead of going through Python's enum.Flag.__and__ on PyQt6.
34+
_DIRTY_PEN = _flag_int(QPaintEngine.DirtyPen)
35+
_DIRTY_BRUSH = _flag_int(QPaintEngine.DirtyBrush)
36+
_DIRTY_BRUSH_ORIGIN = _flag_int(QPaintEngine.DirtyBrushOrigin)
37+
_DIRTY_FONT = _flag_int(QPaintEngine.DirtyFont)
38+
_DIRTY_BACKGROUND = _flag_int(QPaintEngine.DirtyBackground)
39+
_DIRTY_TRANSFORM = _flag_int(QPaintEngine.DirtyTransform)
40+
_DIRTY_CLIP_ENABLED = _flag_int(QPaintEngine.DirtyClipEnabled)
41+
_DIRTY_CLIP_REGION = _flag_int(QPaintEngine.DirtyClipRegion)
42+
_DIRTY_CLIP_PATH = _flag_int(QPaintEngine.DirtyClipPath)
43+
_DIRTY_HINTS = _flag_int(QPaintEngine.DirtyHints)
44+
_DIRTY_COMPOSITION_MODE = _flag_int(QPaintEngine.DirtyCompositionMode)
45+
_DIRTY_OPACITY = _flag_int(QPaintEngine.DirtyOpacity)
3046

3147

3248
def qwtHasScalablePen(painter):
@@ -83,35 +99,36 @@ def qwtExecCommand(painter, cmd, renderHints, transform, initialTransform):
8399
painter.drawImage(data.rect, data.image, data.subRect, data.flags)
84100
elif cmd.type() == QwtPainterCommand.State:
85101
data = cmd.stateData()
86-
if data.flags & QPaintEngine.DirtyPen:
102+
flags = _flag_int(data.flags)
103+
if flags & _DIRTY_PEN:
87104
painter.setPen(data.pen)
88-
if data.flags & QPaintEngine.DirtyBrush:
105+
if flags & _DIRTY_BRUSH:
89106
painter.setBrush(data.brush)
90-
if data.flags & QPaintEngine.DirtyBrushOrigin:
107+
if flags & _DIRTY_BRUSH_ORIGIN:
91108
painter.setBrushOrigin(data.brushOrigin)
92-
if data.flags & QPaintEngine.DirtyFont:
109+
if flags & _DIRTY_FONT:
93110
painter.setFont(data.font)
94-
if data.flags & QPaintEngine.DirtyBackground:
111+
if flags & _DIRTY_BACKGROUND:
95112
painter.setBackgroundMode(data.backgroundMode)
96113
painter.setBackground(data.backgroundBrush)
97-
if data.flags & QPaintEngine.DirtyTransform:
114+
if flags & _DIRTY_TRANSFORM:
98115
painter.setTransform(data.transform)
99-
if data.flags & QPaintEngine.DirtyClipEnabled:
116+
if flags & _DIRTY_CLIP_ENABLED:
100117
painter.setClipping(data.isClipEnabled)
101-
if data.flags & QPaintEngine.DirtyClipRegion:
118+
if flags & _DIRTY_CLIP_REGION:
102119
painter.setClipRegion(data.clipRegion, data.clipOperation)
103-
if data.flags & QPaintEngine.DirtyClipPath:
120+
if flags & _DIRTY_CLIP_PATH:
104121
painter.setClipPath(data.clipPath, data.clipOperation)
105-
if data.flags & QPaintEngine.DirtyHints:
122+
if flags & _DIRTY_HINTS:
106123
for hint in (
107124
QPainter.Antialiasing,
108125
QPainter.TextAntialiasing,
109126
QPainter.SmoothPixmapTransform,
110127
):
111128
painter.setRenderHint(hint, bool(data.renderHints & hint))
112-
if data.flags & QPaintEngine.DirtyCompositionMode:
129+
if flags & _DIRTY_COMPOSITION_MODE:
113130
painter.setCompositionMode(data.compositionMode)
114-
if data.flags & QPaintEngine.DirtyOpacity:
131+
if flags & _DIRTY_OPACITY:
115132
painter.setOpacity(data.opacity)
116133

117134

qwt/painter_command.py

Lines changed: 49 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,40 @@
1818
from qtpy.QtGui import QPaintEngine, QPainterPath
1919

2020

21+
def _flag_int(flag):
22+
"""Return the integer value of a Qt enum/flag (PyQt5 and PyQt6).
23+
24+
PyQt5 exposes Qt enums as plain ints (``int(flag)`` works). PyQt6 wraps
25+
them as ``enum.Flag`` instances which are not ``int`` subclasses, so
26+
``int(flag)`` raises -- the value must be read from ``flag.value``.
27+
"""
28+
try:
29+
return flag.value
30+
except AttributeError:
31+
return int(flag)
32+
33+
34+
# Cache QPaintEngine.DirtyXxx flags as plain Python ints once at import time.
35+
# On PyQt6, Qt enums are full ``enum.Flag`` instances and every ``flags &
36+
# Member`` test goes through Python's ``enum.__and__`` machinery (~6 us each).
37+
# In ``QwtPainterCommand.__init__`` below, the State branch performs twelve
38+
# successive flag tests per painter command -- on PyQt6 alone this accounted
39+
# for ~20 ms of the residual perf gap on the load test. Casting once to int
40+
# and bitwise-testing against int constants brings each test back to ~50 ns.
41+
_DIRTY_PEN = _flag_int(QPaintEngine.DirtyPen)
42+
_DIRTY_BRUSH = _flag_int(QPaintEngine.DirtyBrush)
43+
_DIRTY_BRUSH_ORIGIN = _flag_int(QPaintEngine.DirtyBrushOrigin)
44+
_DIRTY_FONT = _flag_int(QPaintEngine.DirtyFont)
45+
_DIRTY_BACKGROUND = _flag_int(QPaintEngine.DirtyBackground)
46+
_DIRTY_TRANSFORM = _flag_int(QPaintEngine.DirtyTransform)
47+
_DIRTY_CLIP_ENABLED = _flag_int(QPaintEngine.DirtyClipEnabled)
48+
_DIRTY_CLIP_REGION = _flag_int(QPaintEngine.DirtyClipRegion)
49+
_DIRTY_CLIP_PATH = _flag_int(QPaintEngine.DirtyClipPath)
50+
_DIRTY_HINTS = _flag_int(QPaintEngine.DirtyHints)
51+
_DIRTY_COMPOSITION_MODE = _flag_int(QPaintEngine.DirtyCompositionMode)
52+
_DIRTY_OPACITY = _flag_int(QPaintEngine.DirtyOpacity)
53+
54+
2155
class PixmapData(object):
2256
def __init__(self):
2357
self.rect = None
@@ -125,32 +159,35 @@ def __init__(self, *args):
125159
self.__type = self.State
126160
self.__stateData = StateData()
127161
self.__stateData.flags = state.state()
128-
if self.__stateData.flags & QPaintEngine.DirtyPen:
162+
# Cast to int once: subsequent bitwise tests are done against
163+
# the cached _DIRTY_* int constants (see top of module).
164+
flags = _flag_int(self.__stateData.flags)
165+
if flags & _DIRTY_PEN:
129166
self.__stateData.pen = state.pen()
130-
if self.__stateData.flags & QPaintEngine.DirtyBrush:
167+
if flags & _DIRTY_BRUSH:
131168
self.__stateData.brush = state.brush()
132-
if self.__stateData.flags & QPaintEngine.DirtyBrushOrigin:
169+
if flags & _DIRTY_BRUSH_ORIGIN:
133170
self.__stateData.brushOrigin = state.brushOrigin()
134-
if self.__stateData.flags & QPaintEngine.DirtyFont:
171+
if flags & _DIRTY_FONT:
135172
self.__stateData.font = state.font()
136-
if self.__stateData.flags & QPaintEngine.DirtyBackground:
173+
if flags & _DIRTY_BACKGROUND:
137174
self.__stateData.backgroundMode = state.backgroundMode()
138175
self.__stateData.backgroundBrush = state.backgroundBrush()
139-
if self.__stateData.flags & QPaintEngine.DirtyTransform:
176+
if flags & _DIRTY_TRANSFORM:
140177
self.__stateData.transform = state.transform()
141-
if self.__stateData.flags & QPaintEngine.DirtyClipEnabled:
178+
if flags & _DIRTY_CLIP_ENABLED:
142179
self.__stateData.isClipEnabled = state.isClipEnabled()
143-
if self.__stateData.flags & QPaintEngine.DirtyClipRegion:
180+
if flags & _DIRTY_CLIP_REGION:
144181
self.__stateData.clipRegion = state.clipRegion()
145182
self.__stateData.clipOperation = state.clipOperation()
146-
if self.__stateData.flags & QPaintEngine.DirtyClipPath:
183+
if flags & _DIRTY_CLIP_PATH:
147184
self.__stateData.clipPath = state.clipPath()
148185
self.__stateData.clipOperation = state.clipOperation()
149-
if self.__stateData.flags & QPaintEngine.DirtyHints:
186+
if flags & _DIRTY_HINTS:
150187
self.__stateData.renderHints = state.renderHints()
151-
if self.__stateData.flags & QPaintEngine.DirtyCompositionMode:
188+
if flags & _DIRTY_COMPOSITION_MODE:
152189
self.__stateData.compositionMode = state.compositionMode()
153-
if self.__stateData.flags & QPaintEngine.DirtyOpacity:
190+
if flags & _DIRTY_OPACITY:
154191
self.__stateData.opacity = state.opacity()
155192
elif len(args) == 3:
156193
rect, pixmap, subRect = args

0 commit comments

Comments
 (0)