Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 24% (0.24x) speedup for OpenVINOTrainer._unpack_singleton in keras/src/backend/openvino/trainer.py

⏱️ Runtime : 62.5 microseconds 50.6 microseconds (best of 180 runs)

📝 Explanation and details

The optimization replaces isinstance(x, (list, tuple)) with separate type(x) is list and type(x) is tuple checks, achieving a 23% speedup.

Key optimizations:

  1. Eliminates tuple creation overhead: The original code creates a tuple (list, tuple) on every function call for the isinstance check. The optimized version avoids this allocation entirely.

  2. Uses faster type identity checks: type(x) is list uses direct type identity comparison, which is faster than isinstance() when you only need exact type matches (not subclasses). This is appropriate here since the function specifically targets built-in list and tuple types.

  3. Reduces function call overhead: isinstance() involves more complex C-level logic to handle inheritance checking, while type() with is comparison is a simpler, more direct operation.

Performance characteristics from tests:

  • Singleton unpacking cases (the primary use case) show 15-32% improvements
  • Non-sequence types (int, str, dict, etc.) benefit most with 25-62% speedups since they skip both checks entirely in the optimized version
  • Large collections still benefit (4-32% faster) due to the cheaper type checking

This optimization is particularly valuable in deep learning contexts where tensor operations frequently involve unpacking singleton containers, and the function may be called thousands of times during model training or inference. The behavioral semantics remain identical - it still only unpacks single-element lists and tuples while preserving all other inputs unchanged.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 151 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import inspect

# imports
import pytest
from keras.src.backend.openvino.trainer import OpenVINOTrainer

class Trainer:
    def __init__(self):
        self._lock = False
        self._run_eagerly = False
        self._jit_compile = None
        self.compiled = False
        self.loss = None
        self.steps_per_execution = 1
        # Can be set by callbacks in on_train_begin
        self._initial_epoch = None
        self._compute_loss_has_training_arg = (
            "training" in inspect.signature(self.compute_loss).parameters
        )

        # Placeholders used in `compile`
        self._compile_loss = None
        self._compile_metrics = None
        self._loss_tracker = None

    def compute_loss(self):
        pass
from keras.src.backend.openvino.trainer import OpenVINOTrainer

# unit tests

@pytest.fixture
def trainer():
    # Fixture to provide a fresh instance for each test
    return OpenVINOTrainer()

# -------------------- Basic Test Cases --------------------

def test_singleton_list(trainer):
    # List with one element should return the element
    codeflash_output = trainer._unpack_singleton([42]) # 768ns -> 667ns (15.1% faster)

def test_singleton_tuple(trainer):
    # Tuple with one element should return the element
    codeflash_output = trainer._unpack_singleton(('hello',)) # 984ns -> 744ns (32.3% faster)

def test_non_singleton_list(trainer):
    # List with multiple elements should return the list as is
    data = [1, 2, 3]
    codeflash_output = trainer._unpack_singleton(data) # 719ns -> 695ns (3.45% faster)

def test_non_singleton_tuple(trainer):
    # Tuple with multiple elements should return the tuple as is
    data = (1, 2)
    codeflash_output = trainer._unpack_singleton(data) # 887ns -> 691ns (28.4% faster)

def test_non_sequence(trainer):
    # Non-sequence (int) should return itself
    codeflash_output = trainer._unpack_singleton(123) # 764ns -> 505ns (51.3% faster)

def test_non_sequence_str(trainer):
    # Strings are sequences, but not list/tuple, so should return as is
    s = "singleton"
    codeflash_output = trainer._unpack_singleton(s) # 635ns -> 516ns (23.1% faster)

def test_non_sequence_dict(trainer):
    # Dict should be returned as is
    d = {'a': 1}
    codeflash_output = trainer._unpack_singleton(d) # 613ns -> 490ns (25.1% faster)

def test_empty_list(trainer):
    # Empty list should be returned as is
    data = []
    codeflash_output = trainer._unpack_singleton(data) # 709ns -> 642ns (10.4% faster)

def test_empty_tuple(trainer):
    # Empty tuple should be returned as is
    data = ()
    codeflash_output = trainer._unpack_singleton(data) # 909ns -> 656ns (38.6% faster)

# -------------------- Edge Test Cases --------------------

def test_list_of_list_singleton(trainer):
    # List containing a single list should return the inner list
    inner = [1, 2]
    outer = [inner]
    codeflash_output = trainer._unpack_singleton(outer) # 765ns -> 665ns (15.0% faster)

def test_tuple_of_tuple_singleton(trainer):
    # Tuple containing a single tuple should return the inner tuple
    inner = (1, 2)
    outer = (inner,)
    codeflash_output = trainer._unpack_singleton(outer) # 929ns -> 711ns (30.7% faster)

def test_nested_singleton_list(trainer):
    # Nested singleton lists: only outermost is unpacked
    data = [[1]]
    codeflash_output = trainer._unpack_singleton(data); result = codeflash_output # 756ns -> 648ns (16.7% faster)

def test_nested_singleton_tuple(trainer):
    # Nested singleton tuples: only outermost is unpacked
    data = ((1,),)
    codeflash_output = trainer._unpack_singleton(data); result = codeflash_output # 944ns -> 738ns (27.9% faster)

def test_set_singleton(trainer):
    # Sets are not unpacked, even if length 1
    data = {99}
    codeflash_output = trainer._unpack_singleton(data) # 830ns -> 511ns (62.4% faster)

def test_frozenset_singleton(trainer):
    # Frozensets are not unpacked
    data = frozenset([99])
    codeflash_output = trainer._unpack_singleton(data) # 641ns -> 494ns (29.8% faster)

def test_bytes_singleton(trainer):
    # Bytes are not unpacked
    data = b'x'
    codeflash_output = trainer._unpack_singleton(data) # 594ns -> 498ns (19.3% faster)

def test_bytearray_singleton(trainer):
    # Bytearray is not unpacked
    data = bytearray(b'x')
    codeflash_output = trainer._unpack_singleton(data) # 611ns -> 499ns (22.4% faster)

def test_list_with_none(trainer):
    # List with one None element should return None
    codeflash_output = trainer._unpack_singleton([None]) # 800ns -> 684ns (17.0% faster)

def test_tuple_with_none(trainer):
    # Tuple with one None element should return None
    codeflash_output = trainer._unpack_singleton((None,)) # 998ns -> 755ns (32.2% faster)

def test_list_with_false(trainer):
    # List with one False element should return False
    codeflash_output = trainer._unpack_singleton([False]) # 765ns -> 608ns (25.8% faster)

def test_tuple_with_zero(trainer):
    # Tuple with one zero element should return 0
    codeflash_output = trainer._unpack_singleton((0,)) # 898ns -> 724ns (24.0% faster)

def test_list_with_empty_list(trainer):
    # List containing a single empty list should return the empty list
    inner = []
    outer = [inner]
    codeflash_output = trainer._unpack_singleton(outer) # 758ns -> 614ns (23.5% faster)

def test_tuple_with_empty_tuple(trainer):
    # Tuple containing a single empty tuple should return the empty tuple
    inner = ()
    outer = (inner,)
    codeflash_output = trainer._unpack_singleton(outer) # 920ns -> 733ns (25.5% faster)

def test_list_with_tuple(trainer):
    # List with one tuple should return the tuple
    inner = (1, 2)
    outer = [inner]
    codeflash_output = trainer._unpack_singleton(outer) # 752ns -> 638ns (17.9% faster)

def test_tuple_with_list(trainer):
    # Tuple with one list should return the list
    inner = [1, 2]
    outer = (inner,)
    codeflash_output = trainer._unpack_singleton(outer) # 940ns -> 710ns (32.4% faster)

def test_list_with_object(trainer):
    # List with one object should return the object
    obj = object()
    codeflash_output = trainer._unpack_singleton([obj]) # 736ns -> 635ns (15.9% faster)

def test_tuple_with_object(trainer):
    # Tuple with one object should return the object
    obj = object()
    codeflash_output = trainer._unpack_singleton((obj,)) # 893ns -> 713ns (25.2% faster)

# -------------------- Large Scale Test Cases --------------------

def test_large_list(trainer):
    # Large list (>1 element) should be returned as is
    data = list(range(1000))
    codeflash_output = trainer._unpack_singleton(data) # 834ns -> 800ns (4.25% faster)

def test_large_tuple(trainer):
    # Large tuple (>1 element) should be returned as is
    data = tuple(range(1000))
    codeflash_output = trainer._unpack_singleton(data) # 1.01μs -> 780ns (29.7% faster)

def test_large_singleton_list(trainer):
    # Large object inside singleton list should be unpacked
    data = list(range(1000))
    outer = [data]
    codeflash_output = trainer._unpack_singleton(outer) # 801ns -> 674ns (18.8% faster)

def test_large_singleton_tuple(trainer):
    # Large object inside singleton tuple should be unpacked
    data = tuple(range(1000))
    outer = (data,)
    codeflash_output = trainer._unpack_singleton(outer) # 944ns -> 744ns (26.9% faster)

def test_large_nested_singleton_list(trainer):
    # Only outermost singleton list is unpacked
    data = [list(range(1000))]
    codeflash_output = trainer._unpack_singleton(data); result = codeflash_output # 770ns -> 675ns (14.1% faster)

def test_large_nested_singleton_tuple(trainer):
    # Only outermost singleton tuple is unpacked
    data = (tuple(range(1000)),)
    codeflash_output = trainer._unpack_singleton(data); result = codeflash_output # 986ns -> 769ns (28.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import inspect

# imports
import pytest
from keras.src.backend.openvino.trainer import OpenVINOTrainer

class Trainer:
    def __init__(self):
        self._lock = False
        self._run_eagerly = False
        self._jit_compile = None
        self.compiled = False
        self.loss = None
        self.steps_per_execution = 1
        # Can be set by callbacks in on_train_begin
        self._initial_epoch = None
        self._compute_loss_has_training_arg = (
            "training" in inspect.signature(self.compute_loss).parameters
            if hasattr(self, "compute_loss") else False
        )
        # Placeholders used in `compile`
        self._compile_loss = None
        self._compile_metrics = None
        self._loss_tracker = None
from keras.src.backend.openvino.trainer import OpenVINOTrainer

# unit tests

@pytest.fixture
def trainer():
    # Fixture to provide a fresh trainer instance for each test
    return OpenVINOTrainer()

# ----------------------
# 1. Basic Test Cases
# ----------------------

def test_unpack_singleton_list_of_one(trainer):
    # Should return the single element inside a one-element list
    codeflash_output = trainer._unpack_singleton([42]) # 782ns -> 686ns (14.0% faster)

def test_unpack_singleton_tuple_of_one(trainer):
    # Should return the single element inside a one-element tuple
    codeflash_output = trainer._unpack_singleton(('foo',)) # 958ns -> 730ns (31.2% faster)

def test_unpack_singleton_list_of_many(trainer):
    # Should return the original list if length > 1
    data = [1, 2, 3]
    codeflash_output = trainer._unpack_singleton(data) # 755ns -> 699ns (8.01% faster)

def test_unpack_singleton_tuple_of_many(trainer):
    # Should return the original tuple if length > 1
    data = (1, 2, 3)
    codeflash_output = trainer._unpack_singleton(data) # 847ns -> 712ns (19.0% faster)

def test_unpack_singleton_int(trainer):
    # Should return the original int (not a list/tuple)
    codeflash_output = trainer._unpack_singleton(5) # 754ns -> 514ns (46.7% faster)

def test_unpack_singleton_str(trainer):
    # Should return the original string (not a list/tuple)
    codeflash_output = trainer._unpack_singleton("hello") # 623ns -> 495ns (25.9% faster)

def test_unpack_singleton_dict(trainer):
    # Should return the original dict (not a list/tuple)
    d = {"a": 1}
    codeflash_output = trainer._unpack_singleton(d) # 614ns -> 522ns (17.6% faster)

def test_unpack_singleton_empty_list(trainer):
    # Should return the empty list as-is
    l = []
    codeflash_output = trainer._unpack_singleton(l) # 694ns -> 651ns (6.61% faster)

def test_unpack_singleton_empty_tuple(trainer):
    # Should return the empty tuple as-is
    t = ()
    codeflash_output = trainer._unpack_singleton(t) # 889ns -> 679ns (30.9% faster)

# ----------------------
# 2. Edge Test Cases
# ----------------------

def test_unpack_singleton_nested_singleton_list(trainer):
    # Should return the nested list, not its contents
    nested = [[1, 2, 3]]
    codeflash_output = trainer._unpack_singleton(nested) # 783ns -> 689ns (13.6% faster)

def test_unpack_singleton_nested_singleton_tuple(trainer):
    # Should return the nested tuple, not its contents
    nested = ((1, 2, 3),)
    codeflash_output = trainer._unpack_singleton(nested) # 941ns -> 759ns (24.0% faster)

def test_unpack_singleton_list_with_none(trainer):
    # Should correctly unpack a singleton list containing None
    codeflash_output = trainer._unpack_singleton([None]) # 775ns -> 677ns (14.5% faster)

def test_unpack_singleton_tuple_with_none(trainer):
    # Should correctly unpack a singleton tuple containing None
    codeflash_output = trainer._unpack_singleton((None,)) # 923ns -> 755ns (22.3% faster)

def test_unpack_singleton_list_with_empty_list(trainer):
    # Should return the empty list inside the singleton list
    codeflash_output = trainer._unpack_singleton([[]]) # 766ns -> 644ns (18.9% faster)

def test_unpack_singleton_tuple_with_empty_tuple(trainer):
    # Should return the empty tuple inside the singleton tuple
    codeflash_output = trainer._unpack_singleton(((),)) # 927ns -> 763ns (21.5% faster)

def test_unpack_singleton_list_with_tuple(trainer):
    # Should return the tuple inside the singleton list
    codeflash_output = trainer._unpack_singleton([(1, 2)]) # 776ns -> 649ns (19.6% faster)

def test_unpack_singleton_tuple_with_list(trainer):
    # Should return the list inside the singleton tuple
    codeflash_output = trainer._unpack_singleton(([1, 2],)) # 920ns -> 748ns (23.0% faster)

def test_unpack_singleton_list_with_dict(trainer):
    # Should return the dict inside the singleton list
    d = {"x": 1}
    codeflash_output = trainer._unpack_singleton([d]) # 768ns -> 676ns (13.6% faster)

def test_unpack_singleton_tuple_with_dict(trainer):
    # Should return the dict inside the singleton tuple
    d = {"y": 2}
    codeflash_output = trainer._unpack_singleton((d,)) # 918ns -> 759ns (20.9% faster)

def test_unpack_singleton_bytes(trainer):
    # Should return the bytes object as-is
    b = b"bytes"
    codeflash_output = trainer._unpack_singleton(b) # 666ns -> 520ns (28.1% faster)

def test_unpack_singleton_set(trainer):
    # Should return the set as-is (not a list/tuple)
    s = {1, 2, 3}
    codeflash_output = trainer._unpack_singleton(s) # 818ns -> 525ns (55.8% faster)

def test_unpack_singleton_bool(trainer):
    # Should return the bool as-is
    codeflash_output = trainer._unpack_singleton(True) # 714ns -> 503ns (41.9% faster)

def test_unpack_singleton_none(trainer):
    # Should return None as-is
    codeflash_output = trainer._unpack_singleton(None) # 626ns -> 496ns (26.2% faster)

def test_unpack_singleton_list_of_zero(trainer):
    # Should return the empty list as-is
    codeflash_output = trainer._unpack_singleton([]) # 714ns -> 665ns (7.37% faster)

def test_unpack_singleton_tuple_of_zero(trainer):
    # Should return the empty tuple as-is
    codeflash_output = trainer._unpack_singleton(()) # 921ns -> 654ns (40.8% faster)

def test_unpack_singleton_list_of_one_tuple(trainer):
    # Should return the tuple inside the singleton list
    codeflash_output = trainer._unpack_singleton([(1, 2, 3)]) # 737ns -> 647ns (13.9% faster)

def test_unpack_singleton_tuple_of_one_list(trainer):
    # Should return the list inside the singleton tuple
    codeflash_output = trainer._unpack_singleton(([1, 2, 3],)) # 933ns -> 745ns (25.2% faster)

# ----------------------
# 3. Large Scale Test Cases
# ----------------------

def test_unpack_singleton_large_list(trainer):
    # Should return the large list as-is if length > 1
    large_list = list(range(1000))
    codeflash_output = trainer._unpack_singleton(large_list) # 856ns -> 787ns (8.77% faster)

def test_unpack_singleton_large_tuple(trainer):
    # Should return the large tuple as-is if length > 1
    large_tuple = tuple(range(1000))
    codeflash_output = trainer._unpack_singleton(large_tuple) # 1.04μs -> 803ns (29.3% faster)

def test_unpack_singleton_singleton_large_list(trainer):
    # Should return the large list inside a singleton list
    large = list(range(1000))
    codeflash_output = trainer._unpack_singleton([large]) # 805ns -> 697ns (15.5% faster)

def test_unpack_singleton_singleton_large_tuple(trainer):
    # Should return the large tuple inside a singleton tuple
    large = tuple(range(1000))
    codeflash_output = trainer._unpack_singleton((large,)) # 967ns -> 739ns (30.9% faster)

def test_unpack_singleton_large_nested(trainer):
    # Should only unpack the outer singleton, not recursively
    nested = [[i for i in range(1000)]]
    codeflash_output = trainer._unpack_singleton(nested); result = codeflash_output # 794ns -> 685ns (15.9% faster)

def test_unpack_singleton_large_list_of_singletons(trainer):
    # Should not unpack if length > 1, even if all elements are singletons
    data = [[i] for i in range(1000)]
    codeflash_output = trainer._unpack_singleton(data) # 829ns -> 790ns (4.94% faster)

def test_unpack_singleton_large_tuple_of_singletons(trainer):
    # Should not unpack if length > 1, even if all elements are singletons
    data = tuple(([i],) for i in range(1000))
    codeflash_output = trainer._unpack_singleton(data) # 1.07μs -> 816ns (31.6% faster)

def test_unpack_singleton_large_string(trainer):
    # Should return the string as-is, not unpack
    s = "x" * 1000
    codeflash_output = trainer._unpack_singleton(s) # 666ns -> 502ns (32.7% faster)

def test_unpack_singleton_large_bytes(trainer):
    # Should return the bytes object as-is, not unpack
    b = b"x" * 1000
    codeflash_output = trainer._unpack_singleton(b) # 623ns -> 530ns (17.5% faster)

# ----------------------
# Additional mutation-detecting tests
# ----------------------

def test_unpack_singleton_does_not_unpack_non_singleton_list(trainer):
    # If the list has more than one element, it should not be unpacked
    data = [1, 2]
    codeflash_output = trainer._unpack_singleton(data) # 754ns -> 707ns (6.65% faster)

def test_unpack_singleton_does_not_unpack_non_singleton_tuple(trainer):
    # If the tuple has more than one element, it should not be unpacked
    data = (1, 2)
    codeflash_output = trainer._unpack_singleton(data) # 945ns -> 711ns (32.9% faster)

def test_unpack_singleton_does_not_unpack_list_of_zero(trainer):
    # If the list is empty, it should not be unpacked
    data = []
    codeflash_output = trainer._unpack_singleton(data) # 715ns -> 669ns (6.88% faster)

def test_unpack_singleton_does_not_unpack_tuple_of_zero(trainer):
    # If the tuple is empty, it should not be unpacked
    data = ()
    codeflash_output = trainer._unpack_singleton(data) # 906ns -> 675ns (34.2% faster)

def test_unpack_singleton_does_not_unpack_other_iterables(trainer):
    # Should not unpack sets, dicts, or other iterables
    data = set([1])
    codeflash_output = trainer._unpack_singleton(data) # 837ns -> 515ns (62.5% faster)
    data = {"a": 1}
    codeflash_output = trainer._unpack_singleton(data) # 283ns -> 248ns (14.1% faster)

def test_unpack_singleton_object(trainer):
    # Should return the object as-is
    class Dummy:
        pass
    obj = Dummy()
    codeflash_output = trainer._unpack_singleton(obj) # 760ns -> 497ns (52.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-OpenVINOTrainer._unpack_singleton-mjaljkl6 and push.

Codeflash Static Badge

The optimization replaces `isinstance(x, (list, tuple))` with separate `type(x) is list` and `type(x) is tuple` checks, achieving a **23% speedup**.

**Key optimizations:**

1. **Eliminates tuple creation overhead**: The original code creates a tuple `(list, tuple)` on every function call for the `isinstance` check. The optimized version avoids this allocation entirely.

2. **Uses faster type identity checks**: `type(x) is list` uses direct type identity comparison, which is faster than `isinstance()` when you only need exact type matches (not subclasses). This is appropriate here since the function specifically targets built-in list and tuple types.

3. **Reduces function call overhead**: `isinstance()` involves more complex C-level logic to handle inheritance checking, while `type()` with `is` comparison is a simpler, more direct operation.

**Performance characteristics from tests:**
- **Singleton unpacking cases** (the primary use case) show 15-32% improvements
- **Non-sequence types** (int, str, dict, etc.) benefit most with 25-62% speedups since they skip both checks entirely in the optimized version
- **Large collections** still benefit (4-32% faster) due to the cheaper type checking

This optimization is particularly valuable in deep learning contexts where tensor operations frequently involve unpacking singleton containers, and the function may be called thousands of times during model training or inference. The behavioral semantics remain identical - it still only unpacks single-element lists and tuples while preserving all other inputs unchanged.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 17, 2025 22:42
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant