Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 105% (1.05x) speedup for _shared_object_saving_scope in keras/src/legacy/saving/serialization.py

⏱️ Runtime : 249 microseconds 122 microseconds (best of 174 runs)

📝 Explanation and details

The optimization replaces getattr(SHARED_OBJECT_SAVING, "scope", None) with direct dictionary access using SHARED_OBJECT_SAVING.__dict__.get("scope", None). This achieves a 104% speedup by bypassing Python's attribute lookup mechanism.

Key Performance Improvement:

  • getattr() triggers Python's descriptor protocol and attribute resolution machinery, which involves multiple method calls and checks
  • Direct __dict__.get() access skips this overhead and performs a simple dictionary lookup
  • The optimization is particularly effective when the attribute doesn't exist (returns None), as shown in the test results where missing attributes see 53-93% speedup

Why This Works:
threading.local() objects store their thread-specific data in a standard __dict__, making direct dictionary access safe and equivalent. Both approaches handle missing attributes identically by returning None.

Impact on Workloads:
Based on the function references, _shared_object_saving_scope() is called in hot paths during Keras model serialization:

  • Called in context manager __enter__ methods that are invoked during model saving
  • Called multiple times within serialize_keras_class_and_config() which processes each layer/component during serialization
  • The 104% speedup becomes significant when serializing complex models with many layers, as this function is invoked repeatedly

Test Case Performance:
The optimization shows consistent 53-121% improvements across all test scenarios, with particularly strong gains for missing attributes and large-scale operations (111% speedup for 500 iterations), making it effective for both individual calls and batch serialization workloads.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 534 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import threading

# imports
import pytest
from keras.src.legacy.saving.serialization import _shared_object_saving_scope

SHARED_OBJECT_SAVING = threading.local()
from keras.src.legacy.saving.serialization import _shared_object_saving_scope

# unit tests

def test_scope_not_set_returns_none():
    """Basic: If scope is not set, function should return None."""
    # Ensure scope attribute is not set
    if hasattr(SHARED_OBJECT_SAVING, "scope"):
        delattr(SHARED_OBJECT_SAVING, "scope")
    codeflash_output = _shared_object_saving_scope() # 1.50μs -> 979ns (53.3% faster)

def test_scope_set_to_value_returns_value():
    """Basic: If scope is set to a value, function should return that value."""
    SHARED_OBJECT_SAVING.scope = "my_scope"
    codeflash_output = _shared_object_saving_scope() # 1.22μs -> 655ns (86.9% faster)

def test_scope_set_to_none_returns_none():
    """Basic: If scope is explicitly set to None, function should return None."""
    SHARED_OBJECT_SAVING.scope = None
    codeflash_output = _shared_object_saving_scope() # 1.11μs -> 645ns (72.4% faster)

def test_scope_set_to_integer():
    """Basic: If scope is set to an integer, function should return the integer."""
    SHARED_OBJECT_SAVING.scope = 42
    codeflash_output = _shared_object_saving_scope() # 1.18μs -> 682ns (73.0% faster)

def test_scope_set_to_object():
    """Basic: If scope is set to an object, function should return the same object."""
    class Dummy:
        pass
    obj = Dummy()
    SHARED_OBJECT_SAVING.scope = obj
    codeflash_output = _shared_object_saving_scope() # 1.18μs -> 666ns (77.2% faster)

def test_scope_set_to_list():
    """Edge: If scope is set to a list, function should return the list."""
    scope_list = [1, 2, 3]
    SHARED_OBJECT_SAVING.scope = scope_list
    codeflash_output = _shared_object_saving_scope() # 1.14μs -> 618ns (84.0% faster)

def test_scope_set_to_empty_string():
    """Edge: If scope is set to an empty string, function should return empty string."""
    SHARED_OBJECT_SAVING.scope = ""
    codeflash_output = _shared_object_saving_scope() # 1.16μs -> 672ns (72.6% faster)

def test_scope_set_to_false():
    """Edge: If scope is set to False, function should return False."""
    SHARED_OBJECT_SAVING.scope = False
    codeflash_output = _shared_object_saving_scope() # 1.17μs -> 649ns (80.9% faster)

def test_scope_set_to_zero():
    """Edge: If scope is set to 0, function should return 0."""
    SHARED_OBJECT_SAVING.scope = 0
    codeflash_output = _shared_object_saving_scope() # 1.14μs -> 652ns (75.5% faster)

def test_scope_deleted_after_set():
    """Edge: If scope is deleted after being set, function should return None."""
    SHARED_OBJECT_SAVING.scope = "something"
    delattr(SHARED_OBJECT_SAVING, "scope")
    codeflash_output = _shared_object_saving_scope() # 1.18μs -> 609ns (92.9% faster)

def test_large_scale_many_assignments():
    """Large Scale: Assign scope many times in a loop and check correctness."""
    for i in range(500):
        SHARED_OBJECT_SAVING.scope = i
        codeflash_output = _shared_object_saving_scope() # 210μs -> 99.4μs (111% faster)

def test_scope_set_to_mutable_and_mutated():
    """Edge: If scope is a mutable object and is mutated, function reflects the mutation."""
    scope_dict = {'a': 1}
    SHARED_OBJECT_SAVING.scope = scope_dict
    codeflash_output = _shared_object_saving_scope() # 1.24μs -> 727ns (70.7% faster)
    scope_dict['b'] = 2
    # Should see the updated dict
    codeflash_output = _shared_object_saving_scope() # 499ns -> 226ns (121% faster)

def test_scope_set_to_tuple():
    """Edge: If scope is set to a tuple, function should return the tuple."""
    SHARED_OBJECT_SAVING.scope = (1, 2, 3)
    codeflash_output = _shared_object_saving_scope() # 975ns -> 575ns (69.6% faster)

def test_scope_set_to_custom_class_with_eq():
    """Edge: If scope is a custom class with __eq__, function returns the correct instance."""
    class Custom:
        def __init__(self, x):
            self.x = x
        def __eq__(self, other):
            return isinstance(other, Custom) and self.x == other.x
    obj = Custom(99)
    SHARED_OBJECT_SAVING.scope = obj
    codeflash_output = _shared_object_saving_scope() # 1.10μs -> 653ns (68.9% faster)

def test_scope_set_to_bytes():
    """Edge: If scope is set to bytes, function should return the bytes."""
    SHARED_OBJECT_SAVING.scope = b"bytes"
    codeflash_output = _shared_object_saving_scope() # 1.11μs -> 623ns (78.0% faster)

def test_scope_set_to_large_list():
    """Large Scale: Set scope to a large list and check retrieval."""
    big_list = list(range(1000))
    SHARED_OBJECT_SAVING.scope = big_list
    codeflash_output = _shared_object_saving_scope() # 1.12μs -> 663ns (69.7% faster)

def test_scope_set_to_large_string():
    """Large Scale: Set scope to a large string and check retrieval."""
    big_str = "x" * 1000
    SHARED_OBJECT_SAVING.scope = big_str
    codeflash_output = _shared_object_saving_scope() # 1.11μs -> 631ns (75.3% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import threading

# imports
import pytest  # used for our unit tests
from keras.src.legacy.saving.serialization import _shared_object_saving_scope

# function to test
# Source: keras/src/legacy/saving/serialization.py
SHARED_OBJECT_SAVING = threading.local()
from keras.src.legacy.saving.serialization import _shared_object_saving_scope

# unit tests

# ---------------------- Basic Test Cases ----------------------

def test_scope_none_by_default():
    # By default, no 'scope' attribute should exist, so should return None
    codeflash_output = _shared_object_saving_scope() # 1.20μs -> 760ns (57.8% faster)

def test_scope_set_to_value():
    # Set the scope to a value and ensure it is returned
    SHARED_OBJECT_SAVING.scope = "my_scope"
    codeflash_output = _shared_object_saving_scope() # 1.16μs -> 651ns (78.5% faster)

def test_scope_set_to_integer():
    # Set the scope to an integer and ensure it is returned
    SHARED_OBJECT_SAVING.scope = 12345
    codeflash_output = _shared_object_saving_scope() # 1.13μs -> 645ns (75.3% faster)

def test_scope_set_to_object():
    # Set the scope to a custom object and ensure it is returned
    class DummyScope:
        pass
    dummy = DummyScope()
    SHARED_OBJECT_SAVING.scope = dummy
    codeflash_output = _shared_object_saving_scope() # 1.20μs -> 618ns (94.7% faster)

def test_scope_set_to_falsey_value():
    # Set the scope to a falsey value (e.g., empty string)
    SHARED_OBJECT_SAVING.scope = ""
    codeflash_output = _shared_object_saving_scope() # 1.09μs -> 690ns (58.0% faster)

# ---------------------- Edge Test Cases ----------------------

def test_scope_deleted_returns_none():
    # Set, then delete the scope attribute, should return None
    SHARED_OBJECT_SAVING.scope = "to_be_deleted"
    del SHARED_OBJECT_SAVING.scope
    codeflash_output = _shared_object_saving_scope() # 1.14μs -> 633ns (79.5% faster)

def test_scope_set_to_none_explicitly():
    # Explicitly set scope to None, should return None
    SHARED_OBJECT_SAVING.scope = None
    codeflash_output = _shared_object_saving_scope() # 1.11μs -> 625ns (77.6% faster)

def test_scope_set_to_empty_list():
    # Set scope to an empty list
    SHARED_OBJECT_SAVING.scope = []
    codeflash_output = _shared_object_saving_scope() # 1.12μs -> 631ns (77.7% faster)

def test_scope_set_to_empty_dict():
    # Set scope to an empty dict
    SHARED_OBJECT_SAVING.scope = {}
    codeflash_output = _shared_object_saving_scope() # 1.12μs -> 634ns (77.3% faster)

def test_scope_set_to_large_string():
    # Set scope to a large string
    large_string = "x" * 1000
    SHARED_OBJECT_SAVING.scope = large_string
    codeflash_output = _shared_object_saving_scope() # 1.03μs -> 626ns (64.9% faster)

def test_scope_set_to_large_list():
    # Set scope to a large list
    large_list = list(range(1000))
    SHARED_OBJECT_SAVING.scope = large_list
    codeflash_output = _shared_object_saving_scope() # 1.13μs -> 660ns (70.6% faster)

def test_scope_set_to_large_dict():
    # Set scope to a large dict
    large_dict = {str(i): i for i in range(1000)}
    SHARED_OBJECT_SAVING.scope = large_dict
    codeflash_output = _shared_object_saving_scope() # 1.21μs -> 696ns (74.1% faster)

def test_scope_set_to_boolean_true():
    SHARED_OBJECT_SAVING.scope = True
    codeflash_output = _shared_object_saving_scope() # 1.21μs -> 652ns (85.4% faster)

def test_scope_set_to_boolean_false():
    SHARED_OBJECT_SAVING.scope = False
    codeflash_output = _shared_object_saving_scope() # 1.05μs -> 626ns (68.4% faster)

def test_scope_set_to_float():
    SHARED_OBJECT_SAVING.scope = 3.14159
    codeflash_output = _shared_object_saving_scope() # 1.08μs -> 613ns (76.3% faster)

def test_scope_set_to_tuple():
    SHARED_OBJECT_SAVING.scope = (1, 2, 3)
    codeflash_output = _shared_object_saving_scope() # 1.12μs -> 644ns (73.3% faster)

# ---------------------- Large Scale Test Cases ----------------------

def test_scope_with_large_object():
    # Set scope to a large custom object
    class LargeObject:
        def __init__(self):
            self.data = [i for i in range(1000)]
    obj = LargeObject()
    SHARED_OBJECT_SAVING.scope = obj
    codeflash_output = _shared_object_saving_scope() # 1.69μs -> 978ns (72.6% faster)

To edit these changes git checkout codeflash/optimize-_shared_object_saving_scope-mjadanrp and push.

Codeflash Static Badge

The optimization replaces `getattr(SHARED_OBJECT_SAVING, "scope", None)` with direct dictionary access using `SHARED_OBJECT_SAVING.__dict__.get("scope", None)`. This achieves a **104% speedup** by bypassing Python's attribute lookup mechanism.

**Key Performance Improvement:**
- `getattr()` triggers Python's descriptor protocol and attribute resolution machinery, which involves multiple method calls and checks
- Direct `__dict__.get()` access skips this overhead and performs a simple dictionary lookup
- The optimization is particularly effective when the attribute doesn't exist (returns `None`), as shown in the test results where missing attributes see 53-93% speedup

**Why This Works:**
`threading.local()` objects store their thread-specific data in a standard `__dict__`, making direct dictionary access safe and equivalent. Both approaches handle missing attributes identically by returning `None`.

**Impact on Workloads:**
Based on the function references, `_shared_object_saving_scope()` is called in hot paths during Keras model serialization:
- Called in context manager `__enter__` methods that are invoked during model saving
- Called multiple times within `serialize_keras_class_and_config()` which processes each layer/component during serialization
- The 104% speedup becomes significant when serializing complex models with many layers, as this function is invoked repeatedly

**Test Case Performance:**
The optimization shows consistent 53-121% improvements across all test scenarios, with particularly strong gains for missing attributes and large-scale operations (111% speedup for 500 iterations), making it effective for both individual calls and batch serialization workloads.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 17, 2025 18:51
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant