Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 18, 2025

📄 146% (1.46x) speedup for map_graph in keras/src/ops/function.py

⏱️ Runtime : 13.9 milliseconds 5.64 milliseconds (best of 36 runs)

📝 Explanation and details

The optimized code achieves a 146% speedup primarily through replacing an O(n²) algorithm with an O(n) algorithm for operation name uniqueness checking, which becomes critical for large models.

Key Optimization:

  • Original approach: For each operation name, called all_names.count(name) which scans the entire list, resulting in O(n²) complexity
  • Optimized approach: Uses a dictionary (name_counts) to track occurrences in a single pass, then checks counts separately - reducing to O(n) complexity

The line profiler shows the dramatic impact: the original all_names.count(name) took 8.47ms (20.6% of total time), while the optimized name counting takes only 0.42ms (1.3% of total time) - a 95% reduction in this section alone.

Why this matters for Keras:
Based on the function references, map_graph is called during model initialization in the Function constructor, which processes all operations in a neural network. Large models with hundreds or thousands of operations would experience quadratic slowdown in the original version, making model creation prohibitively slow.

Test case performance:

  • Small models (2-10 operations): Modest 1-6% improvements
  • Large disconnected inputs test (1000 operations): 219% speedup - demonstrating the optimization scales with model complexity

The optimization preserves all existing behavior while dramatically improving scalability for real-world deep learning models where operation counts can be very large.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 8 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 96.2%
🌀 Generated Regression Tests and Runtime
import collections
# Insert patch into map_graph's namespace
import types

# imports
import pytest
from keras.src.ops.function import map_graph

# Minimal mocks for Node, Operation, and Tensor to simulate Keras-like graph
class DummyTensor:
    def __init__(self, name, keras_history=None):
        self.name = name
        self._keras_history = keras_history or (None, None, None)
    def __repr__(self):
        return f"DummyTensor({self.name})"
    def __hash__(self):
        return hash(id(self))
    def __eq__(self, other):
        return self is other

class DummyNode:
    def __init__(self, operation, input_tensors, outputs, is_input=False, parent_nodes=None):
        self.operation = operation
        self.input_tensors = input_tensors
        self.outputs = outputs
        self.is_input = is_input
        self.parent_nodes = parent_nodes or []
    def __repr__(self):
        return f"DummyNode({self.operation.name}, is_input={self.is_input})"
    def __hash__(self):
        return hash(id(self))
    def __eq__(self, other):
        return self is other

class DummyOperation:
    def __init__(self, name):
        self.name = name
        self._inbound_nodes = []
    def __repr__(self):
        return f"DummyOperation({self.name})"
    def __hash__(self):
        return hash(id(self))
    def __eq__(self, other):
        return self is other
from keras.src.ops.function import map_graph

# ------------------- UNIT TESTS -------------------

# Basic Test Cases

def test_single_input_single_output():
    # One input, one op, one output
    op = DummyOperation("op1")
    inp = DummyTensor("input")
    node = DummyNode(op, [inp], outputs=[], is_input=True)
    op._inbound_nodes.append(node)
    inp._keras_history = (op, 0, None)
    out = DummyTensor("output")
    node.outputs = [out]
    out._keras_history = (op, 0, None)
    codeflash_output = map_graph([inp], [out]); result = codeflash_output # 26.9μs -> 26.5μs (1.77% faster)
    network_nodes, nodes_by_depth, operations, operations_by_depth = result
    # Node should be in nodes_by_depth
    found = False
    for nodes in nodes_by_depth.values():
        if node in nodes:
            found = True

def test_two_layer_chain():
    # input -> op1 -> mid -> op2 -> output
    op1 = DummyOperation("op1")
    op2 = DummyOperation("op2")
    inp = DummyTensor("input")
    node1 = DummyNode(op1, [inp], outputs=[], is_input=True)
    op1._inbound_nodes.append(node1)
    inp._keras_history = (op1, 0, None)
    mid = DummyTensor("mid")
    node1.outputs = [mid]
    mid._keras_history = (op1, 0, None)
    node2 = DummyNode(op2, [mid], outputs=[], is_input=False, parent_nodes=[node1])
    op2._inbound_nodes.append(node2)
    out = DummyTensor("output")
    node2.outputs = [out]
    out._keras_history = (op2, 0, None)
    codeflash_output = map_graph([inp], [out]); result = codeflash_output # 34.3μs -> 33.9μs (1.31% faster)
    network_nodes, nodes_by_depth, operations, operations_by_depth = result

def test_branching_graph():
    # input -> op1 -> mid1 -> op2 -> out1
    #                \-> op3 -> out2
    op1 = DummyOperation("op1")
    op2 = DummyOperation("op2")
    op3 = DummyOperation("op3")
    inp = DummyTensor("input")
    node1 = DummyNode(op1, [inp], outputs=[], is_input=True)
    op1._inbound_nodes.append(node1)
    inp._keras_history = (op1, 0, None)
    mid1 = DummyTensor("mid1")
    node1.outputs = [mid1]
    mid1._keras_history = (op1, 0, None)
    node2 = DummyNode(op2, [mid1], outputs=[], is_input=False, parent_nodes=[node1])
    op2._inbound_nodes.append(node2)
    out1 = DummyTensor("out1")
    node2.outputs = [out1]
    out1._keras_history = (op2, 0, None)
    node3 = DummyNode(op3, [mid1], outputs=[], is_input=False, parent_nodes=[node1])
    op3._inbound_nodes.append(node3)
    out2 = DummyTensor("out2")
    node3.outputs = [out2]
    out2._keras_history = (op3, 0, None)
    codeflash_output = map_graph([inp], [out1, out2]); result = codeflash_output # 42.9μs -> 41.6μs (3.14% faster)
    network_nodes, nodes_by_depth, operations, operations_by_depth = result

def test_multiple_inputs():
    # inp1 -> op1 -> out1
    # inp2 -> op2 -> out2
    op1 = DummyOperation("op1")
    op2 = DummyOperation("op2")
    inp1 = DummyTensor("input1")
    inp2 = DummyTensor("input2")
    node1 = DummyNode(op1, [inp1], outputs=[], is_input=True)
    op1._inbound_nodes.append(node1)
    inp1._keras_history = (op1, 0, None)
    out1 = DummyTensor("out1")
    node1.outputs = [out1]
    out1._keras_history = (op1, 0, None)
    node2 = DummyNode(op2, [inp2], outputs=[], is_input=True)
    op2._inbound_nodes.append(node2)
    inp2._keras_history = (op2, 0, None)
    out2 = DummyTensor("out2")
    node2.outputs = [out2]
    out2._keras_history = (op2, 0, None)
    codeflash_output = map_graph([inp1, inp2], [out1, out2]); result = codeflash_output # 30.4μs -> 29.9μs (1.70% faster)
    network_nodes, nodes_by_depth, operations, operations_by_depth = result

def test_disconnected_input():
    # inp1 -> op1 -> out1, inp2 not connected
    op1 = DummyOperation("op1")
    inp1 = DummyTensor("input1")
    inp2 = DummyTensor("input2")
    node1 = DummyNode(op1, [inp1], outputs=[], is_input=True)
    op1._inbound_nodes.append(node1)
    inp1._keras_history = (op1, 0, None)
    out1 = DummyTensor("out1")
    node1.outputs = [out1]
    out1._keras_history = (op1, 0, None)
    # inp2 is not connected to any output
    op2 = DummyOperation("op2")
    node2 = DummyNode(op2, [inp2], outputs=[], is_input=True)
    op2._inbound_nodes.append(node2)
    inp2._keras_history = (op2, 0, None)
    codeflash_output = map_graph([inp1, inp2], [out1]); result = codeflash_output # 27.8μs -> 26.2μs (6.21% faster)
    network_nodes, nodes_by_depth, operations, operations_by_depth = result

# Edge Test Cases

def test_duplicate_operation_names():
    # Two operations with the same name
    op1 = DummyOperation("op")
    op2 = DummyOperation("op")
    inp1 = DummyTensor("input1")
    inp2 = DummyTensor("input2")
    node1 = DummyNode(op1, [inp1], outputs=[], is_input=True)
    op1._inbound_nodes.append(node1)
    inp1._keras_history = (op1, 0, None)
    out1 = DummyTensor("out1")
    node1.outputs = [out1]
    out1._keras_history = (op1, 0, None)
    node2 = DummyNode(op2, [inp2], outputs=[], is_input=True)
    op2._inbound_nodes.append(node2)
    inp2._keras_history = (op2, 0, None)
    out2 = DummyTensor("out2")
    node2.outputs = [out2]
    out2._keras_history = (op2, 0, None)
    with pytest.raises(ValueError) as e:
        map_graph([inp1, inp2], [out1, out2]) # 41.7μs -> 42.4μs (1.59% slower)

def test_input_is_output():
    # The input tensor is also the output tensor
    op1 = DummyOperation("op1")
    inp = DummyTensor("input")
    node1 = DummyNode(op1, [inp], outputs=[inp], is_input=True)
    op1._inbound_nodes.append(node1)
    inp._keras_history = (op1, 0, None)
    # Output is the same as input
    codeflash_output = map_graph([inp], [inp]); result = codeflash_output # 33.4μs -> 33.4μs (0.099% faster)
    network_nodes, nodes_by_depth, operations, operations_by_depth = result

def test_tensor_used_multiple_times():
    # One tensor is used as input to multiple nodes
    op1 = DummyOperation("op1")
    op2 = DummyOperation("op2")
    inp = DummyTensor("input")
    node1 = DummyNode(op1, [inp], outputs=[], is_input=True)
    op1._inbound_nodes.append(node1)
    inp._keras_history = (op1, 0, None)
    mid = DummyTensor("mid")
    node1.outputs = [mid]
    mid._keras_history = (op1, 0, None)
    node2 = DummyNode(op2, [mid], outputs=[], is_input=False, parent_nodes=[node1])
    op2._inbound_nodes.append(node2)
    out1 = DummyTensor("out1")
    node2.outputs = [out1]
    out1._keras_history = (op2, 0, None)
    node3 = DummyNode(op2, [mid], outputs=[], is_input=False, parent_nodes=[node1])
    op2._inbound_nodes.append(node3)
    out2 = DummyTensor("out2")
    node3.outputs = [out2]
    out2._keras_history = (op2, 1, None)
    codeflash_output = map_graph([inp], [out1, out2]); result = codeflash_output # 45.4μs -> 43.7μs (3.93% faster)
    network_nodes, nodes_by_depth, operations, operations_by_depth = result

# Large Scale Test Cases

def test_large_linear_chain():
    # Linear chain of 500 nodes
    n = 500
    ops = [DummyOperation(f"op{i}") for i in range(n)]
    tensors = [DummyTensor(f"t{i}") for i in range(n + 1)]
    for i in range(n):
        node = DummyNode(ops[i], [tensors[i]], outputs=[], is_input=(i==0),
                         parent_nodes=[ops[i-1]._inbound_nodes[0]] if i > 0 else [])
        ops[i]._inbound_nodes.append(node)
        tensors[i]._keras_history = (ops[i], 0, None)
        node.outputs = [tensors[i+1]]
        tensors[i+1]._keras_history = (ops[i], 0, None)
    codeflash_output = map_graph([tensors[0]], [tensors[-1]]); result = codeflash_output
    network_nodes, nodes_by_depth, operations, operations_by_depth = result
    # Check that the first node is at max depth
    max_depth = max(nodes_by_depth.keys())

def test_large_branching_tree():
    # Binary tree of depth 8 (255 nodes, 256 leaves)
    depth = 8
    ops = []
    tensors = []
    nodes = []
    def build_tree(level, parent_node=None):
        if level == depth:
            t = DummyTensor(f"leaf_{len(tensors)}")
            tensors.append(t)
            return t
        op = DummyOperation(f"op_{level}_{len(ops)}")
        ops.append(op)
        left = build_tree(level+1)
        right = build_tree(level+1)
        node = DummyNode(op, [left, right], outputs=[], is_input=(level==0), parent_nodes=[parent_node] if parent_node else [])
        op._inbound_nodes.append(node)
        out = DummyTensor(f"out_{level}_{len(tensors)}")
        node.outputs = [out]
        out._keras_history = (op, 0, None)
        nodes.append(node)
        return out
    root_out = build_tree(0)
    # All leaves are inputs
    for i, t in enumerate(tensors):
        op = DummyOperation(f"leaf_op_{i}")
        node = DummyNode(op, [], outputs=[t], is_input=True)
        op._inbound_nodes.append(node)
        t._keras_history = (op, 0, None)
    codeflash_output = map_graph(tensors, [root_out]); result = codeflash_output
    network_nodes, nodes_by_depth, operations, operations_by_depth = result
    # Check that all leaf ops are present
    for i, t in enumerate(tensors):
        op = t._keras_history[0]

def test_large_disconnected_inputs():
    # 1000 inputs, only the first is connected to output
    n = 1000
    ops = [DummyOperation(f"op{i}") for i in range(n)]
    tensors = [DummyTensor(f"t{i}") for i in range(n)]
    for i in range(n):
        node = DummyNode(ops[i], [tensors[i]], outputs=[], is_input=True)
        ops[i]._inbound_nodes.append(node)
        tensors[i]._keras_history = (ops[i], 0, None)
        node.outputs = [tensors[i]]
    # Only t0 is connected to output
    codeflash_output = map_graph(tensors, [tensors[0]]); result = codeflash_output # 11.9ms -> 3.72ms (219% faster)
    network_nodes, nodes_by_depth, operations, operations_by_depth = result
    # All ops should be in operations, even if not connected
    for op in ops:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-map_graph-mjaqgp2d and push.

Codeflash Static Badge

The optimized code achieves a 146% speedup primarily through **replacing an O(n²) algorithm with an O(n) algorithm** for operation name uniqueness checking, which becomes critical for large models.

**Key Optimization:**
- **Original approach**: For each operation name, called `all_names.count(name)` which scans the entire list, resulting in O(n²) complexity
- **Optimized approach**: Uses a dictionary (`name_counts`) to track occurrences in a single pass, then checks counts separately - reducing to O(n) complexity

The line profiler shows the dramatic impact: the original `all_names.count(name)` took **8.47ms (20.6% of total time)**, while the optimized name counting takes only **0.42ms (1.3% of total time)** - a **95% reduction** in this section alone.

**Why this matters for Keras:**
Based on the function references, `map_graph` is called during model initialization in the Function constructor, which processes all operations in a neural network. Large models with hundreds or thousands of operations would experience quadratic slowdown in the original version, making model creation prohibitively slow.

**Test case performance:**
- Small models (2-10 operations): Modest 1-6% improvements
- **Large disconnected inputs test (1000 operations): 219% speedup** - demonstrating the optimization scales with model complexity

The optimization preserves all existing behavior while dramatically improving scalability for real-world deep learning models where operation counts can be very large.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 18, 2025 00:59
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant