Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 18, 2025

📄 23,855% (238.55x) speedup for find_last_node in src/algorithms/graph.py

⏱️ Runtime : 101 milliseconds 420 microseconds (best of 250 runs)

📝 Explanation and details

The optimization transforms a quadratic O(n*m) algorithm into a linear O(n+m) one by eliminating repeated edge traversals.

Key Change: Instead of checking all(e["source"] != n["id"] for e in edges) for every node (which scans all edges for each node), the optimized version pre-computes a set of all source IDs: sources = {e["source"] for e in edges}. Then it uses fast O(1) set membership testing: n["id"] not in sources.

Why It's Faster:

  • Original: For each of the n nodes, iterates through all m edges → O(n*m) complexity
  • Optimized: One pass through edges to build the set O(m), then one pass through nodes with O(1) lookups → O(n+m) complexity

Performance Impact: The 238x speedup (from 101ms to 420μs) demonstrates the dramatic difference between quadratic and linear algorithms. This improvement scales exponentially with input size - larger graphs will see even greater speedups.

Test Case Analysis: The optimization excels across all scenarios:

  • Small graphs (2-3 nodes): Minimal overhead from set creation
  • Large linear chains (1000 nodes): Massive improvement due to eliminated redundant edge scanning
  • Dense graphs with many edges: Set lookup remains O(1) regardless of edge count
  • Edge cases (empty graphs, cycles): Maintains correctness while improving performance

The optimization is particularly valuable for graph analysis workflows where this function might be called repeatedly on large datasets.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 43 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# ---------------------------
# Basic Test Cases
# ---------------------------


def test_single_node_no_edges():
    # Only one node, no edges: should return that node
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_two_nodes_one_edge():
    # Two nodes, one edge from node 1 to node 2: node 2 is last
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_three_nodes_linear_chain():
    # Three nodes, edges forming a chain: 1->2->3, so node 3 is last
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_last_nodes():
    # Two nodes with no outgoing edges: both are "last", function returns the first one
    nodes = [{"id": 1}, {"id": 2}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_cycle_graph():
    # Cycle: no node is a "last" node (all have outgoing edges)
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# ---------------------------
# Edge Test Cases
# ---------------------------


def test_empty_nodes_and_edges():
    # No nodes, no edges: should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_non_integer_ids():
    # Node IDs are strings
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edges_with_missing_nodes():
    # Edges refer to nodes not in the nodes list
    nodes = [{"id": 1}]
    edges = [{"source": 2, "target": 1}]
    # Node 1 has no outgoing edge, so it should be returned
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_extra_fields():
    # Nodes have extra fields, function should ignore them
    nodes = [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_nodes_no_edges():
    # Several nodes, no edges: first node is returned
    nodes = [{"id": 10}, {"id": 20}, {"id": 30}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edges_with_missing_source_field():
    # Edge missing "source" key: should raise KeyError
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"target": 2}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)


def test_edges_with_extra_fields():
    # Edge has extra fields, should not affect result
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "weight": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_duplicate_ids():
    # Duplicate node IDs: function should return the first one with no outgoing edge
    nodes = [{"id": 1}, {"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    # Both {"id": 1} nodes have outgoing edges, only {"id": 2} is last
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_none_id():
    # Node with id=None, should be handled
    nodes = [{"id": None}, {"id": 2}]
    edges = [{"source": 2, "target": None}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edges_with_none_source():
    # Edge with source=None, should not affect nodes with id != None
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": None, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# ---------------------------
# Large Scale Test Cases
# ---------------------------


def test_large_linear_chain():
    # 1000 nodes in a linear chain: last node should be returned
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i + 1} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_star_graph():
    # One central node with outgoing edges to 999 leaf nodes, leaves have no outgoing edges
    nodes = [{"id": 0}] + [{"id": i} for i in range(1, 1000)]
    edges = [{"source": 0, "target": i} for i in range(1, 1000)]
    # The first leaf node should be returned
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_all_nodes_with_outgoing_edges():
    # All nodes have outgoing edges: should return None
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": (i + 1) % 1000} for i in range(1000)]  # cycle
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_multiple_last_nodes():
    # 1000 nodes, only last 10 have no outgoing edges, should return the first among those
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i + 1} for i in range(989)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_with_missing_edges():
    # 1000 nodes, only first 500 have outgoing edges
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i + 1} for i in range(499)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# ---------------------------
# Deterministic and Robustness Checks
# ---------------------------


def test_nodes_order_determinism():
    # Function should return the first node with no outgoing edge, order matters
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_mixed_types():
    # Node IDs are mixed types, function should still work
    nodes = [{"id": 1}, {"id": "2"}, {"id": 3.0}]
    edges = [{"source": 1, "target": "2"}, {"source": "2", "target": 3.0}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edges_with_unexpected_keys():
    # Edges have extra keys, should not affect result
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "foo": "bar"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_no_id_key():
    # Node missing "id" key: should raise KeyError
    nodes = [{"name": "A"}, {"id": 2}]
    edges = [{"source": 2, "target": "A"}]
    with pytest.raises(KeyError):
        find_last_node(nodes, edges)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# ------------------- BASIC TEST CASES -------------------


def test_single_node_no_edges():
    # One node, no edges. Node should be returned.
    nodes = [{"id": 1}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_two_nodes_one_edge():
    # Two nodes, one edge from node 1 to node 2. Node 2 should be returned.
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_three_nodes_linear_chain():
    # Three nodes in a chain: 1->2->3. Node 3 should be returned.
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_last_nodes():
    # Two disconnected nodes, no edges. First node in list should be returned.
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_cycle_graph():
    # Cycle: 1->2->3->1. No node is last; should return None.
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# ------------------- EDGE TEST CASES -------------------


def test_empty_nodes_and_edges():
    # No nodes, no edges. Should return None.
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_non_integer_ids():
    # Nodes with string IDs, edge from "A" to "B". "B" is last.
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_duplicate_ids():
    # Duplicate IDs, only first should be returned.
    nodes = [{"id": 1}, {"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_not_referenced_in_edges():
    # Node with id not present in any edge; should be returned.
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edges_with_extra_keys():
    # Edges contain extra irrelevant keys; should not affect result.
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "weight": 5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_last_nodes_with_non_sequential_ids():
    # Multiple nodes, none are sources. First one should be returned.
    nodes = [{"id": 10}, {"id": 20}, {"id": 30}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_self_loop_edge():
    # Node with self-loop; should not be last node.
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edge_with_none_source():
    # Edge with source=None; should not match any node.
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": None, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_various_types():
    # Node IDs are mixed types (int, str, tuple).
    nodes = [{"id": 1}, {"id": "A"}, {"id": (2, 3)}]
    edges = [{"source": 1, "target": "A"}, {"source": "A", "target": (2, 3)}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# ------------------- LARGE SCALE TEST CASES -------------------


def test_large_linear_chain():
    # 1000 nodes in a chain: 0->1->2->...->999. Node 999 should be returned.
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i + 1} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_disconnected_nodes():
    # 1000 nodes, no edges. First node should be returned.
    nodes = [{"id": i} for i in range(1000)]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_cycle_graph():
    # 1000 nodes in a cycle: 0->1->2->...->999->0. Should return None.
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": (i + 1) % 1000} for i in range(1000)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_multiple_last_nodes():
    # 1000 nodes, only first 500 are sources. The rest are last nodes.
    nodes = [{"id": i} for i in range(1000)]
    edges = [{"source": i, "target": i + 500} for i in range(500)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_with_duplicate_ids():
    # 500 nodes with id=1, 500 nodes with id=2, edges from 1 to 2.
    nodes = [{"id": 1} for _ in range(500)] + [{"id": 2} for _ in range(500)]
    edges = [{"source": 1, "target": 2} for _ in range(500)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mjaqp5q2 and push.

Codeflash Static Badge

The optimization transforms a quadratic O(n*m) algorithm into a linear O(n+m) one by eliminating repeated edge traversals. 

**Key Change**: Instead of checking `all(e["source"] != n["id"] for e in edges)` for every node (which scans all edges for each node), the optimized version pre-computes a set of all source IDs: `sources = {e["source"] for e in edges}`. Then it uses fast O(1) set membership testing: `n["id"] not in sources`.

**Why It's Faster**: 
- **Original**: For each of the n nodes, iterates through all m edges → O(n*m) complexity
- **Optimized**: One pass through edges to build the set O(m), then one pass through nodes with O(1) lookups → O(n+m) complexity

**Performance Impact**: The 238x speedup (from 101ms to 420μs) demonstrates the dramatic difference between quadratic and linear algorithms. This improvement scales exponentially with input size - larger graphs will see even greater speedups.

**Test Case Analysis**: The optimization excels across all scenarios:
- Small graphs (2-3 nodes): Minimal overhead from set creation
- Large linear chains (1000 nodes): Massive improvement due to eliminated redundant edge scanning
- Dense graphs with many edges: Set lookup remains O(1) regardless of edge count
- Edge cases (empty graphs, cycles): Maintains correctness while improving performance

The optimization is particularly valuable for graph analysis workflows where this function might be called repeatedly on large datasets.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 December 18, 2025 01:06
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 18, 2025
@KRRT7 KRRT7 closed this Dec 18, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-find_last_node-mjaqp5q2 branch December 18, 2025 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants