Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 18, 2025

📄 14,134% (141.34x) speedup for find_last_node in src/algorithms/graph.py

⏱️ Runtime : 66.0 milliseconds 464 microseconds (best of 250 runs)

📝 Explanation and details

The optimization transforms an O(n*m) algorithm into O(n+m) by replacing nested iteration with set-based membership testing.

Key optimization: The original code uses all(e["source"] != n["id"] for e in edges) for each node, creating a nested loop that checks every edge for every node. The optimized version pre-computes sources = {e["source"] for e in edges} once, then uses fast set membership (n["id"] not in sources) for each node check.

Performance impact:

  • Time complexity: Reduced from O(nodes × edges) to O(nodes + edges)
  • Concrete speedup: 141x faster (66ms → 0.464ms)
  • Memory trade-off: Uses O(unique_sources) additional memory for the set

Why this works: Python sets use hash tables for O(1) average-case membership testing, while the original all() with generator requires O(edges) time per node. For graphs with many edges, this difference compounds significantly.

Test case analysis: The optimization excels particularly on large-scale test cases like test_large_linear_chain and test_large_star_topology with 1000 nodes/edges, where the quadratic behavior of the original becomes prohibitive. Basic cases with few nodes/edges see modest improvements, but the algorithmic advantage scales with input size.

This is especially valuable for graph algorithms where edge lists can be substantial, making the function suitable for production graph processing workflows.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 40 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# --------------------------
# Basic Test Cases
# --------------------------


def test_single_node_no_edges():
    # Only one node, no edges: should return the node itself
    nodes = [{"id": 1, "data": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_two_nodes_one_edge():
    # Two nodes, one edge from node 1 to node 2: node 2 is last
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_three_nodes_linear_chain():
    # Three nodes in a chain: 1->2->3, node 3 is last
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_last_nodes_returns_first():
    # Two nodes with no outgoing edges (2 and 3): returns the first found (2)
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# --------------------------
# Edge Test Cases
# --------------------------


def test_no_nodes():
    # Empty nodes list: should return None
    nodes = []
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_no_edges_with_multiple_nodes():
    # Multiple nodes, no edges: returns the first node
    nodes = [{"id": 10}, {"id": 20}, {"id": 30}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_all_nodes_have_outgoing_edges():
    # All nodes have outgoing edges: should return None
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [
        {"source": 1, "target": 2},
        {"source": 2, "target": 3},
        {"source": 3, "target": 1},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_self_loop():
    # Node with a self-loop is not a last node
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_multiple_incoming_edges():
    # Node 3 has multiple incoming edges, but no outgoing: should be last
    nodes = [{"id": 1}, {"id": 2}, {"id": 3}]
    edges = [{"source": 1, "target": 3}, {"source": 2, "target": 3}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_non_integer_ids():
    # Node IDs as strings
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edges_with_extra_fields():
    # Edges may have extra fields, should be ignored
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2, "weight": 5}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_extra_fields():
    # Nodes may have extra fields, should be returned as-is
    nodes = [{"id": 1, "label": "start"}, {"id": 2, "label": "end"}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_id_not_in_edges():
    # Edges refer to node IDs not present in nodes: should not affect result
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 3, "target": 1}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_none_as_node_id():
    # Node id is None
    nodes = [{"id": None}, {"id": 2}]
    edges = [{"source": 2, "target": None}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_duplicate_node_ids():
    # Duplicate node IDs: should return the first one that matches the criteria
    nodes = [{"id": 1}, {"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# --------------------------
# Large Scale Test Cases
# --------------------------


def test_large_linear_chain():
    # 1000 nodes in a chain: last node is the last in the list
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_star_topology():
    # One central node with outgoing edges to all others: all leaves are last nodes, returns the first leaf
    N = 1000
    nodes = [{"id": 0}] + [{"id": i} for i in range(1, N)]
    edges = [{"source": 0, "target": i} for i in range(1, N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_no_edges():
    # 1000 nodes, no edges: returns the first node
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_all_nodes_have_outgoing_edges():
    # 1000 nodes, each has an outgoing edge (circular): should return None
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": (i + 1) % N} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_multiple_last_nodes():
    # 1000 nodes, first 500 have outgoing edges, last 500 don't: returns first of the last 500
    N = 1000
    nodes = [{"id": i} for i in range(N)]
    edges = [{"source": i, "target": i + 1} for i in range(500)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

# imports
import pytest  # used for our unit tests
from src.algorithms.graph import find_last_node

# unit tests

# ---------------------------
# 1. Basic Test Cases
# ---------------------------


def test_single_node_no_edges():
    # One node, no edges; should return the node itself
    nodes = [{"id": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_two_nodes_one_edge():
    # Two nodes, one edge from A to B; B should be last node
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_three_nodes_linear_chain():
    # A -> B -> C; C should be last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_nodes_multiple_edges():
    # A -> B, B -> C, D (no incoming edges); D should be last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# ---------------------------
# 2. Edge Test Cases
# ---------------------------


def test_empty_nodes_and_edges():
    # No nodes, no edges; should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_no_edges():
    # Multiple nodes, no edges; should return the first node (by function logic)
    nodes = [{"id": "X"}, {"id": "Y"}, {"id": "Z"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_cycle_graph():
    # Cycle: A -> B -> C -> A; no "last" node, should return None
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
        {"source": "C", "target": "A"},
    ]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_multiple_last_nodes():
    # Disconnected: A -> B, C (no edges), D (no edges); should return C (first node with no outgoing edges)
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_self_loop():
    # Node with edge to itself; should return None
    nodes = [{"id": "A"}]
    edges = [{"source": "A", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_multiple_outgoing_edges():
    # A -> B, A -> C; B and C are candidates, should return B (first found)
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "A", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_multiple_incoming_edges():
    # B <- A, B <- C; B should be last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "C", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edge_with_nonexistent_source():
    # Edge from non-existent node; should ignore and find last node normally
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "X", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_edge_with_nonexistent_target():
    # Edge to non-existent node; should ignore and find last node normally
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "X"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_node_with_id_none():
    # Node with id None; should be handled correctly
    nodes = [{"id": None}, {"id": "A"}]
    edges = [{"source": "A", "target": None}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_nodes_with_duplicate_ids():
    # Two nodes with same id; function should return the first one with no outgoing edges
    nodes = [{"id": "A"}, {"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# ---------------------------
# 3. Large Scale Test Cases
# ---------------------------


def test_large_linear_chain():
    # 1000 nodes in a chain; last node should be last in list
    N = 1000
    nodes = [{"id": str(i)} for i in range(N)]
    edges = [{"source": str(i), "target": str(i + 1)} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_star_graph():
    # One central node with edges to 999 others; all others should be last nodes, first one found returned
    N = 1000
    nodes = [{"id": "center"}] + [{"id": str(i)} for i in range(N - 1)]
    edges = [{"source": "center", "target": str(i)} for i in range(N - 1)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_disconnected_nodes():
    # 1000 nodes, no edges; should return the first node
    N = 1000
    nodes = [{"id": str(i)} for i in range(N)]
    edges = []
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_with_multiple_last_nodes():
    # 500 edges connecting first 500 nodes in a chain, 500 isolated nodes; should return first isolated node
    N = 1000
    nodes = [{"id": str(i)} for i in range(N)]
    edges = [{"source": str(i), "target": str(i + 1)} for i in range(499)]
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


def test_large_graph_performance():
    # 1000 nodes, random edges; should not raise or hang
    import random

    N = 1000
    nodes = [{"id": str(i)} for i in range(N)]
    edges = [
        {
            "source": str(random.randint(0, N - 1)),
            "target": str(random.randint(0, N - 1)),
        }
        for _ in range(999)
    ]
    # Should return a node with no outgoing edges, or None if all nodes have outgoing edges
    codeflash_output = find_last_node(nodes, edges)
    result = codeflash_output


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-mjar20i0 and push.

Codeflash Static Badge

The optimization transforms an O(n*m) algorithm into O(n+m) by replacing nested iteration with set-based membership testing.

**Key optimization**: The original code uses `all(e["source"] != n["id"] for e in edges)` for each node, creating a nested loop that checks every edge for every node. The optimized version pre-computes `sources = {e["source"] for e in edges}` once, then uses fast set membership (`n["id"] not in sources`) for each node check.

**Performance impact**: 
- **Time complexity**: Reduced from O(nodes × edges) to O(nodes + edges)
- **Concrete speedup**: 141x faster (66ms → 0.464ms)
- **Memory trade-off**: Uses O(unique_sources) additional memory for the set

**Why this works**: Python sets use hash tables for O(1) average-case membership testing, while the original `all()` with generator requires O(edges) time per node. For graphs with many edges, this difference compounds significantly.

**Test case analysis**: The optimization excels particularly on large-scale test cases like `test_large_linear_chain` and `test_large_star_topology` with 1000 nodes/edges, where the quadratic behavior of the original becomes prohibitive. Basic cases with few nodes/edges see modest improvements, but the algorithmic advantage scales with input size.

This is especially valuable for graph algorithms where edge lists can be substantial, making the function suitable for production graph processing workflows.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 December 18, 2025 01:16
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 18, 2025
@KRRT7 KRRT7 closed this Dec 18, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-find_last_node-mjar20i0 branch December 18, 2025 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants