Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 18, 2025

📄 43% (0.43x) speedup for create_named_temporary_file in skyvern/forge/sdk/api/files.py

⏱️ Runtime : 4.93 milliseconds 3.45 milliseconds (best of 122 runs)

📝 Explanation and details

The optimized code achieves a 42% speedup through two targeted optimizations that address the most expensive operations identified in the profiler:

Key Optimizations:

  1. Precomputed character set in sanitize_filename: The original code recreated a list ["-", "_", ".", "%", " "] on every call (line taking 5798ns per hit). The optimization precomputes this as a module-level set _SANITIZE_ALLOWED, reducing lookup time from O(n) list scanning to O(1) set membership. Additionally, switching from a generator expression to list comprehension provides ~10% better performance for string processing.

  2. Fast-path directory existence check in create_folder_if_not_exist: The original code always called Path.mkdir() which involves filesystem operations (79.9% of function time). The optimization adds os.path.isdir() as a fast-path check, avoiding expensive mkdir calls when the directory already exists. This is particularly impactful since the profiler shows this function consuming 36% of total runtime in create_named_temporary_file.

Performance Impact Analysis:

Based on function references, create_named_temporary_file is called in critical paths including:

  • S3 file downloads for browser automation workflows
  • Browser session/profile storage operations that run during workflow execution
  • Temporary file creation for zip operations during artifact storage

The test results show consistent speedups across all scenarios:

  • 68-71% faster for named file creation (most common use case)
  • 17-37% faster for random temp files
  • 44-71% faster for bulk operations (100+ files)

The optimizations are most effective when:

  • Creating files with custom names (avoids repeated sanitization overhead)
  • Working with existing temp directories (skips mkdir calls)
  • Processing in bulk (amortizes the precomputed set benefits)

These improvements directly benefit the browser automation workflow where temporary files are frequently created for downloads, session management, and artifact processing.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 272 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import os
import random
import shutil
import string
import tempfile
# function to test
import types
from pathlib import Path

import pytest
from skyvern.forge.sdk.api.files import create_named_temporary_file

# --- BEGIN FUNCTION DEFINITION (copied from prompt) ---
class Settings:
    # Dummy settings for testing
    TEMP_PATH = None

class DummyConfig:
    settings = Settings()

settings = DummyConfig.settings
from skyvern.forge.sdk.api.files import create_named_temporary_file

# --- Basic Test Cases ---

def test_create_temp_file_default():
    """Test creating a temp file with default arguments."""
    codeflash_output = create_named_temporary_file(); temp_file = codeflash_output # 64.6μs -> 55.0μs (17.6% faster)
    temp_file.write(b"abc")
    temp_file.flush()
    # Data should be written
    with open(temp_file.name, "rb") as f:
        pass
    temp_file.close()

def test_create_temp_file_with_delete_false():
    """Test creating a temp file with delete=False."""
    codeflash_output = create_named_temporary_file(delete=False); temp_file = codeflash_output # 47.8μs -> 37.6μs (26.9% faster)
    temp_file.write(b"data")
    temp_file.flush()
    name = temp_file.name
    temp_file.close()
    # Clean up
    os.remove(name)

def test_create_temp_file_with_filename():
    """Test creating a temp file with a specific file name."""
    fname = "my_test_file.txt"
    codeflash_output = create_named_temporary_file(file_name=fname); temp_file = codeflash_output # 36.1μs -> 21.4μs (68.9% faster)
    temp_file.write(b"hello")
    temp_file.close()

def test_create_temp_file_with_filename_and_delete_false():
    """Test creating a temp file with a specific file name and delete=False."""
    fname = "keep_this.txt"
    codeflash_output = create_named_temporary_file(file_name=fname, delete=False); temp_file = codeflash_output # 35.4μs -> 21.1μs (67.9% faster)
    temp_file.write(b"persist")
    temp_file.close()
    # Clean up
    os.remove(temp_file.name)

# --- Edge Test Cases ---

def test_create_temp_file_with_empty_filename():
    """Test passing an empty string as filename."""
    codeflash_output = create_named_temporary_file(file_name=""); temp_file = codeflash_output # 67.1μs -> 53.9μs (24.4% faster)
    temp_file.close()

def test_create_temp_file_when_folder_does_not_exist(tmp_path):
    """Test that the function creates the temp folder if it doesn't exist."""
    # Remove the temp dir
    custom_temp = tmp_path / "not_exist_dir"
    settings.TEMP_PATH = str(custom_temp)
    codeflash_output = create_named_temporary_file(); temp_file = codeflash_output # 50.1μs -> 40.8μs (23.0% faster)
    temp_file.close()

def test_create_temp_file_with_non_str_filename():
    """Test passing a non-string filename raises TypeError."""
    with pytest.raises(TypeError):
        create_named_temporary_file(file_name=12345) # 19.2μs -> 5.92μs (224% faster)

def test_create_temp_file_with_none_filename():
    """Test passing None as filename returns a random temp file."""
    codeflash_output = create_named_temporary_file(file_name=None); temp_file = codeflash_output # 62.0μs -> 51.6μs (20.2% faster)
    temp_file.close()

# --- Large Scale Test Cases ---

def test_create_many_temp_files():
    """Test creating many temp files in a loop for resource leaks and collisions."""
    files = []
    for i in range(50):
        codeflash_output = create_named_temporary_file(file_name=f"file_{i}.txt", delete=False); temp_file = codeflash_output # 697μs -> 483μs (44.1% faster)
        temp_file.write(f"data_{i}".encode())
        temp_file.close()
        files.append(temp_file.name)
    # All files should exist and have correct data
    for i, fname in enumerate(files):
        with open(fname, "rb") as f:
            pass
        os.remove(fname)

def test_create_large_temp_file():
    """Test writing a large amount of data to a temp file."""
    codeflash_output = create_named_temporary_file(delete=False); temp_file = codeflash_output # 54.0μs -> 42.8μs (26.1% faster)
    data = os.urandom(1024 * 512)  # 512KB
    temp_file.write(data)
    temp_file.flush()
    temp_file.close()
    os.remove(temp_file.name)

def test_create_temp_files_with_similar_names():
    """Test that files with similar names do not overwrite each other."""
    names = [f"testfile.txt", f"testfile.txt ", f"testfile .txt"]
    paths = []
    for name in names:
        codeflash_output = create_named_temporary_file(file_name=name, delete=False); temp_file = codeflash_output # 76.3μs -> 48.6μs (57.0% faster)
        paths.append(temp_file.name)
        temp_file.write(b"x")
        temp_file.close()
    for path in paths:
        os.remove(path)

def test_create_temp_file_with_max_allowed_length():
    """Test creating a file with filename at OS max length (usually 255)."""
    max_len = 255 - len(".txt")
    fname = "a" * max_len + ".txt"
    codeflash_output = create_named_temporary_file(file_name=fname); temp_file = codeflash_output # 37.2μs -> 24.9μs (49.1% faster)
    temp_file.close()
import os
import shutil
# Patch settings for test isolation
import sys
import tempfile
# function to test
# (pasted from your code block above)
import types
from pathlib import Path

# imports
import pytest  # used for our unit tests
from skyvern.forge.sdk.api.files import create_named_temporary_file

class DummySettings:
    # Use a subfolder in the system temp dir for isolation
    TEMP_PATH = os.path.join(tempfile.gettempdir(), "skyvern_test_temp")

module_name = "skyvern.config"
sys.modules[module_name] = types.SimpleNamespace(settings=DummySettings())
from skyvern.forge.sdk.api.files import create_named_temporary_file

# 1. Basic Test Cases

def test_create_temp_file_default_name_and_delete_true():
    """Test basic creation with default arguments."""
    codeflash_output = create_named_temporary_file(); tmp = codeflash_output # 60.6μs -> 50.7μs (19.7% faster)
    tmp.write(b"hello world")
    tmp.close()

def test_create_temp_file_with_custom_name_and_delete_true():
    """Test creation with a custom filename and delete=True."""
    filename = "customfile123.txt"
    codeflash_output = create_named_temporary_file(file_name=filename); tmp = codeflash_output # 37.4μs -> 24.4μs (53.5% faster)
    tmp.write(b"test data")
    tmp.close()

def test_create_temp_file_with_custom_name_and_delete_false():
    """Test creation with a custom filename and delete=False."""
    filename = "myfile.txt"
    codeflash_output = create_named_temporary_file(delete=False, file_name=filename); tmp = codeflash_output # 31.9μs -> 21.3μs (49.7% faster)
    tmp.write(b"persisted data")
    tmp.close()
    # Clean up
    os.remove(tmp.name)

def test_create_temp_file_with_spaces_in_filename():
    """Test creation with spaces in filename."""
    filename = "my file with spaces.txt"
    codeflash_output = create_named_temporary_file(file_name=filename); tmp = codeflash_output # 34.2μs -> 20.9μs (63.9% faster)
    tmp.close()

def test_create_temp_file_with_empty_filename():
    """Test creation with empty filename (should fallback to random name)."""
    codeflash_output = create_named_temporary_file(file_name=""); tmp = codeflash_output # 68.0μs -> 53.7μs (26.8% faster)
    tmp.close()

def test_create_temp_file_with_none_filename():
    """Test creation with None filename (should fallback to random name)."""
    codeflash_output = create_named_temporary_file(file_name=None); tmp = codeflash_output # 49.8μs -> 36.3μs (37.1% faster)
    tmp.close()

def test_create_temp_file_in_nonexistent_directory():
    """Test creation when temp dir does not exist (should be created)."""
    temp_dir = DummySettings.TEMP_PATH
    # Remove temp dir if exists
    if os.path.exists(temp_dir):
        shutil.rmtree(temp_dir)
    codeflash_output = create_named_temporary_file(); tmp = codeflash_output # 64.5μs -> 52.7μs (22.4% faster)
    tmp.close()

def test_create_many_temp_files_with_unique_names():
    """Test creation of many temp files with unique names."""
    file_objs = []
    names = set()
    for i in range(100):  # Reasonable scale
        filename = f"file_{i}.txt"
        codeflash_output = create_named_temporary_file(file_name=filename, delete=False); tmp = codeflash_output # 1.37ms -> 797μs (71.5% faster)
        file_objs.append(tmp)
        names.add(tmp.name)
        tmp.write(f"data {i}".encode())
    # All files should exist
    for tmp in file_objs:
        tmp.close()
        os.remove(tmp.name)

def test_create_many_temp_files_with_random_names():
    """Test creation of many temp files with random names (no file_name)."""
    file_objs = []
    names = set()
    for i in range(100):  # Reasonable scale
        codeflash_output = create_named_temporary_file(delete=False); tmp = codeflash_output # 1.88ms -> 1.45ms (29.2% faster)
        file_objs.append(tmp)
        names.add(tmp.name)
        tmp.write(b"random data")
    for tmp in file_objs:
        tmp.close()
        os.remove(tmp.name)

def test_create_large_file_and_check_integrity():
    """Test writing a large file and reading it back."""
    filename = "large_file.bin"
    codeflash_output = create_named_temporary_file(file_name=filename, delete=False); tmp = codeflash_output # 43.8μs -> 28.4μs (54.5% faster)
    data = b"x" * 1024 * 512  # 512 KB
    tmp.write(data)
    tmp.close()
    # Read back and check integrity
    with open(tmp.name, "rb") as f:
        read_data = f.read()
    os.remove(tmp.name)

To edit these changes git checkout codeflash/optimize-create_named_temporary_file-mjar0mco and push.

Codeflash Static Badge

The optimized code achieves a **42% speedup** through two targeted optimizations that address the most expensive operations identified in the profiler:

**Key Optimizations:**

1. **Precomputed character set in `sanitize_filename`**: The original code recreated a list `["-", "_", ".", "%", " "]` on every call (line taking 5798ns per hit). The optimization precomputes this as a module-level set `_SANITIZE_ALLOWED`, reducing lookup time from O(n) list scanning to O(1) set membership. Additionally, switching from a generator expression to list comprehension provides ~10% better performance for string processing.

2. **Fast-path directory existence check in `create_folder_if_not_exist`**: The original code always called `Path.mkdir()` which involves filesystem operations (79.9% of function time). The optimization adds `os.path.isdir()` as a fast-path check, avoiding expensive `mkdir` calls when the directory already exists. This is particularly impactful since the profiler shows this function consuming 36% of total runtime in `create_named_temporary_file`.

**Performance Impact Analysis:**

Based on function references, `create_named_temporary_file` is called in critical paths including:
- S3 file downloads for browser automation workflows  
- Browser session/profile storage operations that run during workflow execution
- Temporary file creation for zip operations during artifact storage

The test results show consistent speedups across all scenarios:
- **68-71% faster** for named file creation (most common use case)
- **17-37% faster** for random temp files  
- **44-71% faster** for bulk operations (100+ files)

The optimizations are most effective when:
- Creating files with custom names (avoids repeated sanitization overhead)
- Working with existing temp directories (skips mkdir calls)
- Processing in bulk (amortizes the precomputed set benefits)

These improvements directly benefit the browser automation workflow where temporary files are frequently created for downloads, session management, and artifact processing.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 18, 2025 01:15
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant