Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 18, 2025

📄 20% (0.20x) speedup for prepend_scheme_and_validate_url in skyvern/utils/url_validators.py

⏱️ Runtime : 14.0 milliseconds 11.7 milliseconds (best of 32 runs)

📝 Explanation and details

The optimization replaces urlparse() with urlsplit(), delivering a 19% speedup by using a more efficient URL parsing function.

Key Change:

  • Switched from urlparse() to urlsplit(): Both functions parse URLs and extract the scheme component, but urlsplit() is optimized for cases where you only need basic URL components (scheme, netloc, path, query, fragment) without further parsing the netloc into username, password, hostname, and port.

Why This Works:

  • The function only needs to access parsed_url.scheme to check if a scheme exists and validate it's HTTP/HTTPS
  • urlparse() does additional parsing work that's unnecessary here, creating a more complex internal structure
  • urlsplit() provides the same .scheme attribute but with less computational overhead
  • Line profiler shows the parsing line dropped from 63.6% to 57.3% of total execution time

Performance Impact:
Based on the function references, this optimization is valuable because:

  • Page Navigation: Called in skyvern_page.py for every page navigation, making it a hot path for browser automation
  • API Endpoints: Used in login workflows where multiple URLs (main URL, TOTP URL, webhook URL) are validated per request
  • Batch Processing: Test results show 19-23% improvements for large batches of URLs, indicating the optimization scales well

Best For:
The optimization performs consistently well across all test cases, with particularly strong gains (20-30%) for URLs without schemes that require the https:// prepending operation, which represents a common use case in web automation scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 5292 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from urllib.parse import urlparse

# imports
import pytest
from pydantic import HttpUrl, ValidationError
from skyvern.utils.url_validators import prepend_scheme_and_validate_url

# Custom exception as per the function's implementation
class InvalidUrl(Exception):
    def __init__(self, url):
        super().__init__(f"Invalid URL: {url}")
        self.url = url
from skyvern.utils.url_validators import prepend_scheme_and_validate_url

# unit tests

# --------------------------
# 1. Basic Test Cases
# --------------------------

def test_valid_http_url():
    # Should accept a valid http URL as-is
    url = "http://example.com"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 19.4μs -> 15.8μs (22.8% faster)

def test_valid_https_url():
    # Should accept a valid https URL as-is
    url = "https://example.com"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 13.3μs -> 11.8μs (12.0% faster)

def test_url_without_scheme():
    # Should prepend https to a URL with no scheme
    url = "example.com"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 10.1μs -> 8.25μs (22.5% faster)

def test_url_with_path_and_no_scheme():
    # Should prepend https to a URL with path and no scheme
    url = "example.com/test"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 10.2μs -> 8.26μs (23.4% faster)

def test_url_with_query_and_no_scheme():
    # Should prepend https to a URL with query and no scheme
    url = "example.com/test?q=1"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 11.1μs -> 9.30μs (19.2% faster)

def test_url_with_fragment_and_no_scheme():
    # Should prepend https to a URL with fragment and no scheme
    url = "example.com/test#frag"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 10.7μs -> 9.04μs (18.8% faster)

def test_url_with_www_and_no_scheme():
    # Should prepend https to a www URL with no scheme
    url = "www.example.com"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 10.2μs -> 8.71μs (17.5% faster)

def test_url_with_subdomain_and_no_scheme():
    # Should prepend https to a subdomain URL with no scheme
    url = "sub.example.com"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 17.7μs -> 15.9μs (11.5% faster)

# --------------------------
# 2. Edge Test Cases
# --------------------------

def test_empty_string():
    # Should return empty string for empty input
    url = ""
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 317ns -> 326ns (2.76% slower)

def test_url_with_ip_and_no_scheme():
    # Should prepend https to an IP address
    url = "192.168.1.1"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 20.5μs -> 18.1μs (13.2% faster)

def test_url_with_ipv6_and_no_scheme():
    # Should prepend https to IPv6 address
    url = "[2001:db8::1]"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 12.6μs -> 11.5μs (9.94% faster)

def test_url_with_ipv6_and_port():
    # Should prepend https to IPv6 address with port
    url = "[2001:db8::1]:8080"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 11.5μs -> 10.4μs (10.6% faster)

def test_url_with_unicode_domain():
    # Should accept valid unicode domain
    url = "例子.测试"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 15.0μs -> 13.7μs (10.0% faster)

def test_url_with_punycode_domain():
    # Should accept valid punycode domain
    url = "xn--fsqu00a.xn--0zwm56d"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 12.5μs -> 11.4μs (10.3% faster)

def test_url_with_long_path():
    # Should accept long path
    url = "example.com/" + "a" * 200
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 11.4μs -> 10.4μs (9.38% faster)

def test_url_with_long_query():
    # Should accept long query string
    url = "example.com?" + "q=" + "x" * 200
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 12.2μs -> 11.0μs (10.9% faster)

def test_url_with_long_fragment():
    # Should accept long fragment
    url = "example.com#" + "frag" * 50
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 12.3μs -> 10.8μs (14.1% faster)

def test_url_with_dot_at_end():
    # Should accept domain with trailing dot
    url = "example.com."
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 10.2μs -> 8.96μs (13.5% faster)

def test_url_with_multiple_subdomains():
    # Should accept multiple subdomains
    url = "a.b.c.d.example.com"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 10.2μs -> 9.14μs (11.4% faster)

def test_url_with_dash_in_domain():
    # Should accept domain with dash
    url = "my-site.com"
    codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 9.92μs -> 8.87μs (11.9% faster)

def test_large_batch_of_valid_urls():
    # Should process a large batch of valid URLs efficiently
    domains = [f"site{i}.com" for i in range(500)]
    urls = [f"https://{domain}" for domain in domains]
    for url in urls:
        codeflash_output = prepend_scheme_and_validate_url(url) # 1.69ms -> 1.41ms (19.2% faster)

def test_large_batch_of_urls_without_scheme():
    # Should prepend https to a large batch of URLs without scheme
    domains = [f"site{i}.com" for i in range(500)]
    for domain in domains:
        codeflash_output = prepend_scheme_and_validate_url(domain); result = codeflash_output # 1.33ms -> 1.08ms (23.2% faster)

def test_large_batch_of_empty_strings():
    # Should return empty string for a large batch of empty strings
    empty_urls = [""] * 500
    for url in empty_urls:
        codeflash_output = prepend_scheme_and_validate_url(url) # 44.0μs -> 43.4μs (1.44% faster)

def test_large_batch_with_mixed_validity():
    # Should process a mixed batch and raise/return appropriately
    urls = []
    expected = []
    for i in range(250):
        urls.append(f"site{i}.com")  # valid, no scheme
        expected.append(f"https://site{i}.com")
    for i in range(250):
        urls.append(f"ftp://site{i}.com")  # invalid
        expected.append(InvalidUrl)
    for url, exp in zip(urls, expected):
        if exp is InvalidUrl:
            with pytest.raises(InvalidUrl):
                prepend_scheme_and_validate_url(url)
        else:
            codeflash_output = prepend_scheme_and_validate_url(url)

def test_large_batch_of_long_urls():
    # Should process long URLs efficiently
    base = "example.com/"
    long_paths = [base + "a" * i for i in range(1, 501)]
    for url in long_paths:
        codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 2.06ms -> 1.82ms (13.0% faster)

def test_large_batch_of_unicode_domains():
    # Should process unicode domains efficiently
    domains = [f"例子{i}.测试" for i in range(500)]
    for domain in domains:
        codeflash_output = prepend_scheme_and_validate_url(domain); result = codeflash_output # 1.58ms -> 1.33ms (18.8% faster)

def test_large_batch_of_urls_with_query_and_fragment():
    # Should process URLs with query and fragment efficiently
    urls = [f"site{i}.com/path?q={i}#frag{i}" for i in range(500)]
    for url in urls:
        codeflash_output = prepend_scheme_and_validate_url(url); result = codeflash_output # 1.46ms -> 1.21ms (20.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from urllib.parse import urlparse

# imports
import pytest
from pydantic import HttpUrl, ValidationError
from skyvern.utils.url_validators import prepend_scheme_and_validate_url

# Custom exception as per the function definition
class InvalidUrl(Exception):
    def __init__(self, url):
        self.url = url
        super().__init__(f"Invalid URL: {url}")
from skyvern.utils.url_validators import prepend_scheme_and_validate_url

# unit tests

# ------------------ BASIC TEST CASES ------------------

def test_empty_string_returns_empty():
    # Should return empty string when input is empty
    codeflash_output = prepend_scheme_and_validate_url("") # 321ns -> 266ns (20.7% faster)

def test_valid_http_url():
    # Should accept a valid http URL and return unchanged
    url = "http://example.com"
    codeflash_output = prepend_scheme_and_validate_url(url) # 18.0μs -> 15.0μs (19.5% faster)

def test_valid_https_url():
    # Should accept a valid https URL and return unchanged
    url = "https://example.com"
    codeflash_output = prepend_scheme_and_validate_url(url) # 12.7μs -> 10.9μs (17.0% faster)

def test_url_without_scheme_prepends_https():
    # Should prepend https if scheme is missing
    url = "example.com"
    expected = "https://example.com"
    codeflash_output = prepend_scheme_and_validate_url(url) # 10.6μs -> 8.02μs (31.6% faster)

def test_url_with_path_and_no_scheme():
    # Should prepend https and preserve path
    url = "example.com/path/to/resource"
    expected = "https://example.com/path/to/resource"
    codeflash_output = prepend_scheme_and_validate_url(url) # 10.5μs -> 9.28μs (13.0% faster)

def test_url_with_query_and_no_scheme():
    # Should prepend https and preserve query string
    url = "example.com/search?q=pytest"
    expected = "https://example.com/search?q=pytest"
    codeflash_output = prepend_scheme_and_validate_url(url) # 11.3μs -> 9.73μs (16.1% faster)

def test_url_with_fragment_and_no_scheme():
    # Should prepend https and preserve fragment
    url = "example.com/page#section"
    expected = "https://example.com/page#section"
    codeflash_output = prepend_scheme_and_validate_url(url) # 11.0μs -> 9.46μs (15.8% faster)

def test_url_with_www_and_no_scheme():
    # Should prepend https and preserve www
    url = "www.example.com"
    expected = "https://www.example.com"
    codeflash_output = prepend_scheme_and_validate_url(url) # 10.8μs -> 9.00μs (19.8% faster)

def test_url_with_unicode_path():
    # Should accept unicode in path (not domain)
    url = "example.com/über"
    expected = "https://example.com/über"
    codeflash_output = prepend_scheme_and_validate_url(url) # 18.7μs -> 16.5μs (13.5% faster)

def test_url_with_long_subdomain():
    # Should accept long but valid subdomain
    url = "subdomain.subdomain.subdomain.example.com"
    expected = "https://subdomain.subdomain.subdomain.example.com"
    codeflash_output = prepend_scheme_and_validate_url(url) # 11.7μs -> 10.3μs (13.6% faster)

def test_url_with_ipv4():
    # Should accept IPv4 address
    url = "192.168.1.1"
    expected = "https://192.168.1.1"
    codeflash_output = prepend_scheme_and_validate_url(url) # 13.2μs -> 11.6μs (14.1% faster)

def test_url_with_ipv6():
    # Should accept IPv6 address in brackets
    url = "[2001:db8::1]"
    expected = "https://[2001:db8::1]"
    codeflash_output = prepend_scheme_and_validate_url(url) # 11.8μs -> 10.6μs (11.6% faster)

def test_url_with_ipv6_and_port():
    # Should accept IPv6 with port
    url = "[2001:db8::1]:8080"
    expected = "https://[2001:db8::1]:8080"
    codeflash_output = prepend_scheme_and_validate_url(url) # 11.5μs -> 9.93μs (16.0% faster)

def test_url_with_trailing_dot():
    # Should accept valid domain with trailing dot
    url = "example.com."
    expected = "https://example.com."
    codeflash_output = prepend_scheme_and_validate_url(url) # 10.7μs -> 9.28μs (14.9% faster)

def test_url_with_dash_in_domain():
    # Should accept domain with dash
    url = "my-site.com"
    expected = "https://my-site.com"
    codeflash_output = prepend_scheme_and_validate_url(url) # 10.8μs -> 9.14μs (17.8% faster)

def test_url_with_multiple_slashes():
    # Should accept multiple slashes after domain
    url = "example.com///path"
    expected = "https://example.com///path"
    codeflash_output = prepend_scheme_and_validate_url(url) # 17.9μs -> 15.9μs (12.6% faster)

def test_url_with_leading_and_trailing_whitespace():
    # Should strip whitespace and prepend https
    url = "  example.com  "
    expected = "https://example.com"
    codeflash_output = prepend_scheme_and_validate_url(url.strip()) # 8.68μs -> 5.97μs (45.3% faster)

def test_url_with_punycode_domain():
    # Should accept punycode domain
    url = "xn--exmple-cua.com"
    expected = "https://xn--exmple-cua.com"
    codeflash_output = prepend_scheme_and_validate_url(url) # 13.9μs -> 12.2μs (13.8% faster)

# ------------------ LARGE SCALE TEST CASES ------------------

def test_many_valid_urls():
    # Test a list of 1000 valid URLs without scheme
    base = "test{}.example.com"
    urls = [base.format(i) for i in range(1000)]
    for u in urls:
        expected = f"https://{u}"
        codeflash_output = prepend_scheme_and_validate_url(u) # 2.68ms -> 2.19ms (22.2% faster)

def test_mixed_valid_and_invalid_urls():
    # Interleaved valid and invalid URLs
    valid = [f"good{i}.com" for i in range(500)]
    invalid = [f"bad{i}!.com" for i in range(500)]
    for i in range(500):
        codeflash_output = prepend_scheme_and_validate_url(valid[i])
        with pytest.raises(InvalidUrl):
            prepend_scheme_and_validate_url(invalid[i])

def test_long_url_path():
    # Test a very long but valid path
    path = "/".join(["a"] * 900)
    url = f"example.com/{path}"
    expected = f"https://example.com/{path}"
    codeflash_output = prepend_scheme_and_validate_url(url) # 27.0μs -> 24.8μs (8.81% faster)

def test_large_input_with_whitespace_and_valid():
    # Test 1000 URLs with whitespace that are valid after stripping
    urls = [f"  test{i}.com  " for i in range(1000)]
    for u in urls:
        expected = f"https://{u.strip()}"
        codeflash_output = prepend_scheme_and_validate_url(u.strip()) # 2.70ms -> 2.18ms (23.4% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-prepend_scheme_and_validate_url-mjasqfa7 and push.

Codeflash Static Badge

The optimization replaces `urlparse()` with `urlsplit()`, delivering a **19% speedup** by using a more efficient URL parsing function.

**Key Change:**
- **Switched from `urlparse()` to `urlsplit()`**: Both functions parse URLs and extract the scheme component, but `urlsplit()` is optimized for cases where you only need basic URL components (scheme, netloc, path, query, fragment) without further parsing the netloc into username, password, hostname, and port.

**Why This Works:**
- The function only needs to access `parsed_url.scheme` to check if a scheme exists and validate it's HTTP/HTTPS
- `urlparse()` does additional parsing work that's unnecessary here, creating a more complex internal structure
- `urlsplit()` provides the same `.scheme` attribute but with less computational overhead
- Line profiler shows the parsing line dropped from 63.6% to 57.3% of total execution time

**Performance Impact:**
Based on the function references, this optimization is valuable because:
- **Page Navigation**: Called in `skyvern_page.py` for every page navigation, making it a hot path for browser automation
- **API Endpoints**: Used in login workflows where multiple URLs (main URL, TOTP URL, webhook URL) are validated per request
- **Batch Processing**: Test results show 19-23% improvements for large batches of URLs, indicating the optimization scales well

**Best For:**
The optimization performs consistently well across all test cases, with particularly strong gains (20-30%) for URLs without schemes that require the `https://` prepending operation, which represents a common use case in web automation scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 18, 2025 02:03
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant