Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 17, 2025

📄 68% (0.68x) speedup for _is_valid_proxy_url in skyvern/webeye/browser_factory.py

⏱️ Runtime : 6.98 milliseconds 4.16 milliseconds (best of 195 runs)

📝 Explanation and details

The optimization achieves a 67% speedup through two key changes:

  1. Regex Compilation Hoisting: Moving PROXY_PATTERN compilation outside the function eliminates repeated compilation overhead. The original code recompiled the regex on every call (11% of runtime), while the optimized version compiles it once at import time.

  2. Early Rejection with Regex First: The optimized version checks the regex pattern before calling urlparse(). Since urlparse() consumed 82.3% of the original runtime, this reordering provides massive gains for invalid URLs. The regex can quickly reject malformed URLs without the expensive parsing step.

Performance Impact by Test Case:

  • Invalid URLs see dramatic improvements (200-1000% faster): URLs with wrong schemes, missing components, or malformed syntax are rejected immediately by the regex
  • Valid URLs see modest improvements (2-9% faster): These still require urlparse() but benefit from eliminated regex recompilation
  • Large-scale invalid URL tests show the biggest gains (1000%+ faster), demonstrating the optimization's effectiveness when processing many malformed URLs

Real-World Impact: Based on the function reference, _is_valid_proxy_url is called in setup_proxy() to validate proxy URLs from configuration. When proxy pools contain invalid entries (common in real deployments), this optimization will significantly reduce startup time and configuration validation overhead. The function processes each proxy in a list, so the performance gains compound with larger proxy pools or frequent proxy validation.

The optimization maintains identical behavior while dramatically improving performance for the common case of rejecting invalid proxy URLs.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 2474 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import re
from urllib.parse import urlparse

# imports
import pytest  # used for our unit tests
from skyvern.webeye.browser_factory import _is_valid_proxy_url

# unit tests

# 1. Basic Test Cases

def test_valid_http_proxy():
    # Basic http proxy without auth, with port
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:8080") # 11.1μs -> 10.2μs (9.50% faster)

def test_valid_https_proxy():
    # Basic https proxy without auth, with port
    codeflash_output = _is_valid_proxy_url("https://proxy.example.com:443") # 10.2μs -> 9.97μs (2.77% faster)

def test_valid_socks5_proxy():
    # Basic socks5 proxy without auth, with port
    codeflash_output = _is_valid_proxy_url("socks5://proxy.example.com:1080") # 10.2μs -> 9.69μs (5.22% faster)

def test_valid_proxy_with_auth():
    # Proxy with username and password
    codeflash_output = _is_valid_proxy_url("http://user:[email protected]:3128") # 9.05μs -> 8.25μs (9.65% faster)

def test_valid_proxy_with_username_only():
    # Proxy with only username (no password)
    codeflash_output = _is_valid_proxy_url("http://[email protected]:3128") # 8.97μs -> 8.50μs (5.53% faster)

def test_valid_proxy_without_port():
    # Proxy without port (should still be valid)
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com") # 10.2μs -> 9.37μs (8.95% faster)

def test_valid_proxy_with_ip():
    # Proxy with IP address and port
    codeflash_output = _is_valid_proxy_url("http://192.168.1.1:8000") # 10.1μs -> 9.43μs (7.27% faster)

def test_valid_proxy_with_auth_and_ip():
    # Proxy with IP, username, and password
    codeflash_output = _is_valid_proxy_url("https://user:[email protected]:443") # 9.56μs -> 8.78μs (8.89% faster)

# 2. Edge Test Cases

def test_invalid_scheme():
    # Scheme not in allowed list
    codeflash_output = _is_valid_proxy_url("ftp://proxy.example.com:21") # 8.18μs -> 861ns (850% faster)

def test_missing_scheme():
    # No scheme present
    codeflash_output = _is_valid_proxy_url("proxy.example.com:8080") # 7.88μs -> 810ns (873% faster)

def test_missing_netloc():
    # No netloc (host) present
    codeflash_output = _is_valid_proxy_url("http:///") # 7.81μs -> 1.65μs (374% faster)

def test_empty_string():
    # Empty string should not be valid
    codeflash_output = _is_valid_proxy_url("") # 6.08μs -> 767ns (693% faster)

def test_whitespace_string():
    # String with only whitespace
    codeflash_output = _is_valid_proxy_url("   ") # 8.72μs -> 1.01μs (767% faster)

def test_invalid_host_with_space():
    # Host contains a space (invalid)
    codeflash_output = _is_valid_proxy_url("http://proxy .example.com:8080") # 13.4μs -> 3.53μs (280% faster)

def test_invalid_port():
    # Port is not a number
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:port") # 11.9μs -> 3.20μs (273% faster)

def test_negative_port():
    # Port is negative (should be invalid)
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:-8080") # 11.5μs -> 3.23μs (258% faster)

def test_port_too_large():
    # Port is too large (valid by regex, but not by URL standards)
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:99999") # 11.0μs -> 13.5μs (18.1% slower)

def test_ipv6_host():
    # IPv6 should be valid if formatted correctly
    codeflash_output = _is_valid_proxy_url("http://[2001:db8::1]:8080") # 24.7μs -> 2.61μs (848% faster)

def test_username_with_special_chars():
    # Username or password with special chars (except @ and :)
    codeflash_output = _is_valid_proxy_url("http://us!er:pa%[email protected]:8080") # 10.2μs -> 10.2μs (0.615% slower)

def test_auth_with_empty_password():
    # Username with empty password
    codeflash_output = _is_valid_proxy_url("http://user:@proxy.example.com:8080") # 9.81μs -> 9.48μs (3.50% faster)

def test_double_at_symbol():
    # More than one @ in userinfo (should not match)
    codeflash_output = _is_valid_proxy_url("http://user@name:[email protected]:8080") # 10.6μs -> 2.77μs (282% faster)

def test_trailing_slash():
    # Trailing slash after port (should not match regex)
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:8080/") # 11.5μs -> 3.53μs (225% faster)

def test_path_in_url():
    # Path present in URL (should not match regex)
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:8080/path") # 11.7μs -> 3.32μs (252% faster)

def test_query_in_url():
    # Query present in URL (should not match regex)
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:8080?foo=bar") # 12.0μs -> 3.24μs (270% faster)

def test_fragment_in_url():
    # Fragment present in URL (should not match regex)
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:8080#frag") # 12.0μs -> 3.16μs (280% faster)

def test_unicode_in_url():
    # Unicode in host (should be valid if properly encoded)
    codeflash_output = _is_valid_proxy_url("http://xn--bcher-kva.example:8080") # 10.7μs -> 11.8μs (8.77% slower)

def test_invalid_url_format():
    # Completely malformed URL
    codeflash_output = _is_valid_proxy_url("not_a_url") # 6.49μs -> 899ns (622% faster)

def test_url_with_tab_character():
    # URL with embedded tab (should be invalid)
    codeflash_output = _is_valid_proxy_url("http://proxy\t.example.com:8080") # 11.0μs -> 2.83μs (288% faster)

def test_url_with_newline_character():
    # URL with embedded newline (should be invalid)
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:8080\n") # 10.7μs -> 11.3μs (5.46% slower)

# 3. Large Scale Test Cases

def test_many_valid_proxies():
    # Test a large number of valid proxy URLs
    proxies = [
        f"http://user{i}:pass{i}@proxy{i}.example.com:{1000 + i}"
        for i in range(500)
    ]
    for url in proxies:
        codeflash_output = _is_valid_proxy_url(url) # 1.44ms -> 1.34ms (7.97% faster)

def test_many_invalid_proxies():
    # Test a large number of invalid proxy URLs (missing scheme)
    proxies = [
        f"user{i}:pass{i}@proxy{i}.example.com:{1000 + i}"
        for i in range(500)
    ]
    for url in proxies:
        codeflash_output = _is_valid_proxy_url(url) # 1.05ms -> 90.4μs (1064% faster)

def test_scheme_case_sensitivity():
    # Schemes must be lowercase per regex; uppercase should fail
    codeflash_output = _is_valid_proxy_url("HTTP://proxy.example.com:8080") # 13.0μs -> 1.26μs (938% faster)

def test_extra_colon_in_scheme():
    # Extra colon in scheme should be invalid
    codeflash_output = _is_valid_proxy_url("http:://proxy.example.com:8080") # 7.94μs -> 1.03μs (673% faster)

def test_username_with_colon():
    # Username with colon is not allowed, only one colon separates user:pass
    codeflash_output = _is_valid_proxy_url("http://us:er:[email protected]:8080") # 11.2μs -> 13.2μs (15.2% slower)

def test_double_slash_in_netloc():
    # Double slash in netloc should not be valid
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com//:8080") # 12.0μs -> 3.44μs (249% faster)

def test_port_with_leading_zeros():
    # Port with leading zeros is technically valid
    codeflash_output = _is_valid_proxy_url("http://proxy.example.com:0080") # 11.0μs -> 11.0μs (0.046% slower)

def test_ipv6_with_auth():
    # IPv6 with auth
    codeflash_output = _is_valid_proxy_url("socks5://user:pass@[2001:db8::1]:1080") # 25.1μs -> 2.80μs (798% faster)

def test_ipv6_without_port():
    # IPv6 without port
    codeflash_output = _is_valid_proxy_url("socks5://[2001:db8::1]") # 20.2μs -> 2.40μs (741% faster)

def test_ipv6_with_invalid_characters():
    # IPv6 address with invalid characters
    codeflash_output = _is_valid_proxy_url("http://[2001:db8::zz]:8080") # 18.7μs -> 2.48μs (656% faster)

def test_url_with_percent_encoded_chars():
    # Percent-encoded chars in userinfo/host
    codeflash_output = _is_valid_proxy_url("http://user%3A:pass%[email protected]:8080") # 10.4μs -> 10.7μs (2.80% slower)

def test_url_with_long_username_and_password():
    # Very long username and password
    username = "u" * 100
    password = "p" * 100
    url = f"http://{username}:{password}@proxy.example.com:8080"
    codeflash_output = _is_valid_proxy_url(url) # 11.1μs -> 10.6μs (5.05% faster)

def test_url_with_max_length():
    # Very long but valid URL (close to 1000 chars)
    host = "a" * 900
    url = f"http://{host}:8080"
    codeflash_output = _is_valid_proxy_url(url) # 47.9μs -> 47.4μs (1.14% faster)

def test_url_with_max_length_invalid():
    # Very long but invalid URL (missing scheme)
    host = "a" * 900
    url = f"{host}:8080"
    codeflash_output = _is_valid_proxy_url(url) # 25.1μs -> 851ns (2846% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations

import re
from urllib.parse import urlparse

# imports
import pytest  # used for our unit tests
from skyvern.webeye.browser_factory import _is_valid_proxy_url

# unit tests

# --- BASIC TEST CASES ---

def test_valid_http_proxy():
    # Basic valid HTTP proxy
    codeflash_output = _is_valid_proxy_url("http://127.0.0.1:8080") # 10.6μs -> 10.2μs (4.60% faster)

def test_valid_https_proxy():
    # Basic valid HTTPS proxy
    codeflash_output = _is_valid_proxy_url("https://proxy.example.com:443") # 10.9μs -> 10.5μs (3.72% faster)

def test_valid_socks5_proxy():
    # Basic valid SOCKS5 proxy
    codeflash_output = _is_valid_proxy_url("socks5://192.168.1.1:1080") # 10.1μs -> 9.75μs (3.20% faster)

def test_valid_proxy_with_username_password():
    # Valid proxy with username and password
    codeflash_output = _is_valid_proxy_url("http://user:[email protected]:3128") # 9.34μs -> 9.07μs (2.94% faster)

def test_valid_proxy_with_username_only():
    # Valid proxy with username only
    codeflash_output = _is_valid_proxy_url("http://[email protected]:3128") # 9.64μs -> 8.86μs (8.75% faster)

def test_valid_proxy_without_port():
    # Valid proxy without port (should still be valid)
    codeflash_output = _is_valid_proxy_url("http://proxyhost.com") # 9.93μs -> 9.19μs (8.07% faster)

def test_valid_proxy_with_hyphenated_host():
    # Valid proxy with hyphens in hostname
    codeflash_output = _is_valid_proxy_url("http://proxy-host.com:8000") # 10.2μs -> 9.82μs (3.34% faster)

# --- EDGE TEST CASES ---

def test_invalid_scheme():
    # Invalid scheme (ftp is not allowed)
    codeflash_output = _is_valid_proxy_url("ftp://proxy.com:21") # 8.19μs -> 809ns (912% faster)

def test_missing_scheme():
    # Missing scheme
    codeflash_output = _is_valid_proxy_url("proxy.com:8080") # 7.31μs -> 788ns (828% faster)

def test_missing_host():
    # Missing host (just scheme)
    codeflash_output = _is_valid_proxy_url("http://:8080") # 8.94μs -> 1.36μs (558% faster)

def test_missing_netloc():
    # Only scheme, no netloc
    codeflash_output = _is_valid_proxy_url("http://") # 7.71μs -> 650ns (1087% faster)

def test_empty_string():
    # Empty string
    codeflash_output = _is_valid_proxy_url("") # 5.96μs -> 704ns (747% faster)

def test_none_string():
    # None as input should not raise, but return False
    codeflash_output = _is_valid_proxy_url(None) # 7.30μs -> 1.43μs (412% faster)

def test_whitespace_string():
    # String with only whitespace
    codeflash_output = _is_valid_proxy_url("   ") # 6.25μs -> 628ns (895% faster)

def test_url_with_spaces():
    # URL with spaces should be invalid
    codeflash_output = _is_valid_proxy_url("http://proxy host.com:8080") # 11.6μs -> 2.99μs (287% faster)

def test_url_with_invalid_port():
    # Port is not a number
    codeflash_output = _is_valid_proxy_url("http://proxy.com:port") # 10.7μs -> 2.78μs (286% faster)

def test_url_with_negative_port():
    # Negative port number (should still match regex, but is not a valid port)
    codeflash_output = _is_valid_proxy_url("http://proxy.com:-8080") # 10.4μs -> 2.67μs (289% faster)

def test_url_with_large_port():
    # Port number larger than 65535 (regex allows it, so should be True)
    codeflash_output = _is_valid_proxy_url("http://proxy.com:99999") # 10.2μs -> 11.8μs (13.3% slower)

def test_url_with_ipv6_host():
    # IPv6 host (should be invalid due to regex not supporting brackets)
    codeflash_output = _is_valid_proxy_url("http://[2001:db8::1]:8080") # 23.6μs -> 2.59μs (808% faster)

def test_url_with_special_characters_in_host():
    # Host with special characters
    codeflash_output = _is_valid_proxy_url("http://pro!xy.com:8080") # 9.85μs -> 10.5μs (5.78% slower)

def test_url_with_trailing_slash():
    # Trailing slash after port (regex does not allow it)
    codeflash_output = _is_valid_proxy_url("http://proxy.com:8080/") # 10.7μs -> 2.88μs (272% faster)

def test_url_with_path():
    # Proxy URL with path (should be invalid)
    codeflash_output = _is_valid_proxy_url("http://proxy.com:8080/somepath") # 11.0μs -> 2.80μs (293% faster)

def test_url_with_query():
    # Proxy URL with query (should be invalid)
    codeflash_output = _is_valid_proxy_url("http://proxy.com:8080?foo=bar") # 11.1μs -> 2.74μs (304% faster)

def test_url_with_fragment():
    # Proxy URL with fragment (should be invalid)
    codeflash_output = _is_valid_proxy_url("http://proxy.com:8080#frag") # 11.2μs -> 2.65μs (323% faster)

def test_url_with_empty_username():
    # Username is empty (should be invalid)
    codeflash_output = _is_valid_proxy_url("http://:[email protected]:8080") # 8.67μs -> 1.16μs (649% faster)

def test_url_with_empty_password():
    # Password is empty (should be valid)
    codeflash_output = _is_valid_proxy_url("http://user:@proxy.com:8080") # 9.50μs -> 10.7μs (11.1% slower)

def test_url_with_unicode_characters():
    # Unicode in hostname (should be valid if no spaces/specials)
    codeflash_output = _is_valid_proxy_url("http://xn--bcher-kva.com:8080") # 10.3μs -> 10.3μs (0.223% slower)

def test_url_with_leading_trailing_spaces():
    # Leading/trailing spaces should invalidate
    codeflash_output = _is_valid_proxy_url("  http://proxy.com:8080  ") # 8.42μs -> 781ns (979% faster)

def test_url_with_double_at_in_userinfo():
    # Two '@' in userinfo (should be invalid)
    codeflash_output = _is_valid_proxy_url("http://user:pass@@proxy.com:8080") # 9.57μs -> 9.61μs (0.416% slower)

def test_url_with_multiple_colons_in_userinfo():
    # Multiple colons in userinfo (should be valid as per regex)
    codeflash_output = _is_valid_proxy_url("http://user:pa:[email protected]:8080") # 9.14μs -> 9.14μs (0.000% faster)

def test_url_with_no_host_but_userinfo():
    # Userinfo but no host
    codeflash_output = _is_valid_proxy_url("http://user:pass@:8080") # 10.2μs -> 2.59μs (292% faster)

def test_url_with_dot_at_end_of_host():
    # Host ends with dot (should be valid as per regex)
    codeflash_output = _is_valid_proxy_url("http://proxy.com.:8080") # 10.00μs -> 10.1μs (1.09% slower)

# --- LARGE SCALE TEST CASES ---

def test_many_valid_proxies():
    # Test a large number of valid proxies for performance and correctness
    for i in range(1, 501):
        url = f"http://user{i}:pass{i}@host{i}.com:{1000+i}"
        codeflash_output = _is_valid_proxy_url(url) # 1.41ms -> 1.30ms (8.42% faster)

def test_many_invalid_proxies():
    # Test a large number of invalid proxies (missing scheme, invalid chars, etc.)
    for i in range(1, 501):
        url = f"user{i}:pass{i}@host{i}.com:{1000+i}"  # missing scheme
        codeflash_output = _is_valid_proxy_url(url) # 1.06ms -> 90.5μs (1071% faster)

def test_mixed_large_scale():
    # Mix of valid and invalid URLs
    valid_urls = [f"https://host{i}.com:{10000+i}" for i in range(100)]
    invalid_urls = [f"ftp://host{i}.com:{10000+i}" for i in range(100)]
    for url in valid_urls:
        codeflash_output = _is_valid_proxy_url(url) # 338μs -> 320μs (5.79% faster)
    for url in invalid_urls:
        codeflash_output = _is_valid_proxy_url(url) # 233μs -> 18.5μs (1163% faster)

def test_large_scale_edge_ports():
    # Test edge port numbers
    codeflash_output = _is_valid_proxy_url("http://proxy.com:0") # 8.97μs -> 8.41μs (6.65% faster)
    codeflash_output = _is_valid_proxy_url("http://proxy.com:65535") # 4.80μs -> 4.57μs (5.06% faster)
    codeflash_output = _is_valid_proxy_url("http://proxy.com:65536") # 3.62μs -> 3.51μs (3.08% faster)

def test_large_scale_varied_userinfo():
    # Test many proxies with varied userinfo
    for i in range(1, 100):
        url = f"socks5://user{i}@host{i}.com:{2000+i}"
        codeflash_output = _is_valid_proxy_url(url) # 294μs -> 272μs (7.86% faster)
        url2 = f"socks5://user{i}:@host{i}.com:{2000+i}"
        codeflash_output = _is_valid_proxy_url(url2)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_is_valid_proxy_url-mjai3lda and push.

Codeflash Static Badge

The optimization achieves a **67% speedup** through two key changes:

1. **Regex Compilation Hoisting**: Moving `PROXY_PATTERN` compilation outside the function eliminates repeated compilation overhead. The original code recompiled the regex on every call (11% of runtime), while the optimized version compiles it once at import time.

2. **Early Rejection with Regex First**: The optimized version checks the regex pattern before calling `urlparse()`. Since `urlparse()` consumed 82.3% of the original runtime, this reordering provides massive gains for invalid URLs. The regex can quickly reject malformed URLs without the expensive parsing step.

**Performance Impact by Test Case**:
- **Invalid URLs see dramatic improvements** (200-1000% faster): URLs with wrong schemes, missing components, or malformed syntax are rejected immediately by the regex
- **Valid URLs see modest improvements** (2-9% faster): These still require `urlparse()` but benefit from eliminated regex recompilation
- **Large-scale invalid URL tests** show the biggest gains (1000%+ faster), demonstrating the optimization's effectiveness when processing many malformed URLs

**Real-World Impact**: Based on the function reference, `_is_valid_proxy_url` is called in `setup_proxy()` to validate proxy URLs from configuration. When proxy pools contain invalid entries (common in real deployments), this optimization will significantly reduce startup time and configuration validation overhead. The function processes each proxy in a list, so the performance gains compound with larger proxy pools or frequent proxy validation.

The optimization maintains identical behavior while dramatically improving performance for the common case of rejecting invalid proxy URLs.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 17, 2025 21:05
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant