Skip to content

os.environ.clear() has a quadratic time complexity #139482

@gukoff

Description

@gukoff

Bug report

Bug description:

Problem

Summary

The os.environ.clear() method currently has O(N²) time complexity. This is because it inherits the generic MutableMapping.clear() implementation, which repeatedly calls popitem(). Each popitem() call invokes __iter__(), and for _Environ, __iter__() creates a full snapshot of all environment keys every time. As a result, clearing large environments is extremely slow (e.g., 23 seconds to clear 100,000 variables).

Details

os.environ is an instance of _Environ, which subclasses MutableMapping. _Environ implements the required abstract methods (__getitem__, __setitem__, __delitem__, __iter__, __len__), but inherits generic implementations for methods like clear, pop, popitem, update, and setdefault.

The inherited clear() method is implemented as follows:

class MutableMapping(Mapping):
    ...
    def popitem(self):
        try:
            key = next(iter(self))
        except StopIteration:
            raise KeyError from None
        value = self[key]
        del self[key]
        return key, value

    def clear(self):
        'D.clear() -> None.  Remove all items from D.'
        try:
            while True:
                self.popitem()
        except KeyError:
            pass
    ...

For every item deleted, clear() calls iter(self) inside popitem().

This becomes a performance issue for _Environ because _Environ.__iter__() creates a snapshot of all keys:

class _Environ(MutableMapping):
    ...
    def __iter__(self):
        # list() from dict object is an atomic operation
        keys = list(self._data)
        for key in keys:
            yield self.decodekey(key)

This leads to quadratic time complexity for os.environ.clear().

Performance Impact

This problem is especially evident on large environments.
On my M3 MacBook Pro, it takes 500ms to clear an environment with only 10K variables.
A more extreme example: 100K variables take 23s to clear.

Environments with thousands of variables are rare, but do exist.

In [3]: for i in range(100000):
   ...:     os.environ[str(i)] = str(i)
   ...:

In [4]: %time os.environ.clear()

CPU times: user 23.3 s, sys: 94.3 ms, total: 23.4 s
Wall time: 23.4 s

Suggested solution

Implement a custom _Environ.clear() that avoids repeated snapshots (for every key):

def clear(self):
    while self._data:
        for key in list(self._data):
            try:
                del self[key]
            except KeyError:
                pass

This implementation makes use of the atomicity of list(dict()), which is important in multithreaded environments and is the reason we call list(self._data) inside __iter__(self).

Further improvement on Linux/FreeBSD could be achieved by using clearenv() which is part of the standard C library there.

Other considerations

More generally, we could consider challenging the existing design decisions:

  1. Should __iter__ make a snapshot of all environment keys?
  1. Should MutableMapping.clear() rely on popitem() in a loop, assuming __iter__ is cheap?

CPython versions tested on:

3.12, CPython main branch

Operating systems tested on:

macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions