-
-
Notifications
You must be signed in to change notification settings - Fork 33k
Description
Bug report
Bug description:
Problem
Summary
The os.environ.clear()
method currently has O(N²) time complexity. This is because it inherits the generic MutableMapping.clear()
implementation, which repeatedly calls popitem()
. Each popitem()
call invokes __iter__()
, and for _Environ
, __iter__()
creates a full snapshot of all environment keys every time. As a result, clearing large environments is extremely slow (e.g., 23 seconds to clear 100,000 variables).
Details
os.environ
is an instance of _Environ
, which subclasses MutableMapping
. _Environ
implements the required abstract methods (__getitem__
, __setitem__
, __delitem__
, __iter__
, __len__
), but inherits generic implementations for methods like clear
, pop
, popitem
, update
, and setdefault
.
The inherited clear()
method is implemented as follows:
class MutableMapping(Mapping):
...
def popitem(self):
try:
key = next(iter(self))
except StopIteration:
raise KeyError from None
value = self[key]
del self[key]
return key, value
def clear(self):
'D.clear() -> None. Remove all items from D.'
try:
while True:
self.popitem()
except KeyError:
pass
...
For every item deleted, clear()
calls iter(self)
inside popitem()
.
This becomes a performance issue for _Environ
because _Environ.__iter__()
creates a snapshot of all keys:
class _Environ(MutableMapping):
...
def __iter__(self):
# list() from dict object is an atomic operation
keys = list(self._data)
for key in keys:
yield self.decodekey(key)
This leads to quadratic time complexity for os.environ.clear()
.
Performance Impact
This problem is especially evident on large environments.
On my M3 MacBook Pro, it takes 500ms to clear an environment with only 10K variables.
A more extreme example: 100K variables take 23s to clear.
Environments with thousands of variables are rare, but do exist.
In [3]: for i in range(100000):
...: os.environ[str(i)] = str(i)
...:
In [4]: %time os.environ.clear()
CPU times: user 23.3 s, sys: 94.3 ms, total: 23.4 s
Wall time: 23.4 s
Suggested solution
Implement a custom _Environ.clear()
that avoids repeated snapshots (for every key):
def clear(self):
while self._data:
for key in list(self._data):
try:
del self[key]
except KeyError:
pass
This implementation makes use of the atomicity of list(dict())
, which is important in multithreaded environments and is the reason we call list(self._data)
inside __iter__(self)
.
Further improvement on Linux/FreeBSD could be achieved by using clearenv()
which is part of the standard C library there.
Other considerations
More generally, we could consider challenging the existing design decisions:
- Should
__iter__
make a snapshot of all environment keys?
- PR introducing this behavior (2017): bpo-30441: Fix bug when modifying os.environ while iterating over it #2409
- Bug report that motivated the change: https://bugs.python.org/issue30441
- Should
MutableMapping.clear()
rely onpopitem()
in a loop, assuming__iter__
is cheap?
CPython versions tested on:
3.12, CPython main branch
Operating systems tested on:
macOS