I'm trying to set up a clean separation between RL policy and RL environment execution using this project. My basic flow is:
- Open a new env on remote
- Run
env.step until the episode ends
- Call
env.close
I ran into some issues:
-
No numpy support in zero.
I solved this by writing a new encoder that handles numpy arrays:
import asyncio
import io
from typing import Any, Type
import numpy as np
from zero import AsyncZeroClient
from zero.encoder.generic import GenericEncoder
from zero.encoder.msgspc import T
class GenericEncoderWithNumpySupport(GenericEncoder):
def encode(self, data) -> bytes:
if isinstance(data, np.ndarray):
buffer = io.BytesIO()
np.save(buffer, data)
return super().encode(buffer.getvalue())
return super().encode(data)
def decode(self, data: bytes) -> Any:
decoded_data = super().decode(data)
if decoded_data[1:6] == b"NUMPY": # MAGIC string for numpy array
buffer = io.BytesIO(decoded_data)
return np.load(buffer)
return decoded_data
def decode_type(self, data: bytes, typ: Type[T]) -> T:
if issubclass(typ, np.ndarray):
decoded_data = self.decode(data)
buffer = io.BytesIO(decoded_data)
return np.load(buffer)
return super().decode_type(data, typ)
def is_allowed_type(self, typ: Type) -> bool:
return super().is_allowed_type(typ) or typ is np.ndarray
-
NoneType return disallowed.
It's an annoying restriction—I just worked around this by returning a dummy value.
-
No support for multiple arguments.
This was a deal breaker for me. Not supporting multiple arguments made it pretty hard to proceed, so I stopped here.
Would be nice if these issues could get some attention. The numpy part is easily fixable, but the lack of multi-argument support is really limiting.
I'm trying to set up a clean separation between RL policy and RL environment execution using this project. My basic flow is:
env.stepuntil the episode endsenv.closeI ran into some issues:
No numpy support in zero.
I solved this by writing a new encoder that handles numpy arrays:
NoneType return disallowed.
It's an annoying restriction—I just worked around this by returning a dummy value.
No support for multiple arguments.
This was a deal breaker for me. Not supporting multiple arguments made it pretty hard to proceed, so I stopped here.
Would be nice if these issues could get some attention. The numpy part is easily fixable, but the lack of multi-argument support is really limiting.