Plangym API

class plangym.core.PlanEnv(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]

Inherit from this class to adapt environments to different problems.

Base class that establishes all needed methods and blueprints to work with Gym environments.

Parameters

name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
return_image (bool) –

__del__()[source]: Teardown the Environment when it is no longer needed.

__init__(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]

Initialize a Environment.

Parameters

name (str) – Name of the environment.
frameskip (int) – Number of times step will be called with the same action.
autoreset (bool) – Automatically reset the environment when the OpenAI environment returns end = True.
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later (delayed setups are necessary when one requires to serialize the object environment or to have duplicated instances).
return_image (bool) – If True add an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.

property action_shape: Tuple[int]

Tuple containing the shape of the actions applied to the Environment.

Return type: Tuple[int]

apply_action(action, *, _ray_trace_ctx=None)[source]: Evolve the environment for one time step applying the provided action.

apply_reset(*, _ray_trace_ctx=None, **kwargs)[source]: Perform the resetting operation on the environment.

begin_step(action=None, dt=None, state=None, return_state=None, *, _ray_trace_ctx=None)[source]

Perform setup of step variables before starting step_with_dt.

Parameters: return_state (Optional[bool]) –

clone(*, _ray_trace_ctx=None, **kwargs)[source]

Return a copy of the environment.

Return type: plangym.core.PlanEnv

close(*, _ray_trace_ctx=None)[source]

Tear down the current environment.

Return type: None

get_image(*, _ray_trace_ctx=None)[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a grayscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)

Return type: Union[None, numpy.ndarray]

get_state()[source]

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Return type: Any

get_step_tuple(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]

Prepare the tuple that step returns.

This is a post processing state to have fine-grained control over what data the current step is returning.

By default it determines:

Return the state in the tuple (necessary information to save or load the game).
Adding the “rgb” key in the info dictionary containing an RGB representation of the environment.

Parameters

obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.

Returns

Tuple containing the environment data after calling step.

property name: str

Return is the name of the environment.

Return type: str

property obs_shape: Tuple[int]

Tuple containing the shape of the observations returned by the Environment.

Return type: Tuple[int]

process_apply_action(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]

Perform any post-processing to the data returned by apply_action.

Parameters

obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.

Returns

Tuple containing the processed data.

process_info(info, *, _ray_trace_ctx=None, **kwargs)[source]

Perform optional computation for computing the info dictionary returned by step.

Return type: Dict[str, Any]

process_obs(obs, *, _ray_trace_ctx=None, **kwargs)[source]: Perform optional computation for computing the observation returned by step.

process_reward(reward, *, _ray_trace_ctx=None, **kwargs)[source]

Perform optional computation for computing the reward returned by step.

Return type: float

process_step(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]

Prepare the returned info dictionary.

This is a post processing step to have fine-grained control over what data the info dictionary contains.

Parameters

obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.

Returns

Tuple containing the environment data after calling step.

process_terminal(terminal, *, _ray_trace_ctx=None, **kwargs)[source]

Perform optional computation for computing the terminal flag returned by step.

Return type: bool

reset(return_state=True)[source]

Restart the environment.

Parameters: return_state (bool) – If True, it will return the state of the environment.
Returns: (state, obs) if `return_state is True else return obs.
Return type: Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

property return_image: bool

Return return_image flag.

If True add an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.

Return type: bool

run_autoreset(step_data, *, _ray_trace_ctx=None)[source]: Reset the environment automatically if needed.

sample_action(*, _ray_trace_ctx=None)[source]

Return a valid action that can be used to step the Environment.

Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.

set_state(state)[source]

Set the internal state of the simulation. Overwrite current state by the given argument.

Parameters: state (Any) – Target state to be set in the environment.
Returns: None
Return type: None

setup()[source]

Run environment initialization.

Including in this function all the code which makes the environment impossible to serialize will allow to dispatch the environment to different workers and initialize it once it’s copied to the target process.

Return type: None

step(action, state=None, dt=1, return_state=None)[source]

Step the environment applying the supplied action.

Optionally set the state to the supplied state before stepping it (the method prepares the environment in the given state, dismissing the current state, and applies the action afterwards).

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

In addition, the method allows the user to prepare the returned object, adding additional information and custom pre-processings via self.process_step and self.get_step_tuple methods.

Parameters

action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
state (Optional[numpy.ndarray]) – Set the environment to the given state before stepping it.
dt (int) – Consecutive number of times that the action will be applied.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns

if state is None returns (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info)

Return type

tuple

step_batch(actions, states=None, dt=1, return_state=True)[source]

Allow stepping a vector of states and actions.

Vectorized version of the step method. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.

Parameters

actions (Union[numpy.ndarray, Iterable[Union[numpy.ndarray, int]]]) – Iterable containing the different actions to be applied.
states (Optional[Union[numpy.ndarray, Iterable]]) – Iterable containing the different states to be set.
dt (Union[int, numpy.ndarray]) – int or array containing the consecutive that will be applied to each state. If array, the different values are distributed among the multiple environments (contrary to self.frameskip, which is a common value for any instance).
return_state (bool) – Whether to return the state in the returned tuple, depending on the boolean value. If None, step will return the state if state was passed as a parameter.

Returns

If return_state is True, the method returns (new_states, observs, rewards, ends, infos). If return_state is False, the method returns (observs, rewards, ends, infos). If return_state is None, the returned object depends on the states parameter.

Return type

Tuple[Union[list, numpy.ndarray], …]

step_with_dt(action, dt=1, *, _ray_trace_ctx=None)[source]

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

The method performs any post-processing to the data after applying the action to the environment via self.process_apply_action.

This method neither computes nor returns any state.

Parameters

action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.

Returns

Tuple containing (observs, reward, terminal, info).

property unwrapped: plangym.core.PlanEnv

Completely unwrap this Environment.

Returns: The base non-wrapped plangym.Environment instance
Return type: plangym.Environment

class plangym.core.PlangymEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]

Base class for implementing OpenAI gym environments in plangym.

Parameters

name (str) –
frameskip (int) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
render_mode (Optional[str]) –

__init__(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]

Initialize a PlangymEnv.

The user can read all private methods as instance properties.

Parameters

name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each dt. Common argument to all environments.
autoreset (bool) – Automatically reset the environment when the OpenAI environment returns end = True.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.
remove_time_limit – If True, remove the time limit from the environment.
render_mode (Optional[str]) –

__repr__()[source]: Pretty print the environment.

__str__()[source]: Pretty print the environment.

property action_shape: Tuple[int, ...]

Tuple containing the shape of the actions applied to the Environment.

Return type: Tuple[int, Ellipsis]

property action_space: gym.spaces.space.Space

Return the action_space of the environment.

Return type: gym.spaces.Space

apply_action(action)[source]

Evolve the environment for one time step applying the provided action.

Accumulate rewards and calculate terminal flag after stepping the environment.

apply_reset(return_state=True)[source]

Restart the environment.

Parameters: return_state (bool) – If True it will return the state of the environment.
Returns: (state, obs) if `return_state is True else return obs.
Return type: Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

apply_wrappers(wrappers)[source]

Wrap the underlying OpenAI gym environment.

Parameters: wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

clone(**kwargs)[source]

Return a copy of the environment.

Return type: plangym.core.PlangymEnv

close()[source]: Close the underlying gym.Env.

get_coords_obs(obs, **kwargs)[source]: Calculate the observation returned by step when obs_type == “coords”.

get_grayscale_obs(obs, **kwargs)[source]: Calculate the observation returned by step when obs_type == “grayscale”.

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).

Return type: numpy.ndarray

get_rgb_obs(obs, **kwargs)[source]: Calculate the observation returned by step when obs_type == “rgb”.

property gym_env: Return the instance of the environment that is being wrapped by plangym.

init_gym_env()[source]

Initialize the :class:gym.Env instance that the current class is wrapping.

Return type: gym.core.Env

init_spaces()[source]: Initialize the action_space and observation_space of the environment.

property metadata: Return the metadata of the environment.

property obs_shape: Tuple[int, ...]

Tuple containing the shape of the observations returned by the Environment.

Return type: Tuple[int, Ellipsis]

property obs_type: str

Return the type of observation returned by the environment.

Return type: str

property observation_space: gym.spaces.space.Space

Return the observation_space of the environment.

Return type: gym.spaces.Space

process_obs(obs, **kwargs)[source]

Perform optional computation for computing the observation returned by step.

This is a post processing step to have fine-grained control over the returned observation.

property remove_time_limit: bool

Return True if the Environment can only be stepped for a limited number of times.

Return type: bool

render(mode='human')[source]: Render the environment using OpenGL. This wraps the OpenAI render method.

property render_mode: Union[None, str]

None | human | rgb_array.

Type: Return how the game will be rendered. Values
Return type: Union[None, str]

property reward_range: Return the reward_range of the environment.

sample_action()[source]

Return a valid action that can be used to step the environment chosen at random.

Return type: Union[int, numpy.ndarray]

seed(seed=None)[source]: Seed the underlying gym.Env.

setup()[source]

Initialize the target gym.Env instance.

The method calls self.init_gym_env to initialize the :class:gym.Env instance. It removes time limits if needed and applies wrappers introduced by the user.

wrap(wrapper, *args, **kwargs)[source]

Apply a single OpenAI gym wrapper to the environment.

Parameters: wrapper (Callable) –

Videogames

Atari 2600

class plangym.videogames.atari.AtariEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=False, wrappers=None, array_state=True, clone_seeds=False, **kwargs)[source]

Create an environment to play OpenAI gym Atari Games that uses AtariALE as the emulator.

Parameters

name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return end = True when losing a life.
autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.
remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode (int) – Integer or string indicating the game mode, when available.
difficulty (int) – Difficulty level of the game, when available.
repeat_action_probability (float) – Repeat the last action with this probability.
full_action_space (bool) – Wheter to use the full range of possible actions or only those available in the game.
render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.
possible_to_win (bool) – It is possible to finish the Atari game without getting a terminal state that is not out of bounds or does not involve losing a life.
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).
array_state (bool) – Whether to return the state of the environment as a numpy array.
clone_seeds (bool) – Clone the random seed of the ALE emulator when reading/setting the state. False makes the environment stochastic.

Example:

>>> env = plangym.make(name="ALE/MsPacman-v5", difficulty=2, mode=1)
>>> state, obs = env.reset()
>>>
>>> states = [state.copy() for _ in range(10)]
>>> actions = [env.action_space.sample() for _ in range(10)]
>>>
>>> data = env.step_batch(states=states, actions=actions)
>>> new_states, observs, rewards, ends, infos = data

__init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=False, wrappers=None, array_state=True, clone_seeds=False, **kwargs)[source]

Initialize a AtariEnvironment.

Parameters

name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return end = True when losing a life.
autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.
remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode (int) – Integer or string indicating the game mode, when available.
difficulty (int) – Difficulty level of the game, when available.
repeat_action_probability (float) – Repeat the last action with this probability.
full_action_space (bool) – Wheter to use the full range of possible actions or only those available in the game.
render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.
possible_to_win (bool) – It is possible to finish the Atari game without getting a terminal state that is not out of bounds or does not involve losing a life.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).
array_state (bool) – Whether to return the state of the environment as a numpy array.
clone_seeds (bool) – Clone the random seed of the ALE emulator when reading/setting the state. False makes the environment stochastic.

Example:

>>> env = AtariEnv(name="ALE/MsPacman-v5", difficulty=2, mode=1)
>>> type(env.gym_env)
<class 'gym.envs.atari.environment.AtariEnv'>
>>> state, obs = env.reset()
>>> type(state)
<class 'numpy.ndarray'>

property ale

Return the ale interface of the underlying gym.Env.

Example:

>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="ram")
>>> type(env.ale)
<class 'ale_py._ale_py.ALEInterface'>

clone(**kwargs)[source]

Return a copy of the environment.

Return type: plangym.videogames.env.VideogameEnv

property difficulty: int

Return the selected difficulty for the current environment.

Return type: int

property full_action_space: bool

If True the action space correspond to all possible actions in the Atari emulator.

Return type: bool

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Image is a three-dimensional array interpreted as an RGB image with channels (Height, Width, RGB). Ignores wrappers as it loads the screen directly from the emulator.

Example:

>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="ram")
>>> img = env.get_image()
>>> img.shape
(210, 160, 3)

Return type: numpy.ndarray

get_lifes_from_info(info)[source]

Return the number of lives remaining in the current game.

Parameters: info (Dict[str, Any]) –
Return type: int

get_ram()[source]

Return a numpy array containing the content of the emulator’s RAM.

The RAM is a vector array interpreted as the memory of the emulator.

Example:

>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="grayscale")
>>> ram = env.get_ram()
>>> ram.shape, ram.dtype
((128,), dtype('uint8'))

Return type: numpy.ndarray

get_state()[source]

Recover the internal state of the simulation.

If clone seed is False the environment will be stochastic. Cloning the full state ensures the environment is deterministic.

Example:

>>> env = AtariEnv(name="Qbert-v0")
>>> env.get_state() 
array([<ale_py._ale_py.ALEState object at 0x...>, None],
      dtype=object)

>>> env = AtariEnv(name="Qbert-v0", array_state=False)
>>> env.get_state() 
<ale_py._ale_py.ALEState object at 0x...>

Return type: numpy.ndarray

init_gym_env()[source]

Initialize the gym.Env` instance that the Environment is wrapping.

Return type: gym.core.Env

property mode: int

Return the selected game mode for the current environment.

Return type: int

property observation_space: gym.spaces.space.Space

Return the observation_space of the environment.

Return type: gym.spaces.Space

property repeat_action_probability: float

Probability of repeating the same action after input.

Return type: float

set_state(state)[source]

Set the internal state of the simulation.

Parameters: state (numpy.ndarray) – Target state to be set in the environment.
Return type: None

Example:

>>> env = AtariEnv(name="Qbert-v0")
>>> state, obs = env.reset()
>>> new_state, obs, reward, end, info = env.step(env.sample_action(), state=state)
>>> assert not (state == new_state).all()
>>> env.set_state(state)
>>> (state == env.get_state()).all()
True

step_with_dt(action, dt=1)[source]

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

Parameters

action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.

Returns

If state is None return (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info)

Example:

>>> env = AtariEnv(name="Pong-v0")
>>> obs = env.reset(return_state=False)
>>> obs, reward, end, info = env.step_with_dt(env.sample_action(), dt=7)
>>> assert not end

class plangym.videogames.montezuma.MontezumaEnv(name='PlanMontezuma-v0', frameskip=1, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=True, wrappers=None, array_state=True, clone_seeds=True, **kwargs)[source]

Plangym implementation of the MontezumaEnv environment optimized for planning.

Parameters

frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
mode (int) –
difficulty (int) –
repeat_action_probability (float) –
full_action_space (bool) –
render_mode (Optional[str]) –
possible_to_win (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
array_state (bool) –
clone_seeds (bool) –

__init__(name='PlanMontezuma-v0', frameskip=1, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=True, wrappers=None, array_state=True, clone_seeds=True, **kwargs)[source]

Initialize a MontezumaEnv.

Parameters

frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
mode (int) –
difficulty (int) –
repeat_action_probability (float) –
full_action_space (bool) –
render_mode (Optional[str]) –
possible_to_win (bool) –
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) –
array_state (bool) –
clone_seeds (bool) –

get_state()[source]

Recover the internal state of the simulation.

If clone seed is False the environment will be stochastic. Cloning the full state ensures the environment is deterministic.

Return type: numpy.ndarray

init_gym_env()[source]

Initialize the gum.Env` instance that the current clas is wrapping.

Return type: plangym.videogames.montezuma.CustomMontezuma

set_state(state)[source]

Set the internal state of the simulation.

Parameters: state (numpy.ndarray) – Target state to be set in the environment.
Returns: None

Gym retro

class plangym.videogames.retro.RetroEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Environment for playing gym-retro games.

Parameters

name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
render_mode (Optional[str]) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

__getattr__(item)[source]: Forward getattr to self.gym_env.

__init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Initialize a RetroEnv.

Parameters

name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return end = True when losing a life.
autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.
remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).

clone(**kwargs)[source]

Return a copy of the environment with its initialization delayed.

Return type: plangym.videogames.retro.RetroEnv

close()[source]: Close the underlying gym.Env.

get_ram()[source]

Return the ram of the emulator as a numpy array.

Return type: numpy.ndarray

get_state()[source]

Get the state of the retro environment.

Return type: numpy.ndarray

static get_win_condition(info)[source]

Get win condition for games that have the end of the screen available.

Parameters: info (Dict[str, Any]) –
Return type: bool

init_gym_env()[source]

Initialize the retro environment.

Return type: gym.core.Env

set_state(state)[source]

Set the state of the retro environment.

Parameters: state (numpy.ndarray) –

Super Mario (NES)

class plangym.videogames.nes.MarioEnv(name, movement_type='simple', original_reward=False, **kwargs)[source]

Interface for using gym-super-mario-bros in plangym.

Parameters

name (str) –
movement_type (str) –
original_reward (bool) –

__init__(name, movement_type='simple', original_reward=False, **kwargs)[source]

Initialize a MarioEnv.

Parameters

name (str) – Name of the environment.
movement_type (str) – One of {complex|simple|right}
original_reward (bool) – If False return a custom reward based on mario position and level.
**kwargs – passed to super().__init__.

get_coords_obs(obs, info=None, **kwargs)[source]

Return the information contained in info as an observation if obs_type == “info”.

Parameters

obs (numpy.ndarray) –
info (Optional[Dict[str, Any]]) –

Return type

numpy.ndarray

get_state(state=None)[source]

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Parameters: state (Optional[numpy.ndarray]) –
Return type: numpy.ndarray

init_gym_env()[source]

Initialize the NESEnv` instance that the current class is wrapping.

Return type: gym.core.Env

process_info(info, **kwargs)[source]

Add additional data to the info dictionary.

Return type: Dict[str, Any]

process_reward(reward, info, **kwargs)[source]

Return a custom reward based on the x, y coordinates and level mario is in.

Return type: float

process_terminal(terminal, info, **kwargs)[source]

Return True if terminal or mario is dying.

Return type: bool

class plangym.videogames.nes.NesEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Environment for working with the NES-py emulator.

Parameters

name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
render_mode (Optional[str]) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

__del__()[source]: Tear down the environment.

close()[source]

Close the underlying gym.Env.

Return type: None

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)

Return type: numpy.ndarray

get_ram()[source]

Return a copy of the emulator environment.

Return type: numpy.ndarray

get_state(state=None)[source]

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Parameters: state (Optional[numpy.ndarray]) –
Return type: numpy.ndarray

property nes_env: NESEnv

Access the underlying NESEnv.

Return type: NESEnv

set_state(state)[source]

Set the internal state of the simulation.

Parameters: state (numpy.ndarray) – Target state to be set in the environment.
Returns: None
Return type: None

Video games API

class plangym.videogames.env.VideogameEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Common interface for working with video games that run using an emulator.

Parameters

name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
render_mode (Optional[str]) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

__init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Initialize a VideogameEnv.

Parameters

name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return end = True when losing a life.
autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.
remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode – Integer or string indicating the game mode, when available.
difficulty – Difficulty level of the game, when available.
repeat_action_probability – Repeat the last action with this probability.
full_action_space – Whether to use the full range of possible actions or only those available in the game.
render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).

apply_action(action)[source]: Evolve the environment for one time step applying the provided action.

begin_step(action=None, dt=None, state=None, return_state=None)[source]

Perform setup of step variables before starting step_with_dt.

Parameters: return_state (Optional[bool]) –
Return type: None

clone(**kwargs)[source]

Return a copy of the environment.

Return type: plangym.videogames.env.VideogameEnv

static get_lifes_from_info(info)[source]

Return the number of lifes remaining in the current game.

Parameters: info (Dict[str, Any]) –
Return type: int

get_ram()[source]

Return the ram of the emulator as a numpy array.

Return type: numpy.ndarray

init_spaces()[source]

Initialize the action_space and the observation_space of the environment.

Return type: None

property n_actions: int

Return the number of actions available.

Return type: int

process_obs(obs, **kwargs)[source]: Return the ram vector if obs_type == “ram” or and image otherwise.

Control Tasks

DM Control

class plangym.control.dm_control.DMControlEnv(name='cartpole-balance', frameskip=1, episodic_life=False, autoreset=True, wrappers=None, delay_setup=False, visualize_reward=True, domain_name=None, task_name=None, render_mode=None, obs_type=None, remove_time_limit=None)[source]

Wrap the `dm_control library, allowing its implementation in planning problems.

The dm_control library is a DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo physics.

For more information about the environment, please refer to https://github.com/deepmind/dm_control

This class allows the implementation of dm_control in planning problems. It allows parallel and vectorized execution of the environments.

Parameters

name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
visualize_reward (bool) –
obs_type (Optional[str]) –

__init__(name='cartpole-balance', frameskip=1, episodic_life=False, autoreset=True, wrappers=None, delay_setup=False, visualize_reward=True, domain_name=None, task_name=None, render_mode=None, obs_type=None, remove_time_limit=None)[source]

Initialize a DMControlEnv.

Parameters

name (str) – Name of the task. Provide the task to be solved as domain_name-task_name. For example ‘cartpole-balance’.
frameskip (int) – Set a deterministic frameskip to apply the same action N times.
episodic_life (bool) – Send terminal signal after loosing a life.
autoreset (bool) – Restart environment when reaching a terminal state.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).
delay_setup (bool) – If True, do not initialize the gym.Environment and wait for setup to be called later.
visualize_reward (bool) – Define the color of the agent, which depends on the reward on its last timestep.
domain_name – Same as in dm_control.suite.load.
task_name – Same as in dm_control.suite.load.
render_mode – None|human|rgb_array
obs_type (Optional[str]) –

action_spec()[source]: Alias for the environment’s action_spec.

apply_action(action)[source]: Transform the returned time_step object to a compatible gym tuple.

close()[source]: Tear down the environment and close rendering.

property domain_name: str

Return the name of the agent in the current simulation.

Return type: str

get_coords_obs(obs, **kwargs)[source]

Get the environment observation from a time_step object.

Parameters

obs – Time step object returned after stepping the environment.
**kwargs – Ignored

Returns

Numpy array containing the environment observation.

Return type

numpy.ndarray

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).

Return type: numpy.ndarray

get_state()[source]

Return a tuple containing the three arrays that characterize the state of the system.

Each tuple contains the position of the robot, its velocity: and the control variables currently being applied.

Returns: Tuple of numpy arrays containing all the information needed to describe the current state of the simulation.
Return type: numpy.ndarray

init_gym_env()[source]: Initialize the environment instance (dm_control) that the current class is wrapping.

property physics: Alias for gym_env.physics.

render(mode='human')[source]

Store all the RGB images rendered to be shown when the show_game function is called.

Parameters: mode – rgb_array return an RGB image stored in a numpy array. human stores the rendered image in a viewer to be shown when show_game is called.
Returns: numpy.ndarray when mode == rgb_array. True when mode == human

set_state(state)[source]

Set the state of the simulator to the target State.

Parameters: state (numpy.ndarray) – numpy.ndarray containing the information about the state to be set.
Returns: None
Return type: None

setup()[source]: Initialize the target gym.Env instance.

show_game(sleep=0.05)[source]

Render the collected RGB images.

When ‘human’ option is selected as argument for the render method, it stores a collection of RGB images inside the self.viewer attribute. This method calls the latter to visualize the collected images.

Parameters: sleep (float) –

property task_name: str

Return the name of the task in the current simulation.

Return type: str

Classic control

class plangym.control.classic_control.ClassicControl(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]

Environment for OpenAI gym classic control environments.

Parameters

name (str) –
frameskip (int) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
render_mode (Optional[str]) –

get_state()[source]

Recover the internal state of the environment.

Return type: numpy.ndarray

set_state(state)[source]

Set the internal state of the environemnt.

Parameters: state (numpy.ndarray) – Target state to be set in the environment.
Returns: None

Box2D

class plangym.control.box_2d.Box2DEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]

Common interface for working with Box2D environments released by gym.

Parameters

name (str) –
frameskip (int) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
render_mode (Optional[str]) –

get_state()[source]

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Return type: numpy.array

set_state(state)[source]

Set the internal state of the simulation.

Parameters: state (numpy.ndarray) – Target state to be set in the environment.
Returns: None
Return type: None

class plangym.control.lunar_lander.LunarLander(name=None, frameskip=1, episodic_life=True, autoreset=True, wrappers=None, delay_setup=False, deterministic=False, continuous=False, render_mode=None, remove_time_limit=None, **kwargs)[source]

Fast LunarLander that follows the plangym API.

Parameters

name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
deterministic (bool) –
continuous (bool) –
render_mode (Optional[str]) –

__init__(name=None, frameskip=1, episodic_life=True, autoreset=True, wrappers=None, delay_setup=False, deterministic=False, continuous=False, render_mode=None, remove_time_limit=None, **kwargs)[source]

Initialize a LunarLander.

Parameters

name (Optional[str]) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) –
delay_setup (bool) –
deterministic (bool) –
continuous (bool) –
render_mode (Optional[str]) –

property continuous: bool

Return true if the LunarLander agent takes continuous actions as input.

Return type: bool

property deterministic: bool

Return true if the LunarLander simulation is deterministic.

Return type: bool

get_state()[source]

Recover the internal state of the simulation.

An state must completely describe the Environment at a given moment.

Return type: numpy.ndarray

init_gym_env()[source]

Initialize the target gym.Env instance.

Return type: plangym.control.lunar_lander.FastGymLunarLander

process_terminal(terminal, obs=None, **kwargs)[source]

Return the terminal condition considering the lunar lander state.

Return type: bool

set_state(state)[source]

Set the internal state of the simulation.

Parameters: state (numpy.ndarray) – Target state to be set in the environment.
Returns: None
Return type: None

Vectorization

Multiprocessing

class plangym.vectorization.parallel.ParallelEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, blocking=False, **kwargs)[source]

Allow any environment to be stepped in parallel when step_batch is called.

It creates a local instance of the target environment to call all other methods.

Example:

>>> from plangym.videogames import AtariEnv
>>> env = ParallelEnv(env_class=AtariEnv,
...                           name="MsPacman-v0",
...                           clone_seeds=True,
...                           autoreset=True,
...                           blocking=False)
>>>
>>> state, obs = env.reset()
>>>
>>> states = [state.copy() for _ in range(10)]
>>> actions = [env.sample_action() for _ in range(10)]
>>>
>>> data =  env.step_batch(states=states, actions=actions)
>>> new_states, observs, rewards, ends, infos = data

Parameters

name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
n_workers (int) –
blocking (bool) –

__init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, blocking=False, **kwargs)[source]

Initialize a ParallelEnv.

Parameters

env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times step will me called with the same action.
autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns end = True.
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.
env_callable – Callable that returns an instance of the environment that will be parallelized.
n_workers (int) – Number of workers that will be used to step the env.
blocking (bool) – Step the environments synchronously.
*args – Additional args for the environment.
**kwargs – Additional kwargs for the environment.

property blocking: bool

If True the steps are performed sequentially.

Return type: bool

clone(**kwargs)[source]

Return a copy of the environment.

Return type: plangym.core.PlanEnv

close()[source]

Close the environment and the spawned processes.

Return type: None

make_transitions(actions, states=None, dt=1, return_state=None)[source]

Vectorized version of the step method.

It allows to step a vector of states and actions. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.

Parameters

actions (numpy.ndarray) – Iterable containing the different actions to be applied.
states (Optional[numpy.ndarray]) – Iterable containing the different states to be set.
dt (Union[numpy.ndarray, int]) – int or array containing the frameskips that will be applied.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns

if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos)

setup()[source]: Run environment initialization and create the subprocesses for stepping in parallel.

sync_states(state)[source]

Synchronize all the copies of the wrapped environment.

Set all the states of the different workers of the internal BatchEnv: to the same state as the internal Environment used to apply the non-vectorized steps.

Parameters: state (None) –

Ray

class plangym.vectorization.ray.RayEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]

Use ray for taking steps in parallel when calling step_batch.

Parameters

name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
n_workers (int) –

__init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]

Initialize a ParallelEnv.

Parameters

env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times step will me called with the same action.
autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns end = True.
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.
env_callable – Callable that returns an instance of the environment that will be parallelized.
n_workers (int) – Number of workers that will be used to step the env.
*args – Additional args for the environment.
**kwargs – Additional kwargs for the environment.

make_transitions(actions, states=None, dt=1, return_state=None)[source]

Implement the logic for stepping the environment in parallel.

Parameters

dt ([<class 'numpy.ndarray'>, <class 'int'>]) –
return_state (Optional[bool]) –

reset(return_state=True)[source]

Restart the environment.

Parameters: return_state (bool) –
Return type: [<class ‘numpy.ndarray’>, <class ‘tuple’>]

setup()[source]: Run environment initialization and create the subprocesses for stepping in parallel.

sync_states(state)[source]

Synchronize all the copies of the wrapped environment.

Set all the states of the different workers of the internal BatchEnv: to the same state as the internal Environment used to apply the non-vectorized steps.

Parameters: state (None) –
Return type: None

property workers: List[<plangym.vectorization.ray.ActorClass(RemoteEnv) object at 0x7fd85c670790>]

Remote actors exposing copies of the environment.

Return type: List[RemoteEnv]

Vectorization API

class plangym.vectorization.env.VectorizedEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]

Base class that defines the API for working with vectorized environments.

A vectorized environment allows to step several copies of the environment in parallel when calling step_batch.

It creates a local copy of the environment that is the target of all the other methods of PlanEnv. In practise, a VectorizedEnv acts as a wrapper of an environment initialized with the provided parameters when calling __init__.

Parameters

name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
n_workers (int) –

__getattr__(item)[source]: Forward attributes to the wrapped environment.

__init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]

Initialize a VectorizedEnv.

Parameters

env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times step will be called with the same action.
autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns end = True.
delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.
n_workers (int) – Number of workers that will be used to step the env.
**kwargs – Additional keyword arguments passed to env_class.__init__.

property action_shape: Tuple[int]

Tuple containing the shape of the actions applied to the Environment.

Return type: Tuple[int]

property action_space: gym.spaces.space.Space

Return the action_space of the environment.

Return type: gym.spaces.Space

classmethod batch_step_data(actions, states, dt, batch_size)[source]: Make batches of step data to distribute across workers.

clone(**kwargs)[source]

Return a copy of the environment.

Return type: plangym.core.PlanEnv

create_env_callable(**kwargs)[source]

Return a callable that initializes the environment that is being vectorized.

Return type: Callable[[…], plangym.core.PlanEnv]

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)

Return type: numpy.ndarray

get_state()[source]

Recover the internal state of the simulation.

A state completely describes the Environment at a given moment.

Returns: State of the simulation.

property gym_env: Return the instance of the environment that is being wrapped by plangym.

make_transitions(actions, states, dt, return_state=None)[source]

Implement the logic for stepping the environment in parallel.

Parameters: return_state (Optional[bool]) –

property n_workers: int

Return the number of parallel processes that run step_batch in parallel.

Return type: int

property obs_shape: Tuple[int]

Tuple containing the shape of the observations returned by the Environment.

Return type: Tuple[int]

property observation_space: gym.spaces.space.Space

Return the observation_space of the environment.

Return type: gym.spaces.Space

property plan_env: plangym.core.PlanEnv

Environment that is wrapped by the current instance.

Return type: plangym.core.PlanEnv

render(mode='human')[source]: Render the environment using OpenGL. This wraps the OpenAI render method.

reset(return_state=True)[source]

Reset the environment and returns the first observation, or the first (state, obs) tuple.

Parameters: return_state (bool) – If true return a also the initial state of the env.
Returns: Observation of the environment if return_state is False. Otherwise, return (state, obs) after reset.

sample_action()[source]

Return a valid action that can be used to step the Environment.

Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.

set_state(state)[source]

Set the internal state of the simulation.

Parameters: state – Target state to be set in the environment.

setup()[source]

Initialize the target environment with the parameters provided at __init__.

Return type: None

static split_similar_chunks(vector, n_chunks)[source]

Split an indexable object into similar chunks.

Parameters

vector (Union[list, numpy.ndarray]) – Target indexable object to be split.
n_chunks (int) – Number of similar chunks.

Returns

Generator that returns the chunks created after splitting the target object.

Return type

Generator[Union[list, numpy.ndarray], None, None]

step(action, state=None, dt=1, return_state=None)[source]

Step the environment applying a given action from an arbitrary state.

If is not provided the signature matches the step method from OpenAI gym.

Parameters

action (numpy.ndarray) – Array containing the action to be applied.
state (Optional[numpy.ndarray]) – State to be set before stepping the environment.
dt (int) – Consecutive number of times to apply the given action.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns

if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos).

step_batch(actions, states=None, dt=1, return_state=None)[source]

Vectorized version of the step method.

It allows to step a vector of states and actions. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.

Parameters

actions (numpy.ndarray) – Iterable containing the different actions to be applied.
states (Optional[numpy.ndarray]) – Iterable containing the different states to be set.
dt (Union[numpy.ndarray, int]) – int or array containing the frameskips that will be applied.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns

if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos).

step_with_dt(action, dt=1)[source]

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

Parameters

action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.

Returns

If state is None returns (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info).

Return type

tuple

sync_states(state)[source]

Synchronize the workers’ states with the state of self.gym_env.

Set all the states of the different workers of the internal BatchEnv to the same state as the internal Environment used to apply the non-vectorized steps.

Parameters: state (None) –

static unpack_transitions(results, return_states)[source]

Aggregate the results of stepping across diferent workers.

Parameters

results (list) –
return_states (bool) –