Plangym API

class plangym.core.PlanEnv(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]

Inherit from this class to adapt environments to different problems.

Base class that establishes all needed methods and blueprints to work with Gym environments.

Parameters
  • name (str) –

  • frameskip (int) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • return_image (bool) –

__del__()[source]

Teardown the Environment when it is no longer needed.

__init__(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]

Initialize a Environment.

Parameters
  • name (str) – Name of the environment.

  • frameskip (int) – Number of times step will be called with the same action.

  • autoreset (bool) – Automatically reset the environment when the OpenAI environment returns end = True.

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later (delayed setups are necessary when one requires to serialize the object environment or to have duplicated instances).

  • return_image (bool) – If True add an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.

property action_shape: Tuple[int]

Tuple containing the shape of the actions applied to the Environment.

Return type

Tuple[int]

apply_action(action, *, _ray_trace_ctx=None)[source]

Evolve the environment for one time step applying the provided action.

apply_reset(*, _ray_trace_ctx=None, **kwargs)[source]

Perform the resetting operation on the environment.

begin_step(action=None, dt=None, state=None, return_state=None, *, _ray_trace_ctx=None)[source]

Perform setup of step variables before starting step_with_dt.

Parameters

return_state (Optional[bool]) –

clone(*, _ray_trace_ctx=None, **kwargs)[source]

Return a copy of the environment.

Return type

plangym.core.PlanEnv

close(*, _ray_trace_ctx=None)[source]

Tear down the current environment.

Return type

None

get_image(*, _ray_trace_ctx=None)[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a grayscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)

Return type

Union[None, numpy.ndarray]

get_state()[source]

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Return type

Any

get_step_tuple(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]

Prepare the tuple that step returns.

This is a post processing state to have fine-grained control over what data the current step is returning.

By default it determines:
  • Return the state in the tuple (necessary information to save or load the game).

  • Adding the “rgb” key in the info dictionary containing an RGB representation of the environment.

Parameters
  • obs – Observation of the environment.

  • reward – Reward signal.

  • terminal – Boolean indicating if the environment is finished.

  • info – Dictionary containing additional information about the environment.

Returns

Tuple containing the environment data after calling step.

property name: str

Return is the name of the environment.

Return type

str

property obs_shape: Tuple[int]

Tuple containing the shape of the observations returned by the Environment.

Return type

Tuple[int]

process_apply_action(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]

Perform any post-processing to the data returned by apply_action.

Parameters
  • obs – Observation of the environment.

  • reward – Reward signal.

  • terminal – Boolean indicating if the environment is finished.

  • info – Dictionary containing additional information about the environment.

Returns

Tuple containing the processed data.

process_info(info, *, _ray_trace_ctx=None, **kwargs)[source]

Perform optional computation for computing the info dictionary returned by step.

Return type

Dict[str, Any]

process_obs(obs, *, _ray_trace_ctx=None, **kwargs)[source]

Perform optional computation for computing the observation returned by step.

process_reward(reward, *, _ray_trace_ctx=None, **kwargs)[source]

Perform optional computation for computing the reward returned by step.

Return type

float

process_step(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]

Prepare the returned info dictionary.

This is a post processing step to have fine-grained control over what data the info dictionary contains.

Parameters
  • obs – Observation of the environment.

  • reward – Reward signal.

  • terminal – Boolean indicating if the environment is finished.

  • info – Dictionary containing additional information about the environment.

Returns

Tuple containing the environment data after calling step.

process_terminal(terminal, *, _ray_trace_ctx=None, **kwargs)[source]

Perform optional computation for computing the terminal flag returned by step.

Return type

bool

reset(return_state=True)[source]

Restart the environment.

Parameters

return_state (bool) – If True, it will return the state of the environment.

Returns

(state, obs) if `return_state is True else return obs.

Return type

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

property return_image: bool

Return return_image flag.

If True add an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.

Return type

bool

run_autoreset(step_data, *, _ray_trace_ctx=None)[source]

Reset the environment automatically if needed.

sample_action(*, _ray_trace_ctx=None)[source]

Return a valid action that can be used to step the Environment.

Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.

set_state(state)[source]

Set the internal state of the simulation. Overwrite current state by the given argument.

Parameters

state (Any) – Target state to be set in the environment.

Returns

None

Return type

None

setup()[source]

Run environment initialization.

Including in this function all the code which makes the environment impossible to serialize will allow to dispatch the environment to different workers and initialize it once it’s copied to the target process.

Return type

None

step(action, state=None, dt=1, return_state=None)[source]

Step the environment applying the supplied action.

Optionally set the state to the supplied state before stepping it (the method prepares the environment in the given state, dismissing the current state, and applies the action afterwards).

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

In addition, the method allows the user to prepare the returned object, adding additional information and custom pre-processings via self.process_step and self.get_step_tuple methods.

Parameters
  • action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.

  • state (Optional[numpy.ndarray]) – Set the environment to the given state before stepping it.

  • dt (int) – Consecutive number of times that the action will be applied.

  • return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns

if state is None returns (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info)

Return type

tuple

step_batch(actions, states=None, dt=1, return_state=True)[source]

Allow stepping a vector of states and actions.

Vectorized version of the step method. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.

Parameters
  • actions (Union[numpy.ndarray, Iterable[Union[numpy.ndarray, int]]]) – Iterable containing the different actions to be applied.

  • states (Optional[Union[numpy.ndarray, Iterable]]) – Iterable containing the different states to be set.

  • dt (Union[int, numpy.ndarray]) – int or array containing the consecutive that will be applied to each state. If array, the different values are distributed among the multiple environments (contrary to self.frameskip, which is a common value for any instance).

  • return_state (bool) – Whether to return the state in the returned tuple, depending on the boolean value. If None, step will return the state if state was passed as a parameter.

Returns

If return_state is True, the method returns (new_states, observs, rewards, ends, infos). If return_state is False, the method returns (observs, rewards, ends, infos). If return_state is None, the returned object depends on the states parameter.

Return type

Tuple[Union[list, numpy.ndarray], …]

step_with_dt(action, dt=1, *, _ray_trace_ctx=None)[source]

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

The method performs any post-processing to the data after applying the action to the environment via self.process_apply_action.

This method neither computes nor returns any state.

Parameters
  • action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.

  • dt (int) – Consecutive number of times that the action will be applied.

Returns

Tuple containing (observs, reward, terminal, info).

property unwrapped: plangym.core.PlanEnv

Completely unwrap this Environment.

Returns

The base non-wrapped plangym.Environment instance

Return type

plangym.Environment

class plangym.core.PlangymEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]

Base class for implementing OpenAI gym environments in plangym.

Parameters
  • name (str) –

  • frameskip (int) –

  • autoreset (bool) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

  • delay_setup (bool) –

  • render_mode (Optional[str]) –

__init__(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]

Initialize a PlangymEnv.

The user can read all private methods as instance properties.

Parameters
  • name (str) – Name of the environment. Follows standard gym syntax conventions.

  • frameskip (int) – Number of times an action will be applied for each dt. Common argument to all environments.

  • autoreset (bool) – Automatically reset the environment when the OpenAI environment returns end = True.

  • wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.

  • remove_time_limit – If True, remove the time limit from the environment.

  • render_mode (Optional[str]) –

__repr__()[source]

Pretty print the environment.

__str__()[source]

Pretty print the environment.

property action_shape: Tuple[int, ...]

Tuple containing the shape of the actions applied to the Environment.

Return type

Tuple[int, Ellipsis]

property action_space: gym.spaces.space.Space

Return the action_space of the environment.

Return type

gym.spaces.Space

apply_action(action)[source]

Evolve the environment for one time step applying the provided action.

Accumulate rewards and calculate terminal flag after stepping the environment.

apply_reset(return_state=True)[source]

Restart the environment.

Parameters

return_state (bool) – If True it will return the state of the environment.

Returns

(state, obs) if `return_state is True else return obs.

Return type

Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]

apply_wrappers(wrappers)[source]

Wrap the underlying OpenAI gym environment.

Parameters

wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

clone(**kwargs)[source]

Return a copy of the environment.

Return type

plangym.core.PlangymEnv

close()[source]

Close the underlying gym.Env.

get_coords_obs(obs, **kwargs)[source]

Calculate the observation returned by step when obs_type == “coords”.

get_grayscale_obs(obs, **kwargs)[source]

Calculate the observation returned by step when obs_type == “grayscale”.

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).

Return type

numpy.ndarray

get_rgb_obs(obs, **kwargs)[source]

Calculate the observation returned by step when obs_type == “rgb”.

property gym_env

Return the instance of the environment that is being wrapped by plangym.

init_gym_env()[source]

Initialize the :class:gym.Env instance that the current class is wrapping.

Return type

gym.core.Env

init_spaces()[source]

Initialize the action_space and observation_space of the environment.

property metadata

Return the metadata of the environment.

property obs_shape: Tuple[int, ...]

Tuple containing the shape of the observations returned by the Environment.

Return type

Tuple[int, Ellipsis]

property obs_type: str

Return the type of observation returned by the environment.

Return type

str

property observation_space: gym.spaces.space.Space

Return the observation_space of the environment.

Return type

gym.spaces.Space

process_obs(obs, **kwargs)[source]

Perform optional computation for computing the observation returned by step.

This is a post processing step to have fine-grained control over the returned observation.

property remove_time_limit: bool

Return True if the Environment can only be stepped for a limited number of times.

Return type

bool

render(mode='human')[source]

Render the environment using OpenGL. This wraps the OpenAI render method.

property render_mode: Union[None, str]

None | human | rgb_array.

Type

Return how the game will be rendered. Values

Return type

Union[None, str]

property reward_range

Return the reward_range of the environment.

sample_action()[source]

Return a valid action that can be used to step the environment chosen at random.

Return type

Union[int, numpy.ndarray]

seed(seed=None)[source]

Seed the underlying gym.Env.

setup()[source]

Initialize the target gym.Env instance.

The method calls self.init_gym_env to initialize the :class:gym.Env instance. It removes time limits if needed and applies wrappers introduced by the user.

wrap(wrapper, *args, **kwargs)[source]

Apply a single OpenAI gym wrapper to the environment.

Parameters

wrapper (Callable) –

Videogames

Atari 2600

class plangym.videogames.atari.AtariEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=False, wrappers=None, array_state=True, clone_seeds=False, **kwargs)[source]

Create an environment to play OpenAI gym Atari Games that uses AtariALE as the emulator.

Parameters
  • name (str) – Name of the environment. Follows standard gym syntax conventions.

  • frameskip (int) – Number of times an action will be applied for each step in dt.

  • episodic_life (bool) – Return end = True when losing a life.

  • autoreset (bool) – Restart environment when reaching a terminal state.

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.

  • remove_time_limit (bool) – If True, remove the time limit from the environment.

  • obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.

  • mode (int) – Integer or string indicating the game mode, when available.

  • difficulty (int) – Difficulty level of the game, when available.

  • repeat_action_probability (float) – Repeat the last action with this probability.

  • full_action_space (bool) – Wheter to use the full range of possible actions or only those available in the game.

  • render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.

  • possible_to_win (bool) – It is possible to finish the Atari game without getting a terminal state that is not out of bounds or does not involve losing a life.

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).

  • array_state (bool) – Whether to return the state of the environment as a numpy array.

  • clone_seeds (bool) – Clone the random seed of the ALE emulator when reading/setting the state. False makes the environment stochastic.

Example:

>>> env = plangym.make(name="ALE/MsPacman-v5", difficulty=2, mode=1)
>>> state, obs = env.reset()
>>>
>>> states = [state.copy() for _ in range(10)]
>>> actions = [env.action_space.sample() for _ in range(10)]
>>>
>>> data = env.step_batch(states=states, actions=actions)
>>> new_states, observs, rewards, ends, infos = data
__init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=False, wrappers=None, array_state=True, clone_seeds=False, **kwargs)[source]

Initialize a AtariEnvironment.

Parameters
  • name (str) – Name of the environment. Follows standard gym syntax conventions.

  • frameskip (int) – Number of times an action will be applied for each step in dt.

  • episodic_life (bool) – Return end = True when losing a life.

  • autoreset (bool) – Restart environment when reaching a terminal state.

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.

  • remove_time_limit (bool) – If True, remove the time limit from the environment.

  • obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.

  • mode (int) – Integer or string indicating the game mode, when available.

  • difficulty (int) – Difficulty level of the game, when available.

  • repeat_action_probability (float) – Repeat the last action with this probability.

  • full_action_space (bool) – Wheter to use the full range of possible actions or only those available in the game.

  • render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.

  • possible_to_win (bool) – It is possible to finish the Atari game without getting a terminal state that is not out of bounds or does not involve losing a life.

  • wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).

  • array_state (bool) – Whether to return the state of the environment as a numpy array.

  • clone_seeds (bool) – Clone the random seed of the ALE emulator when reading/setting the state. False makes the environment stochastic.

Example:

>>> env = AtariEnv(name="ALE/MsPacman-v5", difficulty=2, mode=1)
>>> type(env.gym_env)
<class 'gym.envs.atari.environment.AtariEnv'>
>>> state, obs = env.reset()
>>> type(state)
<class 'numpy.ndarray'>
property ale

Return the ale interface of the underlying gym.Env.

Example:

>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="ram")
>>> type(env.ale)
<class 'ale_py._ale_py.ALEInterface'>
clone(**kwargs)[source]

Return a copy of the environment.

Return type

plangym.videogames.env.VideogameEnv

property difficulty: int

Return the selected difficulty for the current environment.

Return type

int

property full_action_space: bool

If True the action space correspond to all possible actions in the Atari emulator.

Return type

bool

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Image is a three-dimensional array interpreted as an RGB image with channels (Height, Width, RGB). Ignores wrappers as it loads the screen directly from the emulator.

Example:

>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="ram")
>>> img = env.get_image()
>>> img.shape
(210, 160, 3)
Return type

numpy.ndarray

get_lifes_from_info(info)[source]

Return the number of lives remaining in the current game.

Parameters

info (Dict[str, Any]) –

Return type

int

get_ram()[source]

Return a numpy array containing the content of the emulator’s RAM.

The RAM is a vector array interpreted as the memory of the emulator.

Example:

>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="grayscale")
>>> ram = env.get_ram()
>>> ram.shape, ram.dtype
((128,), dtype('uint8'))
Return type

numpy.ndarray

get_state()[source]

Recover the internal state of the simulation.

If clone seed is False the environment will be stochastic. Cloning the full state ensures the environment is deterministic.

Example:

>>> env = AtariEnv(name="Qbert-v0")
>>> env.get_state() 
array([<ale_py._ale_py.ALEState object at 0x...>, None],
      dtype=object)

>>> env = AtariEnv(name="Qbert-v0", array_state=False)
>>> env.get_state() 
<ale_py._ale_py.ALEState object at 0x...>
Return type

numpy.ndarray

init_gym_env()[source]

Initialize the gym.Env` instance that the Environment is wrapping.

Return type

gym.core.Env

property mode: int

Return the selected game mode for the current environment.

Return type

int

property observation_space: gym.spaces.space.Space

Return the observation_space of the environment.

Return type

gym.spaces.Space

property repeat_action_probability: float

Probability of repeating the same action after input.

Return type

float

set_state(state)[source]

Set the internal state of the simulation.

Parameters

state (numpy.ndarray) – Target state to be set in the environment.

Return type

None

Example:

>>> env = AtariEnv(name="Qbert-v0")
>>> state, obs = env.reset()
>>> new_state, obs, reward, end, info = env.step(env.sample_action(), state=state)
>>> assert not (state == new_state).all()
>>> env.set_state(state)
>>> (state == env.get_state()).all()
True
step_with_dt(action, dt=1)[source]

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

Parameters
  • action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.

  • dt (int) – Consecutive number of times that the action will be applied.

Returns

If state is None return (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info)

Example:

>>> env = AtariEnv(name="Pong-v0")
>>> obs = env.reset(return_state=False)
>>> obs, reward, end, info = env.step_with_dt(env.sample_action(), dt=7)
>>> assert not end
class plangym.videogames.montezuma.MontezumaEnv(name='PlanMontezuma-v0', frameskip=1, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=True, wrappers=None, array_state=True, clone_seeds=True, **kwargs)[source]

Plangym implementation of the MontezumaEnv environment optimized for planning.

Parameters
  • frameskip (int) –

  • episodic_life (bool) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • remove_time_limit (bool) –

  • obs_type (str) –

  • mode (int) –

  • difficulty (int) –

  • repeat_action_probability (float) –

  • full_action_space (bool) –

  • render_mode (Optional[str]) –

  • possible_to_win (bool) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

  • array_state (bool) –

  • clone_seeds (bool) –

__init__(name='PlanMontezuma-v0', frameskip=1, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=True, wrappers=None, array_state=True, clone_seeds=True, **kwargs)[source]

Initialize a MontezumaEnv.

Parameters
  • frameskip (int) –

  • episodic_life (bool) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • remove_time_limit (bool) –

  • obs_type (str) –

  • mode (int) –

  • difficulty (int) –

  • repeat_action_probability (float) –

  • full_action_space (bool) –

  • render_mode (Optional[str]) –

  • possible_to_win (bool) –

  • wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) –

  • array_state (bool) –

  • clone_seeds (bool) –

get_state()[source]

Recover the internal state of the simulation.

If clone seed is False the environment will be stochastic. Cloning the full state ensures the environment is deterministic.

Return type

numpy.ndarray

init_gym_env()[source]

Initialize the gum.Env` instance that the current clas is wrapping.

Return type

plangym.videogames.montezuma.CustomMontezuma

set_state(state)[source]

Set the internal state of the simulation.

Parameters

state (numpy.ndarray) – Target state to be set in the environment.

Returns

None

Gym retro

class plangym.videogames.retro.RetroEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Environment for playing gym-retro games.

Parameters
  • name (str) –

  • frameskip (int) –

  • episodic_life (bool) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • remove_time_limit (bool) –

  • obs_type (str) –

  • render_mode (Optional[str]) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

__getattr__(item)[source]

Forward getattr to self.gym_env.

__init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Initialize a RetroEnv.

Parameters
  • name (str) – Name of the environment. Follows standard gym syntax conventions.

  • frameskip (int) – Number of times an action will be applied for each step in dt.

  • episodic_life (bool) – Return end = True when losing a life.

  • autoreset (bool) – Restart environment when reaching a terminal state.

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.

  • remove_time_limit (bool) – If True, remove the time limit from the environment.

  • obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.

  • render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.

  • wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).

clone(**kwargs)[source]

Return a copy of the environment with its initialization delayed.

Return type

plangym.videogames.retro.RetroEnv

close()[source]

Close the underlying gym.Env.

get_ram()[source]

Return the ram of the emulator as a numpy array.

Return type

numpy.ndarray

get_state()[source]

Get the state of the retro environment.

Return type

numpy.ndarray

static get_win_condition(info)[source]

Get win condition for games that have the end of the screen available.

Parameters

info (Dict[str, Any]) –

Return type

bool

init_gym_env()[source]

Initialize the retro environment.

Return type

gym.core.Env

set_state(state)[source]

Set the state of the retro environment.

Parameters

state (numpy.ndarray) –

Super Mario (NES)

class plangym.videogames.nes.MarioEnv(name, movement_type='simple', original_reward=False, **kwargs)[source]

Interface for using gym-super-mario-bros in plangym.

Parameters
  • name (str) –

  • movement_type (str) –

  • original_reward (bool) –

__init__(name, movement_type='simple', original_reward=False, **kwargs)[source]

Initialize a MarioEnv.

Parameters
  • name (str) – Name of the environment.

  • movement_type (str) – One of {complex|simple|right}

  • original_reward (bool) – If False return a custom reward based on mario position and level.

  • **kwargs – passed to super().__init__.

get_coords_obs(obs, info=None, **kwargs)[source]

Return the information contained in info as an observation if obs_type == “info”.

Parameters
  • obs (numpy.ndarray) –

  • info (Optional[Dict[str, Any]]) –

Return type

numpy.ndarray

get_state(state=None)[source]

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Parameters

state (Optional[numpy.ndarray]) –

Return type

numpy.ndarray

init_gym_env()[source]

Initialize the NESEnv` instance that the current class is wrapping.

Return type

gym.core.Env

process_info(info, **kwargs)[source]

Add additional data to the info dictionary.

Return type

Dict[str, Any]

process_reward(reward, info, **kwargs)[source]

Return a custom reward based on the x, y coordinates and level mario is in.

Return type

float

process_terminal(terminal, info, **kwargs)[source]

Return True if terminal or mario is dying.

Return type

bool

class plangym.videogames.nes.NesEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Environment for working with the NES-py emulator.

Parameters
  • name (str) –

  • frameskip (int) –

  • episodic_life (bool) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • remove_time_limit (bool) –

  • obs_type (str) –

  • render_mode (Optional[str]) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

__del__()[source]

Tear down the environment.

close()[source]

Close the underlying gym.Env.

Return type

None

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)

Return type

numpy.ndarray

get_ram()[source]

Return a copy of the emulator environment.

Return type

numpy.ndarray

get_state(state=None)[source]

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Parameters

state (Optional[numpy.ndarray]) –

Return type

numpy.ndarray

property nes_env: NESEnv

Access the underlying NESEnv.

Return type

NESEnv

set_state(state)[source]

Set the internal state of the simulation.

Parameters

state (numpy.ndarray) – Target state to be set in the environment.

Returns

None

Return type

None

Video games API

class plangym.videogames.env.VideogameEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Common interface for working with video games that run using an emulator.

Parameters
  • name (str) –

  • frameskip (int) –

  • episodic_life (bool) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • remove_time_limit (bool) –

  • obs_type (str) –

  • render_mode (Optional[str]) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

__init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]

Initialize a VideogameEnv.

Parameters
  • name (str) – Name of the environment. Follows standard gym syntax conventions.

  • frameskip (int) – Number of times an action will be applied for each step in dt.

  • episodic_life (bool) – Return end = True when losing a life.

  • autoreset (bool) – Restart environment when reaching a terminal state.

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.

  • remove_time_limit (bool) – If True, remove the time limit from the environment.

  • obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.

  • mode – Integer or string indicating the game mode, when available.

  • difficulty – Difficulty level of the game, when available.

  • repeat_action_probability – Repeat the last action with this probability.

  • full_action_space – Whether to use the full range of possible actions or only those available in the game.

  • render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.

  • wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).

apply_action(action)[source]

Evolve the environment for one time step applying the provided action.

begin_step(action=None, dt=None, state=None, return_state=None)[source]

Perform setup of step variables before starting step_with_dt.

Parameters

return_state (Optional[bool]) –

Return type

None

clone(**kwargs)[source]

Return a copy of the environment.

Return type

plangym.videogames.env.VideogameEnv

static get_lifes_from_info(info)[source]

Return the number of lifes remaining in the current game.

Parameters

info (Dict[str, Any]) –

Return type

int

get_ram()[source]

Return the ram of the emulator as a numpy array.

Return type

numpy.ndarray

init_spaces()[source]

Initialize the action_space and the observation_space of the environment.

Return type

None

property n_actions: int

Return the number of actions available.

Return type

int

process_obs(obs, **kwargs)[source]

Return the ram vector if obs_type == “ram” or and image otherwise.

Control Tasks

DM Control

class plangym.control.dm_control.DMControlEnv(name='cartpole-balance', frameskip=1, episodic_life=False, autoreset=True, wrappers=None, delay_setup=False, visualize_reward=True, domain_name=None, task_name=None, render_mode=None, obs_type=None, remove_time_limit=None)[source]

Wrap the `dm_control library, allowing its implementation in planning problems.

The dm_control library is a DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo physics.

For more information about the environment, please refer to https://github.com/deepmind/dm_control

This class allows the implementation of dm_control in planning problems. It allows parallel and vectorized execution of the environments.

Parameters
  • name (str) –

  • frameskip (int) –

  • episodic_life (bool) –

  • autoreset (bool) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

  • delay_setup (bool) –

  • visualize_reward (bool) –

  • obs_type (Optional[str]) –

__init__(name='cartpole-balance', frameskip=1, episodic_life=False, autoreset=True, wrappers=None, delay_setup=False, visualize_reward=True, domain_name=None, task_name=None, render_mode=None, obs_type=None, remove_time_limit=None)[source]

Initialize a DMControlEnv.

Parameters
  • name (str) – Name of the task. Provide the task to be solved as domain_name-task_name. For example ‘cartpole-balance’.

  • frameskip (int) – Set a deterministic frameskip to apply the same action N times.

  • episodic_life (bool) – Send terminal signal after loosing a life.

  • autoreset (bool) – Restart environment when reaching a terminal state.

  • wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a gym.Wrapper or a tuple containing (gym.Wrapper, kwargs).

  • delay_setup (bool) – If True, do not initialize the gym.Environment and wait for setup to be called later.

  • visualize_reward (bool) – Define the color of the agent, which depends on the reward on its last timestep.

  • domain_name – Same as in dm_control.suite.load.

  • task_name – Same as in dm_control.suite.load.

  • render_mode – None|human|rgb_array

  • obs_type (Optional[str]) –

action_spec()[source]

Alias for the environment’s action_spec.

apply_action(action)[source]

Transform the returned time_step object to a compatible gym tuple.

close()[source]

Tear down the environment and close rendering.

property domain_name: str

Return the name of the agent in the current simulation.

Return type

str

get_coords_obs(obs, **kwargs)[source]

Get the environment observation from a time_step object.

Parameters
  • obs – Time step object returned after stepping the environment.

  • **kwargs – Ignored

Returns

Numpy array containing the environment observation.

Return type

numpy.ndarray

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).

Return type

numpy.ndarray

get_state()[source]

Return a tuple containing the three arrays that characterize the state of the system.

Each tuple contains the position of the robot, its velocity

and the control variables currently being applied.

Returns

Tuple of numpy arrays containing all the information needed to describe the current state of the simulation.

Return type

numpy.ndarray

init_gym_env()[source]

Initialize the environment instance (dm_control) that the current class is wrapping.

property physics

Alias for gym_env.physics.

render(mode='human')[source]

Store all the RGB images rendered to be shown when the show_game function is called.

Parameters

modergb_array return an RGB image stored in a numpy array. human stores the rendered image in a viewer to be shown when show_game is called.

Returns

numpy.ndarray when mode == rgb_array. True when mode == human

set_state(state)[source]

Set the state of the simulator to the target State.

Parameters

state (numpy.ndarray) – numpy.ndarray containing the information about the state to be set.

Returns

None

Return type

None

setup()[source]

Initialize the target gym.Env instance.

show_game(sleep=0.05)[source]

Render the collected RGB images.

When ‘human’ option is selected as argument for the render method, it stores a collection of RGB images inside the self.viewer attribute. This method calls the latter to visualize the collected images.

Parameters

sleep (float) –

property task_name: str

Return the name of the task in the current simulation.

Return type

str

Classic control

class plangym.control.classic_control.ClassicControl(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]

Environment for OpenAI gym classic control environments.

Parameters
  • name (str) –

  • frameskip (int) –

  • autoreset (bool) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

  • delay_setup (bool) –

  • render_mode (Optional[str]) –

get_state()[source]

Recover the internal state of the environment.

Return type

numpy.ndarray

set_state(state)[source]

Set the internal state of the environemnt.

Parameters

state (numpy.ndarray) – Target state to be set in the environment.

Returns

None

Box2D

class plangym.control.box_2d.Box2DEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]

Common interface for working with Box2D environments released by gym.

Parameters
  • name (str) –

  • frameskip (int) –

  • autoreset (bool) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

  • delay_setup (bool) –

  • render_mode (Optional[str]) –

get_state()[source]

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Return type

numpy.array

set_state(state)[source]

Set the internal state of the simulation.

Parameters

state (numpy.ndarray) – Target state to be set in the environment.

Returns

None

Return type

None

class plangym.control.lunar_lander.LunarLander(name=None, frameskip=1, episodic_life=True, autoreset=True, wrappers=None, delay_setup=False, deterministic=False, continuous=False, render_mode=None, remove_time_limit=None, **kwargs)[source]

Fast LunarLander that follows the plangym API.

Parameters
  • name (str) –

  • frameskip (int) –

  • episodic_life (bool) –

  • autoreset (bool) –

  • wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –

  • delay_setup (bool) –

  • deterministic (bool) –

  • continuous (bool) –

  • render_mode (Optional[str]) –

__init__(name=None, frameskip=1, episodic_life=True, autoreset=True, wrappers=None, delay_setup=False, deterministic=False, continuous=False, render_mode=None, remove_time_limit=None, **kwargs)[source]

Initialize a LunarLander.

Parameters
  • name (Optional[str]) –

  • frameskip (int) –

  • episodic_life (bool) –

  • autoreset (bool) –

  • wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) –

  • delay_setup (bool) –

  • deterministic (bool) –

  • continuous (bool) –

  • render_mode (Optional[str]) –

property continuous: bool

Return true if the LunarLander agent takes continuous actions as input.

Return type

bool

property deterministic: bool

Return true if the LunarLander simulation is deterministic.

Return type

bool

get_state()[source]

Recover the internal state of the simulation.

An state must completely describe the Environment at a given moment.

Return type

numpy.ndarray

init_gym_env()[source]

Initialize the target gym.Env instance.

Return type

plangym.control.lunar_lander.FastGymLunarLander

process_terminal(terminal, obs=None, **kwargs)[source]

Return the terminal condition considering the lunar lander state.

Return type

bool

set_state(state)[source]

Set the internal state of the simulation.

Parameters

state (numpy.ndarray) – Target state to be set in the environment.

Returns

None

Return type

None

Vectorization

Multiprocessing

class plangym.vectorization.parallel.ParallelEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, blocking=False, **kwargs)[source]

Allow any environment to be stepped in parallel when step_batch is called.

It creates a local instance of the target environment to call all other methods.

Example:

>>> from plangym.videogames import AtariEnv
>>> env = ParallelEnv(env_class=AtariEnv,
...                           name="MsPacman-v0",
...                           clone_seeds=True,
...                           autoreset=True,
...                           blocking=False)
>>>
>>> state, obs = env.reset()
>>>
>>> states = [state.copy() for _ in range(10)]
>>> actions = [env.sample_action() for _ in range(10)]
>>>
>>> data =  env.step_batch(states=states, actions=actions)
>>> new_states, observs, rewards, ends, infos = data
Parameters
  • name (str) –

  • frameskip (int) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • n_workers (int) –

  • blocking (bool) –

__init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, blocking=False, **kwargs)[source]

Initialize a ParallelEnv.

Parameters
  • env_class – Class of the environment to be wrapped.

  • name (str) – Name of the environment.

  • frameskip (int) – Number of times step will me called with the same action.

  • autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns end = True.

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.

  • env_callable – Callable that returns an instance of the environment that will be parallelized.

  • n_workers (int) – Number of workers that will be used to step the env.

  • blocking (bool) – Step the environments synchronously.

  • *args – Additional args for the environment.

  • **kwargs – Additional kwargs for the environment.

property blocking: bool

If True the steps are performed sequentially.

Return type

bool

clone(**kwargs)[source]

Return a copy of the environment.

Return type

plangym.core.PlanEnv

close()[source]

Close the environment and the spawned processes.

Return type

None

make_transitions(actions, states=None, dt=1, return_state=None)[source]

Vectorized version of the step method.

It allows to step a vector of states and actions. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.

Parameters
  • actions (numpy.ndarray) – Iterable containing the different actions to be applied.

  • states (Optional[numpy.ndarray]) – Iterable containing the different states to be set.

  • dt (Union[numpy.ndarray, int]) – int or array containing the frameskips that will be applied.

  • return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns

if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos)

setup()[source]

Run environment initialization and create the subprocesses for stepping in parallel.

sync_states(state)[source]

Synchronize all the copies of the wrapped environment.

Set all the states of the different workers of the internal BatchEnv

to the same state as the internal Environment used to apply the non-vectorized steps.

Parameters

state (None) –

Ray

class plangym.vectorization.ray.RayEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]

Use ray for taking steps in parallel when calling step_batch.

Parameters
  • name (str) –

  • frameskip (int) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • n_workers (int) –

__init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]

Initialize a ParallelEnv.

Parameters
  • env_class – Class of the environment to be wrapped.

  • name (str) – Name of the environment.

  • frameskip (int) – Number of times step will me called with the same action.

  • autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns end = True.

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.

  • env_callable – Callable that returns an instance of the environment that will be parallelized.

  • n_workers (int) – Number of workers that will be used to step the env.

  • *args – Additional args for the environment.

  • **kwargs – Additional kwargs for the environment.

make_transitions(actions, states=None, dt=1, return_state=None)[source]

Implement the logic for stepping the environment in parallel.

Parameters
  • dt ([<class 'numpy.ndarray'>, <class 'int'>]) –

  • return_state (Optional[bool]) –

reset(return_state=True)[source]

Restart the environment.

Parameters

return_state (bool) –

Return type

[<class ‘numpy.ndarray’>, <class ‘tuple’>]

setup()[source]

Run environment initialization and create the subprocesses for stepping in parallel.

sync_states(state)[source]

Synchronize all the copies of the wrapped environment.

Set all the states of the different workers of the internal BatchEnv

to the same state as the internal Environment used to apply the non-vectorized steps.

Parameters

state (None) –

Return type

None

property workers: List[<plangym.vectorization.ray.ActorClass(RemoteEnv) object at 0x7fd85c670790>]

Remote actors exposing copies of the environment.

Return type

List[RemoteEnv]

Vectorization API

class plangym.vectorization.env.VectorizedEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]

Base class that defines the API for working with vectorized environments.

A vectorized environment allows to step several copies of the environment in parallel when calling step_batch.

It creates a local copy of the environment that is the target of all the other methods of PlanEnv. In practise, a VectorizedEnv acts as a wrapper of an environment initialized with the provided parameters when calling __init__.

Parameters
  • name (str) –

  • frameskip (int) –

  • autoreset (bool) –

  • delay_setup (bool) –

  • n_workers (int) –

__getattr__(item)[source]

Forward attributes to the wrapped environment.

__init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]

Initialize a VectorizedEnv.

Parameters
  • env_class – Class of the environment to be wrapped.

  • name (str) – Name of the environment.

  • frameskip (int) – Number of times step will be called with the same action.

  • autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns end = True.

  • delay_setup (bool) – If True do not initialize the gym.Environment and wait for setup to be called later.

  • n_workers (int) – Number of workers that will be used to step the env.

  • **kwargs – Additional keyword arguments passed to env_class.__init__.

property action_shape: Tuple[int]

Tuple containing the shape of the actions applied to the Environment.

Return type

Tuple[int]

property action_space: gym.spaces.space.Space

Return the action_space of the environment.

Return type

gym.spaces.Space

classmethod batch_step_data(actions, states, dt, batch_size)[source]

Make batches of step data to distribute across workers.

clone(**kwargs)[source]

Return a copy of the environment.

Return type

plangym.core.PlanEnv

create_env_callable(**kwargs)[source]

Return a callable that initializes the environment that is being vectorized.

Return type

Callable[[…], plangym.core.PlanEnv]

get_image()[source]

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)

Return type

numpy.ndarray

get_state()[source]

Recover the internal state of the simulation.

A state completely describes the Environment at a given moment.

Returns

State of the simulation.

property gym_env

Return the instance of the environment that is being wrapped by plangym.

make_transitions(actions, states, dt, return_state=None)[source]

Implement the logic for stepping the environment in parallel.

Parameters

return_state (Optional[bool]) –

property n_workers: int

Return the number of parallel processes that run step_batch in parallel.

Return type

int

property obs_shape: Tuple[int]

Tuple containing the shape of the observations returned by the Environment.

Return type

Tuple[int]

property observation_space: gym.spaces.space.Space

Return the observation_space of the environment.

Return type

gym.spaces.Space

property plan_env: plangym.core.PlanEnv

Environment that is wrapped by the current instance.

Return type

plangym.core.PlanEnv

render(mode='human')[source]

Render the environment using OpenGL. This wraps the OpenAI render method.

reset(return_state=True)[source]

Reset the environment and returns the first observation, or the first (state, obs) tuple.

Parameters

return_state (bool) – If true return a also the initial state of the env.

Returns

Observation of the environment if return_state is False. Otherwise, return (state, obs) after reset.

sample_action()[source]

Return a valid action that can be used to step the Environment.

Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.

set_state(state)[source]

Set the internal state of the simulation.

Parameters

state – Target state to be set in the environment.

setup()[source]

Initialize the target environment with the parameters provided at __init__.

Return type

None

static split_similar_chunks(vector, n_chunks)[source]

Split an indexable object into similar chunks.

Parameters
  • vector (Union[list, numpy.ndarray]) – Target indexable object to be split.

  • n_chunks (int) – Number of similar chunks.

Returns

Generator that returns the chunks created after splitting the target object.

Return type

Generator[Union[list, numpy.ndarray], None, None]

step(action, state=None, dt=1, return_state=None)[source]

Step the environment applying a given action from an arbitrary state.

If is not provided the signature matches the step method from OpenAI gym.

Parameters
  • action (numpy.ndarray) – Array containing the action to be applied.

  • state (Optional[numpy.ndarray]) – State to be set before stepping the environment.

  • dt (int) – Consecutive number of times to apply the given action.

  • return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns

if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos).

step_batch(actions, states=None, dt=1, return_state=None)[source]

Vectorized version of the step method.

It allows to step a vector of states and actions. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.

Parameters
  • actions (numpy.ndarray) – Iterable containing the different actions to be applied.

  • states (Optional[numpy.ndarray]) – Iterable containing the different states to be set.

  • dt (Union[numpy.ndarray, int]) – int or array containing the frameskips that will be applied.

  • return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns

if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos).

step_with_dt(action, dt=1)[source]

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

Parameters
  • action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.

  • dt (int) – Consecutive number of times that the action will be applied.

Returns

If state is None returns (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info).

Return type

tuple

sync_states(state)[source]

Synchronize the workers’ states with the state of self.gym_env.

Set all the states of the different workers of the internal BatchEnv to the same state as the internal Environment used to apply the non-vectorized steps.

Parameters

state (None) –

static unpack_transitions(results, return_states)[source]

Aggregate the results of stepping across diferent workers.

Parameters
  • results (list) –

  • return_states (bool) –