Plangym API
- class plangym.core.PlanEnv(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]
Inherit from this class to adapt environments to different problems.
Base class that establishes all needed methods and blueprints to work with Gym environments.
- Parameters
name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
return_image (bool) –
- __init__(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]
Initialize a
Environment
.- Parameters
name (str) – Name of the environment.
frameskip (int) – Number of times
step
will be called with the same action.autoreset (bool) – Automatically reset the environment when the OpenAI environment returns
end = True
.delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later (delayed setups are necessary when one requires to serialize the object environment or to have duplicated instances).return_image (bool) – If
True
add an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.
- property action_shape: Tuple[int]
Tuple containing the shape of the actions applied to the Environment.
- Return type
Tuple[int]
- apply_action(action, *, _ray_trace_ctx=None)[source]
Evolve the environment for one time step applying the provided action.
- apply_reset(*, _ray_trace_ctx=None, **kwargs)[source]
Perform the resetting operation on the environment.
- begin_step(action=None, dt=None, state=None, return_state=None, *, _ray_trace_ctx=None)[source]
Perform setup of step variables before starting step_with_dt.
- Parameters
return_state (Optional[bool]) –
- get_image(*, _ray_trace_ctx=None)[source]
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a grayscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)
- Return type
Union[None, numpy.ndarray]
- get_state()[source]
Recover the internal state of the simulation.
A state must completely describe the Environment at a given moment.
- Return type
Any
- get_step_tuple(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]
Prepare the tuple that step returns.
This is a post processing state to have fine-grained control over what data the current step is returning.
- By default it determines:
Return the state in the tuple (necessary information to save or load the game).
Adding the “rgb” key in the info dictionary containing an RGB representation of the environment.
- Parameters
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
- Returns
Tuple containing the environment data after calling step.
- property name: str
Return is the name of the environment.
- Return type
str
- property obs_shape: Tuple[int]
Tuple containing the shape of the observations returned by the Environment.
- Return type
Tuple[int]
- process_apply_action(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]
Perform any post-processing to the data returned by apply_action.
- Parameters
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
- Returns
Tuple containing the processed data.
- process_info(info, *, _ray_trace_ctx=None, **kwargs)[source]
Perform optional computation for computing the info dictionary returned by step.
- Return type
Dict[str, Any]
- process_obs(obs, *, _ray_trace_ctx=None, **kwargs)[source]
Perform optional computation for computing the observation returned by step.
- process_reward(reward, *, _ray_trace_ctx=None, **kwargs)[source]
Perform optional computation for computing the reward returned by step.
- Return type
float
- process_step(obs, reward, terminal, info, *, _ray_trace_ctx=None)[source]
Prepare the returned info dictionary.
This is a post processing step to have fine-grained control over what data the info dictionary contains.
- Parameters
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
- Returns
Tuple containing the environment data after calling step.
- process_terminal(terminal, *, _ray_trace_ctx=None, **kwargs)[source]
Perform optional computation for computing the terminal flag returned by step.
- Return type
bool
- reset(return_state=True)[source]
Restart the environment.
- Parameters
return_state (bool) – If
True
, it will return the state of the environment.- Returns
(state, obs)
if`return_state
isTrue
else returnobs
.- Return type
Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]
- property return_image: bool
Return return_image flag.
If
True
add an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.- Return type
bool
- run_autoreset(step_data, *, _ray_trace_ctx=None)[source]
Reset the environment automatically if needed.
- sample_action(*, _ray_trace_ctx=None)[source]
Return a valid action that can be used to step the Environment.
Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.
- set_state(state)[source]
Set the internal state of the simulation. Overwrite current state by the given argument.
- Parameters
state (Any) – Target state to be set in the environment.
- Returns
None
- Return type
None
- setup()[source]
Run environment initialization.
Including in this function all the code which makes the environment impossible to serialize will allow to dispatch the environment to different workers and initialize it once it’s copied to the target process.
- Return type
None
- step(action, state=None, dt=1, return_state=None)[source]
Step the environment applying the supplied action.
Optionally set the state to the supplied state before stepping it (the method prepares the environment in the given state, dismissing the current state, and applies the action afterwards).
Take
dt
simulation steps and make the environment evolve in multiples ofself.frameskip
for a total ofdt
*self.frameskip
steps.In addition, the method allows the user to prepare the returned object, adding additional information and custom pre-processings via
self.process_step
andself.get_step_tuple
methods.- Parameters
action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
state (Optional[numpy.ndarray]) – Set the environment to the given state before stepping it.
dt (int) – Consecutive number of times that the action will be applied.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns
if state is None returns
(observs, reward, terminal, info)
else returns(new_state, observs, reward, terminal, info)
- Return type
tuple
- step_batch(actions, states=None, dt=1, return_state=True)[source]
Allow stepping a vector of states and actions.
Vectorized version of the step method. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.
- Parameters
actions (Union[numpy.ndarray, Iterable[Union[numpy.ndarray, int]]]) – Iterable containing the different actions to be applied.
states (Optional[Union[numpy.ndarray, Iterable]]) – Iterable containing the different states to be set.
dt (Union[int, numpy.ndarray]) – int or array containing the consecutive that will be applied to each state. If array, the different values are distributed among the multiple environments (contrary to
self.frameskip
, which is a common value for any instance).return_state (bool) – Whether to return the state in the returned tuple, depending on the boolean value. If None, step will return the state if state was passed as a parameter.
- Returns
If return_state is True, the method returns (new_states, observs, rewards, ends, infos). If return_state is False, the method returns (observs, rewards, ends, infos). If return_state is None, the returned object depends on the states parameter.
- Return type
Tuple[Union[list, numpy.ndarray], …]
- step_with_dt(action, dt=1, *, _ray_trace_ctx=None)[source]
Take
dt
simulation steps and make the environment evolve in multiples ofself.frameskip
for a total ofdt
*self.frameskip
steps.The method performs any post-processing to the data after applying the action to the environment via
self.process_apply_action
.This method neither computes nor returns any state.
- Parameters
action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.
- Returns
Tuple containing
(observs, reward, terminal, info)
.
- property unwrapped: plangym.core.PlanEnv
Completely unwrap this Environment.
- Returns
The base non-wrapped plangym.Environment instance
- Return type
plangym.Environment
- class plangym.core.PlangymEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]
Base class for implementing OpenAI
gym
environments inplangym
.- Parameters
name (str) –
frameskip (int) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
render_mode (Optional[str]) –
- __init__(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]
Initialize a
PlangymEnv
.The user can read all private methods as instance properties.
- Parameters
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each
dt
. Common argument to all environments.autoreset (bool) – Automatically reset the environment when the OpenAI environment returns
end = True
.wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapper
or a tuple containing(gym.Wrapper, kwargs)
.delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later.remove_time_limit – If True, remove the time limit from the environment.
render_mode (Optional[str]) –
- property action_shape: Tuple[int, ...]
Tuple containing the shape of the actions applied to the Environment.
- Return type
Tuple[int, Ellipsis]
- property action_space: gym.spaces.space.Space
Return the action_space of the environment.
- Return type
gym.spaces.Space
- apply_action(action)[source]
Evolve the environment for one time step applying the provided action.
Accumulate rewards and calculate terminal flag after stepping the environment.
- apply_reset(return_state=True)[source]
Restart the environment.
- Parameters
return_state (bool) – If
True
it will return the state of the environment.- Returns
(state, obs)
if`return_state
isTrue
else returnobs
.- Return type
Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]
- apply_wrappers(wrappers)[source]
Wrap the underlying OpenAI gym environment.
- Parameters
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
- get_coords_obs(obs, **kwargs)[source]
Calculate the observation returned by step when obs_type == “coords”.
- get_grayscale_obs(obs, **kwargs)[source]
Calculate the observation returned by step when obs_type == “grayscale”.
- get_image()[source]
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).
- Return type
numpy.ndarray
- get_rgb_obs(obs, **kwargs)[source]
Calculate the observation returned by step when obs_type == “rgb”.
- property gym_env
Return the instance of the environment that is being wrapped by plangym.
- init_gym_env()[source]
Initialize the :class:
gym.Env
instance that the current class is wrapping.- Return type
gym.core.Env
- property metadata
Return the metadata of the environment.
- property obs_shape: Tuple[int, ...]
Tuple containing the shape of the observations returned by the Environment.
- Return type
Tuple[int, Ellipsis]
- property obs_type: str
Return the type of observation returned by the environment.
- Return type
str
- property observation_space: gym.spaces.space.Space
Return the observation_space of the environment.
- Return type
gym.spaces.Space
- process_obs(obs, **kwargs)[source]
Perform optional computation for computing the observation returned by step.
This is a post processing step to have fine-grained control over the returned observation.
- property remove_time_limit: bool
Return True if the Environment can only be stepped for a limited number of times.
- Return type
bool
- render(mode='human')[source]
Render the environment using OpenGL. This wraps the OpenAI render method.
- property render_mode: Union[None, str]
None | human | rgb_array.
- Type
Return how the game will be rendered. Values
- Return type
Union[None, str]
- property reward_range
Return the reward_range of the environment.
- sample_action()[source]
Return a valid action that can be used to step the environment chosen at random.
- Return type
Union[int, numpy.ndarray]
Videogames
Atari 2600
- class plangym.videogames.atari.AtariEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=False, wrappers=None, array_state=True, clone_seeds=False, **kwargs)[source]
Create an environment to play OpenAI gym Atari Games that uses AtariALE as the emulator.
- Parameters
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return
end = True
when losing a life.autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode (int) – Integer or string indicating the game mode, when available.
difficulty (int) – Difficulty level of the game, when available.
repeat_action_probability (float) – Repeat the last action with this probability.
full_action_space (bool) – Wheter to use the full range of possible actions or only those available in the game.
render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.
possible_to_win (bool) – It is possible to finish the Atari game without getting a terminal state that is not out of bounds or does not involve losing a life.
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapper
or a tuple containing(gym.Wrapper, kwargs)
.array_state (bool) – Whether to return the state of the environment as a numpy array.
clone_seeds (bool) – Clone the random seed of the ALE emulator when reading/setting the state. False makes the environment stochastic.
Example:
>>> env = plangym.make(name="ALE/MsPacman-v5", difficulty=2, mode=1) >>> state, obs = env.reset() >>> >>> states = [state.copy() for _ in range(10)] >>> actions = [env.action_space.sample() for _ in range(10)] >>> >>> data = env.step_batch(states=states, actions=actions) >>> new_states, observs, rewards, ends, infos = data
- __init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=False, wrappers=None, array_state=True, clone_seeds=False, **kwargs)[source]
Initialize a
AtariEnvironment
.- Parameters
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return
end = True
when losing a life.autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode (int) – Integer or string indicating the game mode, when available.
difficulty (int) – Difficulty level of the game, when available.
repeat_action_probability (float) – Repeat the last action with this probability.
full_action_space (bool) – Wheter to use the full range of possible actions or only those available in the game.
render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.
possible_to_win (bool) – It is possible to finish the Atari game without getting a terminal state that is not out of bounds or does not involve losing a life.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapper
or a tuple containing(gym.Wrapper, kwargs)
.array_state (bool) – Whether to return the state of the environment as a numpy array.
clone_seeds (bool) – Clone the random seed of the ALE emulator when reading/setting the state. False makes the environment stochastic.
Example:
>>> env = AtariEnv(name="ALE/MsPacman-v5", difficulty=2, mode=1) >>> type(env.gym_env) <class 'gym.envs.atari.environment.AtariEnv'> >>> state, obs = env.reset() >>> type(state) <class 'numpy.ndarray'>
- property ale
Return the
ale
interface of the underlyinggym.Env
.Example:
>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="ram") >>> type(env.ale) <class 'ale_py._ale_py.ALEInterface'>
- property difficulty: int
Return the selected difficulty for the current environment.
- Return type
int
- property full_action_space: bool
If True the action space correspond to all possible actions in the Atari emulator.
- Return type
bool
- get_image()[source]
Return a numpy array containing the rendered view of the environment.
Image is a three-dimensional array interpreted as an RGB image with channels (Height, Width, RGB). Ignores wrappers as it loads the screen directly from the emulator.
Example:
>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="ram") >>> img = env.get_image() >>> img.shape (210, 160, 3)
- Return type
numpy.ndarray
- get_lifes_from_info(info)[source]
Return the number of lives remaining in the current game.
- Parameters
info (Dict[str, Any]) –
- Return type
int
- get_ram()[source]
Return a numpy array containing the content of the emulator’s RAM.
The RAM is a vector array interpreted as the memory of the emulator.
Example:
>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="grayscale") >>> ram = env.get_ram() >>> ram.shape, ram.dtype ((128,), dtype('uint8'))
- Return type
numpy.ndarray
- get_state()[source]
Recover the internal state of the simulation.
If clone seed is False the environment will be stochastic. Cloning the full state ensures the environment is deterministic.
Example:
>>> env = AtariEnv(name="Qbert-v0") >>> env.get_state() array([<ale_py._ale_py.ALEState object at 0x...>, None], dtype=object) >>> env = AtariEnv(name="Qbert-v0", array_state=False) >>> env.get_state() <ale_py._ale_py.ALEState object at 0x...>
- Return type
numpy.ndarray
- init_gym_env()[source]
Initialize the
gym.Env`
instance that the Environment is wrapping.- Return type
gym.core.Env
- property mode: int
Return the selected game mode for the current environment.
- Return type
int
- property observation_space: gym.spaces.space.Space
Return the observation_space of the environment.
- Return type
gym.spaces.Space
- property repeat_action_probability: float
Probability of repeating the same action after input.
- Return type
float
- set_state(state)[source]
Set the internal state of the simulation.
- Parameters
state (numpy.ndarray) – Target state to be set in the environment.
- Return type
None
Example:
>>> env = AtariEnv(name="Qbert-v0") >>> state, obs = env.reset() >>> new_state, obs, reward, end, info = env.step(env.sample_action(), state=state) >>> assert not (state == new_state).all() >>> env.set_state(state) >>> (state == env.get_state()).all() True
- step_with_dt(action, dt=1)[source]
Take
dt
simulation steps and make the environment evolve in multiples ofself.frameskip
for a total ofdt
*self.frameskip
steps.- Parameters
action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.
- Returns
If state is None return
(observs, reward, terminal, info)
else returns(new_state, observs, reward, terminal, info)
Example:
>>> env = AtariEnv(name="Pong-v0") >>> obs = env.reset(return_state=False) >>> obs, reward, end, info = env.step_with_dt(env.sample_action(), dt=7) >>> assert not end
- class plangym.videogames.montezuma.MontezumaEnv(name='PlanMontezuma-v0', frameskip=1, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=True, wrappers=None, array_state=True, clone_seeds=True, **kwargs)[source]
Plangym implementation of the MontezumaEnv environment optimized for planning.
- Parameters
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
mode (int) –
difficulty (int) –
repeat_action_probability (float) –
full_action_space (bool) –
render_mode (Optional[str]) –
possible_to_win (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
array_state (bool) –
clone_seeds (bool) –
- __init__(name='PlanMontezuma-v0', frameskip=1, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=True, wrappers=None, array_state=True, clone_seeds=True, **kwargs)[source]
Initialize a
MontezumaEnv
.- Parameters
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
mode (int) –
difficulty (int) –
repeat_action_probability (float) –
full_action_space (bool) –
render_mode (Optional[str]) –
possible_to_win (bool) –
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) –
array_state (bool) –
clone_seeds (bool) –
- get_state()[source]
Recover the internal state of the simulation.
If clone seed is False the environment will be stochastic. Cloning the full state ensures the environment is deterministic.
- Return type
numpy.ndarray
Gym retro
- class plangym.videogames.retro.RetroEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]
Environment for playing
gym-retro
games.- Parameters
name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
render_mode (Optional[str]) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
- __init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]
Initialize a
RetroEnv
.- Parameters
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return
end = True
when losing a life.autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapper
or a tuple containing(gym.Wrapper, kwargs)
.
- clone(**kwargs)[source]
Return a copy of the environment with its initialization delayed.
- Return type
Super Mario (NES)
- class plangym.videogames.nes.MarioEnv(name, movement_type='simple', original_reward=False, **kwargs)[source]
Interface for using gym-super-mario-bros in plangym.
- Parameters
name (str) –
movement_type (str) –
original_reward (bool) –
- __init__(name, movement_type='simple', original_reward=False, **kwargs)[source]
Initialize a MarioEnv.
- Parameters
name (str) – Name of the environment.
movement_type (str) – One of {complex|simple|right}
original_reward (bool) – If False return a custom reward based on mario position and level.
**kwargs – passed to super().__init__.
- get_coords_obs(obs, info=None, **kwargs)[source]
Return the information contained in info as an observation if obs_type == “info”.
- Parameters
obs (numpy.ndarray) –
info (Optional[Dict[str, Any]]) –
- Return type
numpy.ndarray
- get_state(state=None)[source]
Recover the internal state of the simulation.
A state must completely describe the Environment at a given moment.
- Parameters
state (Optional[numpy.ndarray]) –
- Return type
numpy.ndarray
- init_gym_env()[source]
Initialize the
NESEnv`
instance that the current class is wrapping.- Return type
gym.core.Env
- process_info(info, **kwargs)[source]
Add additional data to the info dictionary.
- Return type
Dict[str, Any]
- class plangym.videogames.nes.NesEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]
Environment for working with the NES-py emulator.
- Parameters
name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
render_mode (Optional[str]) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
- get_image()[source]
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)
- Return type
numpy.ndarray
- get_state(state=None)[source]
Recover the internal state of the simulation.
A state must completely describe the Environment at a given moment.
- Parameters
state (Optional[numpy.ndarray]) –
- Return type
numpy.ndarray
- property nes_env: NESEnv
Access the underlying NESEnv.
- Return type
NESEnv
Video games API
- class plangym.videogames.env.VideogameEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]
Common interface for working with video games that run using an emulator.
- Parameters
name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
delay_setup (bool) –
remove_time_limit (bool) –
obs_type (str) –
render_mode (Optional[str]) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
- __init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]
Initialize a
VideogameEnv
.- Parameters
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return
end = True
when losing a life.autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode – Integer or string indicating the game mode, when available.
difficulty – Difficulty level of the game, when available.
repeat_action_probability – Repeat the last action with this probability.
full_action_space – Whether to use the full range of possible actions or only those available in the game.
render_mode (Optional[str]) – One of {None, “human”, “rgb_aray”}.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapper
or a tuple containing(gym.Wrapper, kwargs)
.
- apply_action(action)[source]
Evolve the environment for one time step applying the provided action.
- begin_step(action=None, dt=None, state=None, return_state=None)[source]
Perform setup of step variables before starting step_with_dt.
- Parameters
return_state (Optional[bool]) –
- Return type
None
- static get_lifes_from_info(info)[source]
Return the number of lifes remaining in the current game.
- Parameters
info (Dict[str, Any]) –
- Return type
int
- init_spaces()[source]
Initialize the action_space and the observation_space of the environment.
- Return type
None
- property n_actions: int
Return the number of actions available.
- Return type
int
Control Tasks
DM Control
- class plangym.control.dm_control.DMControlEnv(name='cartpole-balance', frameskip=1, episodic_life=False, autoreset=True, wrappers=None, delay_setup=False, visualize_reward=True, domain_name=None, task_name=None, render_mode=None, obs_type=None, remove_time_limit=None)[source]
Wrap the `dm_control library, allowing its implementation in planning problems.
The dm_control library is a DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo physics.
For more information about the environment, please refer to https://github.com/deepmind/dm_control
This class allows the implementation of dm_control in planning problems. It allows parallel and vectorized execution of the environments.
- Parameters
name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
visualize_reward (bool) –
obs_type (Optional[str]) –
- __init__(name='cartpole-balance', frameskip=1, episodic_life=False, autoreset=True, wrappers=None, delay_setup=False, visualize_reward=True, domain_name=None, task_name=None, render_mode=None, obs_type=None, remove_time_limit=None)[source]
Initialize a
DMControlEnv
.- Parameters
name (str) – Name of the task. Provide the task to be solved as domain_name-task_name. For example ‘cartpole-balance’.
frameskip (int) – Set a deterministic frameskip to apply the same action N times.
episodic_life (bool) – Send terminal signal after loosing a life.
autoreset (bool) – Restart environment when reaching a terminal state.
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapper
or a tuple containing(gym.Wrapper, kwargs)
.delay_setup (bool) – If
True
, do not initialize thegym.Environment
and wait forsetup
to be called later.visualize_reward (bool) – Define the color of the agent, which depends on the reward on its last timestep.
domain_name – Same as in dm_control.suite.load.
task_name – Same as in dm_control.suite.load.
render_mode – None|human|rgb_array
obs_type (Optional[str]) –
- property domain_name: str
Return the name of the agent in the current simulation.
- Return type
str
- get_coords_obs(obs, **kwargs)[source]
Get the environment observation from a time_step object.
- Parameters
obs – Time step object returned after stepping the environment.
**kwargs – Ignored
- Returns
Numpy array containing the environment observation.
- Return type
numpy.ndarray
- get_image()[source]
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).
- Return type
numpy.ndarray
- get_state()[source]
Return a tuple containing the three arrays that characterize the state of the system.
- Each tuple contains the position of the robot, its velocity
and the control variables currently being applied.
- Returns
Tuple of numpy arrays containing all the information needed to describe the current state of the simulation.
- Return type
numpy.ndarray
- init_gym_env()[source]
Initialize the environment instance (dm_control) that the current class is wrapping.
- property physics
Alias for gym_env.physics.
- render(mode='human')[source]
Store all the RGB images rendered to be shown when the show_game function is called.
- Parameters
mode – rgb_array return an RGB image stored in a numpy array. human stores the rendered image in a viewer to be shown when show_game is called.
- Returns
numpy.ndarray when mode == rgb_array. True when mode == human
- set_state(state)[source]
Set the state of the simulator to the target State.
- Parameters
state (numpy.ndarray) – numpy.ndarray containing the information about the state to be set.
- Returns
None
- Return type
None
- show_game(sleep=0.05)[source]
Render the collected RGB images.
When ‘human’ option is selected as argument for the render method, it stores a collection of RGB images inside the
self.viewer
attribute. This method calls the latter to visualize the collected images.- Parameters
sleep (float) –
- property task_name: str
Return the name of the task in the current simulation.
- Return type
str
Classic control
- class plangym.control.classic_control.ClassicControl(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]
Environment for OpenAI gym classic control environments.
- Parameters
name (str) –
frameskip (int) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
render_mode (Optional[str]) –
Box2D
- class plangym.control.box_2d.Box2DEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode=None, episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]
Common interface for working with Box2D environments released by gym.
- Parameters
name (str) –
frameskip (int) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
render_mode (Optional[str]) –
- class plangym.control.lunar_lander.LunarLander(name=None, frameskip=1, episodic_life=True, autoreset=True, wrappers=None, delay_setup=False, deterministic=False, continuous=False, render_mode=None, remove_time_limit=None, **kwargs)[source]
Fast LunarLander that follows the plangym API.
- Parameters
name (str) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
wrappers (Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]) –
delay_setup (bool) –
deterministic (bool) –
continuous (bool) –
render_mode (Optional[str]) –
- __init__(name=None, frameskip=1, episodic_life=True, autoreset=True, wrappers=None, delay_setup=False, deterministic=False, continuous=False, render_mode=None, remove_time_limit=None, **kwargs)[source]
Initialize a
LunarLander
.- Parameters
name (Optional[str]) –
frameskip (int) –
episodic_life (bool) –
autoreset (bool) –
wrappers (Optional[Iterable[Union[Callable[[], gym.core.Wrapper], Tuple[Callable[[...], gym.core.Wrapper], Dict[str, Any]]]]]) –
delay_setup (bool) –
deterministic (bool) –
continuous (bool) –
render_mode (Optional[str]) –
- property continuous: bool
Return true if the LunarLander agent takes continuous actions as input.
- Return type
bool
- property deterministic: bool
Return true if the LunarLander simulation is deterministic.
- Return type
bool
- get_state()[source]
Recover the internal state of the simulation.
An state must completely describe the Environment at a given moment.
- Return type
numpy.ndarray
Vectorization
Multiprocessing
- class plangym.vectorization.parallel.ParallelEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, blocking=False, **kwargs)[source]
Allow any environment to be stepped in parallel when step_batch is called.
It creates a local instance of the target environment to call all other methods.
Example:
>>> from plangym.videogames import AtariEnv >>> env = ParallelEnv(env_class=AtariEnv, ... name="MsPacman-v0", ... clone_seeds=True, ... autoreset=True, ... blocking=False) >>> >>> state, obs = env.reset() >>> >>> states = [state.copy() for _ in range(10)] >>> actions = [env.sample_action() for _ in range(10)] >>> >>> data = env.step_batch(states=states, actions=actions) >>> new_states, observs, rewards, ends, infos = data
- Parameters
name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
n_workers (int) –
blocking (bool) –
- __init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, blocking=False, **kwargs)[source]
Initialize a
ParallelEnv
.- Parameters
env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times
step
will me called with the same action.autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns
end = True
.delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later.env_callable – Callable that returns an instance of the environment that will be parallelized.
n_workers (int) – Number of workers that will be used to step the env.
blocking (bool) – Step the environments synchronously.
*args – Additional args for the environment.
**kwargs – Additional kwargs for the environment.
- property blocking: bool
If True the steps are performed sequentially.
- Return type
bool
- make_transitions(actions, states=None, dt=1, return_state=None)[source]
Vectorized version of the
step
method.It allows to step a vector of states and actions. The signature and behaviour is the same as
step
, but taking a list of states, actions and dts as input.- Parameters
actions (numpy.ndarray) – Iterable containing the different actions to be applied.
states (Optional[numpy.ndarray]) – Iterable containing the different states to be set.
dt (Union[numpy.ndarray, int]) – int or array containing the frameskips that will be applied.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns
if states is None returns
(observs, rewards, ends, infos)
else(new_states, observs, rewards, ends, infos)
Ray
- class plangym.vectorization.ray.RayEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]
Use ray for taking steps in parallel when calling step_batch.
- Parameters
name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
n_workers (int) –
- __init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]
Initialize a
ParallelEnv
.- Parameters
env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times
step
will me called with the same action.autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns
end = True
.delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later.env_callable – Callable that returns an instance of the environment that will be parallelized.
n_workers (int) – Number of workers that will be used to step the env.
*args – Additional args for the environment.
**kwargs – Additional kwargs for the environment.
- make_transitions(actions, states=None, dt=1, return_state=None)[source]
Implement the logic for stepping the environment in parallel.
- Parameters
dt ([<class 'numpy.ndarray'>, <class 'int'>]) –
return_state (Optional[bool]) –
- reset(return_state=True)[source]
Restart the environment.
- Parameters
return_state (bool) –
- Return type
[<class ‘numpy.ndarray’>, <class ‘tuple’>]
- setup()[source]
Run environment initialization and create the subprocesses for stepping in parallel.
Vectorization API
- class plangym.vectorization.env.VectorizedEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]
Base class that defines the API for working with vectorized environments.
A vectorized environment allows to step several copies of the environment in parallel when calling
step_batch
.It creates a local copy of the environment that is the target of all the other methods of
PlanEnv
. In practise, aVectorizedEnv
acts as a wrapper of an environment initialized with the provided parameters when calling __init__.- Parameters
name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
n_workers (int) –
- __init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]
Initialize a
VectorizedEnv
.- Parameters
env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times
step
will be called with the same action.autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns
end = True
.delay_setup (bool) – If
True
do not initialize thegym.Environment
and wait forsetup
to be called later.n_workers (int) – Number of workers that will be used to step the env.
**kwargs – Additional keyword arguments passed to env_class.__init__.
- property action_shape: Tuple[int]
Tuple containing the shape of the actions applied to the Environment.
- Return type
Tuple[int]
- property action_space: gym.spaces.space.Space
Return the action_space of the environment.
- Return type
gym.spaces.Space
- classmethod batch_step_data(actions, states, dt, batch_size)[source]
Make batches of step data to distribute across workers.
- create_env_callable(**kwargs)[source]
Return a callable that initializes the environment that is being vectorized.
- Return type
Callable[[…], plangym.core.PlanEnv]
- get_image()[source]
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)
- Return type
numpy.ndarray
- get_state()[source]
Recover the internal state of the simulation.
A state completely describes the Environment at a given moment.
- Returns
State of the simulation.
- property gym_env
Return the instance of the environment that is being wrapped by plangym.
- make_transitions(actions, states, dt, return_state=None)[source]
Implement the logic for stepping the environment in parallel.
- Parameters
return_state (Optional[bool]) –
- property n_workers: int
Return the number of parallel processes that run
step_batch
in parallel.- Return type
int
- property obs_shape: Tuple[int]
Tuple containing the shape of the observations returned by the Environment.
- Return type
Tuple[int]
- property observation_space: gym.spaces.space.Space
Return the observation_space of the environment.
- Return type
gym.spaces.Space
- property plan_env: plangym.core.PlanEnv
Environment that is wrapped by the current instance.
- Return type
- render(mode='human')[source]
Render the environment using OpenGL. This wraps the OpenAI render method.
- reset(return_state=True)[source]
Reset the environment and returns the first observation, or the first (state, obs) tuple.
- Parameters
return_state (bool) – If true return a also the initial state of the env.
- Returns
Observation of the environment if return_state is False. Otherwise, return (state, obs) after reset.
- sample_action()[source]
Return a valid action that can be used to step the Environment.
Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.
- set_state(state)[source]
Set the internal state of the simulation.
- Parameters
state – Target state to be set in the environment.
- setup()[source]
Initialize the target environment with the parameters provided at __init__.
- Return type
None
- static split_similar_chunks(vector, n_chunks)[source]
Split an indexable object into similar chunks.
- Parameters
vector (Union[list, numpy.ndarray]) – Target indexable object to be split.
n_chunks (int) – Number of similar chunks.
- Returns
Generator that returns the chunks created after splitting the target object.
- Return type
Generator[Union[list, numpy.ndarray], None, None]
- step(action, state=None, dt=1, return_state=None)[source]
Step the environment applying a given action from an arbitrary state.
If is not provided the signature matches the step method from OpenAI gym.
- Parameters
action (numpy.ndarray) – Array containing the action to be applied.
state (Optional[numpy.ndarray]) – State to be set before stepping the environment.
dt (int) – Consecutive number of times to apply the given action.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns
if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos).
- step_batch(actions, states=None, dt=1, return_state=None)[source]
Vectorized version of the
step
method.It allows to step a vector of states and actions. The signature and behaviour is the same as
step
, but taking a list of states, actions and dts as input.- Parameters
actions (numpy.ndarray) – Iterable containing the different actions to be applied.
states (Optional[numpy.ndarray]) – Iterable containing the different states to be set.
dt (Union[numpy.ndarray, int]) – int or array containing the frameskips that will be applied.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns
if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos).
- step_with_dt(action, dt=1)[source]
Take
dt
simulation steps and make the environment evolve in multiples ofself.frameskip
for a total ofdt
*self.frameskip
steps.- Parameters
action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.
- Returns
If state is None returns (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info).
- Return type
tuple