Plangym API#
- class plangym.core.PlanEnv(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]#
Inherit from this class to adapt environments to different problems.
Base class that establishes all needed methods and blueprints to work with Gym environments.
- Parameters:
name (str)
frameskip (int)
autoreset (bool)
delay_setup (bool)
return_image (bool)
- __init__(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]#
Initialize a
Environment.- Parameters:
name (str) – Name of the environment.
frameskip (int) – Number of times
stepwill be called with the same action.autoreset (bool) – Automatically reset the environment when the OpenAI environment returns
end = True.delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later (delayed setups are necessary when one requires to serialize the object environment or to have duplicated instances).return_image (bool) – If
Trueadd an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.
- property action_shape: tuple[int]#
Tuple containing the shape of the actions applied to the Environment.
- apply_action(action, *, _ray_trace_ctx=None)[source]#
Evolve the environment for one time step applying the provided action.
- apply_reset(*, _ray_trace_ctx=None, **kwargs)[source]#
Perform the resetting operation on the environment.
- begin_step(action=None, dt=None, state=None, return_state=None, *, _ray_trace_ctx=None)[source]#
Perform setup of step variables before starting step_with_dt.
- Parameters:
return_state (bool | None)
- get_image(*, _ray_trace_ctx=None)[source]#
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a grayscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)
- Return type:
None | ndarray
- get_state()[source]#
Recover the internal state of the simulation.
A state must completely describe the Environment at a given moment.
- Return type:
Any
- get_step_tuple(obs, reward, terminal, truncated, info, *, _ray_trace_ctx=None)[source]#
Prepare the tuple that step returns.
This is a post processing state to have fine-grained control over what data the current step is returning.
- By default it determines:
Return the state in the tuple (necessary information to save or load the game).
Adding the “rgb” key in the info dictionary containing an RGB representation of the environment.
- Parameters:
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
truncated – Boolean indicating if the environment was truncated.
- Returns:
Tuple containing the environment data after calling step.
- property img_shape: tuple[int, ...] | None#
Return the shape of the image returned by the environment.
If the environment does not return an image, it will return None. This also applies to environments that throw an error when trying to get the image (like when running in headless machines without a virtual display).
- property name: str#
Return is the name of the environment.
- property obs_shape: tuple[int]#
Tuple containing the shape of the observations returned by the Environment.
- process_apply_action(obs, reward, terminal, truncated, info, *, _ray_trace_ctx=None)[source]#
Perform any post-processing to the data returned by apply_action.
- Parameters:
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
truncated – Boolean indicating if the environment was truncated.
- Returns:
Tuple containing the processed data.
- process_info(info, *, _ray_trace_ctx=None, **kwargs)[source]#
Perform optional computation for computing the info dictionary returned by step.
- Return type:
dict[str, Any]
- process_obs(obs, *, _ray_trace_ctx=None, **kwargs)[source]#
Perform optional computation for computing the observation returned by step.
- process_reward(reward, *, _ray_trace_ctx=None, **kwargs)[source]#
Perform optional computation for computing the reward returned by step.
- Return type:
float
- process_step(obs, reward, terminal, truncated, info, *, _ray_trace_ctx=None)[source]#
Prepare the returned info dictionary.
This is a post processing step to have fine-grained control over what data the info dictionary contains.
- Parameters:
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
truncated – Boolean indicating if the environment was truncated.
- Returns:
Tuple containing the environment data after calling step.
- process_terminal(terminal, *, _ray_trace_ctx=None, **kwargs)[source]#
Perform optional computation for computing the terminal flag returned by step.
- Return type:
bool
- reset(return_state=True)[source]#
Restart the environment.
- Parameters:
return_state (bool) – If
True, it will return the state of the environment.- Returns:
(state, obs)if`return_stateisTrueelse returnobs.- Return type:
ndarray | tuple[ndarray, ndarray]
- property return_image: bool#
Return return_image flag.
If
Trueadd an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.
- run_autoreset(step_data, *, _ray_trace_ctx=None)[source]#
Reset the environment automatically if needed.
- sample_action(*, _ray_trace_ctx=None)[source]#
Return a valid action that can be used to step the Environment.
Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.
- set_state(state)[source]#
Set the internal state of the simulation. Overwrite current state by the given argument.
- Parameters:
state (Any) – Target state to be set in the environment.
- Returns:
None
- Return type:
None
- setup()[source]#
Run environment initialization.
Including in this function all the code which makes the environment impossible to serialize will allow to dispatch the environment to different workers and initialize it once it’s copied to the target process.
- Return type:
None
- step(action, state=None, dt=1, return_state=None)[source]#
Step the environment applying the supplied action.
Optionally set the state to the supplied state before stepping it (the method prepares the environment in the given state, dismissing the current state, and applies the action afterwards).
Take
dtsimulation steps and make the environment evolve in multiples ofself.frameskipfor a total ofdt*self.frameskipsteps.In addition, the method allows the user to prepare the returned object, adding additional information and custom pre-processings via
self.process_stepandself.get_step_tuplemethods.- Parameters:
action (ndarray | int | float) – Chosen action applied to the environment.
state (ndarray | None) – Set the environment to the given state before stepping it.
dt (int) – Consecutive number of times that the action will be applied.
return_state (bool | None) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns:
if state is None returns
(observs, reward, terminal, info)else returns(new_state, observs, reward, terminal, info)- Return type:
tuple
- step_batch(actions, states=None, dt=1, return_state=True)[source]#
Allow stepping a vector of states and actions.
Vectorized version of the step method. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.
- Parameters:
actions (ndarray | Iterable[ndarray | int]) – Iterable containing the different actions to be applied.
states (ndarray | Iterable | None) – Iterable containing the different states to be set.
dt (int | ndarray) – int or array containing the consecutive that will be applied to each state. If array, the different values are distributed among the multiple environments (contrary to
self.frameskip, which is a common value for any instance).return_state (bool) – Whether to return the state in the returned tuple, depending on the boolean value. If None, step will return the state if state was passed as a parameter.
- Returns:
If return_state is True, the method returns (new_states, observs, rewards, ends, infos). If return_state is False, the method returns (observs, rewards, ends, infos). If return_state is None, the returned object depends on the states parameter.
- Return type:
tuple[list | ndarray, …]
- step_with_dt(action, dt=1, *, _ray_trace_ctx=None)[source]#
Step the environment applying the supplied action dt times.
Take
dtsimulation steps and make the environment evolve in multiples ofself.frameskipfor a total ofdt*self.frameskipsteps.The method performs any post-processing to the data after applying the action to the environment via
self.process_apply_action.This method neither computes nor returns any state.
- Parameters:
action (ndarray | int | float) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.
- Returns:
Tuple containing
(observs, reward, terminal, info).
- class plangym.core.PlangymEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode='rgb_array', episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]#
Base class for implementing OpenAI
gymenvironments inplangym.- Parameters:
name (str)
frameskip (int)
autoreset (bool)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
delay_setup (bool)
remove_time_limit (bool)
render_mode (str | None)
- __init__(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode='rgb_array', episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]#
Initialize a
PlangymEnv.The user can read all private methods as instance properties.
- Parameters:
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each
dt. Common argument to all environments.autoreset (bool) – Automatically reset the environment when the OpenAI environment returns
end = True.wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapperor a tuple containing(gym.Wrapper, kwargs).delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
render_mode (str | None) – One of {None, “human”, “rgb_aray”}. How the game will be rendered.
episodic_life – Return
end = Truewhen losing a life.obs_type – One of {“rgb”, “grayscale”, “coords”}. Type of observation returned.
return_image – If
Trueadd a “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.kwargs – Additional arguments to be passed to the
gym.makefunction.
- property action_shape: tuple[int, ...]#
Tuple containing the shape of the actions applied to the Environment.
- property action_space: Space#
Return the action_space of the environment.
- apply_action(action)[source]#
Evolve the environment for one time step applying the provided action.
Accumulate rewards and calculate terminal flag after stepping the environment.
- apply_reset()[source]#
Restart the environment.
- Returns
(obs, info). If`return_imageisTrue, the info dictionary contains an'rgb'key with the corresponding image.
- Return type:
tuple[ndarray, dict[str, Any]]
- apply_wrappers(wrappers)[source]#
Wrap the underlying OpenAI gym environment.
- Parameters:
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]])
- get_coords_obs(obs, **kwargs)[source]#
Calculate the observation returned by step when obs_type == “coords”.
- get_grayscale_obs(obs, **kwargs)[source]#
Calculate the observation returned by step when obs_type == “grayscale”.
- get_image()[source]#
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).
- Return type:
ndarray
- get_rgb_obs(obs, **kwargs)[source]#
Calculate the observation returned by step when obs_type == “rgb”.
- property gym_env#
Return the instance of the environment that is being wrapped by plangym.
- init_gym_env()[source]#
Initialize the :class:
gym.Envinstance that the current class is wrapping.- Return type:
Env
- property metadata#
Return the metadata of the environment.
- property obs_shape: tuple[int, ...] | None#
Tuple containing the shape of the observations returned by the Environment.
- property obs_type: str#
Return the type of observation returned by the environment.
- property observation_space: Space#
Return the observation_space of the environment.
- process_obs(obs, **kwargs)[source]#
Perform optional computation for computing the observation returned by step.
This is a post processing step to have fine-grained control over the returned observation.
- property remove_time_limit: bool#
Return True if the Environment can only be stepped for a limited number of times.
- property render_mode: None | str#
None | human | rgb_array.
- Type:
Return how the game will be rendered. Values
- property reward_range#
Return the reward_range of the environment.
- sample_action()[source]#
Return a valid action that can be used to step the environment chosen at random.
- Return type:
int | ndarray
Videogames#
Atari 2600#
- class plangym.videogames.atari.AtariEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode='rgb_array', possible_to_win=False, wrappers=None, array_state=True, clone_seeds=False, **kwargs)[source]#
Create an environment to play OpenAI gym Atari Games that uses AtariALE as the emulator.
- Parameters:
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return
end = Truewhen losing a life.autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode (int) – Integer or string indicating the game mode, when available.
difficulty (int) – Difficulty level of the game, when available.
repeat_action_probability (float) – Repeat the last action with this probability.
full_action_space (bool) – Wheter to use the full range of possible actions or only those available in the game.
render_mode (str | None) – One of {None, “human”, “rgb_aray”}.
possible_to_win (bool) – It is possible to finish the Atari game without getting a terminal state that is not out of bounds or does not involve losing a life.
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapperor a tuple containing(gym.Wrapper, kwargs).array_state (bool) – Whether to return the state of the environment as a numpy array.
clone_seeds (bool) – Clone the random seed of the ALE emulator when reading/setting the state. False makes the environment stochastic.
Example:
>>> env = plangym.make(name="ALE/MsPacman-v5", difficulty=2, mode=1) >>> state, obs, info = env.reset() >>> >>> states = [state.copy() for _ in range(10)] >>> actions = [env.action_space.sample() for _ in range(10)] >>> >>> data = env.step_batch(states=states, actions=actions) >>> new_states, observs, rewards, ends, truncateds,infos = data
- __init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode='rgb_array', possible_to_win=False, wrappers=None, array_state=True, clone_seeds=False, **kwargs)[source]#
Initialize a
AtariEnvironment.- Parameters:
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return
end = Truewhen losing a life.autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode (int) – Integer or string indicating the game mode, when available.
difficulty (int) – Difficulty level of the game, when available.
repeat_action_probability (float) – Repeat the last action with this probability.
full_action_space (bool) – Wheter to use the full range of possible actions or only those available in the game.
render_mode (str | None) – One of {None, “human”, “rgb_aray”}.
possible_to_win (bool) – It is possible to finish the Atari game without getting a terminal state that is not out of bounds or does not involve losing a life.
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapperor a tuple containing(gym.Wrapper, kwargs).array_state (bool) – Whether to return the state of the environment as a numpy array.
clone_seeds (bool) – Clone the random seed of the ALE emulator when reading/setting the state. False makes the environment stochastic.
kwargs – Additional arguments to be passed to the
gym.makefunction.
Example:
>>> env = AtariEnv(name="ALE/MsPacman-v5", difficulty=2, mode=1) >>> type(env.gym_env.unwrapped) <class 'shimmy.atari_env.AtariEnv'> >>> state, obs, info = env.reset() >>> type(state) <class 'numpy.ndarray'>
- property ale#
Return the
aleinterface of the underlyinggym.Env.Example:
>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="ram") >>> type(env.ale) <class 'ale_py._ale_py.ALEInterface'>
- property difficulty: int#
Return the selected difficulty for the current environment.
- property full_action_space: bool#
If True the action space correspond to all possible actions in the Atari emulator.
- get_image()[source]#
Return a numpy array containing the rendered view of the environment.
Image is a three-dimensional array interpreted as an RGB image with channels (Height, Width, RGB). Ignores wrappers as it loads the screen directly from the emulator.
Example:
>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="ram") >>> img = env.get_image() >>> img.shape (210, 160, 3)
- Return type:
ndarray
- get_lifes_from_info(info)[source]#
Return the number of lives remaining in the current game.
- Parameters:
info (dict[str, Any])
- Return type:
int
- get_ram()[source]#
Return a numpy array containing the content of the emulator’s RAM.
The RAM is a vector array interpreted as the memory of the emulator.
Example:
>>> env = AtariEnv(name="ALE/MsPacman-v5", obs_type="grayscale") >>> ram = env.get_ram() >>> ram.shape, ram.dtype ((128,), dtype('uint8'))
- Return type:
ndarray
- get_state()[source]#
Recover the internal state of the simulation.
If clone seed is False the environment will be stochastic. Cloning the full state ensures the environment is deterministic.
Example:
>>> env = AtariEnv(name="Qbert-v0") >>> env.get_state() array([<ale_py._ale_py.ALEState object at 0x...>, None], dtype=object) >>> env = AtariEnv(name="Qbert-v0", array_state=False) >>> env.get_state() <ale_py._ale_py.ALEState object at 0x...>
- Return type:
ndarray
- init_gym_env()[source]#
Initialize the
gym.Env`instance that the Environment is wrapping.- Return type:
Env
- property mode: int#
Return the selected game mode for the current environment.
- property observation_space: Space#
Return the observation_space of the environment.
- property repeat_action_probability: float#
Probability of repeating the same action after input.
- set_state(state)[source]#
Set the internal state of the simulation.
- Parameters:
state (ndarray) – Target state to be set in the environment.
- Return type:
None
Example:
>>> env = AtariEnv(name="Qbert-v0") >>> state, obs, info = env.reset() >>> new_state, obs, reward, end, tru, info = env.step(env.sample_action(), state=state) >>> assert not (state == new_state).all() >>> env.set_state(state) >>> (state == env.get_state()).all() np.True_
- step_with_dt(action, dt=1)[source]#
Step the environment
dttimes.Take
dtsimulation steps and make the environment evolve in multiples ofself.frameskipfor a total ofdt*self.frameskipsteps.- Parameters:
action (ndarray | int | float) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.
- Returns:
If state is None return
(observs, reward, terminal, info)else returns(new_state, observs, reward, terminal, info)
Example:
>>> env = AtariEnv(name="Pong-v0") >>> obs = env.reset(return_state=False) >>> obs, reward, end, truncated, info = env.step_with_dt(env.sample_action(), dt=7) >>> assert not end
- class plangym.videogames.montezuma.MontezumaEnv(name='PlanMontezuma-v0', frameskip=1, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=True, wrappers=None, array_state=True, clone_seeds=True, **kwargs)[source]#
Plangym implementation of the MontezumaEnv environment optimized for planning.
- Parameters:
frameskip (int)
episodic_life (bool)
autoreset (bool)
delay_setup (bool)
remove_time_limit (bool)
obs_type (str)
mode (int)
difficulty (int)
repeat_action_probability (float)
full_action_space (bool)
render_mode (str | None)
possible_to_win (bool)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
array_state (bool)
clone_seeds (bool)
- __init__(name='PlanMontezuma-v0', frameskip=1, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', mode=0, difficulty=0, repeat_action_probability=0.0, full_action_space=False, render_mode=None, possible_to_win=True, wrappers=None, array_state=True, clone_seeds=True, **kwargs)[source]#
Initialize a
MontezumaEnv.- Parameters:
frameskip (int)
episodic_life (bool)
autoreset (bool)
delay_setup (bool)
remove_time_limit (bool)
obs_type (str)
mode (int)
difficulty (int)
repeat_action_probability (float)
full_action_space (bool)
render_mode (str | None)
possible_to_win (bool)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
array_state (bool)
clone_seeds (bool)
- get_state()[source]#
Recover the internal state of the simulation.
If clone seed is False the environment will be stochastic. Cloning the full state ensures the environment is deterministic.
- Return type:
ndarray
Gym retro#
- class plangym.videogames.retro.RetroEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]#
Environment for playing
gym-retrogames.- Parameters:
name (str)
frameskip (int)
episodic_life (bool)
autoreset (bool)
delay_setup (bool)
remove_time_limit (bool)
obs_type (str)
render_mode (str | None)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
- __init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]#
Initialize a
RetroEnv.- Parameters:
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return
end = Truewhen losing a life.autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
render_mode (str | None) – One of {None, “human”, “rgb_aray”}.
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapperor a tuple containing(gym.Wrapper, kwargs).kwargs – Additional arguments to be passed to the
gym.makefunction.
- clone(**kwargs)[source]#
Return a copy of the environment with its initialization delayed.
- Return type:
- static get_win_condition(info)[source]#
Get win condition for games that have the end of the screen available.
- Parameters:
info (dict[str, Any])
- Return type:
bool
Super Mario (NES)#
- class plangym.videogames.nes.MarioEnv(name, movement_type='simple', original_reward=False, **kwargs)[source]#
Interface for using gym-super-mario-bros in plangym.
- Parameters:
name (str)
movement_type (str)
original_reward (bool)
- __init__(name, movement_type='simple', original_reward=False, **kwargs)[source]#
Initialize a MarioEnv.
- Parameters:
name (str) – Name of the environment.
movement_type (str) – One of {complex|simple|right}
original_reward (bool) – If False return a custom reward based on mario position and level.
**kwargs – passed to super().__init__.
- get_coords_obs(obs, info=None, **kwargs)[source]#
Return the information contained in info as an observation if obs_type == “info”.
- Parameters:
obs (ndarray)
info (dict[str, Any] | None)
- Return type:
ndarray
- get_state(state=None)[source]#
Recover the internal state of the simulation.
A state must completely describe the Environment at a given moment.
- Parameters:
state (ndarray | None)
- Return type:
ndarray
- init_gym_env()[source]#
Initialize the
NESEnv`instance that the current class is wrapping.- Return type:
Env
- process_info(info, **kwargs)[source]#
Add additional data to the info dictionary.
- Return type:
dict[str, Any]
- class plangym.videogames.nes.NesEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]#
Environment for working with the NES-py emulator.
- Parameters:
name (str)
frameskip (int)
episodic_life (bool)
autoreset (bool)
delay_setup (bool)
remove_time_limit (bool)
obs_type (str)
render_mode (str | None)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
- get_image()[source]#
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)
- Return type:
ndarray
- get_state(state=None)[source]#
Recover the internal state of the simulation.
A state must completely describe the Environment at a given moment.
- Parameters:
state (ndarray | None)
- Return type:
ndarray
- property nes_env: NESEnv#
Access the underlying NESEnv.
Video games API#
- class plangym.videogames.env.VideogameEnv(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]#
Common interface for working with video games that run using an emulator.
- Parameters:
name (str)
frameskip (int)
episodic_life (bool)
autoreset (bool)
delay_setup (bool)
remove_time_limit (bool)
obs_type (str)
render_mode (str | None)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
- __init__(name, frameskip=5, episodic_life=False, autoreset=True, delay_setup=False, remove_time_limit=True, obs_type='rgb', render_mode=None, wrappers=None, **kwargs)[source]#
Initialize a
VideogameEnv.- Parameters:
name (str) – Name of the environment. Follows standard gym syntax conventions.
frameskip (int) – Number of times an action will be applied for each step in dt.
episodic_life (bool) – Return
end = Truewhen losing a life.autoreset (bool) – Restart environment when reaching a terminal state.
delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later.remove_time_limit (bool) – If True, remove the time limit from the environment.
obs_type (str) – One of {“rgb”, “ram”, “grayscale”}.
mode – Integer or string indicating the game mode, when available.
difficulty – Difficulty level of the game, when available.
repeat_action_probability – Repeat the last action with this probability.
full_action_space – Whether to use the full range of possible actions or only those available in the game.
render_mode (str | None) – One of {None, “human”, “rgb_aray”}.
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapperor a tuple containing(gym.Wrapper, kwargs).kwargs – Additional arguments to be passed to the
gym.makefunction.
- apply_action(action)[source]#
Evolve the environment for one time step applying the provided action.
- begin_step(action=None, dt=None, state=None, return_state=None)[source]#
Perform setup of step variables before starting step_with_dt.
- Parameters:
return_state (bool | None)
- Return type:
None
- static get_lifes_from_info(info)[source]#
Return the number of lifes remaining in the current game.
- Parameters:
info (dict[str, Any])
- Return type:
int
- init_spaces()[source]#
Initialize the action_space and the observation_space of the environment.
- Return type:
None
- property n_actions: int#
Return the number of actions available.
Control Tasks#
DM Control#
- class plangym.control.dm_control.DMControlEnv(name='cartpole-balance', frameskip=1, episodic_life=False, autoreset=True, wrappers=None, delay_setup=False, visualize_reward=True, domain_name=None, task_name=None, render_mode='rgb_array', obs_type=None, remove_time_limit=None, return_image=False)[source]#
Wrap the `dm_control library, allowing its implementation in planning problems.
The dm_control library is a DeepMind’s software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo physics.
For more information about the environment, please refer to deepmind/dm_control
This class allows the implementation of dm_control in planning problems. It allows parallel and vectorized execution of the environments.
- Parameters:
name (str)
frameskip (int)
episodic_life (bool)
autoreset (bool)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
delay_setup (bool)
visualize_reward (bool)
obs_type (str | None)
return_image (bool)
- __init__(name='cartpole-balance', frameskip=1, episodic_life=False, autoreset=True, wrappers=None, delay_setup=False, visualize_reward=True, domain_name=None, task_name=None, render_mode='rgb_array', obs_type=None, remove_time_limit=None, return_image=False)[source]#
Initialize a
DMControlEnv.- Parameters:
name (str) – Name of the task. Provide the task to be solved as domain_name-task_name. For example ‘cartpole-balance’.
frameskip (int) – Set a deterministic frameskip to apply the same action N times.
episodic_life (bool) – Send terminal signal after loosing a life.
autoreset (bool) – Restart environment when reaching a terminal state.
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None) – Wrappers that will be applied to the underlying OpenAI env. Every element of the iterable can be either a
gym.Wrapperor a tuple containing(gym.Wrapper, kwargs).delay_setup (bool) – If
True, do not initialize thegym.Environmentand wait forsetupto be called later.visualize_reward (bool) – Define the color of the agent, which depends on the reward on its last timestep.
domain_name – Same as in dm_control.suite.load.
task_name – Same as in dm_control.suite.load.
render_mode – None|human|rgb_array.
remove_time_limit – Ignored.
obs_type (str | None) – One of {“coords”, “rgb”, “grayscale”}.
return_image (bool) – If
True, add a “rgb” key to the observation dict.
- property domain_name: str#
Return the name of the agent in the current simulation.
- get_coords_obs(obs, **kwargs)[source]#
Get the environment observation from a time_step object.
- Parameters:
obs – Time step object returned after stepping the environment.
**kwargs – Ignored
- Returns:
Numpy array containing the environment observation.
- Return type:
ndarray
- get_image()[source]#
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).
- Return type:
ndarray
- get_state()[source]#
Return the state of the environment.
Return a tuple containing the three arrays that characterize the state of the system.
- Each tuple contains the position of the robot, its velocity
and the control variables currently being applied.
- Returns
Tuple of numpy arrays containing all the information needed to describe the current state of the simulation.
- Return type:
ndarray
- init_gym_env()[source]#
Initialize the environment instance (dm_control) that the current class is wrapping.
- property physics#
Alias for gym_env.physics.
- render(mode=None)[source]#
Render the environment.
Store all the RGB images rendered to be shown when the show_game function is called.
- Parameters:
mode – rgb_array return an RGB image stored in a numpy array. human stores the rendered image in a viewer to be shown when show_game is called.
- Returns:
numpy.ndarray when mode == rgb_array. True when mode == human
- set_state(state)[source]#
Set the state of the simulator to the target State.
- Parameters:
state (ndarray) – numpy.ndarray containing the information about the state to be set.
- Returns:
None
- Return type:
None
- show_game(sleep=0.05)[source]#
Render the collected RGB images.
When ‘human’ option is selected as argument for the render method, it stores a collection of RGB images inside the
self.viewerattribute. This method calls the latter to visualize the collected images.- Parameters:
sleep (float)
- property task_name: str#
Return the name of the task in the current simulation.
Classic control#
- class plangym.control.classic_control.ClassicControl(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode='rgb_array', episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]#
Environment for OpenAI gym classic control environments.
- Parameters:
name (str)
frameskip (int)
autoreset (bool)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
delay_setup (bool)
remove_time_limit (bool)
render_mode (str | None)
Box2D#
- class plangym.control.box_2d.Box2DEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode='rgb_array', episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]#
Common interface for working with Box2D environments released by gym.
- Parameters:
name (str)
frameskip (int)
autoreset (bool)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
delay_setup (bool)
remove_time_limit (bool)
render_mode (str | None)
- class plangym.control.lunar_lander.LunarLander(name=None, frameskip=1, episodic_life=True, autoreset=True, wrappers=None, delay_setup=False, deterministic=False, continuous=False, render_mode='rgb_array', remove_time_limit=None, **kwargs)[source]#
Fast LunarLander that follows the plangym API.
- Parameters:
name (str | None)
frameskip (int)
episodic_life (bool)
autoreset (bool)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
delay_setup (bool)
deterministic (bool)
continuous (bool)
render_mode (str | None)
- __init__(name=None, frameskip=1, episodic_life=True, autoreset=True, wrappers=None, delay_setup=False, deterministic=False, continuous=False, render_mode='rgb_array', remove_time_limit=None, **kwargs)[source]#
Initialize a
LunarLander.- Parameters:
name (str | None)
frameskip (int)
episodic_life (bool)
autoreset (bool)
wrappers (Iterable[Callable[[], Wrapper] | tuple[Callable[[...], Wrapper] | dict[str, Any]]] | None)
delay_setup (bool)
deterministic (bool)
continuous (bool)
render_mode (str | None)
- property continuous: bool#
Return true if the LunarLander agent takes continuous actions as input.
- property deterministic: bool#
Return true if the LunarLander simulation is deterministic.
- get_image()[source]#
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).
- Return type:
ndarray
- get_state()[source]#
Recover the internal state of the simulation.
An state must completely describe the Environment at a given moment.
- Return type:
ndarray
Vectorization#
Multiprocessing#
- class plangym.vectorization.parallel.ParallelEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, blocking=False, **kwargs)[source]#
Allow any environment to be stepped in parallel when step_batch is called.
It creates a local instance of the target environment to call all other methods.
Example:
>>> from plangym.videogames import AtariEnv >>> env = ParallelEnv(env_class=AtariEnv, ... name="MsPacman-v0", ... clone_seeds=True, ... autoreset=True, ... blocking=False) >>> >>> state, obs, info = env.reset() >>> >>> states = [state.copy() for _ in range(10)] >>> actions = [env.sample_action() for _ in range(10)] >>> >>> data = env.step_batch(states=states, actions=actions) >>> new_states, observs, rewards, ends, truncateds, infos = data
- Parameters:
name (str)
frameskip (int)
autoreset (bool)
delay_setup (bool)
n_workers (int)
blocking (bool)
- __init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, blocking=False, **kwargs)[source]#
Initialize a
ParallelEnv.- Parameters:
env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times
stepwill me called with the same action.autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns
end = True.delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later.env_callable – Callable that returns an instance of the environment that will be parallelized.
n_workers (int) – Number of workers that will be used to step the env.
blocking (bool) – Step the environments synchronously.
*args – Additional args for the environment.
**kwargs – Additional kwargs for the environment.
- property blocking: bool#
If True the steps are performed sequentially.
- make_transitions(actions, states=None, dt=1, return_state=None)[source]#
Vectorized version of the
stepmethod.It allows to step a vector of states and actions. The signature and behaviour is the same as
step, but taking a list of states, actions and dts as input.- Parameters:
actions (ndarray) – Iterable containing the different actions to be applied.
states (ndarray | None) – Iterable containing the different states to be set.
dt (ndarray | int) – int or array containing the frameskips that will be applied.
return_state (bool | None) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns:
if states is None returns
(observs, rewards, ends, truncateds, infos)else(new_states, observs, rewards, ends, truncateds, infos)
Ray#
- class plangym.vectorization.ray.RayEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]#
Use ray for taking steps in parallel when calling step_batch.
- Parameters:
name (str)
frameskip (int)
autoreset (bool)
delay_setup (bool)
n_workers (int)
- __init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]#
Initialize a
ParallelEnv.- Parameters:
env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times
stepwill me called with the same action.autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns
end = True.delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later.env_callable – Callable that returns an instance of the environment that will be parallelized.
n_workers (int) – Number of workers that will be used to step the env.
*args – Additional args for the environment.
**kwargs – Additional kwargs for the environment.
- make_transitions(actions, states=None, dt=1, return_state=None)[source]#
Implement the logic for stepping the environment in parallel.
- Parameters:
dt ([<class 'numpy.ndarray'>, <class 'int'>])
return_state (bool | None)
- reset(return_state=True)[source]#
Restart the environment.
- Parameters:
return_state (bool)
- Return type:
[<class ‘numpy.ndarray’>, <class ‘tuple’>]
- setup()[source]#
Run environment initialization and create the subprocesses for stepping in parallel.
- sync_states(state)[source]#
Synchronize all the copies of the wrapped environment.
- Set all the states of the different workers of the internal
BatchEnv to the same state as the internal
Environmentused to apply the non-vectorized steps.
- Parameters:
state (None)
- Return type:
None
- Set all the states of the different workers of the internal
- property workers: list[<plangym.vectorization.ray.ActorClass(RemoteEnv) object at 0x7f5c16eb2d10>]#
Remote actors exposing copies of the environment.
Vectorization API#
- class plangym.vectorization.env.VectorizedEnv(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]#
Base class that defines the API for working with vectorized environments.
A vectorized environment allows to step several copies of the environment in parallel when calling
step_batch.It creates a local copy of the environment that is the target of all the other methods of
PlanEnv. In practise, aVectorizedEnvacts as a wrapper of an environment initialized with the provided parameters when calling __init__.- Parameters:
name (str)
frameskip (int)
autoreset (bool)
delay_setup (bool)
n_workers (int)
- __init__(env_class, name, frameskip=1, autoreset=True, delay_setup=False, n_workers=8, **kwargs)[source]#
Initialize a
VectorizedEnv.- Parameters:
env_class – Class of the environment to be wrapped.
name (str) – Name of the environment.
frameskip (int) – Number of times
stepwill be called with the same action.autoreset (bool) – Ignored. Always set to True. Automatically reset the environment when the OpenAI environment returns
end = True.delay_setup (bool) – If
Truedo not initialize thegym.Environmentand wait forsetupto be called later.n_workers (int) – Number of workers that will be used to step the env.
**kwargs – Additional keyword arguments passed to env_class.__init__.
- property action_shape: tuple[int]#
Tuple containing the shape of the actions applied to the Environment.
- property action_space: Space#
Return the action_space of the environment.
- classmethod batch_step_data(actions, states, dt, batch_size)[source]#
Make batches of step data to distribute across workers.
- create_env_callable(**kwargs)[source]#
Return a callable that initializes the environment that is being vectorized.
- Return type:
Callable[[…], PlanEnv]
- get_image()[source]#
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)
- Return type:
ndarray
- get_state()[source]#
Recover the internal state of the simulation.
A state completely describes the Environment at a given moment.
- Returns
State of the simulation.
- property gym_env#
Return the instance of the environment that is being wrapped by plangym.
- make_transitions(actions, states, dt, return_state=None)[source]#
Implement the logic for stepping the environment in parallel.
- Parameters:
return_state (bool | None)
- property n_workers: int#
Return the number of parallel processes that run
step_batchin parallel.
- property obs_shape: tuple[int]#
Tuple containing the shape of the observations returned by the Environment.
- property observation_space: Space#
Return the observation_space of the environment.
- render(mode='human')[source]#
Render the environment using OpenGL. This wraps the OpenAI render method.
- reset(return_state=True)[source]#
Reset the environment.
Reset the environment and returns the first observation, or the first (state, obs, info) tuple.
- Parameters:
return_state (bool) – If true return a also the initial state of the env.
- Returns:
Observation of the environment if return_state is False. Otherwise, return (state, obs) after reset.
- sample_action()[source]#
Return a valid action that can be used to step the Environment.
Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.
- set_state(state)[source]#
Set the internal state of the simulation.
- Parameters:
state – Target state to be set in the environment.
- setup()[source]#
Initialize the target environment with the parameters provided at __init__.
- Return type:
None
- static split_similar_chunks(vector, n_chunks)[source]#
Split an indexable object into similar chunks.
- Parameters:
vector (list | ndarray) – Target indexable object to be split.
n_chunks (int) – Number of similar chunks.
- Returns:
Generator that returns the chunks created after splitting the target object.
- Return type:
Generator[list | ndarray, None, None]
- step(action, state=None, dt=1, return_state=None)[source]#
Step the environment applying a given action from an arbitrary state.
If is not provided the signature matches the step method from OpenAI gym.
- Parameters:
action (ndarray) – Array containing the action to be applied.
state (ndarray | None) – State to be set before stepping the environment.
dt (int) – Consecutive number of times to apply the given action.
return_state (bool | None) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns:
if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos).
- step_batch(actions, states=None, dt=1, return_state=None)[source]#
Vectorized version of the
stepmethod.It allows to step a vector of states and actions. The signature and behaviour is the same as
step, but taking a list of states, actions and dts as input.- Parameters:
actions (ndarray) – Iterable containing the different actions to be applied.
states (ndarray | None) – Iterable containing the different states to be set.
dt (ndarray | int) – int or array containing the frameskips that will be applied.
return_state (bool | None) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns:
if states is None returns (observs, rewards, ends, infos) else (new_states, observs, rewards, ends, infos).
- step_with_dt(action, dt=1)[source]#
Step the environment
dttimes with the same action.Take
dtsimulation steps and make the environment evolve in multiples ofself.frameskipfor a total ofdt*self.frameskipsteps.- Parameters:
action (ndarray | int | float) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.
- Returns:
If state is None returns (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info).
- Return type:
tuple