src.plangym.core

src.plangym.core#

Plangym API implementation.

Attributes#

wrap_callable

Classes#

`PlanEnv`	Inherit from this class to adapt environments to different problems.
`PlangymEnv`	Base class for implementing OpenAI `gym` environments in `plangym`.

Module Contents#

src.plangym.core.wrap_callable#

class src.plangym.core.PlanEnv(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]#

Bases: abc.ABC

Inherit from this class to adapt environments to different problems.

Base class that establishes all needed methods and blueprints to work with Gym environments.

Parameters:

name (str)
frameskip (int)
autoreset (bool)
delay_setup (bool)
return_image (bool)

STATE_IS_ARRAY = True#

OBS_IS_ARRAY = True#

SINGLETON = False#

_name#

frameskip#

autoreset#

delay_setup#

_return_image#

_n_step = 0#

_obs_step = None#

_reward_step = 0#

_terminal_step = False#

_truncated_step = False#

_info_step#

_action_step = None#

_dt_step = None#

_state_step = None#

_return_state_step = None#

__del__()[source]#: Teardown the Environment when it is no longer needed.

property name: str#

Return is the name of the environment.

Return type:: str

property obs_shape: tuple[int]#

Abstractmethod:
Return type:: tuple[int]

Tuple containing the shape of the observations returned by the Environment.

property action_shape: tuple[int]#

Abstractmethod:
Return type:: tuple[int]

Tuple containing the shape of the actions applied to the Environment.

property unwrapped: PlanEnv#

Completely unwrap this Environment.

Returns: plangym.Environment: The base non-wrapped plangym.Environment instance

Return type:: PlanEnv

property return_image: bool#

Return return_image flag.

If True add an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.

Return type:: bool

property img_shape: tuple[int, Ellipsis] | None#

Return the shape of the image returned by the environment.

If the environment does not return an image, it will return None. This also applies to environments that throw an error when trying to get the image (like when running in headless machines without a virtual display).

Return type:: tuple[int, Ellipsis] | None

abstract get_image()[source]#

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a grayscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)

Return type:: None | numpy.ndarray

step(action, state=None, dt=1, return_state=None)[source]#

Step the environment applying the supplied action.

Optionally set the state to the supplied state before stepping it (the method prepares the environment in the given state, dismissing the current state, and applies the action afterwards).

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

In addition, the method allows the user to prepare the returned object, adding additional information and custom pre-processings via self.process_step and self.get_step_tuple methods.

Parameters:

action (numpy.ndarray | int | float) – Chosen action applied to the environment.
state (numpy.ndarray) – Set the environment to the given state before stepping it.
dt (int) – Consecutive number of times that the action will be applied.
return_state (bool | None) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.

Returns:

if state is None returns (observs, reward, terminal, info) else returns (new_state, observs, reward, terminal, info)

Return type:

tuple

reset(return_state=True)[source]#

Restart the environment.

Parameters:: return_state (bool) – If True, it will return the state of the environment.
Returns:: (state, obs) if `return_state is True else return obs.
Return type:: numpy.ndarray | tuple[numpy.ndarray, numpy.ndarray]

step_batch(actions, states=None, dt=1, return_state=True)[source]#

Allow stepping a vector of states and actions.

Vectorized version of the step method. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.

Parameters:

actions (numpy.ndarray | Iterable[numpy.ndarray | int]) – Iterable containing the different actions to be applied.
states (numpy.ndarray | Iterable) – Iterable containing the different states to be set.
dt (int | numpy.ndarray) – int or array containing the consecutive that will be applied to each state. If array, the different values are distributed among the multiple environments (contrary to self.frameskip, which is a common value for any instance).
return_state (bool) – Whether to return the state in the returned tuple, depending on the boolean value. If None, step will return the state if state was passed as a parameter.

Returns:

If return_state is True, the method returns (new_states, observs, rewards, ends, infos). If return_state is False, the method returns (observs, rewards, ends, infos). If return_state is None, the returned object depends on the states parameter.

Return type:

tuple[list | numpy.ndarray, Ellipsis]

clone(**kwargs)[source]#

Return a copy of the environment.

Return type:: PlanEnv

sample_action()[source]#

Return a valid action that can be used to step the Environment.

Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.

step_with_dt(action, dt=1)[source]#

Step the environment applying the supplied action dt times.

Take dt simulation steps and make the environment evolve in multiples of self.frameskip for a total of dt * self.frameskip steps.

The method performs any post-processing to the data after applying the action to the environment via self.process_apply_action.

This method neither computes nor returns any state.

Parameters:

action (numpy.ndarray | int | float) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.

Returns:

Tuple containing (observs, reward, terminal, info).

run_autoreset(step_data)[source]#: Reset the environment automatically if needed.

get_step_tuple(obs, reward, terminal, truncated, info)[source]#

Prepare the tuple that step returns.

This is a post processing state to have fine-grained control over what data the current step is returning.

By default it determines:

Return the state in the tuple (necessary information to save or load the game).
Adding the “rgb” key in the info dictionary containing an RGB representation of the environment.

Parameters:

obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
truncated – Boolean indicating if the environment was truncated.

Returns:

Tuple containing the environment data after calling step.

setup()[source]#

Run environment initialization.

Including in this function all the code which makes the environment impossible to serialize will allow to dispatch the environment to different workers and initialize it once it’s copied to the target process.

Return type:: None

begin_step(action=None, dt=None, state=None, return_state=None)[source]#

Perform setup of step variables before starting step_with_dt.

Parameters:: return_state (bool | None)

process_apply_action(obs, reward, terminal, truncated, info)[source]#

Perform any post-processing to the data returned by apply_action.

Parameters:

obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
truncated – Boolean indicating if the environment was truncated.

Returns:

Tuple containing the processed data.

process_step(obs, reward, terminal, truncated, info)[source]#

Prepare the returned info dictionary.

This is a post processing step to have fine-grained control over what data the info dictionary contains.

Parameters:

obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
truncated – Boolean indicating if the environment was truncated.

Returns:

Tuple containing the environment data after calling step.

close()[source]#

Tear down the current environment.

Return type:: None

process_obs(obs, **kwargs)[source]#: Perform optional computation for computing the observation returned by step.

process_reward(reward, **kwargs)[source]#

Perform optional computation for computing the reward returned by step.

Return type:: float

process_terminal(terminal, **kwargs)[source]#

Perform optional computation for computing the terminal flag returned by step.

Return type:: bool

process_info(info, **kwargs)[source]#

Perform optional computation for computing the info dictionary returned by step.

Return type:: dict[str, Any]

abstract apply_action(action)[source]#: Evolve the environment for one time step applying the provided action.

abstract apply_reset(**kwargs)[source]#: Perform the resetting operation on the environment.

abstract get_state()[source]#

Recover the internal state of the simulation.

A state must completely describe the Environment at a given moment.

Return type:: Any

abstract set_state(state)[source]#

Set the internal state of the simulation. Overwrite current state by the given argument.

Parameters:: state (Any) – Target state to be set in the environment.
Returns:: None
Return type:: None

class src.plangym.core.PlangymEnv(name, frameskip=1, autoreset=True, wrappers=None, delay_setup=False, remove_time_limit=True, render_mode='rgb_array', episodic_life=False, obs_type=None, return_image=False, **kwargs)[source]#

Bases: PlanEnv

Base class for implementing OpenAI gym environments in plangym.

Parameters:

name (str)
frameskip (int)
autoreset (bool)
wrappers (Iterable[wrap_callable] | None)
delay_setup (bool)
remove_time_limit (bool)
render_mode (str | None)

AVAILABLE_RENDER_MODES#

AVAILABLE_OBS_TYPES#

DEFAULT_OBS_TYPE = 'coords'#

property render_mode: None | str#

None | human | rgb_array.

Type:: Return how the game will be rendered. Values
Return type:: None | str

_render_mode#

_gym_env = None#

_gym_env_kwargs#

_remove_time_limit#

_wrappers#

_obs_space = None#

_action_space = None#

_obs_type#

__str__()[source]#: Pretty print the environment.

__repr__()[source]#: Pretty print the environment.

property gym_env#
Return the instance of the environment that is being wrapped by plangym.

property obs_shape: tuple[int, Ellipsis] | None#

Tuple containing the shape of the observations returned by the Environment.

Return type:: tuple[int, Ellipsis] | None

property obs_type: str#

Return the type of observation returned by the environment.

Return type:: str

property observation_space: gymnasium.spaces.Space#

Return the observation_space of the environment.

Return type:: gymnasium.spaces.Space

property action_shape: tuple[int, Ellipsis]#

Tuple containing the shape of the actions applied to the Environment.

Return type:: tuple[int, Ellipsis]

property action_space: gymnasium.spaces.Space#

Return the action_space of the environment.

Return type:: gymnasium.spaces.Space

property reward_range#
Return the *reward_range* of the environment.

property metadata#
Return the *metadata* of the environment.

property remove_time_limit: bool#

Return True if the Environment can only be stepped for a limited number of times.

Return type:: bool

setup()[source]#

Initialize the target gym.Env instance.

The method calls self.init_gym_env to initialize the :class:gym.Env instance. It removes time limits if needed and applies wrappers introduced by the user.

init_spaces()[source]#: Initialize the action_space and observation_space of the environment.

_init_action_space()[source]#

_init_obs_space_rgb()[source]#

_init_obs_space_grayscale()[source]#

_init_obs_space_coords()[source]#

get_image()[source]#

Return a numpy array containing the rendered view of the environment.

Square matrices are interpreted as a greyscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB).

Return type:: numpy.ndarray

apply_reset()[source]#

Restart the environment.

Returns: (obs, info). If `return_image is True, the info dictionary contains an 'rgb' key with the corresponding image.

Return type:: tuple[numpy.ndarray, dict[str, Any]]

apply_action(action)[source]#

Evolve the environment for one time step applying the provided action.

Accumulate rewards and calculate terminal flag after stepping the environment.

sample_action()[source]#

Return a valid action that can be used to step the environment chosen at random.

Return type:: int | numpy.ndarray

clone(**kwargs)[source]#

Return a copy of the environment.

Return type:: PlangymEnv

close()[source]#: Close the underlying gym.Env.

init_gym_env()[source]#

Initialize the :class:gym.Env instance that the current class is wrapping.

Return type:: gymnasium.Env

seed(seed=None)[source]#: Seed the underlying gym.Env.

apply_wrappers(wrappers)[source]#

Wrap the underlying OpenAI gym environment.

Parameters:: wrappers (Iterable[wrap_callable])

wrap(wrapper, *args, **kwargs)[source]#

Apply a single OpenAI gym wrapper to the environment.

Parameters:: wrapper (Callable)

render()[source]#: Render the environment using OpenGL. This wraps the OpenAI render method.

process_obs(obs, **kwargs)[source]#

Perform optional computation for computing the observation returned by step.

This is a post processing step to have fine-grained control over the returned observation.

get_coords_obs(obs, **kwargs)[source]#: Calculate the observation returned by step when obs_type == “coords”.

get_rgb_obs(obs, **kwargs)[source]#: Calculate the observation returned by step when obs_type == “rgb”.

get_grayscale_obs(obs, **kwargs)[source]#: Calculate the observation returned by step when obs_type == “grayscale”.