plangym
Various environments for plangym.
Subpackages
Submodules
Package Contents
Classes
Inherit from this class to adapt environments to different problems. |
Functions
|
Create the appropriate PlangymEnv from the environment name and other parameters. |
Attributes
- class plangym.PlanEnv(name, frameskip=1, autoreset=True, delay_setup=False, return_image=False)[source]
Bases:
abc.ABC
Inherit from this class to adapt environments to different problems.
Base class that establishes all needed methods and blueprints to work with Gym environments.
- Parameters
name (str) –
frameskip (int) –
autoreset (bool) –
delay_setup (bool) –
return_image (bool) –
- STATE_IS_ARRAY = True
- OBS_IS_ARRAY = True
- SINGLETON = False
- property name(self)
Return is the name of the environment.
- Return type
str
- property obs_shape(self)
Tuple containing the shape of the observations returned by the Environment.
- Return type
Tuple[int]
- property action_shape(self)
Tuple containing the shape of the actions applied to the Environment.
- Return type
Tuple[int]
- property unwrapped(self)
Completely unwrap this Environment.
- Returns
The base non-wrapped plangym.Environment instance
- Return type
plangym.Environment
- property return_image(self)
Return return_image flag.
If
True
add an “rgb” key in the info dictionary returned by step that contains an RGB representation of the environment state.- Return type
bool
- abstract get_image(self)[source]
Return a numpy array containing the rendered view of the environment.
Square matrices are interpreted as a grayscale image. Three-dimensional arrays are interpreted as RGB images with channels (Height, Width, RGB)
- Return type
Union[None, numpy.ndarray]
- step(self, action, state=None, dt=1, return_state=None)[source]
Step the environment applying the supplied action.
Optionally set the state to the supplied state before stepping it (the method prepares the environment in the given state, dismissing the current state, and applies the action afterwards).
Take
dt
simulation steps and make the environment evolve in multiples ofself.frameskip
for a total ofdt
*self.frameskip
steps.In addition, the method allows the user to prepare the returned object, adding additional information and custom pre-processings via
self.process_step
andself.get_step_tuple
methods.- Parameters
action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
state (numpy.ndarray) – Set the environment to the given state before stepping it.
dt (int) – Consecutive number of times that the action will be applied.
return_state (Optional[bool]) – Whether to return the state in the returned tuple. If None, step will return the state if state was passed as a parameter.
- Returns
if state is None returns
(observs, reward, terminal, info)
else returns(new_state, observs, reward, terminal, info)
- Return type
tuple
- reset(self, return_state=True)[source]
Restart the environment.
- Parameters
return_state (bool) – If
True
, it will return the state of the environment.- Returns
(state, obs)
if`return_state
isTrue
else returnobs
.- Return type
Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]
- step_batch(self, actions, states=None, dt=1, return_state=True)[source]
Allow stepping a vector of states and actions.
Vectorized version of the step method. The signature and behaviour is the same as step, but taking a list of states, actions and dts as input.
- Parameters
actions (Union[numpy.ndarray, Iterable[Union[numpy.ndarray, int]]]) – Iterable containing the different actions to be applied.
states (Union[numpy.ndarray, Iterable]) – Iterable containing the different states to be set.
dt (Union[int, numpy.ndarray]) – int or array containing the consecutive that will be applied to each state. If array, the different values are distributed among the multiple environments (contrary to
self.frameskip
, which is a common value for any instance).return_state (bool) – Whether to return the state in the returned tuple, depending on the boolean value. If None, step will return the state if state was passed as a parameter.
- Returns
If return_state is True, the method returns (new_states, observs, rewards, ends, infos). If return_state is False, the method returns (observs, rewards, ends, infos). If return_state is None, the returned object depends on the states parameter.
- Return type
Tuple[Union[list, numpy.ndarray], Ellipsis]
- sample_action(self)[source]
Return a valid action that can be used to step the Environment.
Implementing this method is optional, and it’s only intended to make the testing process of the Environment easier.
- step_with_dt(self, action, dt=1)[source]
Take
dt
simulation steps and make the environment evolve in multiples ofself.frameskip
for a total ofdt
*self.frameskip
steps.The method performs any post-processing to the data after applying the action to the environment via
self.process_apply_action
.This method neither computes nor returns any state.
- Parameters
action (Union[numpy.ndarray, int, float]) – Chosen action applied to the environment.
dt (int) – Consecutive number of times that the action will be applied.
- Returns
Tuple containing
(observs, reward, terminal, info)
.
- get_step_tuple(self, obs, reward, terminal, info)[source]
Prepare the tuple that step returns.
This is a post processing state to have fine-grained control over what data the current step is returning.
- By default it determines:
Return the state in the tuple (necessary information to save or load the game).
Adding the “rgb” key in the info dictionary containing an RGB representation of the environment.
- Parameters
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
- Returns
Tuple containing the environment data after calling step.
- setup(self)[source]
Run environment initialization.
Including in this function all the code which makes the environment impossible to serialize will allow to dispatch the environment to different workers and initialize it once it’s copied to the target process.
- Return type
None
- begin_step(self, action=None, dt=None, state=None, return_state=None)[source]
Perform setup of step variables before starting step_with_dt.
- Parameters
return_state (bool) –
- process_apply_action(self, obs, reward, terminal, info)[source]
Perform any post-processing to the data returned by apply_action.
- Parameters
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
- Returns
Tuple containing the processed data.
- process_step(self, obs, reward, terminal, info)[source]
Prepare the returned info dictionary.
This is a post processing step to have fine-grained control over what data the info dictionary contains.
- Parameters
obs – Observation of the environment.
reward – Reward signal.
terminal – Boolean indicating if the environment is finished.
info – Dictionary containing additional information about the environment.
- Returns
Tuple containing the environment data after calling step.
- process_obs(self, obs, **kwargs)[source]
Perform optional computation for computing the observation returned by step.
- process_reward(self, reward, **kwargs)[source]
Perform optional computation for computing the reward returned by step.
- Return type
float
- process_terminal(self, terminal, **kwargs)[source]
Perform optional computation for computing the terminal flag returned by step.
- Return type
bool
- process_info(self, info, **kwargs)[source]
Perform optional computation for computing the info dictionary returned by step.
- Return type
Dict[str, Any]
- abstract apply_action(self, action)[source]
Evolve the environment for one time step applying the provided action.
- plangym.make(name=None, n_workers=None, ray=False, domain_name=None, state=None, **kwargs)[source]
Create the appropriate PlangymEnv from the environment name and other parameters.
- Parameters
name (str) –
n_workers (int) –
ray (bool) –
domain_name (str) –
state (str) –
- plangym.__version__ = 0.0.32