This is an introductory tutorial to the main features of plangym.

Working with states#

reset and step return the environment state#

The main difference with the gym API is that environment state is considered as important as observations, rewards and terminal flags. This is why plangym incorporates them to the tuples that the environment returns after calling step and reset:

  • The reset method will return a tuple of (state, observation) unless you pass return_state=False as an argument.

  • When step is called passing the environment state as an argument it will return a tuple containing (state, obs, reward, end, info)

import plangym

env = plangym.make("CartPole-v0")
action = env.action_space.sample()

state, obs = env.reset()
state, obs, reward, end, info = env.step(action, state)

However, if you don’t provide the environment state when calling step, the returned tuple will match the standard gym interface:

env = plangym.make("CartPole-v0")
action = env.action_space.sample()

obs = env.reset(return_state=False)
obs, reward, end, info = env.step(action)

Accessing and modifying the environment state#

You can get a copy of the environment’s state calling env.get_state():

state = env.get_state()
state
array([ 0.03145539,  0.17749025,  0.01348916, -0.25611924])

And set the environment state using env.set_state(state)

env.set_state(state)
assert (state == env.get_state()).all()

Step vectorization#

All plangym environments offer a step_batch method that allows vectorized steps of batches of states and actions.

Calling step_batch with a list of states and actions will return a tuple of lists containing the step data for each of the states and actions provided.

states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(new_states), type(observs)
(list, list)

Parallel step vectorization using multiprocessing#

Passing the argument n_workers to plangym.make will return an environment that steps a batch of actions and states in parallel using multiprocessing.

env = plangym.make("CartPole-v0", n_workers=2)
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(env), type(new_states), type(observs)
(plangym.parallel.ParallelEnv, list, list)

Step vectorization using ray#

It is possible to use ray actors to step the environment in parallel when calling step_batch by passing ray=True to plangym.make

import ray
ray.init()

env = plangym.make("CartPole-v0", n_workers=2, ray=True)
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(env), type(new_states), type(observs)
2021-12-13 10:01:47,772	INFO services.py:1247 -- View the Ray dashboard at http://127.0.0.1:8265
(plangym.ray.RayEnv, list, list)