This is an introductory tutorial to the main features of plangym.

Working with states

reset and step return the environment state

The main difference with the gym API is that environment state is considered as important as observations, rewards and terminal flags. This is why plangym incorporates them to the tuples that the environment returns after calling step and reset:

  • The reset method will return a tuple of (state, observation) unless you pass return_state=False as an argument.

  • When step is called passing the environment state as an argument it will return a tuple containing (state, obs, reward, end, info)

import plangym

env = plangym.make("CartPole-v0")
action = env.action_space.sample()

state, obs = env.reset()
state, obs, reward, end, info = env.step(action, state)

However, if you don’t provide the environment state when calling step, the returned tuple will match the standard gym interface:

env = plangym.make("CartPole-v0")
action = env.action_space.sample()

obs = env.reset(return_state=False)
obs, reward, end, info = env.step(action)

Accessing and modifying the environment state

You can get a copy of the environment’s state calling env.get_state():

state = env.get_state()
state
array([ 0.02895645,  0.16301074,  0.0365989 , -0.30677202])

And set the environment state using env.set_state(state)

env.set_state(state)
assert (state == env.get_state()).all()

Step vectorization

All plangym environments offer a step_batch method that allows vectorized steps of batches of states and actions.

Calling step_batch with a list of states and actions will return a tuple of lists containing the step data for each of the states and actions provided.

states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(new_states), type(observs)
(list, list)

Parallel step vectorization using multiprocessing

Passing the argument n_workers to plangym.make will return an environment that steps a batch of actions and states in parallel using multiprocessing.

env = plangym.make("CartPole-v0", n_workers=2)
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(env), type(new_states), type(observs)
(plangym.vectorization.parallel.ParallelEnv, list, list)

Step vectorization using ray

It is possible to use ray actors to step the environment in parallel when calling step_batch by passing ray=True to plangym.make

import ray
ray.init()

env = plangym.make("CartPole-v0", n_workers=2, ray=True)
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]

data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(env), type(new_states), type(observs)
2023-03-28 12:00:33,159	WARNING services.py:1780 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=1.98gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-03-28 12:00:33,312	INFO worker.py:1553 -- Started a local Ray instance.
(plangym.vectorization.ray.RayEnv, list, list)