This is an introductory tutorial to the main features of plangym.
Working with states
reset
and step
return the environment state
The main difference with the gym
API is that environment state is considered as important as observations, rewards and terminal flags. This is why plangym incorporates them to the tuples that the environment returns after calling step
and reset
:
The
reset
method will return a tuple of (state, observation) unless you passreturn_state=False
as an argument.When
step
is called passing the environment state as an argument it will return a tuple containing(state, obs, reward, end, info)
import plangym
env = plangym.make("CartPole-v0")
action = env.action_space.sample()
state, obs = env.reset()
state, obs, reward, end, info = env.step(action, state)
However, if you don’t provide the environment state when calling step
, the returned tuple will match the standard gym
interface:
env = plangym.make("CartPole-v0")
action = env.action_space.sample()
obs = env.reset(return_state=False)
obs, reward, end, info = env.step(action)
Accessing and modifying the environment state
You can get a copy of the environment’s state calling env.get_state()
:
state = env.get_state()
state
array([ 0.02895645, 0.16301074, 0.0365989 , -0.30677202])
And set the environment state using env.set_state(state)
env.set_state(state)
assert (state == env.get_state()).all()
Step vectorization
All plangym environments offer a step_batch
method that allows vectorized steps of batches of states and actions.
Calling step_batch
with a list of states and actions will return a tuple of lists containing the step data for each of the states and actions provided.
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]
data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(new_states), type(observs)
(list, list)
Parallel step vectorization using multiprocessing
Passing the argument n_workers
to plangym.make
will return an environment that steps a batch of actions and states in parallel using multiprocessing.
env = plangym.make("CartPole-v0", n_workers=2)
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]
data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(env), type(new_states), type(observs)
(plangym.vectorization.parallel.ParallelEnv, list, list)
Step vectorization using ray
It is possible to use ray actors to step the environment in parallel when calling step_batch
by passing ray=True
to plangym.make
import ray
ray.init()
env = plangym.make("CartPole-v0", n_workers=2, ray=True)
states = [state.copy() for _ in range(10)]
actions = [env.action_space.sample() for _ in range(10)]
data = env.step_batch(states=states, actions=actions)
new_states, observs, rewards, ends, infos = data
type(env), type(new_states), type(observs)
2023-03-28 12:00:33,159 WARNING services.py:1780 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=1.98gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-03-28 12:00:33,312 INFO worker.py:1553 -- Started a local Ray instance.
(plangym.vectorization.ray.RayEnv, list, list)