Training
Step 4 – Train the G1 Standing Policy¶
Once the extension is installed and the environments are registered, you can start teaching the G1 humanoid to stand still using the RSL-RL training script.
Conceptually, training will:
- Create many parallel copies of the G1 environment.
- Let the PPO policy try actions and receive standing‑related rewards.
- Update the policy parameters over thousands of iterations until it learns to balance.
All commands in this step are run from the extension project root (<G1_STAND_ROOT>).
Run¶
Start training (headless)
From <G1_STAND_ROOT>, run:
python scripts/rsl_rl/train.py --task G1-Stand-Flat-v0 --headless --num_envs 4096
If your GPU has less memory, use fewer envs:
python scripts/rsl_rl/train.py --task G1-Stand-Flat-v0 --headless --num_envs 1024
What the flags do: --task G1-Stand-Flat-v0 selects your registered env; --headless runs without GUI; --num_envs sets parallel environments (lower if OOM).
During training, logs and checkpoints are written under:
logs/rsl_rl/g1_stand_flat/...
This path is controlled by the PPO config’s:
self.experiment_name = "g1_stand_flat"
in G1StandFlatPPORunnerCfg.
Optional: other commands¶
Override max iterations from CLI
python scripts/rsl_rl/train.py --task G1-Stand-Flat-v0 --headless --num_envs 2048 --max_iterations 2000
Record training videos
python scripts/rsl_rl/train.py --task G1-Stand-Flat-v0 --num_envs 512 --video --video_interval 2000 --video_length 200
Monitoring progress¶
Training typically prints:
- Iteration number.
- Reward statistics (mean/median).
- Loss values.
You can also monitor training via TensorBoard (see Visualization → Inspect training with TensorBoard).
What the terminal should look like¶
When everything is wired up correctly and train.py is running, your terminal will look roughly like this:

Figure: Example RSL-RL training output in the terminal (iteration, reward stats, and reward term breakdown) for the G1 standing task.
Expected training outcome¶
If everything is wired correctly, you should see over time:
- Average episode reward increasing (or at least stabilizing after some oscillation).
- Fewer terminations from falling.
- In the play environment, the G1 robot should:
- Start with wobbly, unstable behavior.
- Gradually learn to recover and stand more upright.
- Eventually stand mostly still, with only small corrective motions.
How fast this happens depends on:
- Number of environments (
--num_envsandscene.num_envs). - GPU performance.
- Reward weights and PPO hyperparameters.
Common mistakes¶
-
Training crashes with
AttributeErroron rewards
This usually means a reward term name is not present in your installed Isaac Lab version.
The environment config useshasattr(self.rewards, "...")before changing weights to avoid this, but if you add new reward references, keep usinghasattr. -
Out of memory (OOM) on GPU
Reduce--num_envs:
python scripts/rsl_rl/train.py --task G1-Stand-Flat-v0 --headless --num_envs 512
- Training seems stuck at low reward
Check: - Reward weights in
G1StandFlatEnvCfg. - PPO hyperparameters in
G1StandFlatPPORunnerCfg. - Whether the robot is actually standing in the play environment (see the next step).
Hardware tip
If you run on a laptop GPU or a card with limited VRAM, it is perfectly fine to:
- Lower --num_envs (e.g., 256–1024).
- Run fewer iterations first to validate the setup, then scale up.
Checklist after running¶
- Confirm
logs/rsl_rl/g1_stand_flat/has at least one run directory. - Check console for reward/loss logs.
- Optionally open TensorBoard (see Visualization) to inspect scalars.