A complete 3D physics simulation environment for differential drive robot navigation and reinforcement learning using PyBullet. Building on the foundations of 2D simulation (IR-SIM), this project takes the next step into realistic 3D environments with full sensor simulation, path planning algorithms, and PPO-based RL training achieving 100% success rate.
2D simulations are great for learning, but real robots live in a 3D world with mass, momentum, friction, and gravity. The challenge was to create a simulation environment that:
The goal: train an RL agent to navigate a cluttered warehouse environment with 31 obstacles, achieving consistent goal-reaching behavior.
Full Sensor Suite
Path Planning
Reinforcement Learning
Multiple Environments
| Category | Technologies |
|---|---|
| Simulation | PyBullet, URDF |
| RL Framework | Stable Baselines3, Gymnasium |
| Path Planning | A*, RRT |
| Control | PID Controller |
| Visualization | PyBullet GUI, Matplotlib |
| Platform | Python 3.10, Conda, Apple Silicon (MPS) |
┌─────────────────────────────────────────────────────────────┐
│ PyBullet Physics Engine │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ URDF Robot │ │ Environment │ │ Sensor System │ │
│ │ Model │ │ Builder │ │ (Lidar/Camera/ │ │
│ │ │ │ │ │ IMU/Odom) │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ Navigation Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ A* Planner │ │ RRT Planner │ │ PID Controller │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│ RL Training Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Gym Wrapper │ │ PPO Agent │ │ Reward Shaping │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Initial RL training was a disaster. The robot would spin in circles, oscillate back and forth, or stop just short of the goal. Success rate: 0-2 out of 5 episodes.
Simplified the reward function to focus on what matters:
| Component | Value | Purpose |
|---|---|---|
| Progress | +1.0/meter | Primary signal — move toward goal |
| Goal reached | +50.0 | Clear success signal |
| Collision | -10.0 | Moderate, not crushing |
| Time | -0.01/step | Encourage efficiency |
| Heading alignment | +0.1 | Guide early learning |
| Metric | Value |
|---|---|
| Success Rate | 5/5 (100%) |
| Average Reward | 57.43 ± 3.83 |
| Average Steps | 164 ± 58 |
| Distance Traveled | 3.06m ± 1.09m |
Both the navigation controller and reward function improved dramatically when I removed complexity. My initial navigation controller had stuck detection, oscillation prevention, and multiple state machines — and it failed. A simple PID controller succeeded.
That missing distance-to-goal observation was a showstopper. The agent literally couldn't perceive when it was close to success. Always verify your observations contain the information needed for the task.
Harsh penalties (-50 for collision) prevented exploration. Moderate penalties (-10) guide behavior without crushing the learning signal.
I discovered visualization was consuming 424ms per frame — not physics or planning. Reducing update frequency brought the simulation from 0.8 FPS to 4.2 FPS.
A* plans faster (~0.1ms) but RRT produced better overall navigation (fewer replanning cycles). Always measure end-to-end, not just component performance.
# Clone repository git clone https://github.com/padawanabhi/pybullet_sim.git cd pybullet_sim # Setup environment (conda recommended for macOS) conda create -n pybullet_sim python=3.10 conda activate pybullet_sim conda install -c conda-forge pybullet pip install -r requirements.txt # Test basic simulation python scripts/01_hello_pybullet.py # Run navigation demo python scripts/06_navigate_environment.py # Train RL agent python scripts/train_ppo.py --timesteps 1000000 # Evaluate trained model python scripts/evaluate.py --episodes 5
# PPO hyperparameters used { "algorithm": "PPO", "timesteps": 1_000_000, "parallel_envs": 4, "learning_rate": 3e-4, "n_steps": 2048, "batch_size": 64, "device": "mps" # Apple Silicon }