A complete robot simulation environment for differential drive robot navigation, localization, and reinforcement learning using IR-SIM (2D simulation). This project bridges classical robotics control algorithms with modern deep reinforcement learning, providing a hands-on learning platform for understanding both approaches. The system includes go-to-goal controllers, potential field navigation, odometry simulation, and full RL training pipeline with PPO, SAC, and TD3 algorithms—achieving 100% navigation success rate after implementing research-backed reward function design.
Robotics is often taught in isolation—either you learn classical control theory with differential equations and state-space models, or you dive into reinforcement learning with abstract environments like CartPole. But real-world robotics lives at the intersection of both worlds.
I wanted to build something that would:
The platform is organized into three main layers:
IR-SIM (2D Intelligent Robot Simulator) provides the physics backbone:
Before diving into RL, I implemented two classical navigation approaches:
Go-to-Goal Controller:
Potential Field Controller:
These controllers serve as baselines and help validate that the simulation environment works correctly.
The RL system wraps IR-SIM with a Gymnasium-compatible environment:
Observation Space (7 values):
dx_norm, dy_norm: Normalized direction to goaldtheta: Heading error to goalv, w: Current linear and angular velocitydist_to_obstacle: Minimum distance to nearest obstacledistance_to_goal: Euclidean distance to goalAction Space:
[0.0, 1.0] (forward-only constraint)[-1.0, 1.0] (turn left/right)Training Algorithms:
Training the RL agent wasn't straightforward. Here's the honest story:
My first version used a complex reward function with many competing signals:
The agent learned... to spin in circles. With 500,000 timesteps of training, it achieved only 20% success rate, and even those successes looked like luck rather than skill.
Deep diagnosis revealed several critical issues:
Issue 1: Action Space Mismatch
The action space allowed [-1.5, 1.5] velocities, but the robot config only supported [-1, 1]. The agent was learning invalid actions that got clipped—wasted training on impossible behaviors.
Issue 2: Collision Detection Bug
Using hasattr() instead of getattr() for IR-SIM shape objects caused silent failures. The obstacle detection was returning default values instead of actual distances. The agent was literally blind to obstacles.
Issue 3: Reward Signal Confusion Too many competing reward signals confused the learning process. The agent couldn't figure out what behavior was being rewarded.
After extensive literature review on RL navigation, I implemented:
Constrained Action Space: Forward-only movement [0, 1] for linear velocity—matching the go-to-goal controller behavior
Progress-Based Reward: Simple 10.0 × max(0, progress) instead of inverse distance—clear signal that rewards closing the gap
Excessive Rotation Penalty: -5.0 × (w - 0.8) when angular velocity exceeds 0.8—stops the spinning behavior
Simplified Signal: Removed all competing bonuses and penalties, focused on the primary objective
robot_sim/
├── configs/ # World configuration files
│ ├── simple_world.yaml # Basic test world
│ ├── rl_world.yaml # RL training world
│ └── rl_world_complex.yaml # Complex obstacle world
├── scripts/ # Python scripts
│ ├── 01_test_simulation.py # Basic simulation test
│ ├── 02_go_to_goal.py # Go-to-goal demo
│ ├── 03_potential_field.py # Potential field demo
│ ├── 04_odometry.py # Odometry visualization
│ ├── 05_rl_environment.py # Gymnasium wrapper
│ ├── 06_train_rl.py # RL training script
│ ├── 07_evaluate_rl.py # Model evaluation
│ └── ... # Diagnostic tools
├── models/ # Trained models (PPO/SAC/TD3)
└── logs/ # TensorBoard logs & visualizations
Before training RL, implement classical solutions. They validate your simulation, provide baselines, and their behavior informs RL design (e.g., forward-only constraint).
When RL fails, resist the urge to tune hyperparameters randomly. Instead:
Complex reward functions with many terms confuse the agent. Progress-based rewards with clear, consistent signals outperform elaborate multi-objective formulations.
Constraining the action space to match desired behavior prevents the agent from learning counterproductive strategies. If your robot shouldn't back up, don't allow negative linear velocity.
The most valuable output isn't the final model—it's the journey. The diagnostic scripts and change logs helped me understand what went wrong and why the fixes worked.
# Clone and setup git clone https://github.com/padawanabhi/robot_sim.git cd robot_sim python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt # Test simulation python scripts/01_test_simulation.py # Train RL agent python scripts/06_train_rl.py --algorithm ppo --timesteps 500000 # Evaluate trained model python scripts/07_evaluate_rl.py --algorithm ppo --episodes 5
tensorboard --logdir logs/tensorboard # Open http://localhost:6006