Urban traffic congestion is a critical issue impacting travel time, fuel consumption, and air quality. Traditional traffic management systems rely on static rules and limited sensor feedback, which fail to adapt to dynamic and unpredictable conditions. This paper proposes an Adaptive Reinforcement Learning (ARL) approach for optimizing traffic signals within smart cities. The ARL model leverages continuous environmental feedback to adjust signal timing based on real-time vehicular flow. Simulations using a synthetic traffic network demonstrate that the proposed model reduces average waiting time by 28%, improves throughput by 21%, and decreases CO₂ emissions by 16% compared to traditional fixed-time control. These results indicate that ARL is a promising direction for sustainable urban mobility
Traffic congestion remains a major challenge in modern cities. Static and semi-adaptive systems, though efficient under predictable patterns, cannot cope with stochastic variations in vehicle density. Reinforcement Learning (RL) provides a self-learning framework where an agent interacts with its environment, receives feedback, and learns an optimal policy.
This research introduces an Adaptive Reinforcement Learning (ARL) framework capable of dynamically tuning parameters according to real-time changes, ensuring stable and efficient control even under uncertain traffic conditions
Recent studies have applied RL to traffic management with varying degrees of success. Van der Pol and Oliehoek (2016)demonstrated that Deep Q-Networks (DQN) outperform traditional Q-learning in non-linear traffic environments. Wei et al. (2018) introduced CoLight, a multi-agent RL approach for signal coordination. However, these methods often struggle with scalability and adaptability. Adaptive frameworks, as discussed by Genders and Razavi (2019), attempt to balance learning speed and stability.
This paper builds upon these foundations by incorporating adaptive reward functions and policy update rates that self-adjust according to congestion intensity.
Problem Formulation
Each traffic intersection is modeled as an RL agent. The state (S) includes queue lengths, waiting times, and neighboring intersection statuses. The action (A) represents the green-light duration for each lane direction. The reward (R) penalizes vehicle delays and rewards higher throughput.
Adaptive Reinforcement Learning Model
The ARL model modifies traditional Q-learning using an adaptive learning rate (α) and reward scaling:
Simulation Setup
Results and Analysis
|
Model |
Avg. Waiting Time (s) |
Throughput (veh/hr) |
CO₂ Emission (g/km) |
|
Fixed-Time |
72.4 |
820 |
140.6 |
|
Q-Learning |
56.8 |
960 |
126.3 |
|
ARL (Proposed) |
52.1 |
1160 |
118.0 |
Table 1: Performance comparison of traffic control methods.
Analysis:
As shown in Table 1, the proposed ARL method achieves a 28% reduction in waiting time compared to fixed-time control and 8% improvement over conventional Q-Learning. Figure 1 (below) shows the cumulative reward convergence, demonstrating faster stabilization with ARL due to dynamic adaptation.
Figure 1: Average Waiting Time by Model
Figure 2: Throughput Comparison
Figure 3: CO₂ Emission by Model
Figure 4: Cumulative Reward Convergence Curve
The results confirm that adaptability in learning rate and reward scaling enhances convergence speed and performance stability. Unlike static RL, ARL maintains efficiency during unexpected traffic surges. The scalability to larger networks is promising, though further optimization is needed to reduce computational cost in multi-agent scenarios.
This study demonstrates that Adaptive Reinforcement Learning significantly improves traffic flow and reduces congestion. Future work will focus on:
Deployment on edge-AI platforms for real-time inference.