Have a personal or library account? Click to login
Deep reinforcement learning-based approach for control of Two Input–Two Output process control system Cover

Deep reinforcement learning-based approach for control of Two Input–Two Output process control system

Open Access
|Jul 2025

Figures & Tables

Figure 1:

Overall structure of the MIMO control system.
Overall structure of the MIMO control system.

Figure 2:

TITO system with controller (TITO). TITO, two input–two output.
TITO system with controller (TITO). TITO, two input–two output.

Figure 3:

Simple flow chart of TITO control system using DDPG. DDPG, deep deterministic policy gradient; TITO, two input–two output.
Simple flow chart of TITO control system using DDPG. DDPG, deep deterministic policy gradient; TITO, two input–two output.

Figure 4:

Critic Network design for DDPG for TITO system. DDPG, deep deterministic policy gradient; TITO, two input–two output.
Critic Network design for DDPG for TITO system. DDPG, deep deterministic policy gradient; TITO, two input–two output.

Figure 5:

Actor network design for DDPG for TITO system. DDPG, deep deterministic policy gradient; TITO, two input–two output.
Actor network design for DDPG for TITO system. DDPG, deep deterministic policy gradient; TITO, two input–two output.

Figure 6:

Simulink model for TITO system. TITO, two input–two output.
Simulink model for TITO system. TITO, two input–two output.

Figure 7:

Reward function representation using DDPG for TITO system. DDPG, deep deterministic policy gradient; TITO, two input–two output.
Reward function representation using DDPG for TITO system. DDPG, deep deterministic policy gradient; TITO, two input–two output.

Figure 8:

Training performance of the DDPG agent for TITO for set point tracking. DDPG, deep deterministic policy gradient; TITO, two input–two output.
Training performance of the DDPG agent for TITO for set point tracking. DDPG, deep deterministic policy gradient; TITO, two input–two output.

Figure 9:

Reward function progression of the DDPG agent for TITO system for set point tracking. DDPG, deep deterministic policy gradient; TITO, two input–two output.
Reward function progression of the DDPG agent for TITO system for set point tracking. DDPG, deep deterministic policy gradient; TITO, two input–two output.

Figure 10:

Simulation of transfer function on Loop 1.
Simulation of transfer function on Loop 1.

Figure 11:

MV values for Loop 1. MV, manipulated variable.
MV values for Loop 1. MV, manipulated variable.

Figure 12:

Simulation of transfer function on Loop 2.
Simulation of transfer function on Loop 2.

Figure 13:

MV values for Loop 2.
MV values for Loop 2.

Figure 14:

Comparison of proposed method on Loop 1 with traditional methods. DDPG, deep deterministic policy gradient.
Comparison of proposed method on Loop 1 with traditional methods. DDPG, deep deterministic policy gradient.

Figure 15:

Comparison of proposed methods on Loop 2 with traditional methods. DDPG, deep deterministic policy gradient.
Comparison of proposed methods on Loop 2 with traditional methods. DDPG, deep deterministic policy gradient.

Figure 16:

Response to disturbance on Loop 1. DDPG, deep deterministic policy gradient.
Response to disturbance on Loop 1. DDPG, deep deterministic policy gradient.

Figure 17:

Response to disturbance on Loop 2. DDPG, deep deterministic policy gradient.
Response to disturbance on Loop 2. DDPG, deep deterministic policy gradient.

Parameters for configuration of DDPG agent

ParameterDescriptionValue
Discount factor (γ)Future reward discounting0.99
Target smooth factor (τ)Target network update rate0.001
Actor learning rateLearning rate for actor updates0.0001
Critic learning rateLearning rate for critic updates0.001
Mini-batch sizeSample size for experience replay64
Experience buffer lengthTotal memory for experience replay1,000,000

Analogy of the traditional system with DRL principles

DRL componentTraditional control equivalentDescription
AgentControllerDecides the actions to control the system.
EnvironmentPlant/processThe system is being controlled.
StateSystem measurementsInformation about the system’s current status.
ActionControl inputAdjustments made to influence the process.
RewardError feedbackGuides the agent to improve performance.
PolicyControl lawStrategy linking states to optimal actions.

Performance indices of Loop 2

MethodISEIAEITSEITAEOvershoot (%)Settling timeSteady-state error
DDPG137.779.133.217e + 041.707e + 040420
NDT[PI]122.182.694.434e + 042.515e + 04601100
Mvall [PI]510.3275.32.228e + 051.305e + 0403800
Wang et al [PID]81.8261.272.947e + 041.856e + 0430850

Performance indices of Loop 1

MethodISEIAEITSEITAEOvershoot (%)Settling timeSteady-state error
DDPG18.3129.92722.9332535480
NDT [PI]26.8239.966311.032e + 04251000
Mvall [PI]34.6147.25488.3188001500
Wang et al [PID]16.2624.823206651720530
Language: English
Submitted on: Mar 1, 2025
Published on: Jul 1, 2025
Published by: Professor Subhas Chandra Mukhopadhyay
In partnership with: Paradigm Publishing Services
Publication frequency: 1 times per year

© 2025 Anil Kadu, Aniket Khandekar, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.