D3PG: Dirichlet DDPG for Task Partitioning and Offloading with
Constrained Hybrid Action Space in Mobile Edge Computing
Abstract
Mobile Edge Computing (MEC) has been regarded as a promising paradigm to
reduce service latency for data processing in Internet of Things, by
provisioning computing resources at network edge. In this work, we
jointly optimize the task partitioning and computational power
allocation for computation offloading in a dynamic environment with
multiple IoT devices and multiple edge servers. We formulate the problem
as a Markov decision process with constrained hybrid action space, which
cannot be well handled by existing deep reinforcement learning (DRL)
algorithms. Therefore, we develop a novel Deep Reinforcement Learning
called Dirichlet Deep Deterministic Policy Gradient (D3PG), which
is built on Deep Deterministic Policy Gradient (DDPG) to solve the
problem. The developed model can learn to solve multi-objective
optimization, including maximizing the number of tasks processed before
expiration and minimizing the energy cost and service latency. More
importantly, D3PG can effectively deal with constrained
distribution-continuous hybrid action space, where the distribution
variables are for the task partitioning and offloading, while the
continuous variables are for computational frequency control. Moreover,
the D3PG can address many similar issues in MEC and general
reinforcement learning problems. Extensive simulation results show that
the proposed D3PG outperforms the state-of-art methods.
Mobile Edge Computing (MEC) has been regarded as a promising paradigm to
reduce service latency for data processing in Internet of Things, by
provisioning computing resources at network edge. In this work, we
jointly optimize the task partitioning and computational power
allocation for computation offloading in a dynamic environment with
multiple IoT devices and multiple edge servers. We formulate the problem
as a Markov decision process with constrained hybrid action space, which
cannot be well handled by existing deep reinforcement learning (DRL)
algorithms. Therefore, we develop a novel Deep Reinforcement Learning
called Dirichlet Deep Deterministic Policy Gradient (D3PG), which is
built on Deep Deterministic Policy Gradient (DDPG) to solve the problem.
The developed model can learn to solve multi-objective optimization,
including maximizing the number of tasks processed before expiration and
minimizing the energy cost and service latency. More importantly, D3PG
can effectively deal with constrained distribution-continuous hybrid
action space, where the distribution variables are for the task
partitioning and offloading, while the continuous variables are for
computational frequency control. Moreover, the D3PG can address many
similar issues in MEC and general reinforcement learning problems.
Extensive simulation results show that the proposed D3PG outperforms the
state-of-art methods.