2022-end editorial: achievements, thanks, perspectives

Carlo Bianca; Lombardo Domenico; Carlo Bianca; Lombardo Domenico

doi:10.3934/biophy.2023007

AIMS Biophysics

2023, Volume 10, Issue 1: 90-94. doi: 10.3934/biophy.2023007

Previous Article Next Article

Editorial

2022-end editorial: achievements, thanks, perspectives

Carlo Bianca ^{1
,
,},
Lombardo Domenico ^{2
,
,}

1.
Efrei Research Lab, Paris-Panthéon-Assas University, France
2.
Consiglio Nazionale delle Ricerche, Istituto per i Processi Chimico-Fisici, 98158 Messina, Italy

Received: 24 February 2023 Revised: 27 February 2023 Accepted: 27 February 2023 Published: 28 February 2023

Citation: Carlo Bianca, Lombardo Domenico. 2023: 2022-end editorial: achievements, thanks, perspectives, AIMS Biophysics, 10(1): 90-94. doi: 10.3934/biophy.2023007

Related Papers:

[1]	Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087
[2]	Liqin Liu, Chunrui Zhang . Dynamic properties of VDP-CPG model in rhythmic movement with delay. Mathematical Biosciences and Engineering, 2020, 17(4): 3190-3202. doi: 10.3934/mbe.2020181
[3]	Jiashuai Li, Xiuyan Peng, Bing Li, Victor Sreeram, Jiawei Wu, Ziang Chen, Mingze Li . Model predictive control for constrained robot manipulator visual servoing tuned by reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(6): 10495-10513. doi: 10.3934/mbe.2023463
[4]	Zia Ud Din, Amir Ali, Zareen A. Khan, Gul Zaman . Heat transfer analysis: convective-radiative moving exponential porous fins with internal heat generation. Mathematical Biosciences and Engineering, 2022, 19(11): 11491-11511. doi: 10.3934/mbe.2022535
[5]	Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430
[6]	Zichen Wang, Xin Wang . Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach. Mathematical Biosciences and Engineering, 2023, 20(4): 6334-6357. doi: 10.3934/mbe.2023274
[7]	Yi Hu, Petia M. Vlahovska, Michael J. Miksis . Electrohydrodynamic assembly of colloidal particles on a drop interface. Mathematical Biosciences and Engineering, 2021, 18(3): 2357-2371. doi: 10.3934/mbe.2021119
[8]	Feng Guo, Haiyu Xu, Peng Xu, Zhiwei Guo . Design of a reinforcement learning-based intelligent car transfer planning system for parking lots. Mathematical Biosciences and Engineering, 2024, 21(1): 1058-1081. doi: 10.3934/mbe.2024044
[9]	Koji Oshima, Daisuke Yamamoto, Atsuhiro Yumoto, Song-Ju Kim, Yusuke Ito, Mikio Hasegawa . Online machine learning algorithms to optimize performances of complex wireless communication systems. Mathematical Biosciences and Engineering, 2022, 19(2): 2056-2094. doi: 10.3934/mbe.2022097
[10]	Qiong Wu, Zhimin Yao, Zhouping Yin, Hai Zhang . Fin-TS and Fix-TS on fractional quaternion delayed neural networks with uncertainty via establishing a new Caputo derivative inequality approach. Mathematical Biosciences and Engineering, 2022, 19(9): 9220-9243. doi: 10.3934/mbe.2022428

1. Introduction

The oceans account for more than three-quarters of the earth, and the ocean seafloor has the considerable potential to recover the great benefit that may benefit humanity. Therefore, ocean exploration is recognized as an essential field in ocean science ^[1]. Ocean exploration identifies two primary devices called remotely operated underwater vehicles (ROV), an autonomous underwater vehicle (AUV). Almost all conventional AUVs adopt water pumps, air-jet engines, or single propellers as the propulsion system ^[2] that cause a loud noise affecting the organism's life on the seabed. In addition, the topological structure of conventional AUVs has been recognized that are not able to perform maneuverability and stability ^[3]. The propeller can also be stuck by sediment and seaweed in the operation of AUVs on the seafloor ^[4,5,6]. A bionic underwater robot equipped with a biomimetic fin mechanism is well-suited for ocean exploration ^[7] to overcome the drawbacks mentioned above. Many approaches studied about bio-fish robots concerned the diversity of fish species ^{[6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]}. These studies pointed out that many significant factors affect the hydrodynamic of bio-fish robots. One such factor is the swimming pattern that enables the bio-fish robots to perform complex operations such as turning, swaying, twisting, and curving. Several studies utilized a sinusoidal-based kinematic equation to generate the undulating oscillatory motion for the bio-fish robots ^{[31,32,33,34,35,36]} to address this research field. This locomotion control strategy can provide various swimming patterns by predefining the amplitude envelope, oscillatory frequency, and phase lag regarded as the kinematic parameters of the sinusoidal generator. However, this does not feature a flexible transition swimming pattern, as well as it does not enable tuning online kinematic parameters to adapt to the environmental changes ^[8,31].

To achieve efficient locomotion, earlier studies have been proposed a central pattern generator (CPG) based locomotion controllers for widely application fields ^{[11,27,39,40,41,42,43,44,45]}. In terms of governing the locomotion of bio-fish robots, the authors early synthesized a locomotion controller using a Proportional-Integral-Derivative (PID) controller integrated with CPG for a prototype of the fish robot in 3D ^[24]. In 2008, Wang et al. ^[19] employed a modified Matsuoka oscillator to build a CPG-based locomotion controller for a prototype of an undulating fins propulsion system with ten fin-rays. Simulation and experimental results showed that the variable model of the weight matrix is consistent with the thrust propulsion generated by the prototype of the propulsion system. In 2011, a CPG-based controller of the proposed propulsion system was integrated with the rotary position sensors to improve the locomotion of undulating fin more flexibly ^[28]. In addition, this study also introduced two control levels with a high-level controller for commanding operation and a low-level controller for driving actuators. In 2012, Zhou et al. ^[39] developed a manta ray robot with two wide flexible pectoral fins. This robot used a CPG model to achieve rhythmic biomimetic movement. Simulation and experimental results showed that the yaw angle is stabilized, but the response time is slow. In 2014, Chunlin Zhou et al. ^[29] adopted a genetic algorithm to achieve a better conversion efficiency to optimize the CPG-based controller for the fish robot according to the thrust generation. To validate the CPG-based control approach for undulating fins propulsion, in 2015, Michael Sfakiotakis et al. ^[32] performed the CPG denominations using the conversion of single amplitude parameters and simultaneous transformation. The authors adopted a CPG model to achieve the undulating motion pattern for finding the critical factor which affects the propulsion. A fish robot prototype using the CPG model for swimming motion was inspired by cuttlefish ^[22]. This study presented the effect of the various kinematic parameters of the undulating fin and the validity of a fluid drag model used to estimate the generated thrust. Another study ^[8] dealt with the utilization of CPG for undulating biological fins with six degrees of freedom to perform the replicated fish-like swimming robot by changing the parameters of the CPG model. Various parameters of the CPG model can be adjustable to generate undulating motion to produce the propulsion force, such as amplitude envelope, oscillatory frequency, and swimming patterns. Thus, Yong Cao et al. ^[46] predefined the undulation frequency and the undulation amplitude as constant parameters while governing CPG neuron output's phase angle to achieve various swimming patterns. It can be concluded that these earlier studies related to CPG have been successfully applied for locomotion control of biomimetic robots. However, most of these researches rely on trial-and-error data fitting to adjust a control parameter of the CPG model called convergence rate. Increasing the convergence rate can reduce the processing time for achieving the limit cycle; however, this can raise an oscillatory error defined as the difference between the intrinsic amplitude of CPG and the maximum amplitude envelope of the CPG's output. This issue is still a challenge for researchers with the lack of optimization for the convergence rate of CPG.

In terms of parameter optimization, several studies used particle swarm optimization (PSO) algorithm to seek the CPG parameters in order to minimize the difference between the desired oscillatory waveform and the generated output of the CPG ^[47], to reduce the control parameters ^[48] and to refine the feature parameters of the CPG ^[49]. In comparison to a genetic algorithm (GA), PSO is similar to GA as so to search for optimal solutions through iterations of a population, but PSO proved to be faster computed and easier implemented than GA ^[50]. However, PSO exhibits that it is susceptible to trap in local minima ^[51]. Reinforcement learning (RL) is known as an alternative strategy for optimization that has been applied recently in various applications such as robotic control, transportation, and energy supervision ^{[52,53,54,55,56,57,58]}. RL generates a series of sequence actions to obtain the maximum numerical rewards in the interaction with environments. RL can be categorized as model-based RL method, which attempts to model the environment known as Markov Decision Process (MDP) ^[59], and model-free RL method, which does not require the explicit of the environment. One such model-free RL method is Q-Learning which is recognized as a well-suited method for optimization to trade-off the performance time and the effectiveness ^{[55,60,61,62]}. According to these above studies, Q-learning can be feasible to implement in real-time on programmable devices. For the application of biomimetic robots, Y. Nakamura et al. ^[63,64] utilized a reinforcement learning model for the CPG-based motion controller, namely CPG-actor-critic, to learn the selection of motion patterns for biped robots. An actor observes the state of the biped robot and outputs a parameter of the motion controller. Then the motion controller with the selected parameter produces the control signal.

The above-aforementioned studies regarding CPG-based bio-fish robots have not conducted optimization for the convergence rate. Inspired from the studies concerned with applying RL for CPG, this paper proposes a reinforcement learning-based optimization of locomotion controller using CPG network for an elongated undulating fin. The elongated undulating fin comprises sixteen oblique fin-rays interconnected with a membrane known as a flexible surface that is controlled by the proposed CPG-based locomotion controller coupled with sixteen neural oscillators to generate the locomotor corresponding to sixteen fin-rays of the elongated undulating fin. The advantages of this control method in comparison to the sinusoidal kinematic equation are discussed. This paper, differentiating from the previous studies, utilizes a Q-learning with discrete state/action to optimize the convergence rate of the CPG controller. The actor observes the undulating signal of the CPG-based locomotion controller and outputs a value of the convergence rate. The locomotion controller with the chosen convergence rate produces the control signal. The proposed controller is promised that it can be implemented on a microcontroller due to its simplicity. The simulation and experimental results are carried out to evaluate the performance and effectiveness of the proposed control method.

2. Elongated undulating fin

The elongated undulating fin comprises sixteen oblique adjacent fin-rays interconnected with a flexible membrane. Each fin-ray is driven by an RC servo motor that enables the fin-ray to sway around a rotary joint fixed to a supporting frame illustrated in Figure 1. The elongated undulating fin is built with a length of 775 mm, a width of 90 mm, and a height of 290 mm.

Figure 1. Overview of elongated undulating fin.

1.	Ru Tong, Yukai Feng, Jian Wang, Zhengxing Wu, Min Tan, Junzhi Yu, A Survey on Reinforcement Learning Methods in Bionic Underwater Robots, 2023, 8, 2313-7673, 168, 10.3390/biomimetics8020168
2.	Pablo López-Osorio, Fernando Pérez-Peña, Juan P. Dominguez-Morales, 2024, Chapter 61, 978-3-031-64105-3, 570, 10.1007/978-3-031-64106-0_61
3.	Quoc Tuan Vu, Van Tu Duong, Huy Hung Nguyen, Tan Tien Nguyen, Optimization of swimming mode for elongated undulating fin using multi-agent deep deterministic policy gradient, 2024, 56, 22150986, 101783, 10.1016/j.jestch.2024.101783
4.	Penghang Shuai, Haipeng Li, Yongkang Luo, Liangwei Deng, 2024, Reinforcement Learning Methods in Robotic Fish: Survey, 978-9-8875-8158-1, 4270, 10.23919/CCC63176.2024.10662409
5.	Yani Zhang, Rongxin Cui, Haoquan Li, Xinxin Guo, CPG-Fuzzy Heading Control for a Hexapod Robot with Arc-Shaped Blade Legs, 2024, 110, 0921-0296, 10.1007/s10846-023-02047-2
6.	Zhiyong Yang, Zhen Fang, Shengze Yang, Yuhong Xiong, Daode Zhang, Research on the Spiral Rolling Gait of High-Voltage Power Line Serpentine Robots Based on Improved Hopf-CPGs Model, 2025, 15, 2076-3417, 1285, 10.3390/app15031285
7.	Guanghao Li, Penglei Ma, Xin Fang, Gongbo Li, Guijie Liu, Xinyu Liu, Hydrodynamic response of separated undulating fins based on numerical simulation and orthogonal experiment: Analysis and optimization of thrust influencing factors, 2025, 328, 00298018, 121094, 10.1016/j.oceaneng.2025.121094

Algorithm: Q-learning based optimization of the convergence rate
1. Initialize $\alpha, \gamma, \varepsilon$
2. Initialize ${Q}_{t-1}\left({s}_{t}, {a}_{t}\right)=\left[0\right], {s}_{t}=rand\left(S\right)$ , and episode $n$
3. Repeat for each step of the episode:
4. Choose ${a}_{t}=\underset{{a}_{t}}{\mathrm{argmax}}Q({s}_{t}, {a}_{t})$ if uniform random number $<1-\varepsilon$
5. Choose ${a}_{t}=\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{d}\left(Q\right({s}_{t}, {a}_{t})$ if otherwise
6. Take the action ${a}_{t}$ (traveling the convergence rate $k$ to the modified CPG network)
7. Observe ${s}_{t}^{'}$ , ${r}_{{s}_{t}\to {s}_{t}^{'}}^{{a}_{t}}$ (perceiving the oscillatory error and the transient-state time, calculating the reward value by Eqs 7, 8.
8. Update Q-value by Eq 9.
9. The next state is assigned as the next state $({s}_{t}\leftarrow {s}_{t}^{'})$
10. Until the current state is the terminal state $({s}_{t}\equiv {s}_{T})$
11. Take the optimal action ${a}_{t}^{*}=\underset{{a}_{t}}{\mathrm{argmax}}Q({s}_{t}, {a}_{t})$

Algorithm: Q-learning based optimization of the convergence rate
1. Initialize $\alpha, \gamma, \varepsilon$
2. Initialize ${Q}_{t-1}\left({s}_{t}, {a}_{t}\right)=\left[0\right], {s}_{t}=rand\left(S\right)$ , and episode $n$
3. Repeat for each step of the episode:
4. Choose ${a}_{t}=\underset{{a}_{t}}{\mathrm{argmax}}Q({s}_{t}, {a}_{t})$ if uniform random number $<1-\varepsilon$
5. Choose ${a}_{t}=\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{d}\left(Q\right({s}_{t}, {a}_{t})$ if otherwise
6. Take the action ${a}_{t}$ (traveling the convergence rate $k$ to the modified CPG network)
7. Observe ${s}_{t}^{'}$ , ${r}_{{s}_{t}\to {s}_{t}^{'}}^{{a}_{t}}$ (perceiving the oscillatory error and the transient-state time, calculating the reward value by Eqs 7, 8.
8. Update Q-value by Eq 9.
9. The next state is assigned as the next state $({s}_{t}\leftarrow {s}_{t}^{'})$
10. Until the current state is the terminal state $({s}_{t}\equiv {s}_{T})$
11. Take the optimal action ${a}_{t}^{*}=\underset{{a}_{t}}{\mathrm{argmax}}Q({s}_{t}, {a}_{t})$

Algorithm: Q-learning based optimization of the convergence rate
1. Initialize $\alpha, \gamma, \varepsilon$
2. Initialize ${Q}_{t-1}\left({s}_{t}, {a}_{t}\right)=\left[0\right], {s}_{t}=rand\left(S\right)$ , and episode $n$
3. Repeat for each step of the episode:
4. Choose ${a}_{t}=\underset{{a}_{t}}{\mathrm{argmax}}Q({s}_{t}, {a}_{t})$ if uniform random number $<1-\varepsilon$
5. Choose ${a}_{t}=\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{d}\left(Q\right({s}_{t}, {a}_{t})$ if otherwise
6. Take the action ${a}_{t}$ (traveling the convergence rate $k$ to the modified CPG network)
7. Observe ${s}_{t}^{'}$ , ${r}_{{s}_{t}\to {s}_{t}^{'}}^{{a}_{t}}$ (perceiving the oscillatory error and the transient-state time, calculating the reward value by Eqs 7, 8.
8. Update Q-value by Eq 9.
9. The next state is assigned as the next state $({s}_{t}\leftarrow {s}_{t}^{'})$
10. Until the current state is the terminal state $({s}_{t}\equiv {s}_{T})$
11. Take the optimal action ${a}_{t}^{*}=\underset{{a}_{t}}{\mathrm{argmax}}Q({s}_{t}, {a}_{t})$

AIMS Biophysics

2022-end editorial: achievements, thanks, perspectives

Related Papers:

1. Introduction

2. Elongated undulating fin

3. Reinforcement learning-based optimization for CPG locomotion controller

3.1. Hopf oscillator

3.2. Modified CPG with multi coupled Hopf oscillators

3.3. Reinforcement learning-based optimization

4. Results and discussion

4.1. Characteristic of convergence rate

4.2. Transition gait

4.3. Experimental study

5. Conclusions

Conflict of interest

Acknowledgments

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

1. Introduction

2. Elongated undulating fin

3. Reinforcement learning-based optimization for CPG locomotion controller

3.1. Hopf oscillator

3.2. Modified CPG with multi coupled Hopf oscillators

3.3. Reinforcement learning-based optimization

4. Results and discussion

4.1. Characteristic of convergence rate

4.2. Transition gait

4.3. Experimental study

5. Conclusions

Conflict of interest

Acknowledgments