Loading [MathJax]/jax/output/SVG/jax.js
Editorial

2022-end editorial: achievements, thanks, perspectives

  • Received: 24 February 2023 Revised: 27 February 2023 Accepted: 27 February 2023 Published: 28 February 2023
  • Citation: Carlo Bianca, Lombardo Domenico. 2023: 2022-end editorial: achievements, thanks, perspectives, AIMS Biophysics, 10(1): 90-94. doi: 10.3934/biophy.2023007

    Related Papers:

    [1] Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087
    [2] Liqin Liu, Chunrui Zhang . Dynamic properties of VDP-CPG model in rhythmic movement with delay. Mathematical Biosciences and Engineering, 2020, 17(4): 3190-3202. doi: 10.3934/mbe.2020181
    [3] Jiashuai Li, Xiuyan Peng, Bing Li, Victor Sreeram, Jiawei Wu, Ziang Chen, Mingze Li . Model predictive control for constrained robot manipulator visual servoing tuned by reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(6): 10495-10513. doi: 10.3934/mbe.2023463
    [4] Zia Ud Din, Amir Ali, Zareen A. Khan, Gul Zaman . Heat transfer analysis: convective-radiative moving exponential porous fins with internal heat generation. Mathematical Biosciences and Engineering, 2022, 19(11): 11491-11511. doi: 10.3934/mbe.2022535
    [5] Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430
    [6] Zichen Wang, Xin Wang . Fault-tolerant control for nonlinear systems with a dead zone: Reinforcement learning approach. Mathematical Biosciences and Engineering, 2023, 20(4): 6334-6357. doi: 10.3934/mbe.2023274
    [7] Yi Hu, Petia M. Vlahovska, Michael J. Miksis . Electrohydrodynamic assembly of colloidal particles on a drop interface. Mathematical Biosciences and Engineering, 2021, 18(3): 2357-2371. doi: 10.3934/mbe.2021119
    [8] Feng Guo, Haiyu Xu, Peng Xu, Zhiwei Guo . Design of a reinforcement learning-based intelligent car transfer planning system for parking lots. Mathematical Biosciences and Engineering, 2024, 21(1): 1058-1081. doi: 10.3934/mbe.2024044
    [9] Koji Oshima, Daisuke Yamamoto, Atsuhiro Yumoto, Song-Ju Kim, Yusuke Ito, Mikio Hasegawa . Online machine learning algorithms to optimize performances of complex wireless communication systems. Mathematical Biosciences and Engineering, 2022, 19(2): 2056-2094. doi: 10.3934/mbe.2022097
    [10] Qiong Wu, Zhimin Yao, Zhouping Yin, Hai Zhang . Fin-TS and Fix-TS on fractional quaternion delayed neural networks with uncertainty via establishing a new Caputo derivative inequality approach. Mathematical Biosciences and Engineering, 2022, 19(9): 9220-9243. doi: 10.3934/mbe.2022428


  • The oceans account for more than three-quarters of the earth, and the ocean seafloor has the considerable potential to recover the great benefit that may benefit humanity. Therefore, ocean exploration is recognized as an essential field in ocean science [1]. Ocean exploration identifies two primary devices called remotely operated underwater vehicles (ROV), an autonomous underwater vehicle (AUV). Almost all conventional AUVs adopt water pumps, air-jet engines, or single propellers as the propulsion system [2] that cause a loud noise affecting the organism's life on the seabed. In addition, the topological structure of conventional AUVs has been recognized that are not able to perform maneuverability and stability [3]. The propeller can also be stuck by sediment and seaweed in the operation of AUVs on the seafloor [4,5,6]. A bionic underwater robot equipped with a biomimetic fin mechanism is well-suited for ocean exploration [7] to overcome the drawbacks mentioned above. Many approaches studied about bio-fish robots concerned the diversity of fish species [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]. These studies pointed out that many significant factors affect the hydrodynamic of bio-fish robots. One such factor is the swimming pattern that enables the bio-fish robots to perform complex operations such as turning, swaying, twisting, and curving. Several studies utilized a sinusoidal-based kinematic equation to generate the undulating oscillatory motion for the bio-fish robots [31,32,33,34,35,36] to address this research field. This locomotion control strategy can provide various swimming patterns by predefining the amplitude envelope, oscillatory frequency, and phase lag regarded as the kinematic parameters of the sinusoidal generator. However, this does not feature a flexible transition swimming pattern, as well as it does not enable tuning online kinematic parameters to adapt to the environmental changes [8,31].

    To achieve efficient locomotion, earlier studies have been proposed a central pattern generator (CPG) based locomotion controllers for widely application fields [11,27,39,40,41,42,43,44,45]. In terms of governing the locomotion of bio-fish robots, the authors early synthesized a locomotion controller using a Proportional-Integral-Derivative (PID) controller integrated with CPG for a prototype of the fish robot in 3D [24]. In 2008, Wang et al. [19] employed a modified Matsuoka oscillator to build a CPG-based locomotion controller for a prototype of an undulating fins propulsion system with ten fin-rays. Simulation and experimental results showed that the variable model of the weight matrix is consistent with the thrust propulsion generated by the prototype of the propulsion system. In 2011, a CPG-based controller of the proposed propulsion system was integrated with the rotary position sensors to improve the locomotion of undulating fin more flexibly [28]. In addition, this study also introduced two control levels with a high-level controller for commanding operation and a low-level controller for driving actuators. In 2012, Zhou et al. [39] developed a manta ray robot with two wide flexible pectoral fins. This robot used a CPG model to achieve rhythmic biomimetic movement. Simulation and experimental results showed that the yaw angle is stabilized, but the response time is slow. In 2014, Chunlin Zhou et al. [29] adopted a genetic algorithm to achieve a better conversion efficiency to optimize the CPG-based controller for the fish robot according to the thrust generation. To validate the CPG-based control approach for undulating fins propulsion, in 2015, Michael Sfakiotakis et al. [32] performed the CPG denominations using the conversion of single amplitude parameters and simultaneous transformation. The authors adopted a CPG model to achieve the undulating motion pattern for finding the critical factor which affects the propulsion. A fish robot prototype using the CPG model for swimming motion was inspired by cuttlefish [22]. This study presented the effect of the various kinematic parameters of the undulating fin and the validity of a fluid drag model used to estimate the generated thrust. Another study [8] dealt with the utilization of CPG for undulating biological fins with six degrees of freedom to perform the replicated fish-like swimming robot by changing the parameters of the CPG model. Various parameters of the CPG model can be adjustable to generate undulating motion to produce the propulsion force, such as amplitude envelope, oscillatory frequency, and swimming patterns. Thus, Yong Cao et al. [46] predefined the undulation frequency and the undulation amplitude as constant parameters while governing CPG neuron output's phase angle to achieve various swimming patterns. It can be concluded that these earlier studies related to CPG have been successfully applied for locomotion control of biomimetic robots. However, most of these researches rely on trial-and-error data fitting to adjust a control parameter of the CPG model called convergence rate. Increasing the convergence rate can reduce the processing time for achieving the limit cycle; however, this can raise an oscillatory error defined as the difference between the intrinsic amplitude of CPG and the maximum amplitude envelope of the CPG's output. This issue is still a challenge for researchers with the lack of optimization for the convergence rate of CPG.

    In terms of parameter optimization, several studies used particle swarm optimization (PSO) algorithm to seek the CPG parameters in order to minimize the difference between the desired oscillatory waveform and the generated output of the CPG [47], to reduce the control parameters [48] and to refine the feature parameters of the CPG [49]. In comparison to a genetic algorithm (GA), PSO is similar to GA as so to search for optimal solutions through iterations of a population, but PSO proved to be faster computed and easier implemented than GA [50]. However, PSO exhibits that it is susceptible to trap in local minima [51]. Reinforcement learning (RL) is known as an alternative strategy for optimization that has been applied recently in various applications such as robotic control, transportation, and energy supervision [52,53,54,55,56,57,58]. RL generates a series of sequence actions to obtain the maximum numerical rewards in the interaction with environments. RL can be categorized as model-based RL method, which attempts to model the environment known as Markov Decision Process (MDP) [59], and model-free RL method, which does not require the explicit of the environment. One such model-free RL method is Q-Learning which is recognized as a well-suited method for optimization to trade-off the performance time and the effectiveness [55,60,61,62]. According to these above studies, Q-learning can be feasible to implement in real-time on programmable devices. For the application of biomimetic robots, Y. Nakamura et al. [63,64] utilized a reinforcement learning model for the CPG-based motion controller, namely CPG-actor-critic, to learn the selection of motion patterns for biped robots. An actor observes the state of the biped robot and outputs a parameter of the motion controller. Then the motion controller with the selected parameter produces the control signal.

    The above-aforementioned studies regarding CPG-based bio-fish robots have not conducted optimization for the convergence rate. Inspired from the studies concerned with applying RL for CPG, this paper proposes a reinforcement learning-based optimization of locomotion controller using CPG network for an elongated undulating fin. The elongated undulating fin comprises sixteen oblique fin-rays interconnected with a membrane known as a flexible surface that is controlled by the proposed CPG-based locomotion controller coupled with sixteen neural oscillators to generate the locomotor corresponding to sixteen fin-rays of the elongated undulating fin. The advantages of this control method in comparison to the sinusoidal kinematic equation are discussed. This paper, differentiating from the previous studies, utilizes a Q-learning with discrete state/action to optimize the convergence rate of the CPG controller. The actor observes the undulating signal of the CPG-based locomotion controller and outputs a value of the convergence rate. The locomotion controller with the chosen convergence rate produces the control signal. The proposed controller is promised that it can be implemented on a microcontroller due to its simplicity. The simulation and experimental results are carried out to evaluate the performance and effectiveness of the proposed control method.

    The elongated undulating fin comprises sixteen oblique adjacent fin-rays interconnected with a flexible membrane. Each fin-ray is driven by an RC servo motor that enables the fin-ray to sway around a rotary joint fixed to a supporting frame illustrated in Figure 1. The elongated undulating fin is built with a length of 775 mm, a width of 90 mm, and a height of 290 mm.

    Figure 1.  Overview of elongated undulating fin.

    Accordingly, each fin-ray reacts as a shaker bar with a limited angle, and the phase difference between two adjacent fin-rays is regarded as a phase lag angle. By changing one of the kinematic parameters such as amplitude envelope, oscillatory frequency, and swimming pattern, the magnitude of the propulsive force can be adjustable. To perform forwarding/reversing motion, the elongated undulating fin might change the sign of the phase lag angle. Additionally, to avoid the counter-torque of the elongated undulating fin, the number of oscillation wavelengths should be an even number. Traditionally, the sinusoidal oscillatory equation employed for generating the undulating motion of bio-fish robots is given by [23]:

    θi(t)=θimaxsin(2πft+ϕi) (1)

    where θi is the sway angle for ith fin-ray; θimax is the maximum sway angle for each fin-ray; f is the oscillatory frequency; ϕi is the phase lag angle of each fin-ray.

    The utilization of the sinusoidal oscillatory equation-based gait control can successfully generate the bio-fish robots' locomotion motion. However, high-performance aquatic locomotion requires swimming adaptability to environments of the bio-fish robots. The sinusoidal oscillatory equation might hardly achieve this feature because the abruptly changing of amplitude envelope, oscillatory frequency, or swimming pattern might cause the discontinuity and instability of the undulating motion. We simulated the sinusoidal swimming locomotion to illustrate this situation in Figure 2.

    Figure 2.  Output of sinusoidal equation in abrupt change of amplitude and frequency.

    We make an abrupt change in the amplitude envelope referring to Figure 2a and the oscillatory frequency referring to Figure 2b at an arbitrary time t. It can be easy to observe that the output of the sinusoidal generator is discontinued at the arbitrary time t.

    The CPG is a circuit network of oscillators that can produce rhythmic patterns for biomimetic robots. Several kinds of oscillators such as Van der Pol, Wilson-Cowan, Kuramoto, Matsuoka, Amplitude-Controlled Phase, Rowat-Selverston, and Hopf have been applied successfully to generate the walking/swimming/flapping gaits of biomimetic robots. However, it seems that the Van der Pol oscillator is better for producing an electrocardiogram signal; most of the above oscillators are well-suited for generating the rhythmic movement of arm/legged robots with two moving phases. Therefore, this research employs a Hopf oscillator, which can realize a nonharmonic sine waveform, to construct a modified CPG for generating the rhythm locomotion of the elongated undulating fin. A typical structure of the Hopf oscillator is shown in Figure 3.

    Figure 3.  Typical structure of Hopf oscillator.

    The dynamic of the Hopf oscillator is expressed by the following differential equation:

    ˙u(t)=k(A2u2(t)v2(t))u(t)2πfv(t)˙v(t)=k(A2u2(t)v2(t))v(t)+2πfu(t) (2)

    where u,v are time-variant state variables of the oscillator; A is the intrinsic amplitude; f is the intrinsic frequency; k is the convergence rate to the limit cycle (k>0).

    For comparison to the traditionally sinusoidal generator, a simulation of a single Hopf oscillator is conducted in the same manner illustrated in Figure 4.

    Figure 4.  Output of Hopf oscillator in abrupt change of intrinsic amplitude and frequency.

    It can be observed from Figure 4 that the oscillatory output generated by the Hopf oscillator can introduce the smooth transition when the abrupt changes of both intrinsic amplitude and oscillatory frequency are conducted at the arbitrary time t. In addition, the Hopf oscillator of Eq 2 also features the quick convergence to the limit cycle. Even though starting from different arbitrary initial points, the output of the Hopf oscillator converges to a stable limit cycle with the intrinsic amplitude A. The convergence rate can be tuned by adjusting k of the Eq 2. The Hopf oscillator output converges to the limit cycle more rapidly with an increasing k, regardless of the abrupt changes of intrinsic amplitude and intrinsic frequency. A simulation result of the Hopf oscillator with eight different initial points for each scenario is illustrated in Figure 5a). It can also be seen from Figure 5b) that the output of the Hopf oscillator can converge to the limit cycle rapidly, approximately 2 seconds.

    Figure 5.  Convergence to limit cycle of Hopf oscillator.

    In both invertebrate and vertebrate organisms, there are several topological couplings between the joints to allow the muscle to work perfectly, which represent the role of stimulus and inhibition. The actual CPGs of animal brains are complicated networks that have abundant neurons. In order to replicate the CPG for controlling biomimetic robots, it is necessary to simplify the coupling connections and categorize them into four main topological structures: chain coupling, radial coupling, ring coupling, and fully connected coupling [31]. Each topological structure of the coupling connection has appropriate property corresponding to the biological characteristic of each species. For instance, the chain coupling is mainly applied to stimulate the locomotion of swimmers, whereas the fully connected coupling is usually applied for rhythm generation of legged robots because all legs must be coupled to perform smooth motion against the environmental change.

    The biological structure of the elongated undulating fin features a series of fin-rays. The abnormal movement of each arbitrary fin-ray due to environmental influences affects only its adjacent fin-ray. To generate the undulate motion for the elongated undulating fin, this research constructs the chain coupling of sixteen oscillators with bi-directional perturbation depicted in Figure 6. Each oscillator is employed to stimulate each fin-ray. The reflection of each fin-ray to its adjacent fin-ray is performed through the bi-directional perturbation. The pair of intrinsic amplitude and intrinsic frequency is an independent entity for each oscillator of the modified CPG network.

    Figure 6.  Structure of modified CPG network with chain coupling of sixteen oscillators in bi-directional perturbation.

    In the modified CPG network, there are two terminal oscillators that are not affected by the adjacent oscillators. However, without loss of generality, the nonlinear function illustrating the modified CPG network shown in Figure 6 is given as follow:

    i=F(Xi)+Pi=[k(A2iu2iv2i)ui2πfvik(A2iu2iv2i)vi+2πfui]+[pu,ipv,i] (3)

    where Xi[uivi]T is the state vector of the i-th oscillator; F(Xi) represents a nonlinear function; Pi[pu,ipv,i]Tis a perturbation vector.

    To clarify Eq 3 for the terminal oscillators, it is necessary to consider the coupling connection of three adjacent oscillators as shown in Figure 7:

    Figure 7.  Coupling connection of three adjacent oscillator.

    For the first oscillator (i=1), there is only perturbation from the second oscillator (i+1); thus, the perturbation of the first oscillator is given by:

    P1=[0β(v2cosφdu2sinφd)] (4)

    where β is the coupling strength; φd is the phase lag angle of two adjacent oscillators.

    In the same manner, the sixteenth oscillator is only affected by the perturbation from the fifteenth oscillators:

    P16=[0β(u15sinφd+v15cosφd)] (5)

    For i-th oscillators, the perturbation vector is given by the following:

    Pi=[0β(ui1sinφd+vi1cosφdui+1sinφd+vi+1cosφd)] (6)

    Corresponding to various intrinsic amplitudes Ai, the modified CPG network can provide different swimming patterns for the elongated undulating fin, it thus can produce different propulsive forces.

    It should be noted from Eq 3 that the convergence rate k is chosen by a trial-and-error method to obtain the limit cycle as quickly as possible. A large value of k can reduce the transient-state time, which is defined as a period from the beginning to the moment that the output of 16th CPG starts the first cycle; meanwhile, it might cause the oscillatory error of the modified CPG network output. Thus, it is necessary to optimal this significant parameter. On the other hand, Q-learning is a part of reinforcement learning that is the value-based learning algorithm to obtain a higher reward for each episode. This paper employs a Q-learning with discrete action because it costs a duration for the CPG to generate the oscillatory output corresponding to each chosen action before taking the following step. Furthermore, this algorithm does not require high computational time, enabling the onboard implementation. Accordingly, the state variables stS (with S is the state variable compact set) are the oscillatory error s1t and the transient-state time s2twiths1tS1,s2tS2,andS1,S2S. The shifting of the convergence rate is chosen as the action variable atA. The interaction of the agent and the environment of RL is shown in Figure 8.

    Figure 8.  Interaction of agent and environment.

    The reward function is proposed to trade-off between the transient-state time and the oscillatory error that the mathematical proposed reward function is given the following:

    ratstst=Lur1(s1t)+Llr2(s2t) (7)

    In Eq 7, st is the next state variable, and Lu,Ll are reward constants set arbitrarily such that the condition holds LuLl to emphasis that the minimization of the oscillatory error is more significant than that of the transient-state time. Thus, Lu,Ll are respectively set to 100 and 10 in this case. The reward subfunctions ri(sit) with i=1,2 are given by the following:

    ri(sit)={Rmax|sit|<min(Si)Rmin|sit|=min(Si)0|sit|>min(Si) (8)

    where Rmax,Rmin are the maximum reward and the minimum reward set to 1 and 0.1, respectively.

    As well, the terminal state sT known as the condition for complete an episode holds the constraint sT:={stS|δ(Lu|s1t|+Lls2t)minδ(Δe)} with Δe is the compact set of δ of each episode.

    The Q-value (action-value) function is updated by the simple Temporal Difference (TD) method:

    Qt(st,at)=Qt1(st,at)+α(ratstst+γmaxatQt1(st,at)Qt1(st,at)) (9)

    where α is the learning rate (0α<1); γ is the discount factor (0γ<1); at is the next action variable; Qt1(■) denotes the current Q-value; Qt(■) denotes the new Q-value;

    The next policy π(at,st) is implemented by ε-Greedy strategy which is given by:

    π(st,at)={argmaxatQt1(st,at)q<1εrand(Qt1(st,at))otherwise (10)

    where q is the uniform random number.

    The optimal convergence rate can be determined by the optimal action-value:

    at=argmaxatQ(st,at) (11)

    The pseudo-code of the Q-learning optimization for the convergence rate is illustrated in Table 1. The impact of the transient-state time and the oscillatory error on the convergence rate is depicted in Figure 9a. As well, the distribution of the Q-value on the state variable and the action variable is illustrated in Figure 9b).

    Table 1.  Pseudo-code of the Q-learning optimization.
    Algorithm: Q-learning based optimization of the convergence rate
    1. Initialize α,γ,ε
    2. Initialize Qt1(st,at)=[0],st=rand(S), and episode n
    3. Repeat for each step of the episode:
      4. Choose at=argmaxatQ(st,at) if uniform random number <1ε
      5. Choose at=rand(Q(st,at) if otherwise
      6. Take the action at(traveling the convergence rate k to the modified CPG network)
      7. Observe st, ratstst(perceiving the oscillatory error and the transient-state time, calculating the reward value by Eqs 7, 8.
      8. Update Q-value by Eq 9.
      9. The next state is assigned as the next state (stst)
      10. Until the current state is the terminal state (stsT)
    11. Take the optimal action at=argmaxatQ(st,at)

     | Show Table
    DownLoad: CSV
    Figure 9.  a) Impact of transient-state time and oscillatory error on the convergence rate. b) Distribution of Q-value on state variable and action variable.

    According to the implementation of the Q-learning based optimization for the convergence rate with the discount factor γ= 0.75, the learning rate α=0.95, the ε – greedy of 0.7, and the episode number n=2000, the optimal Q-value achieved the approximate value Q(st,at)= 658279 with respect to the optimal action of at=96, which is used for simulation/experimental studies in the next section.

    In this research, the simulation study of the modified CPG network is conducted through MATLAB with the aim that is to evaluate the flexible transition gait of the elongated undulating fin relevant to the swimming pattern, intrinsic amplitude, oscillatory frequency, and the number of waveforms. The swimming patterns utilized in this research are illustrated in Figure 10. The simulation results also demonstrate the affection of the convergence rate on the transient-state time and the oscillator error of the modified CPG network.

    Figure 10.  Swimming patterns of elongated undulating fin propulsion.

    The modified CPG parameters are given for this study as Ai=1(withi=1÷16),f=1,φd=π/3,β=0.8 to allow the fin-rays to perform the cuttlefish-like swimming pattern. Figure 12 depicts the output of a single oscillator with k chosen arbitrarily around the optimal value of 96 for comparison. As can be seen, with k=86, the transient-state time is nearly obtained as 1.45 seconds, whereas that of the case k=96 is approximately value of 1.41 seconds compared to the case of k=106 as 1.36 seconds. It is easy to note that the larger amount of k will result in the reducing of the transient-state time due to the modified CPG output converged to the limit cycle. Nevertheless, increasing the convergence rate k will cause the larger oscillatory error of the modified CPG output illustrated in Figure 11, which might affect the performance of the actuators powered for fin-rays. Therefore, the oscillator error is recognized as the more significant factor than the transient-state time.

    Figure 11.  The relative convergence rate concerning transient-state time and oscillatory error.
    Figure 12.  The output of a single oscillator with k=86,k=96,k=106.

    This simulation study aims to clarify several aspects as smooth accelerating/decelerating with no jerk by changing the oscillatory frequency f, flexible transition swimming pattern by changing the intrinsic amplitude Ai, the transition between forwarding and backward swimming by changing the phase lag angle φd, and transition of waveform number. It can be seen from Figure 13a, the modified CPG network initially generates a nonharmonic swimming pattern with the linear waveform to mimic the cuttlefish-like gait for 2.5 seconds. Afterward, the oscillatory frequency gradually increased from 1 Hz to 2 Hz, and the oscillatory output became faster to enable the elongated undulating fin to accelerate. During the time 5–7.5 seconds, the elongated undulating fin performs the quadratic swimming pattern. After 7.5 seconds, the swimming pattern is forced to change into the ecliptic waveform. In Figure 13b, the elongated undulating fin performs the waveform with the elliptical waveform to mimic the stingray-like swimming pattern for the first 5 seconds with the phase lag angle of φd=π/3 for each fin-ray. At the time of 5 seconds, the swimming pattern abruptly change the phase lag angle into φd=π/3 to enable the elongated undulating fin to perform backward swimming. It can be seen that the modified CPG network can perform better smooth transition gait than the kinematic sinusoidal generator. During the time 5–20 seconds, the elongated undulating fin performs the backward swimming. Afterward, the phase lag angle is again changed into φd=π/3 to force the elongated undulating fin to perform the forward swimming. This study scenario also reveals that a lower convergence rate endows the shorter transient-state time when the phase lag angle is changed to switch the swimming direction (see Figure 14).

    Figure 13.  (a) Output of sixteen oscillators with changes of swimming pattern, oscillatory frequency, and waveform number – (b) Output of sixteen oscillators with changes of phase lag angle enabling for reverse swimming direction.
    Figure 14.  Relation of transient-state time with respect to convergence rate.

    Figure 14 shows the CPG's outputs in the cases with the convergence rate of k=96 and k=10. For the sake of distinguishing, we take the undulating signals of the first and third CPGs. During the time 0–5 seconds, the CPGs perform the undulating waveform with the phase lag angle of φd=π/3. It can be recognized by the fact that the output phase of 1st CPG leads that of 3rd CPG. At the time of 5 seconds, the CPGs are commanded to change into the phase lag angle of φd=π/3. It can be seen from the lower side of Figure 14 that the CPG's outputs take 4 seconds to change the swimming direction in the case with the convergence rate of 10. The reverse swimming direction can be recognized by the fact that the output phase of 1st CPG lags that of 3rd CPG. However, the CPG's outputs take 8 seconds to change the swimming direction in the case with the convergence rate of 96, as shown in the upper side of Figure 14. This implies that the convergence rate should be switched into a smaller sufficient value before the CPGs are commanded to the swimming direction.

    A configuration of the experimental setup depicted in Figure 15 is employed to validate the applicability of the modified CPG network. A customized STM32F103RET6 microcontroller-based board is utilized to implement the modified CPG network to drive sixteen fin-rays through the 50Hz PWM signals. In order to perceive the fin-rays angle, all RC servos are modified to sense their rotary angle for the perturbation of the modified CPG network. A computer is utilized to operate the swimming parameter as well as to compute the Q-learning algorithm. The elongated undulating fin is validated in a pre-test stage without water immersion. The kinematic parameters of the modified CPG network are given as f=1 Hz, k=10, φd=π/3,β=0.8, and the sampling time of 0.01 seconds. To match the required amplitude envelope, the output of oscillators is calculated by the following:

    θi=Giui (12)
    Figure 15.  Experimental configuration.

    where ui is the output of each oscillator neural; Gi is the maximum sway angle of each fin-ray which is determined by Gi=arcsin(Yi)/L with Yi defined as the amplitude envelope of each fin-ray along to laterally, and L is the length of fin-ray, for this caseL=150 mm.

    The elongated undulating fin performs the elliptic waveform depicted in Figure 16a with the amplitude envelope Yi for each fin-ray as {0, 5.7, 11.43, 17.14, 22.85, 28.57, 34.28, 40, 40, 34.28, 28.57, 22.85, 17.14, 11.43, 5.7, 0} mm. Figure 16b shows the quadratic waveform of the elongated undulating fin with the amplitude envelope Yichosen as {0, 2.57, 5.33, 8, 10.67, 13.33, 16, 18.67, 21.33, 24, 26.67, 29.33, 32, 34.57, 37.33, 40} mm. The linear waveform with the constant amplitude envelope of 40 mm is illustrated in Figure 16c. The experimental data is denoted in a dashed-dot line, whereas the simulation result is denoted in a solid-dot line. As can be seen from Figure 16, the sway angles of sixteen fin-rays are gradually formed during the period 0.5–3 seconds. Throughout the formation stages of all fin-rays' oscillation, the amplitude envelope of the elongated undulating fin in the case of the experiment is smaller than that of the case of the simulation. This might be because of the limitation of the actuators' response.

    Figure 16.  Experimental results of various swimming patterns.

    This paper has presented the modified CPG network for generating the rhythm for the elongated undulating fin with sixteen fin-rays to mimic the fish's swimming patterns. Accordingly, the modified CPG network is composed by chain coupling sixteen oscillators with bidirectional perturbation because each fin-ray is only affected by its two adjacent oscillators. Both simulation and experimental results show that the modified CPG network seems to be very promising to perform the rhythm for a fish robot. It allows changing the kinematic parameters abruptly with no jerk of oscillation. Additionally, this paper has also investigated the intrinsic parameter of the CPG known as the convergence rate, which has not been considered before, usually using the trial-and-error method for this issue. The simulation results have revealed that the large convergence rate can reduce the transient-state time; however, it might cause the oscillator error worse. Therefore, the tunning of the convergence rate is to trade-off between the transient-state time and the oscillatory error. To deal with this issue, the Q-learning algorithm is appropriate to find the optimal convergence rate. To obtain smooth oscillation avoiding damage to the RC servo motor, the reward function of the Q-learning is defined with more significant oscillatory error than the transient-state time. The optimal convergence rate found by the Q-learning can provide the short transient-state time and the appropriate oscillatory error in the simulation/experimental results with the abrupt change of kinematic parameters such as amplitude envelope, oscillatory frequency, and waveform number. Especially, we have found that the transient-state time is longer in the case of using the large convergence rate when the phase lag angle is changed into the opposite value for reverse swimming. However, a change of the convergence rate while the limit cycle of the CPG is obtained does not affect the CPG output. Thus, this might raise a piece-wise switching function to change the convergence rate according to the swimming operation. Consequently, the convergence rate should be changed from the optimal value into a smaller appropriate value before the phase lag angle is changed to switch forward swimming into backward swimming and vice versa. Afterward, the convergence rate is again changed into the optimal value to obtain the short transient-state time.

    From the perspective of science, this paper has only provided the experimental results in the pre-test stage with no water immersion. This is due to the impact of the COVID-19 epidemic, which terminated all of our laboratory activities at the research facility. The widespread impact and severity of the pandemic show no sign of ending. Therefore, this paper has admitted to lack series of experimental results with the elongated undulating fin submerged into a water tank. For further potential research direction, the kinematic parameters are required to trade-off for optimization of the energy consumption and the generated thrust force. A model-based reinforcement learning which tries to model the operation environment of the fish robot, is also interest to conduct in the future.

    The authors declare there is no conflict of interests.

    This research is supported by DCSELAB and funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number TX2021-20b-01. We acknowledge the support of time and facilities from Ho Chi Minh City University of Technology (HCMUT), VNU-HCM for this study.



  • This article has been cited by:

    1. Ru Tong, Yukai Feng, Jian Wang, Zhengxing Wu, Min Tan, Junzhi Yu, A Survey on Reinforcement Learning Methods in Bionic Underwater Robots, 2023, 8, 2313-7673, 168, 10.3390/biomimetics8020168
    2. Pablo López-Osorio, Fernando Pérez-Peña, Juan P. Dominguez-Morales, 2024, Chapter 61, 978-3-031-64105-3, 570, 10.1007/978-3-031-64106-0_61
    3. Quoc Tuan Vu, Van Tu Duong, Huy Hung Nguyen, Tan Tien Nguyen, Optimization of swimming mode for elongated undulating fin using multi-agent deep deterministic policy gradient, 2024, 56, 22150986, 101783, 10.1016/j.jestch.2024.101783
    4. Penghang Shuai, Haipeng Li, Yongkang Luo, Liangwei Deng, 2024, Reinforcement Learning Methods in Robotic Fish: Survey, 978-9-8875-8158-1, 4270, 10.23919/CCC63176.2024.10662409
    5. Yani Zhang, Rongxin Cui, Haoquan Li, Xinxin Guo, CPG-Fuzzy Heading Control for a Hexapod Robot with Arc-Shaped Blade Legs, 2024, 110, 0921-0296, 10.1007/s10846-023-02047-2
    6. Zhiyong Yang, Zhen Fang, Shengze Yang, Yuhong Xiong, Daode Zhang, Research on the Spiral Rolling Gait of High-Voltage Power Line Serpentine Robots Based on Improved Hopf-CPGs Model, 2025, 15, 2076-3417, 1285, 10.3390/app15031285
    7. Guanghao Li, Penglei Ma, Xin Fang, Gongbo Li, Guijie Liu, Xinyu Liu, Hydrodynamic response of separated undulating fins based on numerical simulation and orthogonal experiment: Analysis and optimization of thrust influencing factors, 2025, 328, 00298018, 121094, 10.1016/j.oceaneng.2025.121094
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1364) PDF downloads(76) Cited by(0)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog