Stability of stochastic dynamic systems of a random structure with Markov switching in the presence of concentration points

Taras Lukashiv; Igor V. Malyk; Maryna Chepeleva; Petr V. Nazarov; Taras Lukashiv; Igor V. Malyk; Maryna Chepeleva; Petr V. Nazarov

doi:10.3934/math.20231245

AIMS Mathematics

2023, Volume 8, Issue 10: 24418-24433. doi: 10.3934/math.20231245

Previous Article Next Article

Research article

Stability of stochastic dynamic systems of a random structure with Markov switching in the presence of concentration points

1.
Multiomics Data Science Research Group, Department of Cancer Research, Luxembourg Institute of Health, 1AB rue Thomas Edison, Strassen, L-1445, Luxembourg
2.
NORLUX Neuro-Oncology Laboratory, Department of Cancer Research, Luxembourg Institute of Health, 6A rue Nicolas-Ernest Barblé, Luxembourg, L-1210, Luxembourg
3.
Department of Mathematical Problems of Control and Cybernetics, Yuriy Fedkovych Chernivtsi National University, 2 Kotsyubynsky str., Chernivtsi, 58000, Ukraine
4.
Faculty of Science, Technology and Medicine, University of Luxembourg, 2 av. de l'Université, Esch-sur-Alzette, L-4365, Luxembourg

Received: 31 May 2023 Revised: 06 July 2023 Accepted: 19 July 2023 Published: 16 August 2023
MSC : 03C45, 60J25, 93D05, 93E15

This article aims to investigate sufficient conditions for the stability of the trivial solution of stochastic differential equations with a random structure, particularly in contexts involving the presence of concentration points. The proof of asymptotic stability leverages the use of Lyapunov functions, supplemented by additional constraints on the magnitudes of jumps and jump times, as well as the Markov property of the system solutions. The findings are elucidated with an example, demonstrating both stable and unstable conditions of the system. The novelty of this work is in the consideration of jump concentration points, which are not considered in classical works. The assumption of the existence of concentration points leads to additional constraints on jumps, jump times and relations between them.

Keywords:

Citation: Taras Lukashiv, Igor V. Malyk, Maryna Chepeleva, Petr V. Nazarov. Stability of stochastic dynamic systems of a random structure with Markov switching in the presence of concentration points[J]. AIMS Mathematics, 2023, 8(10): 24418-24433. doi: 10.3934/math.20231245

Related Papers:

[1]	T. Gayathri Devi, A. Srinivasan, S. Sudha, D. Narasimhan . Web enabled paddy disease detection using Compressed Sensing. Mathematical Biosciences and Engineering, 2019, 16(6): 7719-7733. doi: 10.3934/mbe.2019387
[2]	Yunling Liu, Yaxiong Liu, Jingsong Li, Yaoxing Chen, Fengjuan Xu, Yifa Xu, Jing Cao, Yuntao Ma . ECA-TFUnet: A U-shaped CNN-Transformer network with efficient channel attention for organ segmentation in anatomical sectional images of canines. Mathematical Biosciences and Engineering, 2023, 20(10): 18650-18669. doi: 10.3934/mbe.2023827
[3]	Yu Li, Meilong Zhu, Guangmin Sun, Jiayang Chen, Xiaorong Zhu, Jinkui Yang . Weakly supervised training for eye fundus lesion segmentation in patients with diabetic retinopathy. Mathematical Biosciences and Engineering, 2022, 19(5): 5293-5311. doi: 10.3934/mbe.2022248
[4]	Zhilan Feng, Wenzhang Huang, Donald L. DeAngelis . Spatially heterogeneous invasion of toxic plant mediated by herbivory. Mathematical Biosciences and Engineering, 2013, 10(5&6): 1519-1538. doi: 10.3934/mbe.2013.10.1519
[5]	Yantao Song, Wenjie Zhang, Yue Zhang . A novel lightweight deep learning approach for simultaneous optic cup and optic disc segmentation in glaucoma detection. Mathematical Biosciences and Engineering, 2024, 21(4): 5092-5117. doi: 10.3934/mbe.2024225
[6]	Ran Zhou, Yanghan Ou, Xiaoyue Fang, M. Reza Azarpazhooh, Haitao Gan, Zhiwei Ye, J. David Spence, Xiangyang Xu, Aaron Fenster . Ultrasound carotid plaque segmentation via image reconstruction-based self-supervised learning with limited training labels. Mathematical Biosciences and Engineering, 2023, 20(2): 1617-1636. doi: 10.3934/mbe.2023074
[7]	Yue Li, Hongmei Jin, Zhanli Li . A weakly supervised learning-based segmentation network for dental diseases. Mathematical Biosciences and Engineering, 2023, 20(2): 2039-2060. doi: 10.3934/mbe.2023094
[8]	Guangming Qiu, Zhizhong Yang, Bo Deng . Backward bifurcation of a plant virus dynamics model with nonlinear continuous and impulsive control. Mathematical Biosciences and Engineering, 2024, 21(3): 4056-4084. doi: 10.3934/mbe.2024179
[9]	Bruno Buonomo , Francesco Giannino , Stéphanie Saussure , Ezio Venturino . Effects of limited volatiles release by plants in tritrophic interactions. Mathematical Biosciences and Engineering, 2019, 16(5): 3331-3344. doi: 10.3934/mbe.2019166
[10]	Duolin Sun, Jianqing Wang, Zhaoyu Zuo, Yixiong Jia, Yimou Wang . STS-TransUNet: Semi-supervised Tooth Segmentation Transformer U-Net for dental panoramic image. Mathematical Biosciences and Engineering, 2024, 21(2): 2366-2384. doi: 10.3934/mbe.2024104

Abstract

1. Introduction

Plant phenotypes refer to the morphological characteristics controlled by genes in individual or populations of plants under specific environments, which is a comprehensive reflection of plant traits ^[1]. The major components of plant phenotype include organs, such as roots, leaves, stems, and fruits, with leaves being the most common phenotypic trait. Traditional measurement methods rely on manual observation and are subject to subjective errors. This makes it difficult to meet the demand for rapid phenotypic analysis of large-scale plants ^[2]. In the past decade, automated phenotype research using computer vision and machine learning techniques has received widespread attention ^[3]. The core issue lies in how to effectively and accurately segment plant organs.

In recent years, advanced deep learning methods based on 2D images have been widely used for plant part segmentation ^[4], such as the semantic segmentation residual U-Net model for plant images ^[5], and methods for leaf segmentation used in disease identification ^[6,7,8,9]. However, the specific structure of plants and the influence of factors like complex environmental conditions during plant growth limit the accuracy of obtaining precise plant phenotypic parameters through 2D images. 2D images lack depth information, leading to issues such as occlusion, data loss ^[10], and variability caused by lighting conditions ^[11].

3D point cloud models can provide accurate spatial information of objects ^[12] including their position, shape, size and orientation. In contrast, 2D images can only provide the projected information of objects on the plane and cannot accurately capture their three-dimensional features. Machine vision-based depth cameras and binocular vision-based 3D imaging techniques have been used for 3D reconstruction of small plants, such as soybeans ^[13] and corn ^[14,15]. Compared to machine vision-based 3D imaging techniques, LiDAR is more accurate and is commonly used for large-scale scenes such as forests ^[16] and farmlands ^[17,18]. However, LiDAR data processing requires significant workload, time, and cost. Compared to the two methods mentioned above for obtaining 3D point cloud models of plants, synthetic 3D virtual plant models have the advantage of low cost and simple acquisition methods. Synthetic plant models have long been used in plant research to simulate the interaction between plants and the environment ^[19]. Lindenmayer systems (L-systems) ^[20] is a type of string rewriting system that can be used to generate fractals and natural patterns. Platforms for constructing plant models are typically developed based on L-systems, such as the L + C modeling language combined with C language ^[21] and the L-Py framework combined with Python ^[22]. Some studies utilize synthetic plants as training data ^[23]. Reference ^[24] establishes models capable of automatically annotating maize and canola, saving the time required for manual labeling. Similarly, in ^[25], L-Systems are employed to model artichoke seedlings and automate annotation, thereby reducing cost and time. The study ^[26] combines real and synthetic images of Arabidopsis for network training. Reference ^[27] uses 3D plant models generated by Blender and scenes with random plant parameters to create a synthetic dataset. Reference ^[28] shows that using high-quality 3D synthetic plants to augment the dataset improves performance in leaf counting tasks. All of the above involve using synthetic 3D plant models to generate 2D images rather than directly incorporating synthetic 3D models into training. In contrast, references ^[29,30] use synthetic roses and Arabidopsis, respectively, as substitutes for real plants directly involved in deep learning training.

Point clouds are composed of a large number of discrete points, exhibiting disorder and irregularity, making segmentation with deep learning methods challenging ^[31]. For handling irregular inputs like point clouds, an intuitive approach is to transform the irregularity into regularity. One approach is multi-view projection ^[32], which projects 3D point cloud information onto a 2D plane. Shi et al. ^[33] employed the multi-view projection method for semantic and instance segmentation of tomato seedlings. However, in the multi-view projection method ^[34], the geometric information of point clouds is collapsed during the projection stage. Another approach to transforming irregular point clouds into regular representations is 3D voxelization ^[35], followed by using deep learning to process plant point clouds ^[36]. Compared to the two aforementioned indirect methods, directly point-based deep learning networks consume less computation and memory. PointNet and PointNet++ ^[37,38] are the earliest point-based 3D deep neural networks. Kang et al. ^[39] and Masuda ^[40] applied these two networks to plant point cloud segmentation. Subsequently, point-based 3D neural networks specialized in plant point cloud segmentation were proposed ^[41]. Ghahremani et al. ^[42] proposed a lightweight deep network for plant point cloud segmentation, and later introduced Pattern-Net ^[43], specifically for wheat point cloud segmentation, based on the original network. Recently, attention-based Transformers have been first applied to point-based deep learning ^[44,45,46], significantly improving the performance of point cloud segmentation. Subsequently, some point-based deep learning network models combined with Transformers have been proposed ^[47,48,49], demonstrating excellent segmentation performance. Turgut et al. ^[29] and Li et al. ^[50] proposed attention-based point cloud segmentation networks, RoseSegNet and PSegNet, specifically designed for plant point cloud segmentation.

In conclusion, using 3D deep learning for plant organ segmentation is a promising method in phenotypic research, but there are several challenges. Due to the lack of plant datasets, it is challenging to validate the effectiveness of the designed networks on complex plant structures. In most 3D deep learning models, methods such as downsampling and grouping points may incur significant time costs ^[51].

To address the aforementioned challenges, we created virtual Arabidopsis models that simulate the characteristics of real Arabidopsis complex structures, which can be used to verify the feasibility of our network for segmentation on plants with complex structures. We proposed a novel downsampling module to improve the Stratified Transformer network ^[52] for semantic segmentation on our created virtual Arabidopsis dataset. Our network consumed less time compared to the original network while ensuring segmentation performance.

Our major contributions were as follows:

● To obtain more Arabidopsis data, we used L-Py to generate virtual Arabidopsis models. Then, we proposed the surface-weighted sampling method and downsampling method to obtain annotated Arabidopsis point clouds from the virtual models.

● Plant Stratified Transformer was proposed by us. This network utilized the Fast Downsample Layer, replacing the Stratified Transformer downsampling layer. Testing on the Arabidopsis dataset demonstrated that the improved network maintained segmentation performance comparable to the original network while reducing both training and inference times.

● Virtual Arabidopsis was involved in network training to validate the semantic segmentation performance of different networks, tested against real Arabidopsis. This demonstrated that when there was only a small amount of real plant data, virtual plants could be used as training data for the network.

2. Materials and methods

2.1. Virtual Arabidopsis model

Virtual plants are computer-generated synthetic models that simulate the development and growth processes of real plants ^[28]. L-systems is a commonly used formalism that can describe a wide range of plant features and types ^[53]. In the L-system framework, plants are defined by a sequence of symbols referred to as an L-string, which represents various organs of the plant. We utilized L-Py, a Python-based L-system, as the development platform ^[22]. On this platform, we employed the algorithm proposed by Chaudhury et al. ^[54] for plant generation to create Arabidopsis models. As shown in Figure 1(a), the generation of the Arabidopsis model primarily involves the following L-strings:

Figure 1. The generation process of virtual Arabidopsis. (a) represents the definitions of major plant organs, (b) illustrates the process of Meristem developing into different organs, and (c) shows the morphologies of different organs during the development process.

DownLoad: Full-Size Img PowerPoint

● $Meristem\left(t\right)$ for apical or lateral meristems.

● $Leaf(t, label\_L)$ for leaves of different sizes.

● $Internode(t, label\_I)$ for stem.

● $Flower(t, label\_F)$ for flowers composed of a pedicel, petals, and carpels, with the carpel subsequently developing into a fruit.

Each symbol was associated with a set of parameters that described the variables attached to the respective organ. The definitions of organs in Figure 1(a) essentially represent recursive functions, and thus, the generation and evolution of each organ involve iterative recursion with respect to the corresponding symbol over time increments of dt (1 hour). During the generation of organs, each class of organs X that needs to be identified is assigned a unique 'label_X'.

To ensure the reality of virtual plants, just like real plant development, all organs originated from the Meristem. The morphologies of organs during growth and development are shown in . Therefore, the generation process of Arabidopsis could be described simply as the development of different organs from Meristem under different growth states, as shown in . The development of plant organs in the figure depends on the value of 'state', which is provided by $addLatProd\left(t\right)$ . In order to explain the process of organ development in more detail, we presented the pseudocode (Algorithm 1) for this process. The pseudocode contains a lot of descriptive language for ease of understanding.

Algorithm 1: The process of Meristem developing into different organs.
	Input: Growth time t Output: $Internode\left(t\right)$ , leaf, flower and $Meristem\left(t\right)$
1	$\mathrm{i}\mathrm{f}\;latprod = =\mathrm{F}\mathrm{a}\mathrm{l}\mathrm{s}\mathrm{e}:$
2	//a plastochrone time is not reached so a lateral production(latprod) is not produced
3	$Meristem\left(t\right)$ : continue iteration
4	$\mathrm{e}\mathrm{l}\mathrm{s}\mathrm{e}$ :
5	//a plastochrone is reached and a lateral production must be added
6	$addLatProd\left(t\right):$
7	Physiological state [s, d]:
8	s = 0: Vegetative state
9	s = 1: Between Vegetative and In flowering state
10	s = 2: In flowering state
11	s = 3: Flower state
12	d = Plastochrones in this state (duration)
13	if a lateral primordium is produced:
14	Update of state counters d for the current meristem
15	//Check physiological age
16	if duration limit is reached:
17	move to the next state s
18	$\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{d}\mathrm{u}\mathrm{c}\mathrm{e}:$
19	$Internode\left(t\right)$
20	A $Meristem$ in state [s, d]:
21	$\mathrm{i}\mathrm{f}\;\mathrm{d} < \mathrm{d}\_\mathrm{s}$ (duration threshold in state s):
22	produce a lateral meristem in state [s + 1, 0]
23	produce an apical meristem in state [s, d + 1]
24	$\mathrm{i}\mathrm{f}\;\mathrm{d} = \mathrm{d}\_\mathrm{s}:$
25	produce a lateral meristem in state [s + 2, 0]
26	produce an apical meristem in state [s + 1, 0]
27	$\mathrm{I}\mathrm{n}\;\mathrm{a}\mathrm{d}\mathrm{d}\mathrm{i}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}:$ produce leaf in state 0 and 1
28	$\mathrm{F}\mathrm{i}\mathrm{n}\mathrm{a}\mathrm{l}\mathrm{l}\mathrm{y}:$ the meristem transforms into a flower in state 3
29	$Meristem\left(t\right)$ : keep an apical growth

Figure 2(d), (e) ^[55] depict the comparison between the generated Arabidopsis structure and the real Arabidopsis. In order to ensure the randomness of plant growth while preserving the structure imitating real plants, we employed a significant amount of random computation. Typically, we achieved this by assigning two values to a variable, one as the mean and the other as the standard deviation, and during the runtime, we randomly generated values by substituting these two values into a Gaussian distribution. For instance, the growth duration (in hours) was determined by providing us with the mean days (Mean_day) and standard deviation (StDev_day), and the final duration in days was obtained by the Gaussian distribution function, which was then multiplied by 24. The leaf morphology was determined by a randomly computed ratio between the leaf blade and the petiole. These methods ensure that variables can fluctuate within certain constraints, thus maintaining both developmental diversity and the stability of the plant.

Figure 2. Virtual Arabidopsis at different time points. The average growth periods for (a), (b), (c) and (d) are 10, 20, 30 and 40 days, respectively. (e) represents real Arabidopsis with a growth period of 59 days ^[55].

DownLoad: Full-Size Img PowerPoint

The Arabidopsis we generated using the above algorithm is shown in . The generated Arabidopsis consists of a large number of triangles meshes of different sizes, as shown in . The simplest method to convert the mesh model into a point cloud was to directly use the intersection points in the mesh as sampling points, but this results in significant shape errors in the point cloud. Alternatively, we sampled one point from each triangle repeatedly until the specified number of points was obtained. However, because there were more small triangles than large triangles, the point cloud had highly uneven point density. Moreover, the method was inefficient since the time complexity of randomly selecting triangles each time was ${O}_{\left(N\right)}$ , where $N$ was the number of triangles. To address these issues, we introduced the surface-weighted sampling method, where triangles were selected based on their weight, and points on their surfaces were randomly sampled.

Figure 3. Virtual Arabidopsis models transforms into point cloud models. (a) is Arabidopsis original model, (b) is triangle mesh model, (c) is fully annotated point cloud model.

DownLoad: Full-Size Img PowerPoint

First, we utilized the $AliasMethod$ ^[56], a non-uniform random sampling algorithm that trades space for time. The triangle area as the weight to sample one triangle at a time. The sampling process was mainly as follows: For a grid with $N$ triangles, $Alias\;Method$ compressed the entire probability distribution into a $1\times N$ rectangle, where each event $i$ (corresponding to different-sized triangles) was transformed into the area it occupies within the rectangle. The area occupied by each event in the rectangle can be expressed as follows:

$\begin{array}{c}{S}_{i} = \frac{{p}_{i}\times N}{{\sum }_{k = 1}^{k-N}{p}_{k}}.\end{array}$

(1)

The area occupied by event $i$ in the rectangle may have cases where ${S}_{i} > 1$ or ${S}_{i} < 1$ . We supplemented the excess area of events with ${S}_{i} > 1$ to the corresponding event with ${S}_{i} < 1$ , forming small rectangles with ${S}_{i} = 1$ . Additionally, we ensured that each small rectangle stores a maximum of two events. Lists $Prob$ and $Alias$ were constructed, where $Prob\left[i\right]$ represented the area proportion of event $i$ in the $i$ -th small rectangle, representing the probability of event $i$ . $Alias\left[i\right]$ indicated the index of another event in the same small rectangle. This preprocessing step generated a list with a time complexity of ${O}_{\left(N\right)}$ and a space complexity of ${O}_{\left(N\right)}$ . When selecting a triangle, we generated a random number $i\in \left[0, N\right)$ and then generated another random number $r~Unif\left(\mathrm{0, 1}\right)$ . If $r < Prob\left[i\right]$ , it meant event $i$ was accepted; otherwise, event $i$ was rejected and $Alias\left[i\right]$ was returned. The time complexity here was only ${O}_{\left(1\right)}$ . In summary, for a mesh of N triangles, $Alias\;Method$ could pre-generate a list with a time complexity of ${O}_{\left(N\right)}$ and a space complexity of ${O}_{\left(N\right)}$ , and then randomly selected triangles in constant time ( ${O}_{\left(1\right)}$ ).

Then, we performed point sampling inside the selected triangle. For each selected $△ABC$ , we generated two random numbers $u$ and $v$ on $\left(\mathrm{0, 1}\right)$ and obtained the surface sampling point ${P}_{\left(x, y\right)}$ using Eq (2).

$\begin{array}{c}\left\{\begin{array}{c}x = \left(1-\sqrt{v}\right){x}_{a}+\sqrt{v}\left(1-u\right){x}_{b}+\sqrt{v}u{x}_{c}\\ y = \left(1-\sqrt{v}\right){y}_{a}+\sqrt{v}\left(1-u\right){y}_{b}+\sqrt{v}u{y}_{c}\end{array}\right.u, v\in \left(\mathrm{0, 1}\right).\end{array}$

(2)

${x}_{a}, {x}_{b}, {x}_{c}$ and ${y}_{a}, {y}_{b}, {y}_{c}$ represent the $x$ and $y$ coordinates of vertices $A, B, C$ , respectively. $\sqrt{v}$ represents the percentage from vertex A to the opposite edge $BC$ , while $u$ represents the percentage along the edge $BC$ (see Figure 4).

Figure 4. Sampling a random point in a triangle.

DownLoad: Full-Size Img PowerPoint

In summary, we decomposed the original model's surface into numerous triangles of varying sizes. Next, we employed the surface-weighted sampling method to repeatedly select triangles and sample random points within each selected triangle, resulting in the final point cloud. The resulting point cloud is depicted in Figure 5(c), where points of different colors correspond to different labels.

Figure 5. The downsampling process. Voxel Centroid Sampling was performed on the original point cloud (a) to obtain the voxelized point cloud (b). After sampling a fixed number of points through Random Sampling, the sampling point cloud (c) is finally obtained.

DownLoad: Full-Size Img PowerPoint

2.2. Point cloud sampling

Since neural networks have a small and fixed number of points fed during training, after obtaining the virtual Arabidopsis point cloud, downsampling is required to reduce the number of points to a fixed value. In point cloud downsampling, it is necessary to rapidly and significantly reduce the number of points while preserving the geometric shape and structural information of the plant as much as possible. In order to rapidly reduce the large and dense point cloud to a specified size, Voxel Centroid Sampling and Random Sampling methods are commonly used. Voxel Centroid Sampling first partitions the point cloud into voxels, and each voxel is sampled with only one centroid point ^[57]. Therefore, it can preserve the geometric information of the point cloud and reduce the existence of redundant data, but the number of points after sampling is uncertain. Random Sampling randomly samples points with the same probability, ensuring a more uniform distribution of points in the point cloud, and the number of sampled points can be specified ^[58]. However, during the sampling process, geometric information may be overlooked, which may lead to distortion in the sampling results.

In order to preserve the geometric information of point clouds while controlling the number of points after sampling, we combined these two sampling methods in the downsampling process, as shown in . We define the point cloud as $\mathcal{P} = \left\{{p}_{i}|i = 1, \dots , n\right\}$ , where ${p}_{i} = ({x}_{i}, {y}_{i}, {z}_{i})$ .

For each point, we used K-Dimensional Tree to find its nearest neighbors and calculate the Euclidean distance between ${p}_{i}$ and its nearest neighbor ${p}_{i}^{{{'}}}$ . Then, we average the distances of all points to obtain the average point spacing of the entire point cloud.

$\begin{array}{c}spac{e}_{mean} = \frac{1}{n}{\sum }_{i = 1}^{n}\sqrt{{\left({x}_{i}-{x}_{i}^{{{'}}}\right)}^{2}+{\left({y}_{i}-{y}_{i}^{{{'}}}\right)}^{2}+{\left({z}_{i}-{z}_{i}^{{{'}}}\right)}^{2}}.\end{array}$

(3)

Next, we computed the size of the voxels in the point cloud. The $size = \frac{spac{e}_{mean}}{\sqrt[d]{r}}$ , where $d$ is the dimension of the point cloud, and $r$ is a scaling factor usually ranging from 0.5 to 0.8. The point cloud was divided into voxels of the selected size, and the centroid of each voxel was calculated as ${p}_{cent}\left(\frac{1}{n}{\sum }_{i = 1}^{n}{x}_{i}, \;\frac{1}{n}{\sum }_{i = 1}^{n}{y}_{i}, \;\frac{1}{n}{\sum }_{i = 1}^{n}{z}_{i}\right)$ . The centroids of all voxels were output as the new point cloud in Figure 5(b). Finally, the new point cloud was sampled randomly to output the specified number of points in Figure 5(c). To meet the limitations on the input point cloud size for point-based point cloud segmentation networks, the number of points in the final downsampled point cloud was set to 8192.

2.3. Network architecture

We improved the state-of-the-art method Stratified Transformer in the field of point cloud semantic segmentation and proposed the Plant Stratified Transformer for semantic segmentation of plant point clouds. We designed the Fast Downsample Layer, utilizing point sampling from PSNet ^[59] and feature normalization from Pre-LN ^[60] to accelerate the downsampling process.

2.3.1. Stratified transformer

Stratified Transformer ^[52] is a novel point-based 3D point cloud segmentation network that effectively captures long-range contextual information and exhibits strong generalization capability and high performance. To accelerate convergence speed and enhance performance, Stratified Transformer utilizes First-layer Point Embedding to aggregate local information. The greatest advantage of this network lies in its novel key sampling strategy, the Stratified Key-sampling Strategy. For each query point, it densely samples nearby points and sparsely samples distant points in a stratified manner as its keys, enabling the model to expand the effective receptive field and acquire long-range context at a lower computational cost.

The Stratified Transformer mainly consists of the First-layer Point Embedding, Transformer Block, Downsample Layers, and Upsample Layers. The Downsample Layer first performs FPS (Farthest Point Sampling) to obtain the sampled points and then uses kNN (k-Nearest Neighbors) to query the original points for grouping indices. However, the computational cost of FPS significantly increases with the number of points ^[61]. The sampling results of FPS are influenced by the initial point selection during the sampling process and the order of points in the point cloud. In addition, FPS only considers the Euclidean distance between points and does not take into account the local features or global structure of the point cloud. kNN requires computing the Euclidean distance between each sampled point and all other points, but it has a high time and space complexity ^[62].

2.3.2. Fast downsample layer

We proposed the Fast Downsample Layer as a replacement for the original Downsample Layer. PSNet replaced FPS and kNN in Downsample Layer to perform sampling and grouping tasks, reducing computation and improving speed. Additionally, Pre-LN normalized features, and Max Pooling aggregated the projected features through grouped indexing. The entire structure of the downsampling layer is illustrated in Figure 6, where the input consists of points and features, and the output consists of sampled points and features.

Figure 6. The structure of the downsampling layer. (a) represents the processing flow of points and point features in the downsampling.

${n}_{i}, {n}_{i+1}$ indicate the number of points,

${d}_{1}, {d}_{2}, k$ denote the dimensions. (b) is a schematic diagram of the points data structure in each step of PSNet. In the (b) rightmost image, the gray areas in each column represent points outside the grouping, the light-yellow areas represent points within the same group, and the top dark yellow area indicates the sampled points.

DownLoad: Full-Size Img PowerPoint

PSNet performs simultaneous sampling and grouping, illustrated in . In , $\mathcal{P}$ is the input point cloud with $n$ points, represented as $\mathcal{P} = \{{p}_{1}, {p}_{2}, \dots , {p}_{n}\}$ . Each point ${p}_{i}$ has spatial coordinates ${c}_{i} = \left({x}_{i}, {y}_{i}, {z}_{i}, {\theta }_{i}, {\varphi }_{i}\right)$ , where ${c}_{i}\in {\mathbb{R}}^{d}$ . ${\theta }_{i}$ and ${\varphi }_{i}$ are the polar and azimuthal angles, respectively, of each point in spherical coordinates. Their values can be calculated using the following formulas:

$\begin{array}{c}\left\{\begin{array}{l}\theta = arctan\left(\frac{\sqrt{{x}^{2}+{y}^{2}}}{z}\right)\\ \varphi = arctan\left(\frac{y}{x}\right)\end{array}\right..\end{array}$

(4)

The point cloud $\mathcal{P}$ is divided into multiple sub-point clouds, which can be represented as $\mathcal{A} = \{{a}_{1}, {a}_{2}, \dots , {a}_{m}\}$ , $m$ is the number of local regions, and it is also the number of sampled points.

First, the point cloud $\mathcal{P}$ is processed using the Multilayer Perceptron (MLP) network with multiple layers of 1 × 1 convolutions to achieve the Spatial Features Transform Function (SFTF). For the spatial feature ${c}_{i}$ of point ${p}_{i}$ , the SFTF function can be expressed as: ${v}_{i} = tranform\left({c}_{i}\right)$ , where the function $tranform\left(x\right)$ transforms the $d$ dimensional feature of each point into a higher-dimensional $m$ feature, i.e., ${\mathbb{R}}^{d}\to {\mathbb{R}}^{m}$ . The output ${v}_{i}\in {\mathbb{R}}^{m}$ of the transform function is a vector representing the correlation between point ${p}_{i}$ and each of the $m$ local regions. Extending the SFTF to the entire point cloud P, with the input as the features of n points $\mathcal{C} = \{{c}_{1}, {c}_{2}, \dots , {c}_{n}\}$ , the corresponding $\mathcal{V}$ is obtained:

$\begin{array}{c}V = \;transform\left(\mathcal{C}\right)\end{array} ,$

(5)

where the input $\mathcal{C}\in {\mathbb{R}}^{n\times d}$ and the output $\mathcal{V}\in {\mathbb{R}}^{n\times m}$ (the two-dimensional matrix with $n$ rows).

Next, we use the $sigmoid$ function ${q}_{i} = \sigma \left({v}_{i}\right)$ to obtain the probability vector ${q}_{i}$ based on the correlation vector ${v}_{i}$ . The probability vector ${q}_{i}$ represents the probability that point ${p}_{i}$ belongs to one of the $m$ local regions ${a}_{j}\left(j = \mathrm{1, 2}, \dots , m\right)$ , where ${q}_{i}\in \left(\mathrm{0, 1}\right)$ . When extended to the entire point cloud, the sigmoid function can be expressed as:

$\begin{array}{c}Q = \sigma \left(\mathcal{V}\right)\end{array} ,$

(6)

where $\mathcal{Q}\in {\mathbb{R}}^{n\times m}$ is the probability matrix that represents the membership probabilities between each point in the point cloud $\mathcal{P}$ and each local region in the set of local regions $\mathcal{A}$ . The columns of $\mathcal{Q}$ represent the probabilities of each point belonging to the corresponding local region, which can be denoted as ${e}_{j}\in {\mathbb{R}}^{n}$ . Each value in ${e}_{j}$ represents the probability of point ${p}_{i}$ belonging to the local region ${a}_{j}$ .

The column ${e}_{j}$ in $\mathcal{Q}$ is sorted in descending order, and the indices of the top s probability values are selected. Here, the top s elements refer to the first s elements after sorting in descending order. The size of the point cloud for local region ${a}_{j}$ is denoted as $s$ . This process can be expressed as:

$\begin{array}{c}indice{s}_{j} = argto{p}_{s}\left(desc\left({e}_{j}\right)\right).\end{array}$

(7)

Here, $desc\left(x\right)$ is a function that sorts the elements of ${e}_{j}$ in descending order, and $argto{p}_{s}$ is a function that returns the indices of the top $s$ elements after sorting. $indice{s}_{j}\in {\mathbb{R}}^{s}$ .

Finally, the grouping and sampling of the point cloud $\mathcal{P}$ are completed as shown in the far right of . The $indice{s}_{j}$ are used to obtain the grouping indices for each sampled point, forming a grouping index matrix of size $\left({n}_{i+1}, k\right)$ . Moreover, for each local region, the point ${p}_{i}$ with the highest probability is selected as the downsampled point. This point is the one in the local region ${a}_{j}$ that best matches the features of both the point and the local region. Here, $l = argto{p}_{1}\left(desc\left({e}_{j}\right)\right)$ represents the index of the top-ranked point. Therefore, the set of downsampled points can be expressed as:

$\begin{array}{c}downsample = \left\{{p}_{l}∣l = argto{p}_{1}\left(desc\left({e}_{j}\right)\right)\right\}\;where\;j = \mathrm{1, 2}, \dots , m\end{array} .$

(8)

The final output is the set of sampled points ${\mathcal{P}}_{i+1}\left({n}_{i+1}, {d}_{2}\right)$ , ${\mathcal{P}}_{i+1}\subset {\mathcal{P}}_{i}$ .

Pre-LN consists of Layer Normalization ^[63] and a Linear layer, which helps reduce internal covariate shift and improve the stability of the model during training. As shown in the upper part of , we input the features ${\mathcal{F}}_{i}\left({n}_{i}, {d}_{1}\right)$ into Layer Normalization to normalize and stabilize the distribution of data features for each sample point. Layer Normalization calculates the mean $\mu$ and standard deviation $\sigma$ for all feature dimensions at each sample point.

$\begin{array}{c}\mu = \frac{1}{{n}_{i}}{\sum }_{j = 1}^{{n}_{i}}{f}_{j};\sigma = \sqrt{\frac{1}{{n}_{i}}{\sum }_{j = 1}^{{n}_{i}}{\left({f}_{j}-{\mu }_{i}\right)}^{2}}\end{array} .$

(9)

Here, ${f}_{j}$ represents the feature of each point, and ${f}_{j}\in {\mathcal{F}}_{i}$ . Then, through the Linear layer, the output dimensions remain unchanged at $\left({n}_{i}, {d}_{1}\right)$ . To aggregate the projected features from the Pre-LN output while preserving feature invariance, we use the Max Pooling layer with grouping indices $\left({n}_{i+1}, k\right)$ generated by the grouping operation of PSNet. This aggregates the projected features and generates the output features ${\mathcal{F}}_{i+1}\left({n}_{i+1}, {d}_{1}\right)$ , where ${\mathcal{F}}_{i+1}\subset {\mathcal{F}}_{i}$ .

2.3.3. The modified network

The network architecture utilizes the Stratified Transformer as the backbone and incorporates the Fast Downsample Layer to create the Plant Stratified Transformer for semantic segmentation of small-scale complex plant structures in point clouds. The overall network architecture is illustrated in Figure 7.

Figure 7. The overall structure of the Plant Stratified Transformer.

DownLoad: Full-Size Img PowerPoint

The network architecture resembles U-Net ^[64] and is divided into symmetric contracting and expanding paths. The left side of Figure 7 represents the contracting path, which is used to capture contextual information and perform hierarchical feature extraction, but it may lose some spatial information. At the beginning of the contracting path, the first-layer point embedding module aggregates the features of local neighbors for each point. Each subsequent module consists of a downsampling layer and several Transformer modules that capture local and long-range dependencies in the point cloud. The right side of Figure 7 represents the expanding path, which is used to upsample the features extracted from the contracting path and achieve precise localization of the segmented parts in the point cloud. The expanding path comprises multiple Upsample Layers, which densify features layer by layer. However, upsampling alone cannot recover spatial information. Hence, skip connections are employed to output shallow features from each stage of the contracting path to the corresponding upsampling layer in the expanding path. The network preserves more high-resolution details from shallow feature maps by fusing shallow and deep features in the Upsample Layers, thereby enhancing the accuracy of semantic segmentation.

The first-layer point embedding module in the network aggregates local features from adjacent points in the point cloud to enhance the model's generalization and expressive capabilities. The Transformer blocks employ Stratified Self-attention and Stratified Key-sampling Strategy ^[52] to increase the effective receptive field and enable the effective aggregation of long-range contextual information by query features.

2.4. Evaluation metrics

In order to verify the semantic segmentation performance of the Plant Stratified Transformer(our) on plant point clouds, we used four metrics: Precision (Prec), Recall (Rec), F1, and Intersection over Union (IoU) to compare the success rate of organ segmentation in plants. For all four semantic metrics (expressed as percentages), a higher value indicates better performance. Specifically, Prec represents the ratio of correctly classified points in a semantic class to all points predicted by the network. Rec reflects the ratio of correctly classified points in this semantic class to the total number of points in this class according to the true labels. F1 is a comprehensive metric calculated as the harmonic mean of Prec and Rec. For each semantic class, IoU reflects the degree of overlap between the predicted region of each semantic class and the corresponding true region. The four metrics are defined as follows:

$\begin{array}{c}Pre{c}_{c} = \frac{T{P}_{c}}{T{P}_{c}+F{N}_{c}}\end{array} ,$

(10)

$\begin{array}{c}Re{c}_{c} = \frac{T{P}_{c}}{T{P}_{c}+F{P}_{c}}\end{array} ,$

(11)

$\begin{array}{c}F{1}_{c} = 2\cdot \frac{Pre{c}_{c}\times Re{c}_{c}}{Pre{c}_{c}+Re{c}_{c}}\end{array} ,$

(12)

$\begin{array}{c}Io{U}_{c} = \frac{T{P}_{c}}{T{P}_{c}+F{N}_{c}+F{P}_{c}}\end{array} ,$

(13)

where $T{P}_{c}$ represents the true positive point count of the current semantic class, $F{P}_{c}$ represents the false positive point count of the current class, and $F{N}_{c}$ represents the false negative point count. $c$ is the semantic label, $c\in \{leaf, stem, flower\}$ .

3. Results

3.1. Data preparation and training environment

In this study, we used a generated virtual Arabidopsis dataset and a real Arabidopsis dataset with semantic labels. The latter was a dataset we created from 10 real 3D Arabidopsis plants selected from Ziamtsov et al. ^[65] dataset.

Virtual Arabidopsis dataset: We simulated Arabidopsis models with a developmental period of approximately 6 weeks using the L-Py framework. The generated models were converted into fully annotated point clouds using the surface-weighted sampling method. To simplify the labeling categories of the models, we unified the siliques and flowers of Arabidopsis as 'flower'. The final generated Arabidopsis models had semantic labels for 'leaf', 'stem', and 'flower'. The downsampling method described earlier was applied to reduce the number of points in the generated point cloud models. Through these steps, a total of 930 Arabidopsis point cloud models with diverse morphologies were generated. Figure 8(b) displays different morphological Arabidopsis point cloud models, each containing 8192 points. The information of an individual point included coordinates and semantic labels. The semantic labels 'leaf', 'stem', and 'flower' were represented by red, green, and blue points in the point cloud, respectively, and were denoted as '0', '1' and '2' in the point information, as shown in Figure 8(a). Since we could directly generate point cloud models with significantly different morphological structures, it reduced the possibility of overfitting and eliminates the need for traditional data augmentation operations. We divided the dataset into a training set and a validation set in a ratio of 4:1.

Figure 8. Fully annotated Arabidopsis point cloud model. (a) represents the information representation of points in the point cloud. (b) shows Arabidopsis point clouds with different structures.

DownLoad: Full-Size Img PowerPoint

Real Arabidopsis dataset: The dataset from Ziamtsov et al. ^[65] comprises high-resolution measurement data of plant structures generated using 3D scanning techniques. We selected 47 Arabidopsis 3D scanning models from this dataset with distinct structural features, as shown in Figure 9(a). Compared to our generated virtual Arabidopsis, these real Arabidopsis models exhibited more complex morphological structures. The proportion of siliques and flowers was lower, while the rosette part had a higher proportion.

Figure 9. 3D scanning models and point clouds of real Arabidopsis.

DownLoad: Full-Size Img PowerPoint

We used CloudCompare to convert these models into point clouds and performed semantic annotation on leaves, stems, and flowers. For data augmentation, random rotations, cropping, and stitching operations were applied to point clouds to enhance the diversity among the data. Subsequently, we downsampled the point cloud to 8192 points using the method in Section 2.2, as shown in Figure 9(b). To ensure the separation of training and test dataset, 27 point clouds were randomly selected from the 47 real point clouds and augmented five times, resulting in 135 point clouds for the training set. The remaining 20 real point clouds were used as the test set.

Due to the scarcity of real Arabidopsis, overfitting was likely to occur during network training, leading to an inability to achieve the expected performance. We merged the training sets of virtual and real Arabidopsis to create a mixed training set, comprising 744 virtual point clouds and 135 real point clouds. The validation set consists of the validation set of virtual Arabidopsis, comprising 186 virtual point clouds. The test set consists of the test set of real Arabidopsis, comprising 20 real point clouds.

Properly setting the learning rate is crucial during the network training phase. A high learning rate could lead to gradient explosions, causing significant oscillation in the loss and hindering model convergence. Conversely, A low learning rate slows down the model's learning speed and increases training time. For improved network convergence, we applied a learning rate decay with a fixed step size of 0.5 every 20 epochs, starting from an initial learning rate of 0.001. We trained the network with a batch size of 8 for a total of 250 epochs in the training environment specified in Table 1. For testing, we conducted training and evaluation on the Arabidopsis dataset using PointNet++ ^[38], PAConv ^[66] and the Stratified Transformer with identical parameters and environment, and then compared their performance with our improved network.

Table 1. Training environment.

Name	Parameter
CPU	Intel(R) Core(TM) i7-7700 CPU
GPU	GeForce GTX 1070
Memory	16 GB
Operating system	Ubuntu 20.04
Deep learning framework	Pytorch 1.8.0
Programming language	Python 3.9

| Show Table

DownLoad: CSV

3.2. Validation on virtual data

In this section, we individually trained four different networks on the mixed training set and validated them using the virtual Arabidopsis validation set. We compared the segmentation performance of our network with the other three networks. Table 2 presents the segmentation results of the improved network, Plant Stratified Transformer (ours), and three other segmentation methods for comparison. The table displayed the values of various metrics for the epoch with the highest IoU after network convergence. The values related to 'stem' are significantly lower compared to 'leaf' and 'flower' in the table. This is due to the stem category having fewer points in each Arabidopsis point cloud compared to the other two categories, leading to potential confusion during network training. Furthermore, our network achieve performance similar to the original network in mean precision, mean recall, mean F1-score, and mean IoU, with values of 88.12, 91.44, 89.64 and 85.56%. The differences between the two networks in these metrics are only 0.28, 0.22, 0.08 and 0.09%, indicating that the inclusion of the Fast Downsample Layer do not affect the performance of the network in semantic segmentation. Both networks outperform PointNet++ and PAConv, as the Transformer module effectively capture long-range contextual information, resulting in a larger receptive field and improved generalization ability for the model. In contrast, PointNet++ and PAConv rely on local feature aggregation and do not directly establish long-range dependencies, making it difficult to capture long-range contextual information. Consequently, due to the lower number of points in the stem category, PointNet++ and PAConv exhibit poorer performance in the metrics related to the stem category compared to the other two networks, whereas their performance differences in the other categories are less significant. PAConv, which is an improvement over the original backbone network PointNet, performs slightly better than PointNet++ but do not fully utilize global information. From the table, it could be seen that PointNet++ performs the worst, especially in the stem category, indicating that this network is not suitable for handling sparse plant point clouds with complex structures.

Table 2. The comparison of semantic segmentation across the four networks on the validation set. The best results are in boldface.

		PointNet++	PAConv	Stratified Transformer	Plant Stratified Transformer (ours)
Prec (%)	Leaf	95.78	95.39	98.57	98.49
	Stem	69.48	73.18	77.59	78.18
	Flower	91.26	94.23	96.32	96.79
	Mean	85.51	87.60	90.83	91.15
Rec (%)	Leaf	94.67	95.48	97.83	97.93
	Stem	72.35	75.83	80.41	80.37
	Flower	92.94	95.21	98.15	98.61
	Mean	86.65	88.84	92.13	92.30
F1 (%)	Leaf	95.22	95.43	98.20	98.21
	Stem	70.89	74.48	78.97	79.26
	Flower	92.09	94.72	97.23	97.69
	Mean	83.05	84.96	88.59	91.72
IoU (%)	Leaf	90.88	91.27	96.46	96.48
	Stem	54.90	59.34	65.25	65.65
	Flower	85.34	89.97	94.60	95.49
	Mean	77.04	80.19	85.44	85.87

| Show Table

DownLoad: CSV

The visualization results of semantic segmentation for the four networks are shown in Figure 10. The leftmost images represent the ground truth, and the images on the right display the test results of different networks. We select four Arabidopsis point clouds with significant morphological differences to demonstrate the visualization results of semantic segmentation. From Figure 10, it is visually evident that the number of points belonging to the stem is significantly fewer than those belonging to the leaf and flower. The segmentation performance is clearly the worst at the boundaries between different organ structures, leading to confusion, especially at the boundaries between the stem and the other two structures. The figure also shows that the segmentation of the leaf was superior to that of the stem and flower. This is because the leaf had fewer intersections with the stem and flower, it has a clear structure, and there are also more points.

Figure 10. The qualitative segmentation comparison of the four networks on the validation set.

DownLoad: Full-Size Img PowerPoint

3.3. Segmentation of real data

We tested the networks trained on the mixed dataset on the real Arabidopsis testing set, and the segmentation results are shown in Table 3, compared with the other three networks. Clearly, the segmentation performance of all networks significantly decreases. PointNet++, PAConv and Stratified Transformer had mean IoU of 57.93, 61.02 and 69.86%, respectively, which were decreased by 19.11, 19.17 and 15.58% compared to the segmentation results on virtual Arabidopsis. The segmentation performance of PointNet++ and PAConv was notably poor, failing to meet the requirements for semantic segmentation, possibly due to limitations in their generalization abilities. The improved network achieved mean Prec, Rec, F1-score and IoU of 80.42, 82.18, 81.29 and 70.04%, respectively, which decreased by 10.73, 10.12, 10.43 and 15.83% compared to the segmentation results on virtual Arabidopsis. Both the Stratified Transformer and our network's mean IoU could maintain around 70%, meeting the requirements for semantic segmentation and showcasing good generalization capability. Furthermore, Table 3 shows that the primary reason for the decrease in Mean IoU is a significant drop in the metrics related to the 'Flower' category. This is due to the scarcity of the 'flower' label in the tested real Arabidopsis compared to the virtual Arabidopsis.

Table 3. The comparison of semantic segmentation across the four networks on the test set. The best results are in boldface.

		PointNet++	PAConv	Stratified Transformer	Plant Stratified Transformer (ours)
Prec (%)	Leaf	92.54	94.42	97.51	97.93
	Stem	72.58	75.34	80.21	79.85
	Flower	65.35	69.10	75.12	74.81
	Mean	76.82	79.62	84.28	84.20
Rec (%)	Leaf	89.25	93.25	95.82	96.14
	Stem	66.89	66.48	80.22	79.68
	Flower	60.57	67.31	72.93	73.28
	Mean	72.24	75.68	82.99	83.03
F1 (%)	Leaf	90.87	93.83	96.66	97.03
	Stem	69.62	70.63	80.21	79.76
	Flower	62.87	68.19	74.01	74.04
	Mean	74.45	77.55	83.63	83.61
IoU (%)	Leaf	83.26	88.38	93.53	94.23
	Stem	53.40	54.60	66.97	66.34
	Flower	45.85	51.74	58.74	58.78
	Mean	60.83	64.91	73.08	73.11

| Show Table

DownLoad: CSV

Figure 11 presents a qualitative comparison of the performance of four networks on the real Arabidopsis testing set. It can be visually observed that the segmentation is poorest at boundaries between different organs. Compared to virtual Arabidopsis, the testing performance on real Arabidopsis is relatively poor due to morphological differences between the real and virtual Arabidopsis used in the experiments. This difference primarily lies in the virtual Arabidopsis having far more flowers and siliques than the real Arabidopsis, with significantly fewer points on the rosette's leaf compared to the real Arabidopsis. Similarly, there are more points on the rosette's leaf of the real Arabidopsis. This results in a smaller proportion of non-rosette points in the real Arabidopsis under the condition of the same point cloud size. Consequently, fewer non-rosette point features are captured by the network compared to virtual Arabidopsis, leading to a further decline in segmentation performance.

Figure 11. The qualitative comparison of segmentation on the test set by the four networks.

DownLoad: Full-Size Img PowerPoint

3.4. Time-consuming analysis

To assess the efficiency of our improved network on our dataset, we recorded the Training time and Inference time for each network during the experiments. Training time referred to the time taken during the training phase, from extracting a batch of data from the training set to feeding it into the neural network and obtaining the output results of the model during each iteration. Inference time was the time taken for forward propagation, which was the duration it takes for the model to process new input data and produce output predictions during the testing phase. From Table 4, it can be observed that, using the FPS and kNN sampling and grouping method, both the Training time and Inference time for Stratified Transformer are longer compared to PointNet++ and PAConv. This was because the Transformer block required processing a large number of point-wise self-attention weights.

Table 4. The mean training time (ms) and inference time (ms) for each method. The best results are in boldface.

Methods	Training time	Inference time
PointNet++	792.7	761.5
PAConv	678.8	652.3
Stratified Tranformer	1035.2	869.7
Plant Stratified Tranformer(our)	714.3	597.9

| Show Table

DownLoad: CSV

The training time and inference time of our improved network were 714 and 597.9 ms, respectively, which were reduced by 320.9 and 271.8 ms compared to the original network. PAConv had the shortest training time, 678.8 ms, which was 35.5 ms less than our network. However, our network's inference time was 54.4 ms less than PAConv, and our inference time was the optimal among the compared networks. All of these demonstrated that, compared to the original network, the Fast Downsample Layer replacing the original downsampling layer had shortened the time required for point cloud sampling and grouping.

4. Discussion

4.1. Ablation study

In this section, we designed several independent ablation experiments to verify the effectiveness of the Fast Downsample Layer proposed in the Plant Stratified Transformer. This includes assessing the effectiveness of sampling and grouping in PSNet, as well as the functionality of the Pre-LN module. For the sake of comparison and demonstration, we used traditional sampling and grouping methods, FPS and KNN, as benchmarks, and named the sampling and grouping parts of PSNet as S1 and G1, FPS as S2, and kNN as G2. Since G1 needs to work in conjunction with S1, it cannot exist independently of S1. Ablation experiments for semantic segmentation are presented in Table 5, and ablation experiments for mean training and inference time in semantic segmentation are shown in Table 6.

Table 5. The ablation analysis of our network on the validation set. The best results are in boldface.

	Ver	Sampling	Grouping	Pre-LN	Leaf	Stem	Flower	Mean
Prec (%)	V0	S1	G1	√	98.49	78.18	96.79	91.15
	V1	S1	G1		98.03	77.80	96.23	90.69
	V2	S1	G2	√	98.35	77.93	96.64	90.97
	V3	S1	G2		98.12	77.67	96.19	90.66
	V4	S2	G2	√	98.57	77.59	96.32	90.83
	V5	S2	G2		97.93	77.14	96.12	90.40
Rec (%)	V0	S1	G1	√	97.93	80.37	98.61	92.30
	V1	S1	G1		97.34	79.87	98.13	91.78
	V2	S1	G2	√	97.91	80.29	98.28	92.16
	V3	S1	G2		97.30	80.03	98.06	91.80
	V4	S2	G2	√	97.83	80.41	98.15	92.13
	V5	S2	G2		97.42	79.81	98.01	91.75
F1 (%)	V0	S1	G1	√	98.21	79.26	97.69	91.72
	V1	S1	G1		97.68	78.82	97.17	91.23
	V2	S1	G2	√	98.13	79.09	97.45	91.56
	V3	S1	G2		97.71	78.83	97.12	91.22
	V4	S2	G2	√	98.20	78.97	97.23	91.47
	V5	S2	G2		97.67	78.45	97.06	91.06
IoU (%)	V0	S1	G1	√	96.48	65.65	95.49	85.87
	V1	S1	G1		95.47	65.05	94.50	85.01
	V2	S1	G2	√	96.33	65.42	95.03	85.59
	V3	S1	G2		95.52	65.06	94.39	84.99
	V4	S2	G2	√	96.46	65.25	94.60	85.44
	V5	S2	G2		95.45	64.54	94.28	84.76

| Show Table

DownLoad: CSV

Table 6. The ablation analysis of our network on the training time (ms) and inference time (ms). The best results are in boldface.

Ver	Sampling	Grouping	Pre-LN	Training time	Inference time
V0	S1	G1	√	721.4	613.9
V1	S1	G1		735.9	624.5
V2	S1	G2	√	754.8	643.5
V3	S1	G2		773.5	659.4
V4	S2	G2	√	1041.5	873.2
V5	S2	G2		1059.8	894.3

| Show Table

DownLoad: CSV

In Tables 5 and 6, the "Ver" column provides the names of the ablated network versions. We compared eight version networks named "V1" to "V5" with the complete network ("V0"). Each version was created by removing or replacing existing modules from the original Plant Stratified Transformer. In the comparison of different sampling and grouping methods, we found that sampling and grouping methods can slightly improve the effectiveness of the network model. Removing or adding the Pre-LN module also had an impact on the network's performance, possibly because normalized features are more conducive to the stability of the model during training, to some extent improving the model's effectiveness. This may be because the Fast Downsample Layer can provide more suitable local grouping compared to the original model.

In Table 6, we compared the impact of different modules on the training and inference time. Pre-LN somewhat accelerated the network's speed because normalization can expedite network convergence. Most importantly, our sampling and grouping methods are significantly faster than the combination of FPS and kNN. This is because the time complexity of FPS + kNN is much greater than that of PSNet. The time complexity of FPS is O(n²) ^[61]. The time complexity of kNN includes O(nm) for distance calculation and O(nlog₂n) for heap sorting. Therefore, the time complexity of FPS+kNN is O(n² + nm + nlog₂n). In contrast, the time complexity of PSNet is O(nm + nlog₂n), which includes SFTF and heap sorting. (n represents the number of sampled points, and m represents the number of groups).

4.2. The effect of virtual data on network training

To study the impact of the ratio of virtual to real data on network training and validate whether the inclusion of virtual plants in the mixed dataset enhances accuracy, we divided the training set into V, R1, R2 and R3, as shown in Table 7. R1, R2 and R3 were obtained by augmenting data from 9, 18 and 27 real plants, respectively, five times. These four parts were combined to form 7 training sets, each trained separately. The seven sets of data included virtual data (V), real data (R1, R2, R3), and mixed data (V + R1, V + R2, V + R3). The test set remained as 20 real point clouds. The four networks were trained separately seven times, and the variations in testing results were compared when using different combinations of training sets.

Table 7. Partitioning of training data.

	V	R1	R2	R3
Data	Virtual	Real	Real	Real
Training	744	9:45	18:90	27:135

| Show Table

DownLoad: CSV

Figure 12 presents the test results of models trained by various networks on different combinations of training sets. PointNet++ shows the smallest increase in various metrics as the combinations change. In contrast, Stratified Transformer and our model exhibit the largest increases. This indicates that the latter two can learn more features similar to both virtual and real data. Training exclusively with virtual data (V) yields unsatisfactory results, especially as PointNet++ fails to achieve even a 45% mIoU. Training exclusively with different quantities of real data (R1, R2 and R3) shows improved performance with an increase in the amount of real data. The metrics between R1 and R2 clearly show a higher improvement compared to that between R2 and R3. This suggests that the improvement in network training effectiveness diminishes with the increase in real data. In the case of the combination of R1, R2, R3 and V, the training performance is evidently better than training with real data alone. This indicates that in this study, virtual plant data can indeed enhance the overall training effectiveness when combined with a certain amount of real data. The combination of a large amount of virtual data with a small amount of real data achieves acceptable training results.

Figure 12. The variation in test results of the four networks on different combinations of training sets.

DownLoad: Full-Size Img PowerPoint

4.3. Limitations

Our virtual plants have certain shortcomings when compared to other virtual plants. For instance, Morel et al. ^[67] used a TLS simulator to generate point clouds from virtual tree mesh models. This method simulates LiDAR scanning of trees at certain heights and distances. The references ^[68,69] considered the impact of simulating the distance between real scanning devices and plant objects on point cloud density. However, our method does not simulate the position of scanning devices on point clouds. In future work, we can introduce variable point density and artificial noise into point clouds by simulating acquisition systems such as ToF cameras and LiDAR in a virtual environment.

Point clouds obtained from scanning real plants are typically very large, whereas the maximum point cloud size our network takes as input is 8912. For plants with dense foliage or larger leaves, this can result in very sparse leaf point clouds, thus affecting the final segmentation results. The structure of real Arabidopsis is more complex. In real environments, it is challenging to fully capture the stems, leading to significant differences in the proportions of flowers and stems in various Arabidopsis. Typically, Arabidopsis rosette leaves are abundant, causing the proportion of non-rosette features to be too low in a fixed-size point cloud. This results in the loss of more phenotypic features. In future work, Arabidopsis, such as rosette plants, we can separate the rosette and non-rosette parts before segmenting individually.

5. Conclusions

In this study, virtual Arabidopsis models generated by L-system were used as experimental data. This allowed for the involvement of virtual plants in network training alongside real plants, addressing the scarcity of Arabidopsis point cloud data. Since the generated virtual plants came with semantic labels, this eliminated the need for manual annotation. We proposed the Plant Stratified Transformer for semantic segmentation tasks on plants. Our network was based on the Stratified Transformer backbone, incorporating the Fast Downsample Layer to accelerate network speed. In our dataset, the training set was a mixture of virtual and real Arabidopsis point clouds, while the test set consisted of real data. The improved network, alongside PointNet++, PAConv and the original Stratified Transformer, was trained and tested on this dataset. In terms of segmentation performance, the improved network was comparable to the original network but significantly outperforms the other two. Regarding time consumption, the improved network had much shorter training and inference times compared to the original network. This indicated that the improved network enhanced training and inference efficiency while maintaining the segmentation performance of the original network. The research results thoroughly demonstrated the significant potential of virtual plants and deep learning methods in rapidly extracting plant phenotypes. Synthetic virtual data could effectively facilitate the training of deep learning models, thereby reducing costs and time. This not only has a positive impact on plant phenotype research but also injects new vitality into the field's development.

In future work, we plan to acquire 3D point cloud models by scanning Arabidopsis to obtain additional real data for network training. We will improve or design new deep learning architectures to handle plant point cloud models more efficiently, meeting the real-time requirements of practical applications.

Use of AI tools declaration

The authors declare that they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work was supported by National Key Research and Development Program of China (Project no. 2023YFD200140301), Project of China Construction Center Construction Engineering Co., LTD (Project no. ZX & AZ02202200016001), Shandong Province Postgraduate Quality Case Library Project (Project no. SDYAL2022143), National Key R & D Program of China (Project no. 2022YFE0125800), Basic Scientific Research Project of Liaoning Provincial Department of Education (Project no. LJKQZ20222458) and the National Modern Agricultural Industry Technology System Post Scientist Project (CARS-13-National Peanut Industry Technology System-Sowing and Field Management Mechanization Post).

Conflict of interest

The authors declare that there are no conflicts of interest.

References

[1]	T. Lukashiv, I. Malyk, Existence and uniqueness of solution of stochastic dynamic systems with Markov switching and concentration points, Int. J. Differ. Equ., 2017 (2017), 7958398. https://doi.org/10.1155/2017/7958398 doi: 10.1155/2017/7958398
[2]	A. P. Trofymchuk, A. P. Trofymchuk, Switching systems with fixed moments shocks the general location: Existence, uniqueness of the solution and the correctness of the Cauchy problem, Ukrainian Math. J., 42 (1990), 230–237.
[3]	E. B. Dynkin, Markov processes, Heidelberg: Springer, 1965. https://doi.org/10.1007/978-3-662-00031-1_4
[4]	B. Oksendal, Stochastic differential equation, Heidelberg: Springer, 2013. https://doi.org/10.1007/978-3-642-14394-6_5
[5]	R. Khasminskii, Stochastic stability of differential equations, Berlin: Springer, 2011.
[6]	T. Lukashiv, Y. Litvinchuk, I. V. Malyk, A. Golebiewska, P. V. Nazarov, Stabilization of stochastic dynamical systems of a random structure with Markov switchings and poisson perturbations, Mathematics, 11 (2023), 582. https://doi.org/10.3390/math11030582 doi: 10.3390/math11030582
[7]	T. O. Lukashiv, I. V. Yurchenko, V. K. Yasinskii, Lyapunov function method for investigation of stability of stochastic Ito random-structure systems with impulse Markov switchings. I. General theorems on the stability of stochastic impulse systems, Cybernet. Systems Anal., 45 (2009), 281–290. https://doi.org/10.1007/s10559-009-9102-8 doi: 10.1007/s10559-009-9102-8
[8]	M. L. Sverdan, E. F. Tsar'kov, Stability of stochastic impulse systems, Riga: RTU, 1994.
[9]	A. V. Skorokhod, Asymptotic methods in the theory of stochastic differential equations, American Mathematical Society, 2009.
[10]	J. Jacod, A. N. Shiryaev, Limit theorems for stochastic processes, Heidelberg: Springer Berlin, 2003. https://doi.org/10.1007/978-3-662-05265-5
[11]	J. L. Doob, Stochastic processes, Wiley-Interscience, 1991.
[12]	A. M. Lyapunov, General problem of stability of motion, CRC Press, 1992.
[13]	T. Lukashiv, One form of Lyapunov operator for stochastic dynamic system with Markov parameters, J. Math., 2016 (2016), 1694935. https://doi.org/10.1155/2016/1694935 doi: 10.1155/2016/1694935
[14]	I. Ya. Kats, Lyapunov function method in problems of stability and stabilization of random-structure systems, 1998.
[15]	V. K. Yasinsky, Stability in the first approximation of random-structure diffusion systems with aftereffect and external Markov switchings, Cybern. Syst. Anal., 50 (2014), 248–259. https://doi.org/10.1007/s10559-014-9612-x doi: 10.1007/s10559-014-9612-x

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(1808) PDF downloads(78) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(1)

AIMS Mathematics

Stability of stochastic dynamic systems of a random structure with Markov switching in the presence of concentration points

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Virtual Arabidopsis model

2.2. Point cloud sampling

2.3. Network architecture

2.3.1. Stratified transformer

2.3.2. Fast downsample layer

2.3.3. The modified network

2.4. Evaluation metrics

3. Results

3.1. Data preparation and training environment

3.2. Validation on virtual data

3.3. Segmentation of real data

3.4. Time-consuming analysis

4. Discussion

4.1. Ablation study

4.2. The effect of virtual data on network training

4.3. Limitations

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Stability of stochastic dynamic systems of a random structure with Markov switching in the presence of concentration points

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Virtual Arabidopsis model

2.2. Point cloud sampling

2.3. Network architecture

2.3.1. Stratified transformer

2.3.2. Fast downsample layer

2.3.3. The modified network

2.4. Evaluation metrics

3. Results

3.1. Data preparation and training environment

3.2. Validation on virtual data

3.3. Segmentation of real data

3.4. Time-consuming analysis

4. Discussion

4.1. Ablation study

4.2. The effect of virtual data on network training

4.3. Limitations

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog