
Binder interphases inside the highly filled polymer bonded explosives (PBXs) are irregularly distributed and extremely thin, but play an essential role in affecting the overall moduli and explosive performance of such heterogeneous media. In the present paper, a spring-type interface model, which was physically equivalent to these practical layers within a fixed error bound, was briefly derived, at first taking account of the fact that the stiffness of the binder material was much lower than that of the explosive crystals. Hereafter, a simplified PBX model consisting of a spherical explosive particle bonded to an infinite explosive matrix by the spring-type interface is designed, and its effective isotropic moduli were analytically determined via the generalized self-consistent scheme. The upper and lower bounds of these moduli were also derived based on the elasticity extremum principles of minimum potential and minimum complementary energies. These explicit expressions can be applied to predict the preliminary elastic properties of highly filled PBXs as benchmarks to validate numerical evaluations and so forth. Eventually, some discussions were made on the size-dependent effect of PBXs with the aid of the simplified model.
Citation: Jian-Tao Liu, Mei-Tong Fu. Theoretical modeling of thin binder interphases in highly filled PBX composites together with the closed form expression of the effective isotropic moduli of a simplified PBX model[J]. Electronic Research Archive, 2025, 33(2): 1045-1069. doi: 10.3934/era.2025047
[1] | Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087 |
[2] | Shixuan Yao, Xiaochen Liu, Yinghui Zhang, Ze Cui . An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning. Mathematical Biosciences and Engineering, 2022, 19(9): 9258-9290. doi: 10.3934/mbe.2022430 |
[3] | Cong Zhao, Na Deng . An actor-critic framework based on deep reinforcement learning for addressing flexible job shop scheduling problems. Mathematical Biosciences and Engineering, 2024, 21(1): 1445-1471. doi: 10.3934/mbe.2024062 |
[4] | Jin Zhang, Nan Ma, Zhixuan Wu, Cheng Wang, Yongqiang Yao . Intelligent control of self-driving vehicles based on adaptive sampling supervised actor-critic and human driving experience. Mathematical Biosciences and Engineering, 2024, 21(5): 6077-6096. doi: 10.3934/mbe.2024267 |
[5] | Siqi Chen, Ran Su . An autonomous agent for negotiation with multiple communication channels using parametrized deep Q-network. Mathematical Biosciences and Engineering, 2022, 19(8): 7933-7951. doi: 10.3934/mbe.2022371 |
[6] | Seyedeh N. Khatami, Chaitra Gopalappa . A reinforcement learning model to inform optimal decision paths for HIV elimination. Mathematical Biosciences and Engineering, 2021, 18(6): 7666-7684. doi: 10.3934/mbe.2021380 |
[7] | Koji Oshima, Daisuke Yamamoto, Atsuhiro Yumoto, Song-Ju Kim, Yusuke Ito, Mikio Hasegawa . Online machine learning algorithms to optimize performances of complex wireless communication systems. Mathematical Biosciences and Engineering, 2022, 19(2): 2056-2094. doi: 10.3934/mbe.2022097 |
[8] | Juan Du, Jie Hou, Heyang Wang, Zhi Chen . Application of an improved whale optimization algorithm in time-optimal trajectory planning for manipulators. Mathematical Biosciences and Engineering, 2023, 20(9): 16304-16329. doi: 10.3934/mbe.2023728 |
[9] | Xiaoxuan Pei, Kewen Li, Yongming Li . A survey of adaptive optimal control theory. Mathematical Biosciences and Engineering, 2022, 19(12): 12058-12072. doi: 10.3934/mbe.2022561 |
[10] | Zhen Yang, Junli Li, Liwei Yang, Qian Wang, Ping Li, Guofeng Xia . Path planning and collision avoidance methods for distributed multi-robot systems in complex dynamic environments. Mathematical Biosciences and Engineering, 2023, 20(1): 145-178. doi: 10.3934/mbe.2023008 |
Binder interphases inside the highly filled polymer bonded explosives (PBXs) are irregularly distributed and extremely thin, but play an essential role in affecting the overall moduli and explosive performance of such heterogeneous media. In the present paper, a spring-type interface model, which was physically equivalent to these practical layers within a fixed error bound, was briefly derived, at first taking account of the fact that the stiffness of the binder material was much lower than that of the explosive crystals. Hereafter, a simplified PBX model consisting of a spherical explosive particle bonded to an infinite explosive matrix by the spring-type interface is designed, and its effective isotropic moduli were analytically determined via the generalized self-consistent scheme. The upper and lower bounds of these moduli were also derived based on the elasticity extremum principles of minimum potential and minimum complementary energies. These explicit expressions can be applied to predict the preliminary elastic properties of highly filled PBXs as benchmarks to validate numerical evaluations and so forth. Eventually, some discussions were made on the size-dependent effect of PBXs with the aid of the simplified model.
As China's standard of living rises, the proportion of people with a driving license continues to increase year by year [1]. At the same time, parking difficulties have become a common problem plaguing the public [2]. In order to solve this problem, the rational planning and management of car parks is particularly important [3]. In practical parking planning, traffic flow, vehicle types, surrounding road conditions, parking demand and how to maximize efficiency should be taken into account [4]. Comparatively speaking, traditional manual scheduling methods are deficient in their analysis of the problem and ability to yield decisions [5]. They cannot satisfy the need for fast and efficient computation and are somewhat blind. In terms of transferring cars, it is a different and more problematic scenario than the usual one. This is because it has an infinite state space. While using reinforcement learning the car park is viewed as an intelligent system, which learns by trial and error and guides its behavior through the rewards obtained from interacting with the environment. The car planning system is a multi-objective planning problem with infinite state space. Reinforcement learning can be used to optimize the performance of the system by calculating the optimal combination of strategies through its powerful search strategy [6].
Therefore, designing a reinforcement learning based car transfer planning system for parking lots can effectively solve this problem. Reinforcement learning is a type of machine learning, and it is widely used in the field of artificial intelligence [7]. It allows an intelligent body (agent) to learn by trial and error in an environment to obtain a strategy that maximizes the desired reward. And it is suitable for solving for the sequential decision-making class of problems [8]. The use of this system can enhance the efficiency and accuracy of managers' decision-making process. As a result, it reduces the incidence of risk and error in decision-making. Additionally, utilizing this system can save time and cost in practical applications [9]. Considering the non-negligible environmental factors, it becomes necessary for intelligent bodies to continuously adjust their behaviors and strategies [10]. This adjustment is essential for the requirement of continuous optimization and improvement in performance and effectiveness [11]. Only in this way can the requirement to continuously optimize and improve one's performance and effectiveness be met. Thus, the goal of maximizing the subsequent desired return is achieved. In addition, the development of machine learning provides solutions for optimal control of life's work. In the literature [12], scholars have introduced a variational data assimilation model as a way to deal with sparse, unstructured and time-varying sensor data. In another study [13], a new algorithmic model was built by combining data assimilation with a machine learning model. The authors used this model to implement real-time fire prediction. In a study by Zhong et al. [14], a digital twin fire model was developed for an interactive fire and emissions algorithm for natural environments. Reduced-order modeling and deep learning predictive models are utilized to enhance accuracy and effectiveness in simulations of fire behavior and emissions.
Among the many problems related to car transfer planning, one of the goals pursued is maximization of the expected return. This is achieved by calculating the optimal transfer scheme in scenarios in which vehicles are moved from one parking space to another. The challenge lies in determining the most efficient way to allocate and relocate vehicles to maximize the overall expected return. In parking lots, transfer planning can improve the efficiency of vehicle scheduling and management and increase economic efficiency. It is therefore a research topic of interest in both academia and industry. Wang et al. stated that vehicle rental yard systems are most widely used due to their flexibility, but they may encounter imbalance, particularly in the forms of saturation and exhaustion, and this imbalance may lead to loss of revenue [15]. Huang et al. considered the vehicle management problem with uncertain demand and proposed a solution for vehicle allocation. In their paper, they stated that the number of vehicles an operator needs to move in or out for each site is related to the vehicle inventory. And the inventory depends on the current inventory, vehicle pickup demand and vehicle return demand [16]. Oliveira et al. suggested that how to divide the existing fleet between parking lots is a key aspect. The number of vehicles at each site is constantly changing due to rentals and returns. So the cost of moving vehicles can be reduced by employing appropriate planning methods [17]. Similarly, for the vehicle allocation scheduling problem, Wang et al. analyzed the vehicle scheduling management problem for buses. They also proposed to establish a theoretical model for automated calculation of the optimal driver and bus scheduling scheme, as well as the use of a dynamic programming algorithm for its storage [18]. Wang et al. proposed that the demand for electric car sharing can be predicted by a hidden Markov model with the goal of maximizing corporate profits. Finally, a regional level electric car sharing optimization relocation model was developed [19]. Hao et al. used a novel distributionally robust optimization method. The method can use covariate information and demand moment information to construct scenario-dependent fuzzy sets to solve the problem of pre-allocation of idle taxi vehicles [20].
Regarding research in reinforcement learning, in work by Wang et al. [21], the tracking of an unknown unmanned surface vehicle in a complex system is optimized by using a reinforcement learning control algorithm. This algorithm is applied to enhance the performance and accuracy of tracking for an unmanned surface vehicle system. Alternatively [22], a self-learning model-free solution was designed to optimally control an unmanned surface vehicle. Wang et al. [23]developed a model that combines behaviorally critical augmented learning mechanisms with finite time control techniques. In this way, the tracking of an unmanned surface vehicle is optimized. Liu et al. proposed a method for achieving human-level control by using deep reinforcement learning. Their work on achieving best-in-class performance on multiple Atari games by using directly trained SNNs [24]. Peng et al. proposed an imitation learning system based on reinforcement learning that enables legged robots to learn agility-type motor skills by imitating real-world animals [25]. Zhang et al. proposed a deep reinforcement learning-based approach to allow unmanned aerial vehicles to perform navigation tasks in multi-obstacle environments with randomness and dynamics [26]. Oh et al. proposed a novel experience replay method. It employs new component-driven learnable features in model-based reinforcement learning to compute the experience scores [27].Li et al. proposed a novel advanced autonomous driving integration method based on end-to-end multi-agent deep reinforcement learning. It is capable of autonomously learning complex and realistic traffic dynamics [28]. In summary, reinforcement learning is widely used in several fields. At the same time, researchers have continued to push the development of reinforcement learning algorithms by proposing new methods and techniques, such as deep reinforcement learning, prioritized experience playback and multi-intelligent body reinforcement learning. These studies provide new perspectives and approaches to the application and theory of reinforcement learning and promote its further application and development in practice.
In this work, reinforcement learning was applied to a car transfer planning system for parking lots to find out the state, action, policy and reward of the problem based on the established Markov decision process model. In context, this sequential decision-making problem is suitable for the model-based dynamic planning approach. Thus, this problem has been divided into several subproblems and solved by using either strategy iteration or value iteration. The problem of how to choose a suitable iterative method is also addressed in this paper. A value matrix and an action matrix have been created. And an iterative approach is used to keep the two matrices updated until the values in the matrices reach convergence. For the Poisson distribution probability problem involved in this system, a module has been designed to find the parameters by inputting the parking lot and return data from two parking lots at a certain time period. And the optimal Poisson distribution parameters are obtained via a computational solution for the calculation of the state transfer probability of the Markov decision process.
In order to solve the problem of maximizing the expected return based on transfer planning, a car transfer planning system has been designed based on reinforcement learning. And, the problem is solved by constructing a Markov decision process and using a dynamic planning-based reinforcement learning algorithm. The system consists of two modules, i.e., a Poisson distribution parameterization module and dynamic programming solution module. The former calculates the imported data and finds a suitable parameter as a basis for the latter's solution. In the context of this paper on parking lots, the states of the two locations form a finite set. And each state corresponds to an action and a value, respectively. The dynamic programming solution module is based on the Markov decision process and dynamic programming ideas. And Poisson distribution is used as the basis of state transfer probability. The dynamic planning solution module is based on the Markov decision process and dynamic planning idea, and it adopts the Poisson distribution as the basis of state transfer probability. After that, the optimal strategy and optimal state value of each state are obtained via the dynamic planning iterative strategy. Finally, the computed data are stored in the database, so as to realize the fast query of the corresponding strategy and value of each state.
The specific business scenario diagram is shown in Figure 1.
The Markov decision process is a commonly used mathematical model in the field of artificial intelligence. It can be used to solve sequential decision-making problems, and ultimately the optimal strategy can be obtained via algorithms such as strategy iteration and value iteration [29]. At this stage it already has a wide range of applications in artificial intelligence, operations research, cybernetics, economics and other fields [30].
The first one introduced is the Markov process. It is applied to a temporal process in which the state at moment t+1 depends only on the state St at moment t, independent of any previous state [31]. And the sequence of states is obtained via sampling operations through the use of the state transfer probability matrix given by the Markov process. The Markov reward process was introduced on the basis of the Markov process, the components of which are represented by tuples as <S,P,R,γ>. S is a finite set of states; P is the state transfer probability matrix; R is the reward function; and γ is the decay factor with a range interval of (0, 1] [32]. The cumulative reward for completing a series of states is expressed in terms of a return. The mathematical expression is
Gt=Rt+1+γRt+2+γ2Rt+3+…=∞∑k=0γkRt+k+1 | (2.1) |
In order to describe the importance of the current state, the Markov reward process introduces value, which represents the expected reward. The mathematical expression is
Vs=E[Gt∣St=s] | (2.2) |
The value Vt of a state St is represented by the harvest expectation of that state. The state is sampled through the use of a Markov probability transfer matrix. Thus, a collection of corresponding state sequences is generated. The harvest of each state sequence in the set of state sequences is calculated by means of a discount function. The average harvest of the state is then obtained by performing a weighted summation of all harvests.
The mapping between values and states can be established by using the value function. Since it is unrealistic to calculate the harvest of all state sequences of a state in a practical situation, is modified to obtain the following formula:
Vs=E[Gt∣St=s]=E[Rt+1+γRt+2+γ2Rt+3+…∣St=s]=E[Rt+1+γ(Rt+2+γRt+3+…)∣St=s] | (2.3) |
According to the above equation, the modification can be continued to obtain the Bellman equation for Markov processes. Its mathematical expression is
Vs=Rs+γ∑Pss′⋅Vs′ | (2.4) |
S′denotes any state at the next moment of state s. In this case, the value V_s is determined by Rs of the current state, the state transfer probability of the current state, V(St+1) at the moment t+1 and γ.
The problem of this paper involves the behavior (action) of the intelligent systems themselves, so a Markov decision process is introduced here. This is a mathematical model applied to provide a description of stochastic decision-making processes, and it is widely used to find the optimal strategy [33]. Its composition can be represented by the tuple <S,A,P,R,γ>. A is the set of actions of the intelligent body and the set is finite. P is the state transfer probability. The Markov decision process introduces the notion of policy and denotes by π the law of probability distribution of an action performed by an intelligent body in a certain state. It can be expressed as
π(a∣s)=P[At=a∣St=s] | (2.5) |
The equation describes the probability of performing action a in state s. A is the set of actions. When an intelligent body introduces an action, the value function will be different from that of the Markov reward process, and the selection of the action will change the current environmental state. And the intelligent bodies in different states will produce different actions in response to them; also, the actions occur according to the probability distribution law π. In this regard, the Markov decision-making process introduces the value function Qπ(s,a) of the action based on the policy π. It considers the action factor on the basis of the original. At this point the state value function Vπ(s) and the action value function Qπ(s,a) are expressed in terms of the Bellman equation as follows:
Vπ(s)=E[Gt∣St=s]=E[Rt+1+γVπ(s′)∣St=s,At=a] | (2.6) |
Qπ(s,a)=E[Gt∣St=s,At=a]=E[Rt+1+Qπ(s′,a′)∣St=s,At=a] | (2.7) |
The state value function and the action value function are interrelated and can be expressed in terms of each other. The value of a state can be expressed by using the value of all actions in that state multiplied by the corresponding probability distribution. Similarly, the value of an action can be expressed by multiplying the values of the successor states to that state by the corresponding probability distribution. Their respective relationships are shown in Figure 2(a) and 2(b).
The mathematical equation for the mutual representation between the state value function and the action value function is as follows:
Vπ(s)=∑a∈Aπ(a∣s)Qπ(s,a) | (2.8) |
Qπ(s,a)=Ras+γ∑s′∈SPass⋅Vπ(s′) | (2.9) |
Combining the above mathematical equations with each other gives the following mathematical equation:
Vπ(s)=∑a∈Aπ(a∣s)(Ras+γ∑s′∈SPass′Vπ(s′)) | (2.10) |
Qπ(s,a)=Rass+γ∑s′∈SPas∑a∈Aπ(a′∣s′)Qπ(s′,a′) | (2.11) |
According to the Markov decision process element composition tuple <S,A,P,R,γ>, the process is constructed as follows.
First, a finite set of states (S) is determined: it contains all of the possible states of the system. In the problem of this paper, the set of states is the current number of all possible vehicles in the two parking lots.
Determine a finite set of intelligent body actions (A): the elements of this set are the actions that the intelligent body can perform in each possible state. In the problem of this paper, the set of actions is the number of cars that can be moved between two parking lots.
Determine the state transfer probability (P): based on a certain law, determine the probability that, in each possible state, the intelligent body will cause the system to transition to the next state after performing a certain action. The law can be obtained statistically from several experiments or as based on theoretical analysis. In the problem of this paper, the Poisson distribution is used to determine the probability of taking each action in each state.
Determine the state-based and action-based reward function (R): establish a function such that the value of the reward obtained by the intelligent body after performing an action in each state is mapped to the action and state. In the problem of this paper, the correct relationship is established by relating the realities of the problem, such as the cost required to move a car.
Determine the appropriate attenuation factor (γ): the role of the attenuation factor γ in the reward function is to measure the importance of the future reward. The more γ tends to 1, where the higher the value, the greater the importance of the future reward. For example, in the game of Go, the ultimate goal is to win, not to train the intelligent body to keep capture the opponent's pieces. At this time, the importance of future rewards is high, and γ will be set more inclined to 1. In the problem of this paper, the benefits of parking lots should be considered in the long run, so γ should be set as relatively high.
Determine the optimal policy: an optimal policy π is finally obtained by iterating until the value converges; it is the maximum desired reward value that can be obtained by taking π(s) actions in any state. The optimal strategy can be solved by performing either value iteration or strategy iteration [34].
The π value of the optimal policy at this point is not the π value of the Markov decision process, and the optimal policy π can be called π∗, π∗≥anyπ. At this point it is not a probability distribution law, but a definite array of 0s and 1s. It can be expressed as follows:
π∗(a∣s)={1 if a=argmaxa=AQ∗(s,a)0 otherwise | (2.12) |
The V∗ parameter denotes an optimal state value function corresponding to π∗, Q∗ denotes the optimal action value function, and the mathematical expression is as follows:
V∗(s)=maxa[Ras+γ∑s′∈sPass′⋅V∗(s′)] | (2.13) |
Q∗(s,a)=Ras+γ∑s′∈sPassmaxa′Q∗(s′,a′) | (2.14) |
The construction process of the Markov decision process model includes determining the basic elements such as the state set, action set, state transfer probability, reward function and decay factor. Then the commonly used strategy iteration or value iteration method is adopted to find the optimal strategy. When building the Markov decision process model, attention needs to be paid to the actual situation of the problem to ensure the reliability and validity of the model.
Dynamic programming is often used to obtain the optimization results, and its main idea is to divide a large problem into small problems to obtain the solution and reuse the obtained results in the small problems [35]. Dynamic programming has a very wide range of applications, covering fields such as computer science, operations research, economics and biology. The idea can be described as decomposing the problem, defining the state and solving the problem [36]. These processes will be described below:
Decomposition of the problem: The original problem is split into subproblems and the links between them are defined. Often, the subproblems will have some remaining features that are consistent with the original problem. For example, in a path planning problem, given a start point and an end point, it is necessary to find the shortest path that connects them. This problem can be decomposed into several subproblems. Each subproblem is the shortest path from the starting point to a node on the current path. And this node can be the end point or an intermediate node. Each sub-problem needs to take into account the information that has been previously obtained. That is, the shortest path is known.
Defining states: For each subproblem, a corresponding state needs to be defined to reflect the characteristics and known information of the subproblem. In the path planning problem described above, it is possible to define each subproblem as the shortest path from the starting point to the current node. The connection between states can be described by using some transfer equations.
Strategy iteration is a dynamic programming strategy that belongs to the class of iterative algorithms. It optimizes the strategy by performing two steps over and over again, i.e., strategy evaluation and strategy improvement [37]. In the strategy evaluation phase, the performance of the currently used strategy is evaluated by calculating the value function of the strategy. The value function represents the expected value of the long-term return that can be obtained from the beginning to the end of the computation for each state while in the current strategy. This process can be done by taking the solution of the Bellman equation. The Bellman equation is a recursive equation. It expresses the value function of a state in terms of the weighted average of the value functions of the states adjacent to that state.
In the strategy optimization phase, strategies are optimized and improved based on the value function of the currently selected strategy. Specifically, an optimal action is chosen for each state. This action maximizes the value function of that state. This process can be achieved by implementing a greedy algorithm. A greedy algorithm is one that selects the action that maximizes the value function in each state [38]. The strategy iteration algorithm implements strategy evaluation and strategy improvement over and over again. It ensures that the strategy does not change anymore. When the strategies converge, the optimal strategy with the optimal value function is obtained. The advantage of the policy iteration algorithm is that it ensures convergence to the optimal policy. However, it is computationally intensive to perform policy evaluation and policy improvement for each iteration.
Value iteration is a dynamic planning strategy. It belongs to a class of iterative algorithms. Unlike strategy iteration, it optimizes the strategy by performing one step over and over again: iterative value update [39]. In the value iteration update phase, the value function of the current state is updated based on the value function of that state. Specifically, for each state an action is chosen that maximizes the value function of that state. This action is still obtained by solving the Bellman optimality equation. The value iteration algorithm performs the value iteration update step repeatedly and stops iterating when the value function converges. At this point the optimal value function is obtained. The optimal policy can be obtained by choosing the action that maximizes the value function in each state.
The advantage of the value iteration algorithm is that it is less computationally expensive. This is because only one value iteration update is required per iteration [40]. However, it is not guaranteed to converge to the optimal policy because in some cases, the optimal policy may not be a greedy policy. In addition, the computational cost of the value iteration algorithm becomes high when the state space is large.
This car transfer planning system contains two modules: They are Poisson distribution parameter calculation and dynamic planning solution module. The Poisson distribution parameter calculation module is mainly purposed to classify and count the data of the uploaded files and find the best-fitting Poisson distribution parameters as the basis for the subsequent probability calculation. The uploaded data should be the number of car transfers and car returns in each of the two places in a longer period of time. The dynamic programming solver module focuses on finding the optimal strategy by passing the set data parameters into the model as states, actions, rewards, penalties, etc. of the system and training it until convergence. And the results of training are presented in the form of a 3D scatter plot and heat map.
The specific technical methodology process is as follows:
Algorithm 1 Reinforcement learning-based car transferring planning method |
1: Initialize parking lot environment:
Define state space S Define available action space A Initialize Q-value function Q(S,A) as 0 2: Define car relocation reward function R(S′,A,S): Define the reward function based on the specific problem 3: Define reinforcement learning algorithm parameters: Learning rate α Discount factor γ Exploration and exploitation trade-off parameter ϵ 4: Training iteration: For each episode: Initialize starting state s Repeat for each step in the episode: Use epsilon-greedy policy to choose action a Execute action a, observe new state s′ and reward r Update Q-value function: Q(s,a)←Q(s,a)+α(r+γmaxa′Q(s′,a′)−Q(s,a)) 5: Model evaluation: Test and evaluate the trained Q-value function Different evaluation metrics can be used 6: Model improvement: Based on evaluation results, improve and optimize the model 7: Output optimal car relocation policy: Based on the trained Q-value function, obtain the optimal car relocation policy |
In the system designed in this paper, the Poisson distribution probability plays a crucial role in the global picture as the basis of the state transfer probability of the Markov decision process. Therefore, it is necessary to find a Poisson distribution parameter that conforms to the data law. In this work, the Poisson distribution parameter calculation module has been designed for the implementation of this function. When setting the Poisson distribution parameters on the interactive page, one can upload an Excel table by uploading a file; one can then take out the data saved in the table and put it into an array. This array can be used as a parameter for classification statistics. As shown in Table 1, the Excel sheet was designed with four columns, in order of the number of cars requested to be transferred and the number of cars requested to be returned for site 1, and the number of cars requested to be transferred and the number of cars requested to be returned for site 2. And the data for the same day is shown in one row.
Days | Transferred cars (vehicles) I | Returned cars (vehicles) I | Transferred cars (vehicles) II | Returned cars (vehicles)II |
1 | 8 | 10 | 7 | 3 |
2 | 8 | 9 | 11 | 0 |
3 | 3 | 4 | 0 | 11 |
... | ... | ... | ... | ... |
The calculation of Poisson distribution is shown in Eq (3.1). Let the number of data points for a certain number x in a certain column obtained after the counting operation be n and the total number of days be N. Then the probability of x under this request at this site can be expressed as P(x)=nN. The parameter λ is obtained by computing x as k and P(x) as P(X=k) of the Poisson distribution.
P(X=k)=λkk!e−λ | (3.1) |
Combining the actual situation and the formula for Poisson distribution, the final output of the model should be an integer corresponding to each column in the table. Since each data point in each column will generate a Poisson distribution parameter, the data point in each column will yield a list consisting of the Poisson distribution parameter λ after calculation. In order to find a unique result for each column, the idea of a greedy algorithm is used in this work to achieve optimization. That is, for each column, the probability of every λ is calculated for all the data within that column. Subsequently, N probability lists corresponding to different λ value are generated for each column, as shown in Figure 3.The difference between the obtained probability and the true probability is calculated as its absolute value. In the list of differences obtained for each λ, the largest absolute value is taken. After that, find the minimum value among all the maximum absolute values. The λ corresponding to the absolute value of the obtained difference is the optimal result.
In the system detailed in this paper, the Poisson distribution parameters as well as various conditional data will be passed as parameters to the dynamic programming solution module. The required parameters are the maximum vehicle capacities of site 1 and site 2, respectively, and the Poisson distribution parameters for site 1 and site 2 for transferring and returning a vehicle. Additionally, the cost of moving a vehicle (cost), the maximum number of request ceilings and the maximum number of vehicles to be moved are also crucial parameters to be considered. The maximum number of vehicles to be moved was designed as the upper limit of the absolute value of the elements in a set of action integers, naming the set A. A positive number is used in set A to represent the number of vehicles moving from site 1 to site 2. Negative numbers represent the number of vehicles moving from site 2 to site 1. In this module, it is necessary to define a value matrix, V(s) for storing the values that have been iterated to convergence, and of size [(maximum number of vehicles at site 1 + 1) (maximum number of vehicles at site 2 + 1)]. It is also necessary to define an action matrix π∗ for storing the optimal policy; it is of size [(maximum number of vehicles at site 1 + 1) (maximum number of vehicles at site 2 + 1)]. Among them, the action is the number of vehicles that the agent chooses to move to meet the requirement of not exceeding the number of vehicles held at the site. The reward is calculated as the difference between the earning from renting out the vehicle and the cost of moving the vehicle after executing this action. The environment is the customer group and other factors, such as the request to rent and return a vehicle.
Based on the importance of future rewards, it is considered that the benefits should be considered in the long run. Therefore the decay factor γ is defined as 0.9. This is used to ensure that the final convergence and Vπ are unique. In this problem, actions and states should be considered with constraints. That is, the number of vehicles moved must not exceed the number of vehicles held at the site. And the sum of the number of vehicles moved to the target site must not exceed the maximum capacity value. And, the total number of vehicles at both sites before and after the move is kept constant. If the number of transferred requests is greater than the number of vehicles held, all vehicles at the location are transferred at most. If the sum of the number of change requests and the number of vehicles held is greater than the upper capacity limit, the vehicle is moved to another location. Based on such constraints, a summation function is designed for the purpose of finding the expected value of V(s) obtained by using a certain action in a certain state. The flow is shown in Figure 4.
The core strategy evaluation formula for this value function is as follows:
V(s)=p(s′,r∣s,π(s))[r+γV(s′)] | (3.2) |
Based on the above value function, the module provides two methods i.e., strategy iteration and value iteration. The first step is to randomly assign the initial value, named policy 0. In this work, the initial action matrix is set as the policy, and the initial value matrix is set as the state value. The second step is to call the value function to calculate the state value according to the current policy. The third step is to compare the size of the new state value with the old state value according to the principle of expectation maximization, take the larger value and update the corresponding policy. The second and third steps are repeated until the state values and strategies converge to the optimal state values and optimal strategies.
The value iteration is one step less than the strategy iteration. In this paper, we will find the optimal state value first, as well as determine the optimal strategy based on the state value. Therefore, in the first step, we set the initial action matrix as the strategy and the initial value matrix as the state value. In the second step, the value function is invoked sequentially on the actions to obtain the corresponding state values, and the maximum value of the state values obtained from all actions is taken. This value is replaced with the number in the corresponding position of the initial value matrix. Repeat the second step until the state values converge to the optimal state values.
The background of this experiment was set as follows: the maximum number of vehicles in two places is 20, the cost of moving a car is 2 yuan and the revenue from transferring a car is 10 yuan. The maximum number of requests for car transfer and return is 11, and the maximum number of cars that can be moved is 5. The car transfer and car return Poisson parameters for the first site are 3, the second site's transfer Poisson parameters are 4 and the car return Poisson parameters are 2. And the number of cars returned to the two places obey the Poisson mean.
As shown in Figure 5, policy 0 represents the initial policy (without any improvement). At this time, all strategies are 0, that is, no movement is made. Positive numbers represent the number of vehicles moving from site 1 to site 2, and negative numbers represent the number of vehicles moving from site 2 to site 1.
As shown in the heat map of Figure 6, policy 1 represents the results after the first strategy improvement. From the figure, it can be seen that, when the number of vehicles held at site one is greater than 7, the policy is optimized to move five vehicles to site 2. And, as the number of vehicles held at site 2 increases, the percentage of moving five vehicles decreases. The policy 1, in general, greatly favors moving vehicles from site 1 to site 2.
As shown in Figure 7, policy 2 represents the results after the second strategy improvement. Compared to polict 1, the percentage of site 1 moving 5 vehicles to site 2 decreases significantly. And when the number of vehicles held by the second site is greater than 7, it tends to move vehicles from the second site to the first site.
As shown in Figure 8, policy 3 represents the results of the third strategy improvement. Compared to policy 2, the percentage of moving vehicles has increased at both sites and the strategies are converging.
As shown in Figure 9, policy 4 represents the results after the fourth policy improvement. It has now converged to the optimal policy.
The state values of the strategy iterations are shown in the heat map of Figure 10.
As shown in the 3D scatter plot of Figure 11, the scatter of the number of vehicles held in the two locations versus the optimal state value forms an approximately curved shape. As the number of vehicles held in the two locations increases, the expected return is higher.
As shown in Figure 12, policy 0 represents the initial policy (without any improvement) and policy 1 demonstrates that the optimal policy has been found.
As shown in the 3D scatter plot of Figure 13, this result is consistent with the strategy iteration method.
Both policy iteration and value iteration can converge to the optimal policy and optimal value state. The running times of the strategy iteration and value iteration are shown in Table 2 respectively. The strategy iteration and value iteration algorithms go through 10 strategy optimizations respectively. From Table 2, it can be seen that the running time gradually converges to a stable value when going through the eighth iteration. Therefore, considering the high efficiency of the system's operation, the strategy iteration method is the best choice. Compared to traditional planning strategies, this method has more time consumption but better management efficiency. Thus, it can bring more economic benefits. Accordingly, the planning method based on reinforcement learning, as proposed in this paper has higher accuracy.
Number of iterations | Strategy iteration runtime (s) | Value iteration run length (s) |
1 | 78.69 | 184.31 |
2 | 77.22 | 182.11 |
3 | 75.92 | 181.79 |
4 | 74.39 | 181.63 |
5 | 69.82 | 180.55 |
6 | 65.43 | 180.43 |
7 | 63.21 | 179.37 |
8 | 60.63 | 177.91 |
9 | 60.55 | 177.83 |
10 | 60.54 | 177.82 |
In this study, reinforcement learning was applied to a car transfer planning system for parking lots. And it is a suitable choice to use reinforcement learning to solve the transferring problem. The vehicle movement problem is an operations research problem. According to the properties of reinforcement learning, it will play a great role in saving cost, improving efficiency and providing a vehicle moving strategy that maximizes benefits. And it is suitable for many application scenarios, especially in the effort to determine out the optimal strategy and maximize the expected value of the return while achieving excellent performance. The system focuses on as many future rewards as possible and has more potential for development than the traditional method of manual transferring planning.
In the experiments discussed in this paper, both the strategy iteration methods and the value iteration converge to the optimal strategy and optimal value state. The former uses the value function of the previous strategy to start the calculation when performing strategy evaluation. This will effectively increase the speed of convergence of strategy evaluation. It consumes a significantly large amount of time on strategy evaluation. In contrast, although value iteration reduces the time spent on strategy evaluation, it converges slowly. Since strategy iteration ensures the optimization of the strategies, it is better to use strategy iteration for this system.
In this paper, a reinforcement learning method based on dynamic planning is detailed. It serves to find the optimal moving strategy and the optimal value return for parking lots. The system now completed in this study has been able to effectively solve this vehicle moving problem. However, due to the existence of objective factors such as the algorithm time complexity being high, the system has a long waiting time for the calculation of large-capacity matrices. Therefore the model needs to be further improved. Moreover, this system solves the operations research problem of how to move vehicles between two locations at the end of a day's business in a parking lot. The scope of the current application is relatively narrow. In the future work, more kinds of situations will be considered, such as how to find the optimal strategy and the optimal state value of the current state at any time during a day's business.
The authors declare that they have not used artificial intelligence tools in the creation of this article.
The authors declare that there is no conflict of interest.
[1] |
Y. Q. Wu, F. L. Huang, A thermal-mechanical constitutive model for b-HMX single crystal and cohesive interface under dynamic high pressure loading, Sci. China Phys. Mech. Astron., 53 (2010), 218–226. https://doi.org/10.1007/s11433-009-0264-1 doi: 10.1007/s11433-009-0264-1
![]() |
[2] |
J. J. Xiao, W. R. Wang, J. Chen, G. F. Ji, W. Zhu, H. M. Xiao, Study on structure, sensitivity and mechanical properties of HMX and HMX-based PBXs with molecular dynamics simulation, Comput. Theor. Chem., 999 (2012), 21–27. https://doi.org/10.1016/j.comptc.2012.08.006 doi: 10.1016/j.comptc.2012.08.006
![]() |
[3] |
A. Elbeih, S. Zeman, M. Jungova, P. Vávra, Z. Akstein, Effect of different polymeric matrices on some properties of plastic bonded explosives, Propellants Explos. Pyrotech., 37 (2012), 676–684. https://doi.org/10.1002/prep.201200018 doi: 10.1002/prep.201200018
![]() |
[4] | K. D. Dai, Y. L. Liu, P. W. Chen, Y. Tian, Finite element simulation on effective elastic modulus of PBX explosives, Trans. Beijing Inst. Technol., 2012 (2012), 1154–1158. |
[5] |
H. Tan, C. Liu, Y. Huang, P. H. Geubelle, The cohesive law for the particle/matrix interfaces in high explosives, J. Mech. Phys. Solids, 53 (2005), 1892–1917. https://doi.org/10.1016/j.jmps.2005.01.009 doi: 10.1016/j.jmps.2005.01.009
![]() |
[6] |
B. Banerjee, D. O. Adams, On predicting the effective elastic properties of polymer bonded explosives using the recursive cell method, Int. J. Solids Struct., 41 (2004), 481–509. https://doi.org/10.1016/j.ijsolstr.2003.09.016 doi: 10.1016/j.ijsolstr.2003.09.016
![]() |
[7] |
H. Tan, Y. Huang, C. Liu, P. H. Geubelle, The Mori-Tanaka method for composite materials with nonlinear interface debonding, Int. J. Plast., 21 (2005), 1890–1918. https://doi.org/10.1016/j.ijplas.2004.10.001 doi: 10.1016/j.ijplas.2004.10.001
![]() |
[8] |
P. J. Rae, H. T. Goldrein, S. J. P. Palmer, J. E. Field, A. L. Lewis, Quasi-static studies of the deformation and failure of β-HMX based polymer bonded explosives, Proc. R. Soc. London, Ser. A, 458 (2002), 743–762. https://doi.org/10.1098/rspa.2001.0894 doi: 10.1098/rspa.2001.0894
![]() |
[9] |
C. Hübner, E. Geibler, P. Elsner, P. Eyerer, The importance of micromechanical phenomena in energetic materials, Propellants Explos. Pyrotech., 24 (1999), 119–125. https://doi.org/10.1002/(SICI)1521-4087(199906)24:03<119::AID-PREP119>3.0.CO;2-G doi: 10.1002/(SICI)1521-4087(199906)24:03<119::AID-PREP119>3.0.CO;2-G
![]() |
[10] |
P. W. Chen, H. M. Xie, F. L. Huang, T. Huang, Y. S. Ding, Deformation and failure of polymer bonded explosives under diametric compression test, Polym. Test., 25 (2006), 333–341. https://doi.org/10.1016/j.polymertesting.2005.12.006 doi: 10.1016/j.polymertesting.2005.12.006
![]() |
[11] |
X. Q. Liu, P. J. Wei, Influences of interfacial damage on the effective wave velocity in composites with reinforced particles, Sci. China Ser. G: Phys. Mech. Astron., 51 (2008), 1126–1133. https://doi.org/10.1007/s11433-008-0094-6 doi: 10.1007/s11433-008-0094-6
![]() |
[12] |
R. L. Peeters, R. M. Hackett, Constitutive modeling of plastic-bonded explosives, Exp. Mech., 21 (1981), 111–116. https://doi.org/10.1007/BF02326367 doi: 10.1007/BF02326367
![]() |
[13] |
K. S. Yeom, S. Jeong, H. Hun, J. Park, New pseudo-elastic model for polymer-bonded explosive simulants considering the Mullins effect, J. Compos. Mater., 47 (2013), 3401–3411. https://doi.org/10.1177/0021998312466118 doi: 10.1177/0021998312466118
![]() |
[14] | J. Gao, Z. X. Huang, Parameter indentification for viscoelastic damage constitutive model of PBX (in Chinese), Eng. Mech., 30 (2013), 299–304. |
[15] |
F. Saghir, S. Gohery, F. Mozafari, N. Moslemi, C. Burvill, A. Smith, et al., Mechanical characterization of particulated FRP composite pipes: A comprehensive experimental study, Polym. Test., 93 (2021), 1007001. https://doi.org/10.1016/j.polymertesting.2020.107001 doi: 10.1016/j.polymertesting.2020.107001
![]() |
[16] |
S. Gohery, S. Sharifi, C. Burvill, S. Mouloodi, M. Izadifar, P. Thissen, Localized failure analysis of internally pressurized laminated ellipsoidal woven GFRP composite domes: Analytical, numerical, and experimental studies, Arch. Civ. Mech. Eng., 19 (2019), 1235–1250. https://doi.org/10.1016/j.acme.2019.06.009 doi: 10.1016/j.acme.2019.06.009
![]() |
[17] |
Q. Z. Xia, Y. Q. Wu, F. L. Huang, Effect of interface behaviour on damage and instability of PBX under combined tension-shear loading, Def. Technol., 23 (2023), 137–151. https://doi.org/10.1016/j.dt.2022.01.010 doi: 10.1016/j.dt.2022.01.010
![]() |
[18] |
P. W. Chen, F. L. Huang, K. D. Dai, Y. S. Ding, Detection and characterization of long-pulse low-velocity impact damage in plastic bonded explosives, Int. J. Impact. Eng., 31 (2005), 497–508. https://doi.org/10.1016/j.ijimpeng.2004.01.008 doi: 10.1016/j.ijimpeng.2004.01.008
![]() |
[19] |
Z. W. Liu, H. M. Xie, K. X. Li, P. W. Chen, F. L. Huang, Fracture behavior of PBX simulation subject to combined thermal and mechanical loads, Polym. Test., 28 (2009), 627–635. https://doi.org/10.1016/j.polymertesting.2009.05.011 doi: 10.1016/j.polymertesting.2009.05.011
![]() |
[20] |
A. E. D. M. Heijden, Y. L. M. Creyghton, E. Marino, R. H. B. Bouma, G. J. H. G. Scholtes, W. Duvalois, et al., Energetic materials: Crystallization, characterization and insensitive plastic bonded explosives, Propellants Explos. Pyrotech., 33 (2008), 25–32. https://doi.org/10.1002/prep.200800204 doi: 10.1002/prep.200800204
![]() |
[21] |
Z. B. Zhou, P. W. Chen, F. L. Huang, S. Q. Liu, Experimental study on the micromechanical behavior of a PBX simulant using SEM and digital image correlation method, Opt. Lasers Eng., 49 (2011), 366–370. https://doi.org/10.1016/j.optlaseng.2010.11.001 doi: 10.1016/j.optlaseng.2010.11.001
![]() |
[22] |
W. Zhu, J. J. Xiao, W. H. Zhu, H. M. Xiao, Molecular dynamics simulations of RDX and RDX-based plastic-bonded explosives, J. Hazard. Mater., 164 (2009), 1082–1088. https://doi.org/10.1016/j.jhazmat.2008.09.021 doi: 10.1016/j.jhazmat.2008.09.021
![]() |
[23] |
L. Qiu, H. M. Xiao, Molecular dynamics study of binding energies, mechanical properties and detonation performance of bicyclo-HMX-based PBXs, J. Hazard. Mater., 164 (2009), 329–336. https://doi.org/10.1016/j.jhazmat.2008.08.030 doi: 10.1016/j.jhazmat.2008.08.030
![]() |
[24] |
M. M. Li, F. S. Li, R. Q. Shen, X. D. Guo, Molecular dynamics study of the structures and properties of RDX/GAP propellant, J. Hazard. Mater., 186 (2011), 2031–2036. https://doi.org/10.1016/j.jhazmat.2010.12.101 doi: 10.1016/j.jhazmat.2010.12.101
![]() |
[25] | B. Banerjee, D. O. Adams, Micormechanics-based prediction of thermoelastic properties of high energy materials, preprint, arXiv: 1201.2437. |
[26] |
B. Banerjee, C. M. Cady, D. O. Adams, Micromechanics simulations of glass-estane mock polymer bonded explosives, Modell. Simul. Mater. Sci. Eng., 11 (2003), 457–475. https://doi.org/10.1088/0965-0393/11/4/304 doi: 10.1088/0965-0393/11/4/304
![]() |
[27] | B. Banerjee, Effective elastic moduli of polymer bonded explosives from finite element simulations, preprint, arXiv: cond-mat/0510367. |
[28] |
H. Tan, Y. Huang, C. Liu, G. Ravichandran, G. H. Paulino, Constitutive behaviors of composites with interface debonding: the extended Mori-Tanaka method for uniaxial tension, Int. J. Fract., 146 (2007), 139–148. https://doi.org/10.1007/s10704-007-9155-5 doi: 10.1007/s10704-007-9155-5
![]() |
[29] |
A. Barua, M. Zhou, A Lagrangian framework for analyzing microstructural level response of polymer-bonded explosives, Modell. Simul. Mater. Sci. Eng., 19 (2011), 055001. https://doi.org/10.1088/0965-0393/19/5/055001 doi: 10.1088/0965-0393/19/5/055001
![]() |
[30] |
A. Barua, M. Zhou, Computational analysis of temperature rises in microstructures of HMX-Estane PBXs, Comput. Mech., 52 (2013), 151–159. https://doi.org/10.1007/s00466-012-0803-x doi: 10.1007/s00466-012-0803-x
![]() |
[31] |
Z. Hashin, Extremum principles for elastic heterogenous media with imperfect interfaces and their application to bounding of effective moduli, J. Mech. Phys. Solids, 40 (1992), 767–781. https://doi.org/10.1016/0022-5096(92)90003-K doi: 10.1016/0022-5096(92)90003-K
![]() |
[32] |
S. T. Gu, Q. C. He, Interfacial discontinuity relations for coupled multified phenomena and their application to the modeling of thin interphases as imperfect interface, J. Mech. Phys. Solids, 59 (2011), 1413–1426. https://doi.org/10.1016/j.jmps.2011.04.004 doi: 10.1016/j.jmps.2011.04.004
![]() |
[33] |
J. T. Liu, S. T. Gu, Q. C. He, A computational approach for evaluating the effective elastic moduli of non-spherical particle reinforced composites with interfacial displacement and traction jumps, Int. J. Multiscale Comput. Eng., 13 (2015), 123–143. https://doi.org/10.1615/INTJMULTCOMPENG.2014011640 doi: 10.1615/INTJMULTCOMPENG.2014011640
![]() |
[34] |
S. Gohery, M. Ahmed, Q. Q. Liang, T. Molla, M. Kajtaz, K. M. Tse, et al., Higher-order trigonometric series-based analytical solution to free transverse vibration of suspended laminated composite slabs, Eng. Struct., 296 (2023), 116902. https://doi.org/10.1016/j.engstruct.2023.116902 doi: 10.1016/j.engstruct.2023.116902
![]() |
[35] |
S. T. Gu, J. T. Liu, Q. C. He, Size-dependent effective elastic moduli of particulate composites with interfacial displacement and traction discontinuities, Int. J. Solids Struct., 51 (2014), 2283–2296. https://doi.org/10.1016/j.ijsolstr.2014.02.033 doi: 10.1016/j.ijsolstr.2014.02.033
![]() |
[36] |
Q. Z. Zhu, S. T. Gu, J. Yvonnet, J. F. Shao, Q. C. He, Three-dimensional numerical modelling by XFEM of spring-layer imperfect curved interfaces with applications to linearly elastic composite materials, Int. J. Numer. Mech. Eng., 88 (2011), 307–328. https://doi.org/10.1002/nme.3175 doi: 10.1002/nme.3175
![]() |
[37] |
H. L. Duan, X. Yi, Z. P. Huang, J. Wang, A unified scheme for prediction of effective moduli of multiphase composites with interface effects. Part Ⅰ: Theoretical framework, Mech. Mater., 39 (2007), 81–93. https://doi.org/10.1016/j.mechmat.2006.02.009 doi: 10.1016/j.mechmat.2006.02.009
![]() |
[38] |
H. L. Duan, J. Wang, Z. P. Huang, B. L. Karihaloo, Size-dependent effective elastic constants of solids containing nano-inhomogeneities with interface stress, J. Mech. Phys. Solids, 53 (2005), 1574–1596. https://doi.org/10.1016/j.jmps.2005.02.009 doi: 10.1016/j.jmps.2005.02.009
![]() |
[39] |
H. L. Duan, X. Yi, Z. P. Huang, J. Wang, A unified scheme for prediction of effective moduli of multiphase composites with interface effects. Part Ⅱ: Applicationa and scaling laws, Mech. Mater., 39 (2007), 94–103. https://doi.org/10.1016/j.mechmat.2006.02.010 doi: 10.1016/j.mechmat.2006.02.010
![]() |
[40] |
S. Nemat-Nasser, M. Lori, Micromechanics: Overall properties of heterogeneous materials, J. Appl. Mech., 63 (1996), 561. https://doi.org/10.1115/1.2788912 doi: 10.1115/1.2788912
![]() |
[41] |
R. M. Christensen, K. H. Lo, Solutions for effective shear properties in three phase sphere and cyliner models, J. Mech. Phys. Solids, 27 (1979), 315–330. https://doi.org/10.1016/0022-5096(79)90032-2 doi: 10.1016/0022-5096(79)90032-2
![]() |
[42] | J. H. Liu, S. J. Liu, M. Huang, H. Z. Li, F. D. Nie, Progress on crystal damage in pressed polymer bonded explosives (in Chinese), Energetic Mater., 21 (2013), 372–378. |
1. | Abdelrahman Osman Elfaki, Wassim Messoudi, Anas Bushnag, Shakour Abuzneid, Tareq Alhmiedat, Constraint Optimization Model for Dynamic Parking Space Allocation, 2024, 24, 1424-8220, 3988, 10.3390/s24123988 | |
2. | Marzieh Sadat Arabi, Anjali Awasthi, A PSO-Based Approach for the Optimal Allocation of Electric Vehicle Parking Lots to the Electricity Distribution Network, 2025, 18, 1999-4893, 175, 10.3390/a18030175 |
Days | Transferred cars (vehicles) I | Returned cars (vehicles) I | Transferred cars (vehicles) II | Returned cars (vehicles)II |
1 | 8 | 10 | 7 | 3 |
2 | 8 | 9 | 11 | 0 |
3 | 3 | 4 | 0 | 11 |
... | ... | ... | ... | ... |
Number of iterations | Strategy iteration runtime (s) | Value iteration run length (s) |
1 | 78.69 | 184.31 |
2 | 77.22 | 182.11 |
3 | 75.92 | 181.79 |
4 | 74.39 | 181.63 |
5 | 69.82 | 180.55 |
6 | 65.43 | 180.43 |
7 | 63.21 | 179.37 |
8 | 60.63 | 177.91 |
9 | 60.55 | 177.83 |
10 | 60.54 | 177.82 |
Days | Transferred cars (vehicles) I | Returned cars (vehicles) I | Transferred cars (vehicles) II | Returned cars (vehicles)II |
1 | 8 | 10 | 7 | 3 |
2 | 8 | 9 | 11 | 0 |
3 | 3 | 4 | 0 | 11 |
... | ... | ... | ... | ... |
Number of iterations | Strategy iteration runtime (s) | Value iteration run length (s) |
1 | 78.69 | 184.31 |
2 | 77.22 | 182.11 |
3 | 75.92 | 181.79 |
4 | 74.39 | 181.63 |
5 | 69.82 | 180.55 |
6 | 65.43 | 180.43 |
7 | 63.21 | 179.37 |
8 | 60.63 | 177.91 |
9 | 60.55 | 177.83 |
10 | 60.54 | 177.82 |