Loading [MathJax]/jax/output/SVG/jax.js
Research article

Comparing theory based and higher-order reduced models for fusion simulation data

  • Received: 09 November 2018 Accepted: 23 November 2018 Published: 06 December 2018
  • We consider using regression to fit a theory-based log-linear ansatz, as well as higher order approximations, for the thermal energy confinement of a Tokamak as a function of device features. We use general linear models based on total order polynomials, as well as deep neural networks. The results indicate that the theory-based model fits the data almost as well as the more sophisticated machines, within the support of the data set. The conclusion we arrive at is that only negligible improvements can be made to the theoretical model, for input data of this type.

    Citation: David E. Bernholdt, Mark R. Cianciosa, David L. Green, Kody J.H. Law, Alexander Litvinenko, Jin M. Park. Comparing theory based and higher-order reduced models for fusion simulation data[J]. Big Data and Information Analytics, 2018, 3(2): 41-53. doi: 10.3934/bdia.2018006

    Related Papers:

    [1] Nickson Golooba, Woldegebriel Assefa Woldegerima, Huaiping Zhu . Deep neural networks with application in predicting the spread of avian influenza through disease-informed neural networks. Big Data and Information Analytics, 2025, 9(0): 1-28. doi: 10.3934/bdia.2025001
    [2] Xiaoxiang Guo, Zuolin Shi, Bin Li . Multivariate polynomial regression by an explainable sigma-pi neural network. Big Data and Information Analytics, 2024, 8(0): 65-79. doi: 10.3934/bdia.2024004
    [3] Yaru Cheng, Yuanjie Zheng . Frequency filtering prompt tuning for medical image semantic segmentation with missing modalities. Big Data and Information Analytics, 2024, 8(0): 109-128. doi: 10.3934/bdia.2024006
    [4] Jason Adams, Yumou Qiu, Luis Posadas, Kent Eskridge, George Graef . Phenotypic trait extraction of soybean plants using deep convolutional neural networks with transfer learning. Big Data and Information Analytics, 2021, 6(0): 26-40. doi: 10.3934/bdia.2021003
    [5] Bill Huajian Yang . Modeling path-dependent state transitions by a recurrent neural network. Big Data and Information Analytics, 2022, 7(0): 1-12. doi: 10.3934/bdia.2022001
    [6] Sayed Mohsin Reza, Md Al Masum Bhuiyan, Nishat Tasnim . A convolution neural network with encoder-decoder applied to the study of Bengali letters classification. Big Data and Information Analytics, 2021, 6(0): 41-55. doi: 10.3934/bdia.2021004
    [7] Zhouchen Lin . A Review on Low-Rank Models in Data Analysis. Big Data and Information Analytics, 2016, 1(2): 139-161. doi: 10.3934/bdia.2016001
    [8] S. Chen, Z. Wang, M. Kelly . Aggregate loss model with Poisson-Tweedie frequency. Big Data and Information Analytics, 2021, 6(0): 56-73. doi: 10.3934/bdia.2021005
    [9] Ricky Fok, Agnieszka Lasek, Jiye Li, Aijun An . Modeling daily guest count prediction. Big Data and Information Analytics, 2016, 1(4): 299-308. doi: 10.3934/bdia.2016012
    [10] Marco Tosato, Jianhong Wu . An application of PART to the Football Manager data for players clusters analyses to inform club team formation. Big Data and Information Analytics, 2018, 3(1): 43-54. doi: 10.3934/bdia.2018002
  • We consider using regression to fit a theory-based log-linear ansatz, as well as higher order approximations, for the thermal energy confinement of a Tokamak as a function of device features. We use general linear models based on total order polynomials, as well as deep neural networks. The results indicate that the theory-based model fits the data almost as well as the more sophisticated machines, within the support of the data set. The conclusion we arrive at is that only negligible improvements can be made to the theoretical model, for input data of this type.


    This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).


    1. Setup

    Assume we have a set of labelled data points, or training data pairs D={x(i),y(i)}Ni=1, where x(i)RK and y(i)R+.

    We postulate amongst a family of ansatz f(;θ), parameterized by θRP, that if we can find a θ such that for all i=1,,N,

    y(i)f(x(i);θ),

    then for all x within the convex hull of the set {x(i)}Ni=1, and ideally in a high-probability region over training data distribution, we will have

    yf(x;θ),

    where y is the true output corresponding to input x. In particular, the hope is that if we derive f based on physical theory, then in fact it will hold that even for x different from the set {x(i)}Ni=1.

    Here we consider parametric approximations, given by generalized linear models (GLM), in particular polynomial models, as well as nonlinear models (NM) given by neural networks (NN).

    We will consider the problem of finding the θ which minimizes data misfit

    Φ(θ)=Ni=1(y(i),f(x(i);θ)),

    for some distance function . Here we use the standard choice ():=||2:=22. It makes sense to interpret this as the log-likelihood of the data given the parameter, as a function of the parameter.


    2. Physical setup and theory-based model

    In recent work [9], a theory-based scaling of thermal energy confinement time has been derived based on a comprehensive turbulent transport model trapped gyrokinetic Landau fluid (TGLF) [13] in core coupled to the edge pedestal (EPED) [12] model, especially in burning plasma conditions with dominant fusion alpha particle heating for future reactor design. The simulation dataset consists of a massive number of predictive Integrated Plasma Simulator (IPS) FASTRAN [8] simulations, selfconsistent with core transport, edge pedestal, fusion alpha particle heating, and MHD equilibrium, built upon a modern integrated modeling framework, IPS.

    The K=9 input features are x1=R= major radius, x2=a= minor radius, x3=κ=V/2π2a2R, where V= plasma volume, x4=δ= triangularity, x5=Ip= plasma current, x6=b0= toroidal magnetic field, x7=ˉne= line average density, x8=Zeff=, x9=Ploss= loss power. The output y=τ is the thermal energy confinement time.

    For input features x(i)RK, i=1,,N, the theoretical relationship with the output thermal energy confinement time y(i)R+, is given by the following ansatz

    fth(x;θ)=θ0Kk=1xθkk (2.1)

    or on a logarithmic scale (redefining logθ0θ0)

    logfth(x;θ)=θ0+Kk=1θklog(xk) (2.2)

    3. Parametric regression

    Parametric models afford the luxury of discarding the data set after training, and reduce subsequent evaluations of the model to a cost relating to the number of parameters only [1,7,10]. However, these models are much more rigid than the non-parametric ones, by virtue of restricting to a particular parametric family.


    3.1. General linear models

    The simplest type of parametrization is in terms of coefficients of expansion in some basis {ψi}P1i=0. Here we choose appropriately ordered monomials with total degree p, then the number of parameters is P=(K+p)!/K!p!. The polynomials {ψi} are products of monomials in the individual variables, such as ψi(x)=Kk=1ψαk(i)(xk), where ψ0(x)=1, each index iZ+ is associated to a unique multi-index α(i)=(α1(i),,αK(i))ZK+, and the single variable monomials are defined as ψαk(i)(z)=zαk(i) for zR. In particular, for i=1,,K, α(i)i=i and α(i)j=0 for ji, i.e. ψi(x)=xi. The polynomials are constructed with sglib (https://github.com/ezander/sglib) and the computations are done with Matlab.

    Assume that f(;θ)=P1i=0θiψi(). Under the assumption that the data is corrupted by additive Gaussian white noise, the negative log-likelihood (i.e. data misfit) takes the form

    Φ(θ)=Ni=1|f(x(i);θ)y(i)|2 (3.1)

    Let X=[x(1),,x(N)]T and Y=[y(1),,y(N)]T, and let Ψ(X)ij=ψj(x(i)). Then

    Φ(θ)=|Ψ(X)θY|2

    and the maximum likelihood solution θRP is given by

    θ=(Ψ(X)TΨ(X))1Ψ(X)TY.

    Predictions of outputs Y for some X are later given by

    Y=Ψ(X)θ.

    We will also consider log-polynomial regression, where the monomials are composed with log, i.e. the basis functions become ψilog, where log acts component-wise. We then define logf(;θ)=P1i=0θiψi(log()) and the misfit is

    Φ(θ)=Ni=1|logf(x(i);θ)logy(i)|2. (3.2)

    This is the negative log-likelihood under a multiplicative lognormal noise assumption, as opposed to the additive Gaussian noise assumption of (3.1). To understand the motivation for this, observe that (2.2) is a log-linear model corresponding to the logarithm of (2.1). Therefore, this setup includes the theoretical model prediction, when p=1, and provides a way to systematically improve upon the theoretical model, by increasing p.

    As a final note, as p increases, there is a chance of including more irrelevant parameters, in addition to ones which may be relevant. Therefore, we must regularize somehow. For the GLM, it is natural to adopt a standard L2 Tikhonov regularization term λ|θ|2, resulting in the following modified objective function, for λ>0,

    ˉΦ(θ)=Φ(θ)+λ|θ|2 (3.3)

    which again can be exactly solved

    θ=(Ψ(X)TΨ(X)+λI)1Ψ(X)TY

    where I is the P×P identity matrix. One can readily observe that this provides an upper bound to the pseudo inverse operator mapping the observation set to the optimal parameter (in other words Ψ(X)TΨ(X)+λI has a spectrum lower bounded by λ).


    3.2. Nonlinear models

    In some very particular cases, it may make sense to invoke nonlinear models, at the significant additional expense of solving a nonlinear least squares problem (in the easiest instance) as opposed to the linear least squares solution presented above. The most notable example is deep neural networks. It has been shown that these models perform very well for tasks such as handwritten digit classification, voice and object recognition [3], and there has recently even been mathematical justification that substantiates this [5]. However, [6] show that one can sometimes find very small perturbations on input values which lead to misclassification. The methods can also be used for regression problems, and will be considered below.


    3.2.1. Deep neural networks

    Let θiRPi (P=Li=1Pi, where L is the number of layers) be parameters and gi be some possibly nonlinear function, corresponding to layer i, and let

    f(;θ)=gL(g2(g1(;θ1);θ2);;θL).

    As an example, θi=(Ai,bi) is typically split into parameters AiRni×ni1,biRni, defining an affine transformation Aix+bi, and gi(x;θi)=σi(Aix+bi), where σi is a simple nonlinear "activation function" which is applied element-wise. In this case ni are the number of auxiliary variables, or "neurons", on level i, n0=K is the input dimension, nL=1 is the output dimension, and there are L levels. For regression one typically takes σL=Id the identity map, as will be done here.

    For the neural networks used here, σi=σ:RR+, for i<L, is the the rectified linear unit (ReLU) defined by

    σ(zi)=max{0,zi} (3.4)

    This activation function contains no trainable parameters, and we iterate that it is applied element-wise to yield a function RniRni+.

    The misfit again takes the form

    Φ(θ)=Ni=1|f(x(i);θ)y(i)|2 (3.5)

    except now we have a nonlinear and typically non-convex optimization problem.

    There are many methods that can be employed to try to find the minimum of this objective function, and they will generally be successful in finding some minimum (though perhaps not global). Due to the mathematically simple functions that make up the neural network, derivatives with respect to the unknown parameters can be determined analytically. A popular method to solve non-convex optimization problems is stochastic gradient descent, or Robbins-Monro stochastic approximation algorithm [4,11], which is defined as follows. Let ^Φ be an unbiased estimate of Φ, i.e. a random variable such that E^Φ(θ)=Φ(θ). Then let

    θi+1=θiγi^Φ(θi) (3.6)

    If {γi} are chosen such that iγi= and iγ2i<, then the algorithm is guaranteed to converge to a minimizer [4,11], which is global if the function is convex [2]. Noting that we can compute

    Φ(θ)=Ni=1θ(f(x(i);θ),y(i))

    easily in this case, one could use for example ^Φ(θ)=Φ(θ)+ξ, where ξRP is some random variable, for example a Gaussian, with Eξ=0. Something more clever and natural can be done in the particular case of (3.5). Recall that N can be very large. Observe that (3.5) is, up to a multiple of N, an equally weighted average of the summands (f(x(i);θ),y(i)). Let {ni}Nsubi=1 be a random (or deterministic) subset of Nsub<N indices, where Nsub could be 1. Then the following provides an unbiased estimator which is also much cheaper than the original gradient (since it uses a subset of the data)

    ^Φ(θ):=NNsubNsubi=1θ(f(x(ni);θ),y(ni)).

    Naturally the variance of this estimator will affect the convergence time of the algorithm and we refer the interested reader to the monograph [4] for discussion of the finer points of this family of algorithms. Neural Network models were implemented and trained using the Mathematica Neural Network functions built on top of the MXNet framework.

    Increasing the number of free parameters, by increasing the depth or width of the network, allows more flexibility and can provide a better approximation to complex input output relationships. However, as with GLM, increasing the number of degrees of freedom in the optimization problem also increases the susceptibility of the algorithm to instability from fitting noise in the data too accurately. A Tikhonov regularization could be employed again, but here we appeal to another strategy which is applicable only to DNN. In particular, the training data is randomly divided into N training samples and Nval validation samples. The algorithm 3.6 is run with the training data, however the loss of the validation data is computed along the way

    L(θ)=N+Nvali=N+1|f(x(i);θ)y(i)|2,

    and the network with the smallest loss on the validation data is retained at the end. This serves as a regularization, i.e. avoids fitting noise, since the training and validation sets are corrupted by different realizations of noise.


    4. Numerical results

    We consider a data set of 2822 data points with K=9 input features and a single output. A subset of N=2100 data points will be used for training, with the rest of the Nout data points aside for testing. General linear and log linear models with order up to 6 are considered, as well as deep neural networks.


    4.1. GLM

    An empirically chosen regularization of λ=1 for p=2, λ=50 for p=3, and λ=100 for p>3 is used in (3.3). Figure 1 presents the results for the theoretical log linear model, as well as the best fit based on a expansion in log polynomials of total order 6, for the out-of-sample set of testing data. It is clear already that the enriched model set brings negligible improvement in the reconstruction fidelity. The left subpanels of Figure 4 show some slices over the various covariates, where the other covariates are fixed at a value from the center of the data set. This gives some idea of how the reconstruction behaves, although one should bear in mind that the data set forms a manifold and not a hypercube, in R9, and so these slices are bound to include unobserved subsets of the parameter space. Furthermore, there may be sparse coverage even where there is data. This can be observed also in Figure 4 which illustrates the single component marginal histograms over the data.

    Figure 1. Comparison of true values with GLM prediction, for log linear (left), and log sixth order (right). The right subplots show the error distribution histogram (difference between values of the truth and the prediction). Out-of-sample, test data.
    Figure 2. Relative L2 error and coefficient of determination R2 for out-of-sample test data as a function of model complexity, for log polynomial models (left) and direct polynomial models (right).
    Figure 3. Histograms of the 3 DNN instances and the loglinear model. The right subplots show the error distribution histogram (difference between values of the truth and the prediction).
    Figure 4. Scaling and input data distribution log linear, sixth order, and NN models. The left subplots show a slice through the parameter space of the various models, giving an indication of their behaviour relative to one another. The right subplots show the marginal histograms for the given parameter. One can observe a large deviation between the curves where the data is quite sparse, particularly for the higher order polynomial models.

    Figure 2 show the relative L2 fidelity and the coefficient of determination R2, defined below, for log polynomial and polynomial models of increasing order.

    The relative error L2rel(f) of the surrogate f is given by

    L2rel(f):=N+Nouti=N+1|f(x(i);θ)y(i)|2N+Nouti=N+1|y(i)|2.

    The coefficient of determination R2(f) is given by

    R2(f):=1N+Nouti=N+1|f(x(i);θ)y(i)|2N+Nouti=N+1|y(i)¯y|2,

    where

    y=1NoutN+Nouti=N+1y(i).

    Both models begin to saturate at the same level of fidelity, indicating that this is a fundamental limitation of the information in the data. Furthermore, one observes from Figure 2 that the saturation level of the fidelity occurs with only a few percentage points of improvement over the theoretically derived log-linear model.


    4.2. Neural networks

    The neural network models were setup and trained using the built in neural network functions of Mathematica, which is built around the MXNet machine learning framework. The Nval=Nout samples are used in this case for validation, as described in the end of subsection 3.2.1.

    By connecting various combinations of layers, the neural network model gains the flexibility to model the behavior of the training data. As the size of the network increases so do the number of unknown parameters in the model. If there is too much flexibility then the network model begins to conform to the noise in the data. In addition, due to the nature of gradient decent methods, the minima reached is a local minima dependent on the starting position. As the number of unknown parameters increases, in particular if it exceeds the amount of training data, one expects the inversion to become unstable, with an increased risk of converging to an undesirable local minimum.

    To reduce the degeneracy of the objective function, the total number tunable weights and biases in the network will be kept below N, although this can be relaxed if a Tikhonov or other type of regularization is employed. Table 1, shows the make up of the neural network model used here. Initial weights and biases and chosen at random. The networks will be trained multiple times with random initial parameters to explore various local modes. Three examples are shown in Figures 3, 4. Figure 3 shows four pairs of images, with the left being the output in comparison to the true value, and the right being a histogram of the difference between the 2. The left two pairs and the top right are all corresponding to different instances of the same network optimization, but with different random selections of training data and different initial guesses. The bottom right is the best fit log linear model 2.2.

    Table 1. Description of the 3 layer DNN.
    Layer Type Inputs Width Weights Biases Total Parameters
    Linear 9 20 180 20 200
    ReLU 20 20 0 0 0
    Linear 20 20 400 20 420
    ReLU 20 20 0 0 0
    Linear 20 1 20 1 21
    Total 641
     | Show Table
    DownLoad: CSV

    The L2rel and R2 fidelity measures are shown in Table 2. What one observes here is that one finds almost no improvement at all over the theoretical log linear model with this P=641 parameter neural network, while in fact we do manage to improve the accuracy by a few percent with higher order GLM using P=5005 parameters (see Figure 2). Figure 4 shows 9 pairs of panels, one for each parameter. This indicates that the models perform better where the data is more dense. Also, the higher order polynomial models can behave wildly outside the support of the data. See for example the left bottom 3 panels.

    Table 2. Total loss of all available data.
    Network Validation loss 1L2rel R2
    3 Layer 1 0.0148 0.915 0.963
    3 Layer 2 0.0146 0.917 0.964
    3 Layer 3 0.0140 0.918 0.965
    Linlog 0.0161 0.914 0.960
     | Show Table
    DownLoad: CSV

    Finally, Figure 5 shows the pairwise marginal histograms, which gives a slightly better indication of the shape of the data distribution.

    Figure 5. Pairwise marginal histograms of the input data X. This gives some idea of the pairwise correlations between the different features over the input data.

    5. Summary

    In this work we considered fitting a theory-based log-linear ansatz, as well as higher order approximations for the thermal energy confinement of a Tokamak. It is a supervised machine learning regression problem to identify the map RKR+ with K=9 inputs. We parametrized the map using general linear models based on total order polynomials up to degree 6 (P=5005 parameters), as well as deep neural networks with up to P=641 parameters. The results indicate that the theory-based model fits the data almost as well as the more sophisticated machines, within the support of the data set. The conclusion we arrive at is that only negligible improvements can be made to the theoretical model, for input data of this type. Further work includes (ⅰ) Bayesian formulation of the inversion for uncertainty quantification (UQ), or perhaps derivative-based UQ (sensitivity), (ⅱ) inclusion of further hyper-parameters in the GLM for improved regularization within an empirical Bayesian framework, (ⅲ) using other types of regularization for the deep neural network, in order to allow for more parameters, and (ⅳ) exploring alternative machine learning methods, such as Gaussian processes, which naturally include UQ.


    Acknowledgments

    This work has been supported by the U.S. Department of Energy, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility under awards, DE-FG02-04ER54761 and DE-FC02-04ER54698.


    Conflict of interest

    All authors declare no additional conflicts of interest in this paper.


    [1] Christopher MB (2006) Pattern recognition and machine learning. J Electron Imaging 16: 140–155.
    [2] Boyd S, Vandenberghe V (2004) Convex optimization, Cambridge university press.
    [3] Goodfellow I, Bengio Y, Courville A (2016) Deep Learning, MIT Press.
    [4] Kushner H, Yin GG (2003) Stochastic approximation and recursive algorithms and applications, Springer Science & Business Media, 35.
    [5] Mallat S (2016) Understanding deep convolutional networks. Phil Trans R Soc A 374: 20150203. doi: 10.1098/rsta.2015.0203
    [6] Moosavi-Dezfooli SM, Fawzi A, Frossard P (2016) Deepfool: A simple and accurate method to fool deep neural networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2574–2582.
    [7] Murphy KP (2014) Machine learning: A probabilistic perspective. CHANCE 27: 62–63.
    [8] Park JM, Ferron JR , Holcomb CT, et al. (2018) Integrated modeling of high βn steady state scenario on diii-d. Phys Plasmas 25: 012506. doi: 10.1063/1.5013021
    [9] Park JM, Staebler G, Snyder PB, et al. (2018) Theory-based scaling of energy confinement time for future reactor design. Available from: http://ocs.ciemat.es/EPS2018ABS/pdf/P5.1096.pdf.
    [10] Rasmussen CE (2003) Gaussian processes in machine learning, MIT Press.
    [11] Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat: 400–407.
    [12] Snyder PB, Burrell KH,Wilson HR, et al. (2007) Stability and dynamics of the edge pedestal in the low collisionality regime: Physics mechanisms for steady-state elm-free operation. Nucl Fusion 47: 961. doi: 10.1088/0029-5515/47/8/030
    [13] Staebler GM, Kinsey JE, Waltz RE (2007) A theory-based transport model with comprehensive physics. Phys Plasmas 14: 055909. doi: 10.1063/1.2436852
  • This article has been cited by:

    1. Ehab Hassan, C. E. Kessel, J. M. Park, W. R. Elwasif, R. E. Whitfield, K. Kim, P. B. Snyder, D. B. Batchelor, D. E. Bernholdt, M. R. Cianciosa, D. L. Green, K. J. H. Law, Core-Pedestal Plasma Configurations in Advanced Tokamaks, 2023, 1536-1055, 1, 10.1080/15361055.2022.2145826
    2. Yang Li, Imran Shafique Ansari, The Influence of Big Data and Information Fusion Innovative Technology in College Students’ Ideological and Political Education, 2022, 2022, 1875-905X, 1, 10.1155/2022/9265037
  • Reader Comments
  • © 2018 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5329) PDF downloads(1703) Cited by(2)

Article outline

Figures and Tables

Figures(5)  /  Tables(2)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog