
When planning the development of future energy resources, electrical infrastructure, transportation networks, agriculture, and many other societally significant systems, policy makers require accurate and high-resolution data reflecting different climate scenarios. There is widely documented evidence that perceptual loss can be used to generate perceptually realistic results when mapping low-resolution inputs to high-resolution outputs, but its application is limited to images at present. In this paper, we study the perceptual loss when increasing the resolution of raw precipitation data by ×4 and ×8 under training modes of CNN and GAN. We examine the difference in the perceptual loss calculated by using different layers of feature maps and demonstrate how low- and mid-level feature maps can yield comparable results to pixel-wise loss. In particular, from both qualitative and quantitative points of view, Conv2_1 and Conv3_1 are the best compromises between obtaining detailed information and maintaining the overall error in our case. We propose a new approach to benefit from perceptual loss while considering the characteristics of climate data. We show that in comparison to calculating perceptual loss directly for the entire sample, our proposed approach can obtain detailed information of extreme events regions while reducing error.
Citation: Yang Wang, Hassan A. Karimi. Perceptual loss function for generating high-resolution climate data[J]. Applied Computing and Intelligence, 2022, 2(2): 152-172. doi: 10.3934/aci.2022009
[1] | Youliang Zhang, Guowu Yuan, Hao Wu, Hao Zhou . MAE-GAN: a self-supervised learning-based classification model for cigarette appearance defects. Applied Computing and Intelligence, 2024, 4(2): 253-268. doi: 10.3934/aci.2024015 |
[2] | Hao Zhen, Yucheng Shi, Jidong J. Yang, Javad Mohammadpour Vehni . Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification. Applied Computing and Intelligence, 2023, 3(1): 13-26. doi: 10.3934/aci.2023002 |
[3] | Sheyda Ghanbaralizadeh Bahnemiri, Mykola Pnomarenko, Karen Eguiazarian . Iterative transfer learning with large unlabeled datasets for no-reference image quality assessment. Applied Computing and Intelligence, 2024, 4(2): 107-124. doi: 10.3934/aci.2024007 |
[4] | Yang Wang, Hassan A. Karimi . Exploring large language models for climate forecasting. Applied Computing and Intelligence, 2025, 5(1): 1-13. doi: 10.3934/aci.2025001 |
[5] | Xu Ji, Fang Dong, Zhaowu Huang, Xiaolin Guo, Haopeng Zhu, Baijun Chen, Jun Shen . Edge-assisted multi-user millimeter-wave radar for non-contact blood pressure monitoring. Applied Computing and Intelligence, 2025, 5(1): 57-76. doi: 10.3934/aci.2025004 |
[6] | Noah Gardner, John Paul Hellenbrand, Anthony Phan, Haige Zhu, Zhiling Long, Min Wang, Clint A. Penick, Chih-Cheng Hung . Investigation of ant cuticle dataset using image texture analysis. Applied Computing and Intelligence, 2022, 2(2): 133-151. doi: 10.3934/aci.2022008 |
[7] | Xuetao Jiang, Binbin Yong, Soheila Garshasbi, Jun Shen, Meiyu Jiang, Qingguo Zhou . Crop and weed classification based on AutoML. Applied Computing and Intelligence, 2021, 1(1): 46-60. doi: 10.3934/aci.2021003 |
[8] | Yunxiang Yang, Hao Zhen, Yongcan Huang, Jidong J. Yang . Enhancing nighttime vehicle detection with day-to-night style transfer and labeling-free augmentation. Applied Computing and Intelligence, 2025, 5(1): 14-28. doi: 10.3934/aci.2025002 |
[9] | Mohammad Alkhalaf, Ping Yu, Jun Shen, Chao Deng . A review of the application of machine learning in adult obesity studies. Applied Computing and Intelligence, 2022, 2(1): 32-48. doi: 10.3934/aci.2022002 |
[10] | Hao Zhen, Oscar Lares, Jeffrey Cooper Fortson, Jidong J. Yang, Wei Li, Eric Conklin . Unraveling the dynamics of single-vehicle versus multi-vehicle crashes: a comparative analysis through binary classification. Applied Computing and Intelligence, 2024, 4(2): 349-369. doi: 10.3934/aci.2024020 |
When planning the development of future energy resources, electrical infrastructure, transportation networks, agriculture, and many other societally significant systems, policy makers require accurate and high-resolution data reflecting different climate scenarios. There is widely documented evidence that perceptual loss can be used to generate perceptually realistic results when mapping low-resolution inputs to high-resolution outputs, but its application is limited to images at present. In this paper, we study the perceptual loss when increasing the resolution of raw precipitation data by ×4 and ×8 under training modes of CNN and GAN. We examine the difference in the perceptual loss calculated by using different layers of feature maps and demonstrate how low- and mid-level feature maps can yield comparable results to pixel-wise loss. In particular, from both qualitative and quantitative points of view, Conv2_1 and Conv3_1 are the best compromises between obtaining detailed information and maintaining the overall error in our case. We propose a new approach to benefit from perceptual loss while considering the characteristics of climate data. We show that in comparison to calculating perceptual loss directly for the entire sample, our proposed approach can obtain detailed information of extreme events regions while reducing error.
A wide range of research areas can benefit from climatology data on a variety of spatial and temporal scales [1,2]. For example, in terms of temporal scales, short-term precipitation data can be used for flood forecasting, and long-term precipitation patterns can be utilized for urban planning and policy development. For spatial scales, it is possible to understand the characteristics of temperature distribution on a global scale from large-scale temperature data, whereas small-scale solar data can provide information on the pattern of solar energy resource distribution at the regional level [3]. Analyzing small-scale climate data is particularly challenging since the data needs to be at a high resolution. Global climate model (GCM) is one of the most commonly used sources of climate data in current studies [4], from which climate data is provided at a coarse spatial scale. Additionally, because scale-specific physics, computational resources, and time frames are customized into numerical models for each application, simulating all desired scales is generally intractable.
There are various types of interpolation methods available for the generation of high-resolution (HR) climate data from corresponding low-resolution (LR) data, but interpolation tends to smooth out the resulting high-resolution data, which removes texture and sharp edge information on small scales [5]. For this, recently interest in applying deep learning models, which originally were developed for computer vision, to generate high-resolution climate data has increased. Two examples of such models are super-resolution convolutional neural network (SRCNN) [6] and super-resolution generative adversarial network (SRGAN) [7]. While many studies have demonstrated superior performances by SRCNN and SRGAN to solve climate-related problems [8,9,10], recent developments in deep neural networks suggest new thinking and a huge potential in climate research. However, despite their impressive results, the loss function in such networks is pixel-wise where the mean squared error (MSE) is used to measure the difference between a network output and the ground truth. One disadvantage of pixel-wise loss functions is that they lead to blurry and over-smoothed HR data by forcing the model to generate a result which is the average of all plausible solutions [11]. When an algorithm tries to minimize per pixel MSE, it overlooks subtle textures crucial to human perception.
In comparing two images, human vision does not compare pixels by pixels, but rather extracts and compares features. Inspired by this idea, more recently, perceptual loss functions have shown significant improvements in computer vision for dealing with HR image data [7,12,13]. Pixel-wise loss is calculated by directly comparing the high-resolution results generated by the model with the true values. In contrast to per-pixel loss functions, perceptual loss calculates the loss between the simulated and true values in the feature space, which are extracted by using pre-trained feature extractors such as very deep convolutional networks, for example, VGGs [14]. A pre-trained feature extractor is the basis of perceptual loss. In the early layers of the network, each neuron is connected to a smaller receptive field resulting in the extraction of predominantly low-level spatial information, which is primarily the detailed information of the input. As the network goes deeper, the receptive field gradually increases, and the information extracted in the deeper layer changes from low-level spatial information to global semantic information, while fine-grained spatial details are discarded [15,16]. By comparing model results with high-level information of ground truth, such as content and global structure, perceptual loss model results and ground truth are perceptually close. As a result, the problem of over-smoothing caused by per-pixel loss functions is resolved.
The idea of perceptual loss has been applied not only to image problems, but also to solve audio pattern representation [17]. In terms of generating high-resolution climate data, the application of a perceptual loss function is, however, limited. Climate data are typically presented as raster type of data, with each raster representing the average value of a particular climate variable within the raster. Since feature extractors used in perceptual loss are usually trained with natural images, it is unknown whether they are suitable for extracting features from raster type of climate data [8]. The deep layer of the neural network in pre-trained feature extractors contains all the semantic information in the input [18]. This may not be reliable for climate data. However, recent studies have demonstrated that pretrained models have more extraction power based on the network structure than on the trained weights. A multilayer CNN may be used to capture the dependencies between statistics of variables at multiple levels without the need for any learning [19]. Due to this insight, the limitation of pre-training and specific network structures (typically VGG), for which perceptual loss has been used, is weakened, thereby allowing a broader range of applications to be analyzed.
Perceptual loss, especially VGGs-based perceptual loss, has been widely demonstrated in computer vision research to produce more realistic results compared to per-pixel loss, especially in solving image smoothing issues. However, the application of perceptual loss is currently limited to the image domain, and to our knowledge there are no relevant studies to verify the performance of perceptual loss for raster types of climate data. To fill this gap, in this study, we investigate whether perceptual loss which is trained on natural images using a pre-trained feature extractor could be used to generate high-resolution climate data. For this, we first verify the impact of using different layers of feature maps to calculate perceptual loss in generating high-resolution climate data and investigate its performance in CNN and GAN both qualitatively and quantitatively. According to the results of the study, low- and medium-level feature maps can provide comparable results compared to per-pixel loss with greater detailed information and relatively larger root mean square error (RMSE). Based on the understanding of the results from different levels of feature maps, we propose a new method to benefit from the traditional perceptual loss. Using the proposed method, we demonstrate that it can obtain good detailed information while losing less general accuracy than calculating perceptual loss directly for the whole sample.
The structure of the paper is as follows: Section 2 introduces some related works about generating HR climate data and perceptual loss. Section 3 presents the methodology, and section 4 presents the experimental setup. Section 5 describes the results. The paper concludes with a discussion in Section 6 and concluding remarks in Section 7.
In this section, we review relevant approaches that are used to generate HR climate data and perceptual loss in image related studies.
Climate data are downscaled by means of dynamical and statistical approaches to mitigate the lack of spatial resolution. The process of dynamic downscaling, also called regional climate models (RCMs), is driven by low-resolution data with large-scale data as boundary conditions and result in high-resolution data with small-scale climate projections. Dynamic downscaling is not transferable across regions and is computationally demanding. On the contrary, statistical downscaling obtains HR climate data by establishing a statistical relationship between HR and LR pairs. In past studies, linear regression models [20], support vector machines [21], and random forests [22] have been applied to establish statistical relationships between HR and LR data. Considering that climate data and images are both in raster form, statistical downscaling is similar to image super-resolution issues. It is becoming increasingly common to use models designed to handle super-resolution image in order to generate high-resolution climate data. For example, super-resolution CNN (SRCNN) was the first deep learning model to be applied to generate high-resolution climate data [23]. [10] also proposed a deep CNN, which successfully improved the resolution of precipitation data by a factor of 5. By referencing the residual dense block (RDB) to the super-resolution network, [24] was able to improve the performance of the HR data generated. In these works, the models were optimized by using pixel-wise loss functions. Adversarial training has also been applied in some studies. For example, [25] studied to determine whether Wasserstein generative adversarial networks (WGANs) could generate realistic weather conditions when trained by climate data of a general circulation model (GCM). The result of super-resolution generative adversarial network (SRGAN) showed an increase in resolution of climate data by a factor of 50 [8].
To generate realistic HR images, perceptual loss has been proposed as a solution to the problem of over-smoothing and lack of details in HR images based on the idea of perceptual similarity [7,12,26]. Perceptual loss, as opposed to pixel-wise loss, focuses on comparing the output with the ground truth in the feature domain, which is extracted by pre-trained networks such as VGG16 and VGG19 as commonly used pre-trained models [12,14]. Typically, perceptual loss is specified as the loss of one or several specific feature layers in a VGG network, and the loss depends on the application [27]. Visualizations of deep CNNs indicate that convolutional features at different levels provide different perspectives on objects and their surroundings [28]. For example, ReLU5_4 is chosen as a high-level feature layer in [7] and [29], while ReLU2_2 is selected as low-level feature layer in [30]. Instead of relying on one or multiple identified layers, some studies calculate the perceptual loss based on all feature layers [27]. A number of other studies divide the target image into several parts and calculate the loss of each part by using different feature layers [13,31]. Perceptual loss has also been applied to studies related to remote sensing images [32,33].
We require neural networks for this work in order to transform low-resolution climate data into high-resolution climate data. In light of the fact that previous studies have demonstrated that adversarial loss can lead to improved results when applied to high-resolution climate data [8], we compare pixel-wise loss and perpetual loss by using the SRGAN and validate the proposed climatological binary perceptual loss (using a mask of 1 and 0). SRGAN follows the same structure as [7] where the networks are deep and fully convolutional neural networks with 16 residual blocks and skip connections. All convolutional kernels are 3 × 3 and are followed by rectified linear unit activation functions. In contrast to the three-layer RGB channel of image data, as we only have one channel of climate input data, the input to the model is modified to one channel. The network consists of two parts: 1) a generator G and 2) a discriminator D. Generator G converts LR input into HR output. Discriminator D attempts to distinguish between real and fake HR output. Learning between the two networks forces G to produce more realistic HR output, whereas D is able to distinguish between real and generated HR data more accurately. In the discriminator network, the loss function is defined as follows:
Ldis=−log(D(YHR))−log(D(G(XLR))) | (1) |
where XLR is the LR input data and YHR is the HR target. Generator's loss function consists of two main components:
Lgen=α⋅Ladversarial(YHR,XLR)+Lcontent(YHR,XLR) | (2) |
where α is a weight. The adversarial loss can be calculated as follow:
Ladversarial(YHR,XLR)=−log((D(G(XLR))) | (3) |
This adversarial loss captures how effective the generator network is at fooling the discriminator. By comparing the generated data with the actual HR data, the content loss Lcontent is effectively conditioning the output of the generator network and we calculate it in the following three forms: LMSE, LVGG, and LCTPL.
The mean squared error (MSE) is used in this study to calculate the pixel-wise loss. This is currently the most widely used optimization target in generating high-resolution climate data. The pixel-wise MSE loss is calculated as follows:
LMSE=1W×H∑Wx=1∑Hy=1(YHR(x,y)−G(XLR)(x,y))2 | (4) |
where W and H are the width and height of YHR. Pixel-wise loss calculates the distance between YHR and G(XLR) in the data field. With the MSE loss, the network attempts to find pixel-wise averages of plausible solutions, resulting in poor perceptual qualities and a lack of high-frequency details.
Instead of comparing YHR and G(XLR) directly from the data field, perceptual loss aims to compare l2 distance (Euclidean Distance) between the two in the feature domain. In this study, the feature domain is constructed by using the VGG19 network. VGG19 consists of 19 hidden layers, including 16 convolutional layers and 3 fully connected layers. As mentioned earlier, early layers can return low-level detailed spatial information, such as edges and speckles. Deeper layers of the network can return mid-level information, such as texture, and the final layers can return high-level features, which are indicative of global semantic understanding. The perceptual loss is calculated as follows:
LVGG=1Wi,j×Hi,j∑Wi,jx=1∑Hi,jy=1(ϕi,j(YHR(x,y))−ϕi,j(G(XLR)(x,y)))2 | (5) |
where ϕi,j indicates the feature map obtained by the jth convolution (before activation) within the VGG19 network prior to the ith max pooling layer. Within the VGG network, Wi,j and Hi,j describe the dimensions of the corresponding feature maps. In order to investigate the effect of different levels of information, i.e., different depths of feature maps on the result, perceptual loss is calculated by using the feature maps obtained by conv1_1, conv2_1, conv3_1, conv4_1, and conv5_1, respectively.
Perceptual loss is computed by selecting different depths of feature maps. In image-related studies, most studies choose one single-layer feature maps to calculate the perceptual loss of the whole image. Application of this technique to an entire image without considering semantic information limits its utility. Calculation of the perceptual loss of the whole image by a low-level feature map may cause the network to give unnecessary penalties for medium-level information, such as texture, resulting in a loss of informative features. In contrast, by using high-level feature maps to calculate perceptual loss, the model may lose some fine-grained spatial details, such as sharp edges. In contrast to images, samples of climate data often contain limited information. In the case of the precipitation data used in this paper, most regions in the sample have no precipitation values or relatively small precipitation values, which means that most regional grid values remain stable and do not show a gradient. Only a few areas in the sample receive a significant amount of precipitation, and these areas have a gradient. It is possible for the network to add useless information to those stable regions if the perceptual loss is calculated for the entire sample. In addition, these stable regions may have an impact on the feature map of regions with precipitation. On this basis, our climatological binary perceptual loss is calculated as follows:
LCTPL=LMSE(YHR,XLR)+LVGG(YHR∘MT,XLR∘MT) | (6) |
where ∘ denotes element-wise multiplication.
Step 1. The first step is to calculate the MSE for the entire sample in order to ensure that the generated HR results and the ground truth remain similar.
LMSE(YHR,XLR)=1W×H∑Wx=1∑Hy=1(YHR(x,y)−G(XLR)(x,y))2 | (7) |
Step 2. A certain sample of climate data usually has stable values across larger areas without gradient variation. As an example, in the precipitation data used in the paper, most regions in the sample have 0 or very low values of precipitation. Calculating the perceptual loss for these regions may insert details into the generated HR that do not match the actual situation, which increases the error. Meanwhile, using perceived loss for those regions with higher precipitation helps to include more details in the generated HR. There is a general requirement for different loss functions in different regions. In this step, a mask M1 is generated to which the target region corresponds a raster value of 1 and the other regions correspond to a raster value of 0. The target region can be defined as one of the following three scenarios: 1) a threshold lower limit, the region in the original sample that is larger than this value is the target region; 2) a threshold upper limit, the region in the original sample that is smaller than this value is the target region; 3) a threshold interval, the region in the original sample that is within this interval is the target region. In our experiment, most areas in the sample have relatively small precipitation, and only a limited number of areas have large precipitation. Therefore, M1 represents scenario 1 in the experiment, with a threshold of 75th percentile for precipitation values in each sample.
Step 3. In the third step, we first multiply the ground truth and the generated HR data with mask separately using element-wise multiplication. After multiplication only the target region retains the corresponding value in the obtained result, while the other regions have the value of zero. Then the ground truth and the generated HR data after multiplication are used as input to calculate the perceptual loss. As the same mask is used on both the ground truth and the generated HR data, the feature distance only reflects the difference between target regions. VGG has three input channels, as the ground truth and the generated HR data are duplicated in three layers as inputs for perceptual loss.
LCBPL=1Wi,j×Hi,j∑Wi,jx=1∑Hi,jy=1(ϕi,j(YHR(x,y)∘M1)−ϕi,j(G(XLR)(x,y)∘M1))2 | (8) |
In this work, we first study the effect of perceptual loss on the generation of high-resolution climate data by using Global Precipitation Measurement (GPM) data. GPM is an international network of satellites in collaboration with the National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA), which provides the next generation of global observations of precipitation and snow [34,35]. GPM can provide precipitation data with a maximum resolution of 0.1° and has different versions, and in this study we used GPM_3IMERGDF, which is daily precipitation data that combines microwave-IR estimates with gauge calibration. We chose three patches (106.96°W-126.95°W, 25.05°N-45.05°N; 86.85°W-106.85°W, 25.05°N-45.05°N; 66.75°W-86.75°W, 25.05°N-45.05°N).
Figure 1 shows the location of each patch. Each patch has daily precipitation data from 06/01/2000 to 08/31/2021, with a total of 7756 samples. This provides us with a total of 23268 samples with a dimension of 200 × 200. Since ground truth is required when we evaluate different models, we consider the raw resolution of GPM precipitation data as our HR data and ground truth. We study the results of perceptual loss with improved resolution of ×4 and ×8, where we downsample the raw GPM data by a factor of 4 and 8, respectively, using an average sample pooling of 4 × 4 rasters and 8 × 8 rasters. The coarsened 0.4° × 0.4° resolution data (50 × 50) and 0.8° × 0.8° resolution data (50 × 50) are used as the LR input for all the experiments; 10% of the training examples are held out for testing in each case.
Daily precipitation dataset, by the GFDL-ESM4 model from Coupled Model Intercomparison Project (CMIP6), was obtained to verify the results of the proposed method applied to GCM data [36]. We used daily precipitation starting from January 2022 to January 2023 with a spatial resolution of 1° latitude ×1.25° longitude under the scenario of SSP1-2.6. For the purpose of minimizing distortion near the poles, we only use data from latitudes ±60°. For validation, we used the RMSE and Pearson's correlation to evaluate the overall quantitative performance of the model.
For training, we first trained the generator network alone for 1000 loops. This step, on the one hand, enabled the generator to learn the mapping relationship from LR to HR roughly before adversarial learning, which is a warm-up process before adversarial learning and helps to improve the results of GANs. In addition, the loss function of this process did not include adversarial loss, and we were able to compare the difference between pixel-wise loss and perceptual loss in the CNN training process. After the warm-up of the generator network iwas completed, we continued the training of GANs. The generator and discriminator were trained sequentially with 500 loops with 0.001 as the value of α. The loss functions used in each network are summarized in Table 1.
Network | Loss |
PretrainedCNNMSE | LMSE |
GANsMSE | LMSE+ Ladversarial |
PretrainedCNN# | LVGG using conv#_1 feature map |
GANs# | LVGG using conv#_1 feature map +Ladversarial |
Given that the VGG19 network is trained on natural images, its performance on climate features extracted from other raster types is unknown. For comparison, the feature maps obtained by VGG19 are shown in Figure 2 along with the feature maps obtained by one natural image in corresponding layers. It is difficult to conduct a quantitative analysis of these feature maps, however, we can see that, similar to the results on natural images, conv1_1 pays more attention to detailed climate information, such as boundaries and edges. As the layers deepen, the information presented by the feature maps becomes more and more abstract, implying that the deeper layers pay attention to the global semantic information. Additionally, we find that the average percentage of neurons that are activated in each layer is similar in both climate data and natural images. As an example, the activation percentages for conv1_1 are 64% and 59%, respectively; these percentages are 23% and 26% for conv5_1. The VGG network is therefore considered to be a reliable feature extractor for climate data.
We used various layers of feature maps to calculate the perceptual loss and examined the effect of different levels of information on the resulting HR climate data. Quantitative results are summarized in Table 2 and Table 3. As can be seen, the RMSE of PretrainedCNNRMSE is the smallest for both ×4 and ×8 results, which are 0.67 and 1.19, respectively. With the deepening of the applied feature maps, the RMSE of perceptual loss increases and both PretrainedCNN and GANs exhibit the same trend. Particularly, when the perceptual loss is calculated using conv5_1, the error increases significantly. As can be seen from the quantitative results, except for conv5_1, the results produced by all layers are not significantly different from those produced by using RMSE as the loss. We compared the results qualitatively in order to further illustrate the performance of perceptual loss.
Loss function | Pretrained CNN | GAN | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 0.67 | 0.98 | 0.73 | 0.98 |
conv1_1 | 0.81 | 0.98 | 0.83 | 0.98 |
conv2_1 | 0.93 | 0.98 | 0.96 | 0.98 |
conv3_1 | 0.97 | 0.98 | 0.98 | 0.98 |
conv4_1 | 1.12 | 0.98 | 1.02 | 0.98 |
conv5_1 | 2.36 | 0.97 | 1.91 | 0.97 |
Loss function | Pretrained CNN | GAN | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 1.19 | 0.97 | 1.30 | 0.97 |
conv1_1 | 1.42 | 0.97 | 1.44 | 0.96 |
conv2_1 | 1.51 | 0.96 | 1.59 | 0.96 |
conv3_1 | 1.46 | 0.96 | 1.50 | 0.96 |
conv4_1 | 1.81 | 0.96 | 1.71 | 0.95 |
conv5_1 | 4.94 | 0.94 | 3.85 | 0.94 |
Figure 3 shows the results of PretrainedCNNRMSE and GANsRMSE, with the first row representing the result of ×4 and the second row representing the result of ×8. In the training process of PretrainedCNNRMSE, the loss is calculated by per-pixel loss (RMSE), while the loss in the training process of GANsRMSE includes both pixel-wise loss and adversarial loss. In spite of the fact that PretrainedCNNRMSE's results are slightly better than GANsRMSE's in terms of metrics, we can see from the figure that the result of PretrainedCNN is over-smoothed, resulting in a lack of detail information. As an example, in PretrainedCNNRMSE corresponding to Ground Truth (20051216), there is a lack of detailed information in the lower right corner of the region. Nevertheless, when the adversarial loss is included, we can see that the results of GANs are clearly superior to those of PretrainedCNNRMSE with respect to detailed information. In contrast, Figure 4 shows PretrainedCNN and GANs computed by using perceptual loss. In spite of the absence of adversarial loss, perceptual loss contributes some detailed information to the results of PretrainedCNN. In the training of GANs, the addition of adversarial loss forces the detailed information generated by perceptual loss to be closer to the real results.
The per-pixel loss (RMSE) used as the loss function is consistent with our final evaluation metric in Table 1 and Table 2. Accordingly, a model trained to minimize per-pixel loss would always outperform one trained to minimize perceptual loss or adversarial loss, which encourages the generated HR data to be perceptually similar to the target HR data, but does not require their exact match. This explains why PretrainedCNNRMSE is best in terms of the metric. It is specifically PretrainedCNN to optimize the content-based loss, resulting in safer HR fields, i.e., over-smoothed ground truth predictions. According to the difference between the results of PretrainedCNN under per-pixel loss and perceptual loss, we find that, although the parameters of VGG19 are trained on natural images, the medium- and low-level feature maps from VGG19 remain effective for climate data. Even though perceptual loss achieves worse evaluation metrics, it makes more aggressive predictions, inserting significantly more small-scale features that better reflect the nature of real precipitation than the results under per-pixel loss.
Figure 5 compares the results of GANs for different loss conditions. We also show the results of Bicubic whose results are evidently over-smoothed when compared to the ground truth. By adding the adversarial loss, GANsRMSE weakens the smoothing compared to PretrainedCNNRMSE and adds more detailed information. It should be noted, however, that GANsRMSE provides more smoothed result when compared to the results of GANs using perceptual loss; for example, see the area with more precipitation on the right side of the region in Figure 5. For the GANs using perceptual loss, we can observe that as the feature maps become deeper, i.e., from Conv1_1 to Conv5_1, the results are less smooth, and the HR data are produced with more detailed information. The negative effect of this very aggressive generation is the increase of RMSE, i.e., the deviation from the ground truth. It is important to observe from Table 1 and Table 2 that Conv5_1 produced significantly worst results metrics. In reviewing the results of Conv5_1, we identified two issues that contributed to this outcome. For some samples, the results of Conv5_1 deviate significantly from those of the ground truth. Secondly, some HR results are similar to the ground truth but contain significant checkerboard artifacts. In view of the large error and instability of the results generated by Conv5_1, we do not include Conv5_1 in subsequent analyses.
As previously stated, for climate data, we are more concerned with the performance of those regions with extreme events in the generated HR results. To compare the results of different loss functions in these regions, different quantitative thresholds are used to test the ability of each method to capture extreme events. At each grid cell in our study area, all precipitation values above the percentile threshold were selected first, and then the RMSE was calculated based on the selected data. This was performed in a range of percentages between 75 and 99.9 and averaged over all locations and all samples. As can be seen from Figure 6, in both the ×4 and ×8 models, the RMSEs corresponding to GANs1, GANs2, and GANs3 are smaller than those in the region of the extreme event. Specifically, for the ×4 model, GANs4 has the worst RMSE. There is a close relationship between the curves obtained in GANs2, GANs3 and GANsRMSE, and the best result is obtained in GANs1. Compared with GANs4 and GANsRMSE, the results of GANs1, GANs2, and GANs3 are very close in the ×8 model. We can conclude that the results of GANs1, GANs2, and GANs3 are comparable to GANsRMSE for the generation of extreme events region. In addition, we examine the spatial distribution of errors in the region of extreme events for GANs1, GANs2, GANs3 and GANsRMSE. In the case of GANs1 and GANsRMSE, the error distribution is wider, but the error values are relatively small. In the case of GANs2 and GANs3, the errors are concentrated in fewer grid cells, but the error values are relatively large.
Figure 7 shows the precipitation distribution of the six samples, where the x-axis is the variation of precipitation in each raster, and the y-axis refers to the proportion of the corresponding raster. The first row is the result of the ×4 model and the second row is the result of the ×8 model. As can be seen from the figure, the distribution of GANs4 is significantly different from the distribution of the ground truth. The distributions of GANs1, GANs2, GANs3 and GANsRMSE are gradually approaching the distribution of the ground truth as precipitation quantity increases. In contrast, the distributions of GANsRMSE and GANs1 for the smaller precipitation are closer to the ground truth's distribution than the distributions of GANs2 and GANs3. As an example, in sample 6, the density of GANs2 and GANs3 are greater than the ground truth's distribution in the interval of 0-3 mm. The same observation is also made for sample 1, sample 2, and sample 5.
In order to better qualitatively evaluate the results corresponding to different feature maps, we set up a user experiment. In the experiment, we had a total of 40 participants. There were 20 samples of precipitation data provided to each participant. Each data sample consisted of HR ground truth, and generated results from GANs1, GANs2, GANs3, GANs4, and GANs5. The order of the different results in each sample was randomized. User experiment results are presented in Figure 6. We collected a total of 790 votes. Among all the votes, the votes corresponding to GANs2 occupy 32.4%, slightly higher than the 31.6 corresponding to GANs3. The results of GANs2 and GANs3 are significantly higher than the results of the other models. As shown in Figure 8, the best results for each sample are determined by the principle of the largest number of votes, where 7 best results correspond to GANs3 and 8 best results correspond to GANs2.
In summary, although low- and medium-level feature maps have slightly worse RMSE metrics than per-pixel loss, they can provide more detailed information for HR results, and with such detailed information, HR results perform well in regions of extreme events. In contrast to per-pixel loss, there is a difference between the distribution of perceptual loss and the ground truth in regions with less precipitation. By qualitatively and quantitatively comparing the results corresponding to different layers of feature maps, we use GANs2 and GANs3 to test the proposed perceptual loss, since we consider GANs2 and GANs3 as a balance of qualitative detail information and quantitative evaluation metrics.
In this section, we examine the performance of the results obtained by training the GAN via climatological binary perceptual loss. Table 4 shows the results for ×4 and ×8. In the table, although some metrics are still poor when compared to the GANsRMSE, they are optimized when compared to the direct calculation with GANs2 and GANs3. In the case of ×4, GANsbpl2 and GANsbpl3 improve by 0.11 and 0.10, respectively, and in the case of ×8, GANsbpl2 and GANsbpl3 improve by 0.17 and 0.11, respectively. Some distributions of HR results based on climatological binary perceptual loss are shown in Figure 10. The results obtained with the proposed loss are closer to the ground truth in the interval with lower precipitation levels than the direct use of Conv2_1 and Conv3_1, which retain the advantage of calculating the loss by using per-pixel loss. In addition to improving the distribution of the samples, the proposed climatological binary perceptual loss is also optimized for the regions of extreme events. In Figure 9, the results obtained by using Conv2_1 and Conv3_1 are directly compared with those obtained by using the proposed loss. It can be seen that the proposed loss provides detailed information in the area of extreme events that is closer to the ground truth.
Loss function | ×4 | ×8 | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 0.73 | 0.98 | 1.30 | 0.97 |
Conv2_1 | 0.96 | 0.98 | 1.59 | 0.96 |
Conv3_1 | 0.98 | 0.98 | 1.50 | 0.96 |
Proposed Loss with conv2_1 | 0.85 | 0.98 | 1.42 | 0.96 |
Proposed Loss with conv3_1 | 0.88 | 0.98 | 1.39 | 0.97 |
In this section, we discuss the application of the proposed climatological targeted perceptual loss to the CMIP6 data. The original raster size is 120 × 288 with a resolution of 1° × 1.25°. By increasing the resolution by a factor of 4 and 8, we increase the size of the raster to 480 × 1152 and 960 × 2304. Figure 10 shows the distribution of precipitation before and after the resolution improvement. To further illustrate the results of the model, we calculated the average precipitation of the raw CMIP6 data and the generated high-resolution data in the range of 25° N to 49° N, 70° W to 130° W. This region roughly covers the contiguous United States range. The obtained time series are also presented in Figure 11. We combine the daily data into the corresponding monthly data, and there are 132 time points in the time series from January 2023 to January 2033. As we can see from the figure, the results of ×4 are very close to the precipitation process obtained from the original resolution. The results of ×8 have more error than the results of ×4 in some months, such as July 2024, but it also basically follows the precipitation process as the original resolution. From the samples, we can also see that the proposed climatological targeted perceptual loss increases the detail information of the extreme events regions.
Obtaining high-resolution climate data from low-resolution climate data is an ill-posed problem, as for a given LR input, there may be multiple possible high-resolution outputs corresponding to it. On the one hand, we want the generated HR data to be quantitatively similar to the LR input. Additionally, we want the generated HR data to contain detailed information. There is often a conflict between the two during the training process. In general, qualitative similarity forces the model to provide smoother results, which, on average, are close to the input. The disadvantage, however, is that detailed information is lost. Including rich detailed information requires the model to insert possible high-resolution results, and this aggressive approach can lead to an increase in overall error. Our study examines both qualitative and quantitative aspects of perceptual loss in the generation of high-resolution climate data. In terms of multiplicity, we consider both cases of increasing the original resolution by a factor of 4 and 8. In terms of training mode, we examine the results of perceptual loss in the training process of CNN and the training process of GAN. Despite the fact that the parameters of the feature extractor used for perceptual loss are usually trained by using natural images, it still achieves promising results compared with the traditional pixel-wise loss for climatic data. Perceptual loss can provide more realistic HR results than pixel-wise loss during CNN training, though there is a trade-off in the overall RMSE. As the feature map used for perceptual loss becomes deeper, the results become less smooth. The addition of adversarial loss to the GAN training process can solve the problem of pixel-wise loss results that provide less detailed information and are too smooth in nature. A comparison of the results of pixel-wise loss and perceptual loss in GAN indicates that the perceptual loss results are perceptually closer to those obtained by pixel-wise loss. In contrast, when deep feature maps, such as Conv4_1 and Con5_1, are used, the generated results are unstable. The results of Con5_1, in particular, show large errors as well as checkerboard artifacts. The VGG network is trained to solve classification problems, so the high-level semantic information expressed in the deep feature map is primarily used to distinguish between different labels. In the case of climate data, there may be zero or no significant change in most of the region, resulting in an insufficient amount of semantic information contained in the deep feature map. It may not be possible to accurately measure the difference between the generated results and the ground truth by comparing these feature maps.
As a result of comparing the results obtained from feature maps with different layers, we find that the feature maps of the low and middle layers, in our case Conv1_1, Conv2_1, and Conv3_1, provide good results for regions with extreme events. They have slightly smaller RMSEs in these regions than the HR results obtained by using pixel-wise loss, and they provide richer detailed information as well. Based on our qualitative and quantitative analyses, Conv2_1 and Conv3_1are the best balance between obtaining detailed information and preserving the overall error. To solve the problem that perceptual loss has large errors in regions with small raster values and small gradient changes in the sample, we propose climatological targeted perceptual loss. In order to calculate the error for the whole sample, we first calculate the pixel-wise loss, and then for the targeted regions, that is, those containing extreme events, we further use the perceptual loss. In comparison to calculating perceptual loss directly for the entire region, our results show that climatological targeted perceptual loss can improve the detailed information of extreme events regions while reducing error.
In this work, we adapt the perceptual loss technique developed for super-resolution in image processing to obtain high-resolution climatological data. We present the difference between perceptual loss and pixel-wise loss by increasing the resolution of raw rainfall data by ×4 and ×8 under two different training modes of CNN and GAN. We examine the difference in the perceptual loss calculated by using different layers of feature maps. Based on our results, it appears that low- and mid-level feature maps can yield comparable results to pixel-wise loss. Based on our qualitative and quantitative analyses, Conv2_1 and Conv3_1are the best balance between obtaining detailed information and preserving the overall error. In comparison with calculating perceptual loss directly for the entire region, our results show that the proposed climatological binary perceptual loss can improve the detailed information of extreme events regions. We also examine the performance of the climatological binary perceptual loss by using unseen GCM data.
All authors declare that there is no conflict of interests in this paper.
[1] |
F. K. Muthoni, V. O. Odongo, J. Ochieng, E. M. Mugalavai, S. K. Mourice, I. Hoesche-zeledon, et al., Long-term spatial-temporal trends and variability of rainfall over Eastern and Southern Africa. Theor. Appl. Climatol., 137 (2019), 1869–1882. https://doi.org/10.1007/s00704-018-2712-1 doi: 10.1007/s00704-018-2712-1
![]() |
[2] |
C. C. Wei, Simulation of operational typhoon rainfall nowcasting using radar reflectivity combined with meteorological data, J. Geophys. Res.-Atmos., 119 (2014), 6578–6595. https://doi.org/10.1002/2014JD021488 doi: 10.1002/2014JD021488
![]() |
[3] |
C. P. Castillo, F. B. e Silva, C. Lavalle, An assessment of the regional potential for solar power generation in EU-28, Energy Policy, 88 (2016), 86–99. https://doi.org/10.1016/j.enpol.2015.10.004 doi: 10.1016/j.enpol.2015.10.004
![]() |
[4] |
A. Voldoire, E. Sanchez-Gomez, D. Salas y Mélia, B. Decharme, C. Cassou, S. Sénési, et al., The CNRM-CM5.1 global climate model: Description and basic evaluation, Clim. Dyn., 40 (2013), 2091–2121. https://doi.org/10.1007/s00382-011-1259-y doi: 10.1007/s00382-011-1259-y
![]() |
[5] |
N. Hofstra, M. Haylock, M. New, P. Jones, C. Frei, Comparison of six methods for the interpolation of daily, European climate data, J. Geophys. Res. Atmos., 113 (2008). https://doi.org/10.1029/2008JD010100 doi: 10.1029/2008JD010100
![]() |
[6] |
C. Dong, C. C. Loy, K. He, X. Tang, Image Super-Resolution Using Deep Convolutional Networks, IEEE T. Pattern Anal. Mach. Intell., 38 (2015), 295–307. https://doi.org/10.1109/TPAMI.2015.2439281 doi: 10.1109/TPAMI.2015.2439281
![]() |
[7] | C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, et al., Photo-realistic single image super-resolution using a generative adversarial network, Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, 4681–4690. https://doi.org/10.1109/CVPR.2017.19 |
[8] |
K. Stengel, A. Glaws, D. Hettinger, R. N. King, Adversarial super-resolution of climatological wind and solar data, Proc. Natl. Acad. Sci. U. S. A., 117 (2020), 16805–16815. https://doi.org/10.1073/pnas.1918964117 doi: 10.1073/pnas.1918964117
![]() |
[9] |
T. Vandal, E. Kodra, S. Ganguly, A. Michaelis, R. Nemani, A. R. Ganguly, Generating high resolution climate change projections through single image super-resolution: An abridged version, IJCAI Int. Jt. Conf. Artif. Intell., (2018) 5389–5393. https://doi.org/10.24963/ijcai.2018/759 doi: 10.24963/ijcai.2018/759
![]() |
[10] |
E. R. Rodrigues, I. Oliveira, R. Cunha, M. Netto, DeepDownscale: A deep learning strategy for high-resolution weather forecast, Proc. - IEEE 14th Int. Conf. eScience, e-Science, (2018), 415–422. https://doi.org/10.1109/eScience.2018.00130 doi: 10.1109/eScience.2018.00130
![]() |
[11] | Y. Jo, S. Yang, S. J. Kim, Investigating loss functions for extreme super-resolution, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work., (2020), 424–425. |
[12] |
J. Johnson, A. Alahi, L. Fei-Fei, Perceptual losses for real-time style transfer and super-resolution, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), 9906 (2016), 694–711. https://doi.org/10.1007/978-3-319-46475-6_43 doi: 10.1007/978-3-319-46475-6_43
![]() |
[13] |
M. S. Sajjadi, B. Scholkopf, M. Hirsch, EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis, Proc. IEEE Int. Conf. Comput. Vis., (2017), 4491–4500. https://doi.org/10.1109/ICCV.2017.481 doi: 10.1109/ICCV.2017.481
![]() |
[14] | K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., (2015), 1–14. |
[15] |
A. Mahendran, A. Vedaldi, Visualizing Deep Convolutional Neural Networks Using Natural Pre-images, Int. J. Comput. Vis., 120 (2016), 233–255. https://doi.org/10.1007/s11263-016-0911-8 doi: 10.1007/s11263-016-0911-8
![]() |
[16] | W. Yu, K. Yang, Y. Bai, H. Yao, Y. Rui, Visualizing and Comparing Convolutional Neural Networks, 2014. |
[17] | Y. Ma, K. A. Lee, V. Hautamaki, H. Li, PL-EESR: Perceptual Loss Based End-to-End Robust Speaker Representation Extraction, 2021 IEEE Autom. Speech Recognit. Underst. Work. ASRU, (2021), 106–113. |
[18] |
E. Shelhamer, J. Long, T. Darrell, Fully Convolutional Networks for Semantic Segmentation, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965 doi: 10.1109/CVPR.2015.7298965
![]() |
[19] |
Y. Liu, H. Chen, Y. Chen, W. Yin, C. Shen, Generic Perceptual Loss for Modeling Structured Output Dependencies, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., (2021), 5420–5428. https://doi.org/10.1109/CVPR46437.2021.00538 doi: 10.1109/CVPR46437.2021.00538
![]() |
[20] |
R. Huth, Statistical downscaling of daily temperature in central Europe, J. Clim., 15 (2002), 1731–1742. https://doi.org/10.1175/1520-0442(2002)015<1731:SDODTI>2.0.CO;2 doi: 10.1175/1520-0442(2002)015<1731:SDODTI>2.0.CO;2
![]() |
[21] |
S. T. Chen, P. S. Yu, Y. H. Tang, Statistical downscaling of daily precipitation using support vector machines and multivariate analysis, J. Hydrol., 385 (2010), 13–22. https://doi.org/10.1016/j.jhydrol.2010.01.021 doi: 10.1016/j.jhydrol.2010.01.021
![]() |
[22] |
C. Hutengs, M. Vohland, Downscaling land surface temperatures at regional scales with random forest regression, Remote Sens. Environ., 178 (2016), 127–141. https://doi.org/10.1016/j.rse.2016.03.006 doi: 10.1016/j.rse.2016.03.006
![]() |
[23] |
T. Vandal, E. Kodra, S. Ganguly, A. Michaelis, R. Nemani, A. R. Ganguly, DeepSD: Generating high resolution climate change projections through single image super-resolution, Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., (2017), 1663–1672. https://doi.org/10.1145/3097983.3098004 doi: 10.1145/3097983.3098004
![]() |
[24] |
J. Cheng, Q. Kuang, C. Shen, J. Liu, X. Tan, W. Liu, ResLap: Generating High-Resolution Climate Prediction through Image Super-Resolution, IEEE Access, 8 (2020), 39623–39634. https://doi.org/10.1109/ACCESS.2020.2974785 doi: 10.1109/ACCESS.2020.2974785
![]() |
[25] |
C. Besombes, O. Pannekoucke, C. Lapeyre, B. Sanderson, O. Thual, Producing realistic climate data with generative adversarial networks, Nonlinear Proc. Geoph., 28 (2021), 347–370. https://doi.org/10.5194/npg-28-347-2021 doi: 10.5194/npg-28-347-2021
![]() |
[26] | J. Bruna, P. Sprechmann, Y. LeCun, Super-resolution with deep convolutional sufficient statistics, 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc., (2016), 1–17. |
[27] |
X. Xu, M. Xie, P. Miao, W. Qu, W. Xiao, H. Zhang, et al., Perceptual-Aware Sketch Simplification Based on Integrated VGG Layers, IEEE T. Vis. Comput. Gr., 27 (2019), 178–189. https://doi.org/10.1109/TVCG.2019.2930512 doi: 10.1109/TVCG.2019.2930512
![]() |
[28] |
A. Mahendran, A. Vedaldi, Understanding deep image representations by inverting them, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., (2015), 5188–5196. https://doi.org/10.1109/CVPR.2015.7299155 doi: 10.1109/CVPR.2015.7299155
![]() |
[29] |
Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, et al., Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss, IEEE T. Med. Imaging, 37 (2018), 1348–1357. https://doi.org/10.1109/TMI.2018.2827462 doi: 10.1109/TMI.2018.2827462
![]() |
[30] |
P. Sangkloy, J. Lu, C. Fang, F. Yu, J. Hays, Scribbler: Controlling deep image synthesis with sketch and color, Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, (2017), 6836–6845. https://doi.org/10.1109/CVPR.2017.723 doi: 10.1109/CVPR.2017.723
![]() |
[31] | M. S. Rad, B. Bozorgtabar, U. V. Marti, M. Basler, H. K. Ekenel, J. P. Thiran, SROBB: Targeted perceptual loss for single image super-resolution, Proc. IEEE Int. Conf. Comput. Vis., (2019), 2710–2719. |
[32] |
Y. Zhang, W. Li, W. Gong, Z. Wang, J. Sun, An improved boundary-aware perceptual loss for building extraction from VHR images, Remote Sens., 12 (2020), 1195. https://doi.org/10.3390/rs12071195 doi: 10.3390/rs12071195
![]() |
[33] |
J. Chi, J. Bae, Y. J. Kwon, Two-stream convolutional long-and short-term memory model using perceptual loss for sequence-to-sequence arctic sea ice prediction, Remote Sens., 13 (2021), 3413. https://doi.org/10.3390/rs13173413 doi: 10.3390/rs13173413
![]() |
[34] | G. Skofronick-Jackson, W. A. Petersen, W. Berg, C. Kidd, E. F. Stocker, D. B. Kirschbaum, et al., The global precipitation measurement (GPM) mission for science and Society, B. Am. Meteorol. Soc., 98 (2017), 1679–1695. |
[35] | V. Levizzanni, P. Bauer, F. J. Turk, Measuring Precipitation from Space: EURAINSAT and the Future, vol. 28, 2007. https://doi.org/10.1007/978-1-4020-5835-6 |
[36] |
J. P. Dunne, L. W. Horowitz, A. J. Adcroft, P. Ginoux, I. M. Held, J. G. John, et al., The GFDL Earth System Model Version 4.1 (GFDL-ESM 4.1): Overall Coupled Model Description and Simulation Characteristics, J. Adv. Model. Earth Syst., 12 (2020), e2019MS002015. https://doi.org/10.1029/2019MS002015 doi: 10.1029/2019MS002015
![]() |
1. | Albu Alexandra-Ioana, Improving radar echo extrapolation models using autoencoder-based perceptual losses, 2023, 225, 18770509, 1611, 10.1016/j.procs.2023.10.150 | |
2. | 张博文 Zhang Bowen, 夏振平 Xia Zhenping, 张跃渊 Zhang Yueyuan, 程成 Cheng Cheng, 刘宇杰 Liu Yujie, 基于改进拉普拉斯金字塔的HDR图像色调映射算法, 2024, 61, 1006-4125, 0437009, 10.3788/LOP230581 | |
3. | Jin Seong Hong, Seung Gu Kim, Jung Soo Kim, Kang Ryoung Park, Deep learning-based restoration of multi-degraded finger-vein image by non-uniform illumination and noise, 2024, 133, 09521976, 108036, 10.1016/j.engappai.2024.108036 | |
4. | Elicia Min Yi Chan, Chee Kiat Seow, Esther Su Wee Tan, Min Wang, Peter Chunyu Yau, Qi Cao, 2024, SketchBoard: Sketch-Guided Storyboard Generation for Game Characters in the Game Industry, 979-8-3315-2747-1, 1, 10.1109/INDIN58382.2024.10774282 |
Network | Loss |
PretrainedCNNMSE | LMSE |
GANsMSE | LMSE+ Ladversarial |
PretrainedCNN# | LVGG using conv#_1 feature map |
GANs# | LVGG using conv#_1 feature map +Ladversarial |
Loss function | Pretrained CNN | GAN | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 0.67 | 0.98 | 0.73 | 0.98 |
conv1_1 | 0.81 | 0.98 | 0.83 | 0.98 |
conv2_1 | 0.93 | 0.98 | 0.96 | 0.98 |
conv3_1 | 0.97 | 0.98 | 0.98 | 0.98 |
conv4_1 | 1.12 | 0.98 | 1.02 | 0.98 |
conv5_1 | 2.36 | 0.97 | 1.91 | 0.97 |
Loss function | Pretrained CNN | GAN | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 1.19 | 0.97 | 1.30 | 0.97 |
conv1_1 | 1.42 | 0.97 | 1.44 | 0.96 |
conv2_1 | 1.51 | 0.96 | 1.59 | 0.96 |
conv3_1 | 1.46 | 0.96 | 1.50 | 0.96 |
conv4_1 | 1.81 | 0.96 | 1.71 | 0.95 |
conv5_1 | 4.94 | 0.94 | 3.85 | 0.94 |
Loss function | ×4 | ×8 | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 0.73 | 0.98 | 1.30 | 0.97 |
Conv2_1 | 0.96 | 0.98 | 1.59 | 0.96 |
Conv3_1 | 0.98 | 0.98 | 1.50 | 0.96 |
Proposed Loss with conv2_1 | 0.85 | 0.98 | 1.42 | 0.96 |
Proposed Loss with conv3_1 | 0.88 | 0.98 | 1.39 | 0.97 |
Network | Loss |
PretrainedCNNMSE | LMSE |
GANsMSE | LMSE+ Ladversarial |
PretrainedCNN# | LVGG using conv#_1 feature map |
GANs# | LVGG using conv#_1 feature map +Ladversarial |
Loss function | Pretrained CNN | GAN | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 0.67 | 0.98 | 0.73 | 0.98 |
conv1_1 | 0.81 | 0.98 | 0.83 | 0.98 |
conv2_1 | 0.93 | 0.98 | 0.96 | 0.98 |
conv3_1 | 0.97 | 0.98 | 0.98 | 0.98 |
conv4_1 | 1.12 | 0.98 | 1.02 | 0.98 |
conv5_1 | 2.36 | 0.97 | 1.91 | 0.97 |
Loss function | Pretrained CNN | GAN | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 1.19 | 0.97 | 1.30 | 0.97 |
conv1_1 | 1.42 | 0.97 | 1.44 | 0.96 |
conv2_1 | 1.51 | 0.96 | 1.59 | 0.96 |
conv3_1 | 1.46 | 0.96 | 1.50 | 0.96 |
conv4_1 | 1.81 | 0.96 | 1.71 | 0.95 |
conv5_1 | 4.94 | 0.94 | 3.85 | 0.94 |
Loss function | ×4 | ×8 | ||
RMSE (mm/day) | Correlation | RMSE (mm/day) | Correlation | |
MSE | 0.73 | 0.98 | 1.30 | 0.97 |
Conv2_1 | 0.96 | 0.98 | 1.59 | 0.96 |
Conv3_1 | 0.98 | 0.98 | 1.50 | 0.96 |
Proposed Loss with conv2_1 | 0.85 | 0.98 | 1.42 | 0.96 |
Proposed Loss with conv3_1 | 0.88 | 0.98 | 1.39 | 0.97 |