
REM Behavior Disorder (RBD) is considered one of most powerful prodromal condition in different neurodegenerative disorders, mainly alpha-synucleinopathies. A large amount of research recently explored this relationship.
The present critically appraised review undertakes this topic, from the perspective of the pathogenetic interplay between clinical manifestations in RBD patients and the misfolding processes that characterize neurodegeneration. In particular, evidence in favor and against the role of RBD as a biomarker of neurodegeneration is discussed.
The selected papers were functional to structure the review into three main sections: 1) Protein misfolding in neurodegenerative disorders with focus on alpha-synuclein; 2) Clinical features, diagnosis, and pathophysiology of RBD; 3) RBD as a clinical biomarker of protein misfolding. Data herein highlights the current knowledge and the areas of uncertainties in the relationship between RBD and neurodegenerative disorders; we went through preclinical, prodromal and clinical stages of neurodegenerative processes as a useful reference for clinicians involved in brain pathological aging and future research in this field.
Citation: Andrea Bernardini, Gaia Pellitteri, Giovanni Ermanis, Gian Luigi Gigli, Mariarosaria Valente, Francesco Janes. Critically appraised topic on Rapid Eye Movement Behavior Disorder: From protein misfolding processes to clinical pathophysiology and conversion to neurodegenerative disorders[J]. AIMS Molecular Science, 2023, 10(2): 127-152. doi: 10.3934/molsci.2023010
[1] | Jian Liu, Zhen Yu, Wenyu Guo . The 3D-aware image synthesis of prohibited items in the X-ray security inspection by stylized generative radiance fields. Electronic Research Archive, 2024, 32(3): 1801-1821. doi: 10.3934/era.2024082 |
[2] | Wu Zeng, Heng-liang Zhu, Chuan Lin, Zheng-ying Xiao . A survey of generative adversarial networks and their application in text-to-image synthesis. Electronic Research Archive, 2023, 31(12): 7142-7181. doi: 10.3934/era.2023362 |
[3] | Jinjiang Liu, Yuqin Li, Wentao Li, Zhenshuang Li, Yihua Lan . Multiscale lung nodule segmentation based on 3D coordinate attention and edge enhancement. Electronic Research Archive, 2024, 32(5): 3016-3037. doi: 10.3934/era.2024138 |
[4] | Shuangjie Yuan, Jun Zhang, Yujia Lin, Lu Yang . Hybrid self-supervised monocular visual odometry system based on spatio-temporal features. Electronic Research Archive, 2024, 32(5): 3543-3568. doi: 10.3934/era.2024163 |
[5] | Jiange Liu, Yu Chen, Xin Dai, Li Cao, Qingwu Li . MFCEN: A lightweight multi-scale feature cooperative enhancement network for single-image super-resolution. Electronic Research Archive, 2024, 32(10): 5783-5803. doi: 10.3934/era.2024267 |
[6] | Shizhen Huang, Enhao Tang, Shun Li, Xiangzhan Ping, Ruiqi Chen . Hardware-friendly compression and hardware acceleration for transformer: A survey. Electronic Research Archive, 2022, 30(10): 3755-3785. doi: 10.3934/era.2022192 |
[7] | Yixin Sun, Lei Wu, Peng Chen, Feng Zhang, Lifeng Xu . Using deep learning in pathology image analysis: A novel active learning strategy based on latent representation. Electronic Research Archive, 2023, 31(9): 5340-5361. doi: 10.3934/era.2023271 |
[8] | Chetan Swarup, Kamred Udham Singh, Ankit Kumar, Saroj Kumar Pandey, Neeraj varshney, Teekam Singh . Brain tumor detection using CNN, AlexNet & GoogLeNet ensembling learning approaches. Electronic Research Archive, 2023, 31(5): 2900-2924. doi: 10.3934/era.2023146 |
[9] | Zongsheng Zheng, Jia Du, Yuewei Zhang, Xulong Wang . CoReFuNet: A coarse-to-fine registration and fusion network for typhoon intensity classification using multimodal satellite imagery. Electronic Research Archive, 2025, 33(4): 1875-1901. doi: 10.3934/era.2025085 |
[10] | Huixia Liu, Zhihong Qin . Deep quantization network with visual-semantic alignment for zero-shot image retrieval. Electronic Research Archive, 2023, 31(7): 4232-4247. doi: 10.3934/era.2023215 |
REM Behavior Disorder (RBD) is considered one of most powerful prodromal condition in different neurodegenerative disorders, mainly alpha-synucleinopathies. A large amount of research recently explored this relationship.
The present critically appraised review undertakes this topic, from the perspective of the pathogenetic interplay between clinical manifestations in RBD patients and the misfolding processes that characterize neurodegeneration. In particular, evidence in favor and against the role of RBD as a biomarker of neurodegeneration is discussed.
The selected papers were functional to structure the review into three main sections: 1) Protein misfolding in neurodegenerative disorders with focus on alpha-synuclein; 2) Clinical features, diagnosis, and pathophysiology of RBD; 3) RBD as a clinical biomarker of protein misfolding. Data herein highlights the current knowledge and the areas of uncertainties in the relationship between RBD and neurodegenerative disorders; we went through preclinical, prodromal and clinical stages of neurodegenerative processes as a useful reference for clinicians involved in brain pathological aging and future research in this field.
Artificial Intelligence Generated Content (AIGC) is the term for digital media produced by machine learning methods, such as ChatGPT and stable diffusion [1], which are currently popular [2]. AIGC has various applications in domains such as entertainment, education, marketing and research [3]. Image synthesis is a subcategory of AIGC that involves generating realistic or stylized images from textual inputs, sketches or other images [4]. Image synthesis can also perform various tasks such as inpainting, semantic scene synthesis, super-resolution and unconditional image generation [1,5,6,7].
Currently, deep learning methods can be categorized into two types: Generative and discriminative [8]. The goal of a discriminative model is to directly predict or classify based on input data, without involving the process of data generation. Discriminative methods have various applications, such as image classification [9,10], image segmentation [11] and sequence prediction [12]. On the other hand, generative models learn the distribution characteristics of the data in order to generate new samples that are similar to the original data. Generative models also have many typical applications, such as image synthesis [13] and text synthesis [14].
Image synthesis can be classified into two types based on controllability: Unconditional and conditional [15]. Conditional image synthesis can be further divided into three levels of control: High, medium and low. High-level control refers to the image content such as category, medium-level control refers to the image background and other aspects and low-level control refers to manipulating the image based on the underlying principles of traditional computer vision [16,17,18].
Conventional 3D image synthesis techniques face challenges in handling intricate deFiguretails and patterns that vary across different objects [19]. Deep learning methods can better model the variations in shape, texture and illumination of 3D objects [20]. The field of deep learning-based image synthesis has made remarkable progress in recent years, aided by the availability of more open source data sets [21,22,23]. Various image synthesis methods have emerged, such as generative adversarial network (GAN) [7], diffusion model (DM) [6] and neural radiance field (NeRF) [24]. These methods differ in their levels of controllability: GAN and DM are suitable for high-level or medium-level controllable image synthesis, while NeRF is suitable for low-level controllable image synthesis.
Low-level controllable image synthesis can be categorized into geometric and illumination control. Geometric control involves manipulating the pose and structure of the scene, where the pose can refer to either the camera or the object, while the structure can refer to either the global shape (using depth maps, point clouds or other 3D representations) or the local attributes (such as size, shape, color, etc.) of the object. Illumination control involves manipulating the light source and the material properties of the object.
Several surveys have attempted to cover the state-of-the-art techniques and applications in image synthesis. However, most of these surveys have become obsolete due to the rapid development of the field [15] or have focused on the high-level and medium-level aspects of image synthesis, while ignoring the low-level aspects [25]. Furthermore, most of these surveys have adopted a methodological perspective, which is useful for researchers who want to understand the underlying principles and algorithms of image synthesis, but not for practitioners who want to apply image synthesis techniques to solve specific problems in various domains [25,26]. In this paper, we provide a task-oriented review of low-level controllable image synthesis, excluding human subjects [27,28,29,30].
This review offers a comprehensive overview of the state-of-the-art deep learning methods for low-level controllable image synthesis. The overview of the surveyed low-level controllable image synthesis is shown in Figure 1. In Section 2, we begin by introducing the common data sets and evaluation indicators for this task. For the data set section, we divide it by its content. In Sections 3 to 5, we survey the control methods based on pose (see Figure 2), structure (see Figure 3) and illumination (see Figure 4) and divide each section into global and local controls. In Section 6, we discuss some current applications of low-level controllable image synthesis based on deep learning. Finally, Section 7 concludes this paper. In the following sections, we will review common data sets and evaluation indicators in detail.
One of the key challenges in low-level controllable image synthesis is to evaluate the quality and diversity of the generated images. Different data sets and metrics have been proposed to measure various aspects of low-level controllable image synthesis, such as realism, consistency, fidelity and controllability. In this section, we will introduce some commonly used data sets and metrics for low-level controllable image synthesis and discuss their advantages and limitations.
3D image synthesis is the task of generating realistic images of 3D objects from different viewpoints. This task requires a large amount of training data that can capture the shape, texture, lighting and pose variations of 3D objects. Several data sets have been proposed for this purpose, each with its own advantages and limitations. Table 1 shows all the data sets covered in this section, as well as the relationships between the data sets and each section in the survey. The details of these data sets are as follows:
type | data sets | the section used |
viewpoint | ABO [31] | Section 3 |
Clevr3D [37] | ||
ScanNet [38] | ||
RealEstate10K [39] | ||
point cloud | ShapeNet [40] | Section 4 |
KITTI [41] | ||
nuScenes [42] | ||
Matterport3D [43] | ||
depth map | Middlebury Stereo [44,45,46,47,48] | |
NYU Depth [49] | ||
KITTI [41] | ||
illumination | Multi-PIE [35] | Section 5 |
Relightables [50] |
● ABO is a synthetic data set that contains 3D shapes generated by assembling basic objects (ABOs) such as cubes, spheres, cylinders and cones. It has 10 categories and 1000 shapes per category. ABO is useful for tasks such as shape abstraction, decomposition and generation. However, ABO is also limited by its synthetic nature, its small number of categories and instances and its lack of realistic lighting and occlusion [31].
● Clevr3D is a synthetic data set that contains 3D scenes composed of simple geometric shapes with various attributes such as color, size and material. It also provides natural language descriptions and questions for each scene. Clevr3D is useful for tasks such as scene understanding, reasoning and captioning. However, Clevr3D is also limited by its synthetic nature, its simple scene composition and its lack of realistic textures and backgrounds [37].
● ScanNet is an RGB-D video data set that contains 2.5 million views in more than 1500 scans of indoor scenes. It provides annotations such as camera poses, surface reconstructions and instance-level semantic segmentations. ScanNet is useful for tasks such as semantic segmentation, object detection and pose estimation. ScanNet is also limited by its incomplete coverage (due to scanning difficulties), its inconsistent labeling (due to human errors) and its lack of fine-grained details (such as object parts) [38].
● RealEstate10K is a data set for view synthesis that contains camera poses corresponding to 10 million frames derived from about 80,000 video clips gathered from YouTube videos. The data set also provides links to download the original videos. RealEstate10K is a large-scale and diverse data set that covers various types of scenes, such as houses, apartments, offices and landscapes. RealEstate10K is useful for tasks such as stereo magnification, light field rendering and novel view synthesis. However, RealEstate10K also has some challenges, such as the low quality of the videos, the inconsistency of the camera poses and the lack of depth information [39].
Point cloud data sets are collections of points that represent the shape and appearance of a 3D object or scene. They are often obtained from sensors such as lidars, radars or cameras. Some of the data sets are:
● ShapeNet is a large-scale repository of 3D CAD models that covers 55 common object categories and 4 million models. It provides rich annotations such as category labels, part labels, alignments and correspondences. ShapeNet is useful for tasks such as shape classification, segmentation, retrieval and completion. Some of the limitations of ShapeNet are that it does not contain realistic textures or materials, it does not capture the variability and diversity of natural scenes and it does not provide ground truth poses or camera parameters for rendering [40].
● KITTI is a data set for autonomous driving that contains 3D point clouds captured by a Velodyne HDL-64E LIDAR sensor, along with RGB images, GPS/IMU data, object annotations and semantic labels. KITTI is one of the most popular and challenging data sets for 3D object detection and semantic segmentation, as it covers various scenarios, weather conditions and occlusions. However, KITTI also has some limitations, such as the limited number of frames per sequence (around 200), the fixed sensor configuration and the lack of dynamic objects [41].
● nuScenes is another data set for autonomous driving that contains 3D point clouds captured by a 32-beam LIDAR sensor, along with RGB images, radar data, GPS/IMU data, object annotations and semantic labels. nuScenes is more comprehensive and diverse than KITTI, as it covers 1000 scenes from six cities in different countries, with varying traffic rules and driving behaviors. nuScenes also provides more temporal information, with 20 seconds of continuous data per scene. However, nuScenes also has some challenges, such as the lower resolution of the point clouds, the higher complexity of the scenes and the need for sensor fusion [42].
● Matterport3D is a data set for indoor scene understanding that contains 3D point clouds reconstructed from RGB-D images captured by a Matterport camera. The data set also provides surface reconstructions, camera poses and 2D and 3D semantic segmentations. Matterport3D is a large-scale and high-quality data set that covers 10,800 panoramic views from 194,400 RGB-D images in 90 building types. Matterport3D is useful for tasks such as keypoint matching, view overlap prediction and scene completion. However, Matterport3D also has some limitations, such as the lack of dynamic objects, the dependence on RGB-D sensors and the difficulty of obtaining ground truth annotations [43].
Depth map data sets are collections of images and their corresponding depth values, which can be used for various computer vision tasks such as depth estimation, 3D reconstruction, scene understanding, etc. The commonly used depth map data sets are as follows:
● Middlebury Stereo is a data set of stereo images with ground truth disparity maps obtained using structured light or a robot arm. It contains several versions of data sets collected from 2001 to 2021, with different scenes, resolutions and levels of difficulty. The data set is widely used for evaluating stereo matching algorithms and provides online benchmarks and leaderboards. The strengths of this data set are its high accuracy, diversity and availability. The limitations are its relatively small size, indoor scenes only and lack of semantic labels [44,45,46,47,48].
● NYU Depth Data set V2 is a data set of RGB-D images captured by Microsoft Kinect in various indoor scenes. It contains 1449 densely labeled pairs of aligned RGB and depth images, as well as 407,024 unlabeled frames. The data set also provides surface normals, 3D point clouds and semantic labels for each pixel. The data set is widely used for evaluating monocular depth estimation algorithms and provides online tools for data processing and visualization. The strengths of this data set are its large size, rich annotations and realistic scenes. The limitations are its low resolution, noisy depth values and indoor scenes only [49].
● KITTI also includes depth maps, but its depth maps are limited by sparse and noisy LiDAR depth maps. There is also a lack of real depth maps on the ground for certain scenes, as well as limitations on city Settings [41].
Illumination data sets are collections of information about the intensity, distribution and characteristics of artificial or natural light sources. Some examples of common illumination data sets are:
● Multi-PIE is a large-scale data set that contains over 750,000 images of 337 subjects, captured in 15 view angles and 19 illumination conditions. Each subject also performed different facial expressions, such as neutral, smile, surprise and squint. The data set is useful for studying face recognition, face alignment, face synthesis and face editing under varying conditions. However, Multi-PIE only contains images of Caucasian subjects, which limits its diversity and generalization [35].
● Relightables is a collection of high-quality 3D scans of human subjects under varying lighting conditions. This data set allows for realistic rendering of human performances with any lighting and viewpoint, which can be integrated into any CG scene. Nevertheless, this data set has some drawbacks, such as the low diversity of subjects, poses and expressions and the high computational expense of processing the data [50].
In conclusion, data sets are essential for low-level controllable image synthesis based on deep learning, as they provide the necessary information for training and evaluating deep generative models. These data sets provide rich annotations and variations for different type of control, such as viewpoint, lighting, poses, point clouds and depth. However, each data set has its own strengths and weaknesses, and there is room for improvement and innovation in this field.
To evaluate the quality and diversity of the synthesized images, several performance indicators are commonly used. Some of them are:
- Peak signal-to-noise ratio (PSNR) [51]: This measures the similarity between the synthesized image and a reference image in terms of pixel values. It is defined as the ratio of the maximum possible power of a signal to the power of noise that affects the fidelity of its representation. A higher PSNR indicates a better image quality.
- Structural similarity index (SSIM) [52]: This measures the similarity between the synthesized image and a reference image in terms of luminance, contrast and structure. It is based on the assumption that the human visual system is highly adapted to extract structural information from images. A higher SSIM indicates a better image quality.
- Learned perceptual image patch similarity (LPIPS) [53]: This measures the similarity between the synthesized image and a reference image in terms of deep features. It is defined as the distance between the activations of two image patches for a pre-trained network. A lower LPIPS indicates a better image quality.
- Inception score (IS) [54]: This measures the quality and diversity of the synthesized images using a pre-trained classifier, such as Inception-v3. It is based on the idea that good images should have high class diversity (i.e., they can be classified into different categories) and low class ambiguity (i.e., they can be classified with high confidence). A higher IS indicates a better image synthesis.
- Fréchet inception distance (FID) [55]: This measures the distance between the feature distributions of the synthesized images and the real images using a pre-trained classifier, such as Inception-v3. It is based on the idea that good images should have similar feature statistics to real images. A lower FID indicates a better image synthesis.
- Kernel inception distance (KID) [56]: This measures the squared maximum mean discrepancy between the feature distributions of the synthesized images and the real images using a pre-trained classifier, such as Inception-v3. It is based on the idea that good images should have similar feature statistics to real images. A lower KID indicates a better image synthesis.
GAN [7] can generate realistic and diverse data from a latent space. GAN consists of two neural networks: A generator and a discriminator. The generator tries to produce data that can fool the discriminator, while the discriminator tries to distinguish between real and fake data. Its network structure is shown in Figure 5. The loss function of GAN measures how well the generator and the discriminator perform their tasks. The loss function is usually composed of two terms: One for the generator (LG) and one for the discriminator (LD). LG is based on how often the discriminator classifies the generated data as real, while LD is based on how often it correctly classifies the real and fake data. The goal of GAN is to minimize LG and maximize LD. As shown in Eq (3.1).
LD=Ex∼pdata(x)[log(D(x))]+Ez∼pz(z)[log(1−D(G(z))))]LG=−Ez∼pz(z)[log(D(G(z))))]LGAN=LD+LG | (3.1) |
where x is the sample obtained from the real data distribution pdata, and z is the sample obtained from a specific distribution pz(z).
a) Crossview image synthesis. Viewpoint manipulation refers to the ability to manipulate the perspective or orientation of the objects or scenes in the synthetic images. The earliest view composites were usually only able to composite a specific view, such as a bird's eye view, a frontal view of a person's face, etc. Huang et al. introduced TP-GAN, a method that integrates global structure and local details to generate realistic frontal views of faces [58]. Similarly, Zhao et al. proposed VariGAN, which combines variational inference and GANs for the progressive refinement of synthesized target images [59]. To address the challenge of generating scenes from different viewpoints and resolutions, Regmi and Borji developed two methods: Crossview Fork (X-Fork) and Crossview Sequential (X-Seq) [60]. These methods employ semantic segmentation graphs to aid conditional GANs (cGANs) in producing sharper images. Furthermore, Regmi and Borji utilized geometry-guided cGANs for image synthesis, converting ground images to aerial views [61]. Mokhayeri et al. proposed a cross-domain face synthesis approach using a Controllable GAN (C-GAN). This method generates realistic face images under various poses by refining simulated images from a 3D face model through an adversarial game [62]. Zhu et al. developed BridgeGAN, a technique for synthesizing bird's eye view images from single frontal view images. They employed a homography view as an intermediate representation to accomplish this task [63]. Ding et al. addressed the problem of cross-view image synthesis by utilizing GANs based on deformable convolution and attention mechanisms [64]. Lastly, Ren et al. proposed MLP-Mixer GANs for cross-view image conversion. This method comprises two stages to alleviate severe deformation when generating entirely different views [65].
b) Free viewpoint image synthesis. By adding conditional inputs, such as a camera pose or camera manifold to the GAN network, they can output images from any viewpoint. Zhu et al. introduced CycleGAN, a method capable of recovering the front face from a single profile postural facial image, even when the source domain does not match the target domain [66]. This approach is based on a conditional variational autoencoder and GAN (cVAE-GAN) framework, which does not require paired data, making it a versatile method for view translation [67]. Shen et al. proposed Pairwise-GAN, employing two parallel U-Nets as generators and PatchGAN as a discriminator to synthesize frontal face images [68]. Similarly, Chan et al. presented pi-GAN, a method utilizing periodic implicit GANs for high-quality 3D-aware image synthesis [69]. Cai et al. further extended this approach with Pix2NeRF, an unsupervised method leveraging pi-GAN to train on single images without relying on 3D or multi-view supervision [70]. Leimkuhler et al. introduced FreeStyleGAN, which integrates a pre-trained StyleGAN into standard 3D rendering pipelines, enabling stereo rendering or consistent insertion of faces in synthetic 3D environments [71]. Medin et al. proposed MOST GAN, explicitly incorporating physical facial attributes as prior knowledge to achieve realistic portrait image manipulation [72]. On the other hand, Or-El et al. developed StyleSDF, a novel method generating images based on StyleGAN2 by utilizing Signed Distance Fields (SDFs) to accurately model 3D surfaces, enabling volumetric rendering with consistent results [73]. Additionally, Zheng et al. presented SDF-StyleGAN, a deep learning method for generating 3D shapes based on StyleGAN2, employing two new shape discriminators operating on global and local levels to compare real and synthetic SDF values and gradients, significantly enhancing shape geometry and visual quality [74]. Moreover, Deng et al. proposed GRAM, a novel approach regulating point sampling and radiance field learning on 2D manifolds, embodied as a set of learned implicit surfaces in the 3D volume, leading to improved synthesis results [75]. Xiang et al. built upon this work with GRAM-HD, capable of generating high-resolution images with strict 3D consistency, up to a resolution of 1024 x 1024 [76]. In another line of research, Chan et al. developed an efficient framework for generating realistic 3D shapes from 2D images using GANs, comprising a geometry-aware module predicting the 3D shape and its projection parameters from the input image, and a refinement module enhancing shape quality and details [77]. Similarly, Zhao et al. proposed a method for generating high-quality 3D images from 2D inputs using GAN, achieving consistency across different viewpoints and offering rendering with novel lighting effects [78]. Lastly, Alhaija et al. introduced XDGAN, a method for synthesizing realistic and diverse 3D shapes from 2D images, converting 3D shapes into compact 1-channel geometry images and utilizing StyleGAN3 and image-to-image translation networks to generate 3D objects in a 2D space [79]. These advancements in image synthesis techniques have significantly enriched the field of 3D image generation from 2D inputs.
NeRF [24] is a novel representation for complex 3D scenes that can be rendered photo realistically from any viewpoint. NeRF models a scene as a continuous function that maps 5D coordinates (3D location and 2D viewing direction, expressed as (x, y, z, θ, φ)) to a 4D output (RGB color and opacity). Its schematic diagram is shown in Figure 6. This function is learned from a set of posed images of the scene using a deep neural network. Before the NeRF passes the (x, y, z, θ, φ) input to the network, it maps the input to a higher dimensional space using high-frequency functions to better fit the data containing high-frequency variations. The high-frequency coding function is:
γ(p)=(sin(20πp),cos(20πp),…,sin(2L−1πp),cos(2L−1πp)) | (3.2) |
where p is the input (x, y, z, θ, φ).
Zhang et al. introduced NeRF++ as a framework that enhances NeRF through adaptive sampling, hierarchical volume rendering and multiscale feature encoding techniques [80]. This approach enables high-quality rendering for both static and dynamic scenes while improving efficiency and robustness. Rebain et al. proposed a method to enhance the efficiency and quality of neural rendering by employing spatial decomposition [81]. Park et al. developed a novel technique for capturing and rendering high-quality 3D selfies using a single RGB camera. Their method utilizes a deformable NeRF model capable of representing both the geometry and appearance of dynamic scenes [82]. Li et al. introduced MINE, a method for novel view synthesis and depth estimation from a single image. This approach generalizes Multiplane Images (MPI) with continuous depth using NeRF [83]. Park et al. proposed HyperNeRF, a method for representing and rendering complex 3D scenes with varying topology using neural radiance fields (NeRFs). Unlike previous NeRF-based approaches that rely on a fixed 3D coordinate system, HyperNeRF employs a higher-dimensional continuous embedding space to capture arbitrary scene changes [84]. Chen et al. presented Aug-NeRF, a novel method for training NeRFs with physically-grounded augmentations at different levels: Scene, camera and pixel [85]. Kaneko proposed AR-NeRF, a method for learning 3D representations of natural images without supervision. The approach utilizes a NeRF model to render images with various viewpoints and aperture sizes, capturing both depth and defocus effects [86]. Li et al. introduced SymmNeRF, a framework that utilizes NeRFs to synthesize novel views of objects from a single image. This method leverages symmetry priors to recover fine appearance details, particularly in self-occluded areas [87]. Zhou et al. proposed NeRFLiX, a novel framework for improving the quality of novel view synthesis using NeRF. This approach addresses rendering artifacts such as noise and blur by employing an inter-viewpoint aggregation framework that fuses high-quality training images to generate more realistic synthetic views [88].
Besides, a number of researchers have proposed enhancements to the original NeRF model, addressing its limitations in scenarios such as no camera pose, sparse data, noisy data, large-scale image synthesis and image synthesis speed. See Table 2.
Feature | Method | Publication | Image resolution | Data set |
No camera pose | NeRF– [89] | arXiv2022 | 756 x 1008/1080 x 1920/520 x 780 | [90]/ [39]/ [89] |
GNeRF [91] | ICCV2021 | 400 x 400/500 x 400 | [24]/ [92] | |
SCNeRF [93] | ICCV2021 | 756 x 1008/648 x 484 | [90]/ [94] | |
NoPe-NeRF [95] | CVPR2023 | 960 x 540/648 x 484 | [94]/ [38] | |
SPARF [96] | CVPR2023 | - | [92]/ [90]/ [97] | |
Sparse data | NeRS [98] | NIPS2021 | 600 x 450 | [98] |
MixNeRF [99] | CVPR2023 | - | [92]/ [90]/ [24] | |
SceneRF [100] | ICCV2023 | 1220 x 370 | [101] | |
GM-NeRF [102] | CVPR2023 | 224 x 224 | [103]/ [104]/ [105]/ [106] | |
SPARF [96] | CVPR2023 | 960 x 540/648 x 484 | [94]/ [38] | |
Noisy data | RawNeRF [107] | CVPR2022 | - | [107] |
Deblur-NeRF [108] | CVPR2022 | 512 x 512 | [108] | |
HDR-NeRF [109] | CVPR2022 | 400 x 400/804 x 534 | [109] | |
NAN [110] | CVPR2022 | - | [110] | |
Large-scale image synthesis | Mip-NeRF 360 [111] | CVPR2022 | 960 x 540 | [94] |
BungeeNeRF [112] | ECCV2022 | - | [113] | |
Block-NeRF [114] | CVPR2022 | - | [114] | |
GridNeRF [115] | CVPR2023 | 2048 x 2048/4096 x 4096 | [116]/ [115] | |
EgoNeRF [117] | CVPR2023 | 600 x 600 | [117] | |
Image synthesis speed | PlenOctrees [118] | ICCV2021 | 800 x 800/1920 x 1080 | [24]/ [94] |
DirectVoxGO [119] | CVPR2022 | 800 x 800/800 x 800/768 x 576/1920 x 1080/512 x 512 | [24]/ [120]/ [121]/ [94]/ [122] | |
R2L [123] | ECCV2022 | 800 x 800 | [24]/ [124] | |
SqueezeNeRF [125] | CVPR2022 | - | [24]/ [90] | |
MobileNeRF [126] | CVPR2023 | 800 x 800/756 x 1008/1256 x 828 | [24]/ [90]/ [111] | |
L2G-NeRF [127] | CVPR2023 | 756 x 1008 | [90] |
One of the most widely used models in deep learning is the diffusion model, which is a generative model that can produce realistic and diverse images from random noise. The diffusion model is based on the idea of reversing the process of adding Gaussian noise to an image until it becomes completely corrupted. The diffusion process starts from a data sample and gradually adds noise until it reaches a predefined noise level. If we use xt to represent the image information at time t, then the process (q(xt∣xt−1)) can be expressed as Eq (3.3). The generative model then learns to reverse this process by denoising the samples at each step, i.e., pθ(xt−1∣xt) in Figure 7, where θ represents the parameters of the neural network.
q(xt∣xt−1)=N(xt;√1−βtxt−1,βtI) | (3.3) |
where βt is the constant that changes with time t.
Sbrolli et al. introduced IC3D, a novel approach addressing various challenges in shape generation. This method is capable of reconstructing a 3D shape from a single view, synthesizing a 3D shape from multiple views and completing a 3D shape from partial inputs [128]. Another significant contribution in this area is the work by Gu et al., who developed Control3Diff, a generative model with 3D-awareness and controllability. By combining diffusion models and 3D GANs, Control3Diff can synthesize diverse and realistic images without relying on 3D ground truth data and can be trained solely on single-view image data sets [129]. Additionally, Anciukevicius et al. proposed RenderDiffusion, an innovative diffusion model for 3D generation and inference. Remarkably, this model can be trained using only monocular 2D supervision and incorporates an intermediate three-dimensional representation of the scene during each denoising step, effectively integrating a robust inductive structure into the diffusion process [130]. Xiang et al. presented a novel method for generating 3D-aware images using 2D diffusion models. Their approach involves a sequential process of generating multi-view 2D images from different perspectives, ultimately achieving the synthesis of 3D-aware images [131]. Furthermore, Liu et al. proposed a framework for changing the camera viewpoint of an object using only a single RGB image. Leveraging the geometric priors learned by large-scale diffusion models about natural images, their framework employs a synthetic data set to learn the controls for adjusting the relative camera viewpoint [132]. Lastly, Chan et al. developed a method for generating diverse and realistic novel views of a scene based on a single input image. Their approach utilizes a diffusion-based model that incorporates 3D geometry priors through a latent feature volume. This feature volume captures the distribution of potential scene representations and enables the rendering of view-consistent images [133].
Transformers are a type of neural network architecture that have been widely used in natural language processing. They are based on the idea of self-attention, which allows the network to learn the relationships between different parts of the input and output sequences. Transformers is introduced into the field of computer vision in the paper ViT [134]. Its core is the Attention section in Figure 8 and its formula is as follows:
Attention(Q,K,V)=softmax(QKT√dk)V | (3.4) |
where dk is a dimensional constant, Q represents the vector used to calculate the similarity between the current position (or token) and other positions (or tokens) in the sequence, K represents the vector associated with each position (or token) in the sequence and V represents the vector that contains the information or content associated with each position (or token) in the sequence.
Leveraging the Transformer architecture for vision applications, several studies have explored its potential for synthesizing 3D views. Nguyen-Ha and colleagues presented a pioneering approach to synthesizing new views of a scene using a given set of input views. Their method employs a transformer-based architecture that effectively captures the long-range dependencies among the input views. Using a sequential process, the method generates high-quality novel views. This research contribution is documented in [136]. Similarly, Yang and colleagues proposed an innovative method for generating viewpoint-invariant 3D shapes from a single image. Their approach is based on disentangling learning and parametric NURBS surface generation. The method employs an encoder-decoder network augmented with a disentangled transformer module. This configuration enables the independent learning of shape semantics and camera viewpoints. The output of this comprehensive network includes the geometric parameters of the NURBS surface representing the 3D shape, as well as the camera-viewpoint parameters involving rotation, translation and scaling. Further details of this method can be found in [137]. Additionally, Kulhánek and colleagues proposed ViewFormer, an impressive neural rendering method that does not rely on NeRF and instead capitalizes on the power of transformers. ViewFormer is designed to learn a latent representation of a scene using only a few images, and this learned representation enables the synthesis of novel views. Notably, ViewFormer can handle complex scenes with varying illumination and geometry without requiring any 3D information or ray marching. The specific approach and findings of ViewFormer are detailed in [138].
a) GAN-based NeRF. NeRF is a novel method for rendering images from arbitrary viewpoints, but it suffers from high computational cost due to its pixel-wise optimization. GANs can synthesize realistic images in a single forward pass, but they may not preserve the view consistency across different viewpoints. Hence, there is a growing interest in exploring the integration of NeRF and GAN for efficient and consistent image synthesis. Meng et al. presented the GNeRF framework, which combines GANs and NeRF reconstruction to generate scenes with unknown or random camera poses [91]. Similarly, Zhou et al. introduced CIPS-3D, a generative model that utilizes style transfer, shallow NeRF networks and deep INR networks to represent 3D scenes and provide precise control over camera poses [139]. Another approach by Xu et al. is GRAF, a generative model for radiance fields that enables high-resolution image synthesis while being aware of the 3D shape. GRAF disentangles camera and scene properties from unposed 2D images, allowing for the synthesis of novel views and modifications to shape and appearance [140]. Lan et al. proposed a self-supervised geometry-aware encoder for style-based 3D GAN inversion. Their encoder recovers the latent code of a given 3D shape and enables manipulation of its style and geometry attributes [141]. Li et al. developed a two-step approach for 3D-aware multi-class image-to-image translation using NeRFs. They trained a multi-class 3D-aware GAN with a conditional architecture and innovative training strategy. Based on this GAN, they constructed a 3D-aware image-to-image translation system [142]. Shahbazi et al. focused on knowledge distillation, proposing a method to transfer the knowledge of a GAN trained on NeRF representation to a convolutional neural network (CNN). This enables efficient 3D-aware image synthesis [143]. Kania et al. introduced a generative model for 3D objects based on NeRFs, which are rendered into 2D novel views using a hypernetwork. The model is trained adversarially with a 2D discriminator [144]. Lastly, Bhattarai et al. proposed TriPlaneNet, an encoder specifically designed for EG3D inversion. The task of EG3D inversion involves reconstructing 3D shapes from 2D edge images [145].
b) Diffusion model-based NeRF. Likewise, the diffusion model alone fails to produce images that are consistent across different viewpoints. Therefore, many researchers integrate it with NeRF to synthesize high-quality and view-consistent images. Muller et al. proposed DiffRF, which directly generates volumetric radiance fields from a set of posed images using a 3D denoising model and a rendering loss [146]. Similarly, Xu et al. proposed NeuralLift-360, a framework that generates a 3D object with 360° views from a single 2D photo using a depth-aware NeRF and a denoising diffusion model [147]. Chen et al. proposed a 3D-aware image synthesis framework using NeRF and diffusion models, which jointly optimizes a NeRF auto-decoder and a latent diffusion model to enable simultaneous 3D reconstruction and prior learning from multi-view images of diverse objects [148]. Lastly, Gu et al. proposed NeRFDiff, a method for generating realistic and 3D-consistent novel views from a single input image. This method distills the knowledge of the conditional diffusion model (CDM) into the NeRF by synthesizing and refining a set of virtual views at test time, using a NeRF-guided distillation algorithm [149]. These approaches demonstrate the potential of using NeRF and diffusion models for 3D scene synthesis, and further research in this area is expected to yield even more exciting results.
c) Transformer-based NeRF. Building on the previous work of integrating GANs and NeRFs, some researchers have explored the possibility of using Transformer models and NeRFs to generate 3D images that are consistent across different viewpoints. Wang et al. proposed a method that can handle complex scenes with dynamic objects and occlusions, and can generalize to unseen scenes without fine-tuning. The key idea is to use a transformer to learn a global latent representation of the scene, which is then used to condition a NeRF model that renders novel views [150]. Similarly, Lin et al. proposed a method for novel view synthesis from a single unposed image using NeRF and a vision transformer (ViT). The method leverages both global and local image features to form a 3D representation of the scene, which is then used to render novel views by a multi-layer perceptron (MLP) network [151]. Finally, Liu et al. proposed a method for visual localization using a conditional NeRF model. The method can estimate the 6-DoF pose of a query image given a sparse set of reference images and their poses [152]. These methods demonstrate the potential of NeRFs and transformers in addressing challenging problems in computer vision.
Liao et al. proposed a novel framework consisting of two components for learning generative models that can achieve this goal. The first component is a 3D generator that learns to reconstruct the 3D shape and appearance of an object from a single image, while the second component is a 2D generator that learns to render the 3D object into a 2D image. This framework can generate high-quality images with controllable factors such as pose, shape and appearance [153]. Nguyen-Phuoc et al. proposed BlockGAN, a novel image generative model that can create realistic images of scenes composed of multiple objects. BlockGAN learns to generate 3D features for each object and combine them into a 3D scene representation. The model then renders the 3D scene into a 2D image, taking into account the occlusion and interaction between objects, such as shadows and lighting. BlockGAN can manipulate the pose and identity of each object independently while preserving image quality [154]. Pan et al. proposed a novel framework that can reconstruct 3D shapes from 2D image GANs without any supervision or prior knowledge. The method can generate realistic and diverse 3D shapes for various object categories, and the reconstructed shapes are consistent with the 2D images generated by the GANs. The recovered 3D shapes allow high-quality image editing such as relighting and object rotation [155]. Tewari et al. proposed a novel 3D generative model that can learn to separate the geometry and appearance factors of objects from a data set of monocular images. The model uses a non-rigid deformable scene formulation, where each object instance is represented by a deformed canonical 3D volume. The model can also compute dense correspondences between images and embed real images into its latent space, enabling editing of real images [156].
Niemeyer and Geiger introduced GIRAFFE, a deep generative model that can synthesize realistic and controllable images of 3D scenes. The model represents scenes as compositional neural feature fields that encode the shape and appearance of individual objects as well as the background. The model can disentangle these factors from unstructured and unposed image collections without any additional supervision. With GIRAFFE, individual objects in the scene can be manipulated by translating, rotating, or changing their appearance, as well as changing the camera pose [33]. Yang et al. proposed a neural scene rendering system called OC-NeRF that learns an object-compositional NeRF for editable scene rendering. OC-NeRF consists of a scene branch and an object branch, which encode the scene and object geometry and appearance, respectively. The object branch is conditioned on learnable object activation codes that enable object-level editing such as moving, adding or rotating objects [32]. Kobayashi et al. proposed a method to enable semantic editing of 3D scenes represented by NeRFs. The authors introduced distilled feature fields (DFFs), which are 3D feature descriptors learned by transferring the knowledge of pre-trained 2D image feature extractors such as CLIP-LSeg or DINO. DFFs allow users to query and select specific regions or objects in the 3D space using text, image patches or point-and-click inputs. The selected regions can then be edited in various ways, such as rotation, translation, scaling, warping, colorization or deletion [157]. Zhang et al. introduced NeRFlets, a new approach to represent 3D scenes from 2D images using local radiance fields. Unlike prior approaches that rely on global implicit functions, NeRFlets partition the scene into a collection of local coordinate frames that encode the structure and appearance of the scene. This enables efficient rendering and editing of complex scenes with high fidelity and detail. NeRFlets can manipulate the object's orientation, position and size, among other operations [158]. Finally, Zheng et al. proposed EditableNeRF, a method that allows users to edit dynamic scenes modeled by NeRF with key points. The method can handle topological changes and generate novel views from a single camera input. The key points are detected and optimized automatically by the network, and users can drag them to modify the scene. These approaches provide various means for 3D scene synthesis and editing, including manipulating objects, changing camera pose, selecting and editing specific regions or objects and handling topological changes [159].
A depth map is a representation of the distance between a scene and a reference point, such as a camera. It can be used to create realistic effects such as depth of field, occlusion and parallax [160]. Liang et al. proposed a novel method called SPIDR for representing and manipulating 3D objects using neural point fields (NPFs) and signed distance functions (SDFs) [161]. The method combines explicit point cloud and implicit neural representations to enable high-quality mesh and surface reconstruction for object deformation and lighting estimation. With the trained SPIDR model, various geometric edits can be applied to the point cloud representation, which can be used for image editing. Zhang et al. introduced a new method for rendering point clouds with frequency modulation, which enables easy editing of shape and appearance [162]. The method converts point clouds into a set of frequency-modulated signals that can be rendered efficiently using Fourier analysis. The signals can also be manipulated in the frequency domain to achieve various editing effects, such as deformation, smoothing, sharpening and color adjustment. Chen et al. also proposed NeuralEditor, a novel method for editing NeRFs for shape editing tasks [163]. The method uses point clouds as the underlying structure to construct NeRFs and renders them with a new scheme based on K-D tree-guided voxels. NeuralEditor can perform shape deformation and scene morphing by mapping points between point clouds.
Zhu et al. introduced the Visual Object Networks (VON) framework, which enables the disentangled learning of 3D object representations from 2D images. This framework comprises three modules, namely a shape generator, an appearance generator and a rendering network. By manipulating the generators, VON can perform a range of tasks, including shape manipulation, appearance transfer and novel view synthesis [164]. Mirzaei et al. proposed a reference-guided controllable inpainting method for NeRFs, which allows for the synthesis of novel views of a scene with missing regions. The method employs a reference image to guide the inpainting process and a user interface that enables the user to adjust the degree of blending between the reference and the original NeRF [165]. Yin et al. introduced OR-NeRF, a novel pipeline that can remove objects from 3D scenes using point or text prompts on a single view. This pipeline leverages a points projection strategy, a 2D segmentation model, 2D inpainting methods and depth supervision and perceptual loss to achieve better editing quality and efficiency than previous works [166]. Kim et al. proposed a visual comfort aware-reinforcement learning (VCARL) method for depth adjustment of stereoscopic 3D images. This method aims to improve the visual quality and comfort of 3D images by learning a depth adjustment policy from human feedback [167]. These advancements offer various means of manipulating objects, adjusting depth and generating novel views, ultimately enhancing the quality and realism of 3D scene synthesis and editing.
In recent years, there have been significant advancements in the field of 3D scene inpainting and editing using GANs. Jheng et al. proposed a dual-stream GAN for free-form 3D scene inpainting. The network comprises two streams, namely a depth stream and a color stream, which are jointly trained to inpaint the missing regions of a 3D scene. The depth stream predicts the depth map of the scene, while the color stream synthesizes the color image. This approach enables the removal of objects using existing 3D editing tools [168]. Another recent development in GAN training is the introduction of LinkGAN, a regularizer proposed by Zhu et al. that links some latent axes to image regions or semantic categories. By resampling partial latent codes, this approach enables local control of GAN generation [34]. Wang et al. proposed a novel method for synthesizing realistic images of indoor scenes with explicit camera pose control and object-level editing capabilities. This method builds on BlobGAN, a 2D GAN that disentangles individual objects in the scene using 2D blobs as latent codes. To extend this approach to 3D scenes, the authors introduced 3D blobs, which capture the 3D nature of objects and allow for flexible manipulation of their location and appearance [169]. These recent advancements in GAN-based 3D scene inpainting and editing have the potential to significantly improve the quality and realism of synthesized scenes.
Liu et al. [120] introduced Neural Sparse Voxel Fields (NSVF), which combines neural implicit functions with sparse voxel octrees to enable high-quality novel view synthesis from a sparse set of input images, without requiring explicit geometry reconstruction or meshing. Gu et al. [170] introduced StyleNeRF, a method that enables camera pose manipulation for synthesizing high-resolution images with strong multi-view coherence and photo realism. Wang et al. [171] introduced CLIP-NeRF, a method for manipulating 3D objects represented by NeRF using text or image inputs. Kania et al. [172] proposed a novel method for manipulating neural 3D representations of scenes beyond novel view rendering by allowing the user to specify which part of the scene they want to control with mask annotations in the training images. Lazova et al. [173] proposed a novel method for performing flexible, 3D-aware image content manipulation while enabling high-quality novel view synthesis by combining scene-specific feature volumes with a general neural rendering network. Yuan et al. [174] proposed a method for user-controlled shape deformation of scenes represented by implicit neural rendering, especially NeRF. Sun et al. [175] proposed NeRFEditor, a learning framework for 3D scene editing that uses a pre-trained StyleGAN model and a NeRF model to generate stylized images from a 360-degree video input. Wang et al. [176] proposed a novel method for image synthesis of topology-varying objects using generative deformable radiance fields (GDRFs). Tertikas et al. [177] proposed PartNeRF, a novel part-aware generative model for editable 3D shape synthesis that does not require any explicit 3D supervision. Bao et al. [178] proposed SINE, a novel approach for editing a NeRF with a single image or text prompts. Cohen-Bar et al. [179] proposed a novel framework for synthesizing and manipulating 3D scenes from text prompts and object proxies. Finally, Mirzaei et al. [180] proposed a novel method for reconstructing 3D scenes from multi-view images by leveraging NeRF to model the geometry and appearance of the scene, and introducing a segmentation network and a perceptual inpainting network to handle occlusions and missing regions. These methods represent significant progress towards the goal of enabling high-quality, user-driven 3D scene synthesis and editing.
Avrahami et al. [181] introduced a method for local image editing based on natural language descriptions and region-of-interest masks. The method uses a pre-trained language-image model (CLIP) and a denoising diffusion probabilistic model (DDPM) to produce realistic outcomes that conform to the text input. It can perform various editing tasks, such as object addition, removal, replacement or modification, background replacement and image extrapolation. Nichol et al. [182] proposed GLIDE, a diffusion-based model for text-conditional image synthesis and editing. This method uses a guidance technique to trade off diversity for fidelity and produces photorealistic images that match the text prompts. Couairon et al. [183] proposed DiffEdit, a method that uses text-conditioned diffusion models to edit images based on text queries. It can automatically generate a mask that highlights the regions of the image that need to be changed according to the text query. It also uses latent inference to preserve the content in those regions. DiffEdit can produce realistic and diverse semantic image edits for various text prompts and image sources. Sella et al. [184] proposed Vox-E, a novel framework that uses latent diffusion models to edit 3D objects based on text prompts. It takes 2D images of a 3D object as input and learns a voxel grid representation of it. It then optimizes a score distillation loss to align the voxel grid with the text prompt while regularizing it in 3D space to preserve the global structure of the original object. Vox-E can create diverse and realistic edits. Haque et al. [185] proposed a novel method for editing 3D scenes with natural language instructions. The method leverages a NeRF representation of the scene and a transformer-based model that can parse the instructions and modify the NeRF accordingly. The method can perform various editing tasks, such as changing the color, shape, position and orientation of objects, as well as adding and removing objects, with high fidelity and realism. Lin et al. [186] proposed CompoNeRF, a novel method for text-guided multi-object compositional NeRF with editable 3D scene layout. CompoNeRF can synthesize photorealistic images of complex scenes from natural language descriptions and user-specified camera poses. It can also edit the 3D layout of the scene by manipulating the objects' positions, orientations and scales. These methods have shown promising results in advancing the field of image and 3D object editing using natural language descriptions and they have the potential to be applied in various applications.
Controllable image generation refers to the use of technology to generate images and to constrain and adjust the generation process so that the generated images meet specific requirements. By guiding external conditions or manipulating and adjusting the code, it is possible to trim a certain area or attribute of the image while leaving other areas or attributes unchanged. To solve the low-level image generation problem, we analyze the image generation for different conditions, lighting being one of them, and summarize the algorithms for each solution under different lighting conditions.
Inverse rendering. Currently, neural rendering is applied to scene restruction. One approach is to capture photometric appearance variations in in-the-wild data, decomposing the scene into image-dependent shared components [187].
Another very important type of rendering is inverse rendering. The inverse rendering of objects under completely unknown capture conditions is a fundamental challenge in computer vision and graphics. This challenge is especially acute when the input image is captured in a complex and changing environment. Without using the NeRF method, Boss et al. proposed a join optimization framework to estimate the shape, BRDF, per-image camera pose and illumination [188].
Choi et al. proposed IBL-NeRF also based on rendering. This method's inverse rendering extends the original NeRF formulation to capture the spatial variation of lighting within the scene volume, in addition to surface properties. Specifically, the scenes of diverse materials are decomposed into intrinsic components for image-based rendering, namely, albedo, roughness, surface normal, irradiance and pre-filtered radiance. All the components are inferred as neural images from MLP and model large-scale general scenes [189].
However, NeRF-based methods encode shape, reflectance and illumination implicitly and this makes it challenging for users to manipulate these properties in the rendered images explicitly. So a new hybrid SDF-based 3D neural representation is generated, capable of rendering scene deformations and lighting more accurately. This neural representation also adds a new SDF regularization. The disadvantage of this approach is that it sacrifices rendering quality. In reverse rendering, high render quality is often at odds with accurate lighting decomposition, as shadows and lighting can easily be misinterpreted as textures. Therefore, rendering quality requires a concerted effort of surface reconstruction and reverse rendering [161]. While dynamic NeRF is a powerful algorithm capable of rendering photo-realistic novel view images from a monocular RGB video of a dynamic scene. However, dynamic NeRF does not model the change of the reflected color during the warping. This is one of its drawbacks. To address this problem in rendering, Yan et al. allowed specularly reflective surfaces of different poses to maintain different reflective colors when mapped to the common canonical space by reformulating the neural radiation field function as conditional on the position and orientation of the surface in the observation space. This method more accurately reconstructs and renders dynamic specular scenes [190].
The inverse rendering objective function of this method is as follows:
L=Lrender +Lpref +Lprior +λI, reg LI, reg | (5.1) |
Lrender and Lpref are rendering losses to match the rendered images with the input images.
Next, we will explain each of these parameters.
Lrender =‖Lo(r)−ˆLo(r)‖22, | (5.2) |
This is for each pixel of the camera light. r represents a single piexl. Where Lo is our estimated radiance and ˆLo is ground truth radiance.
Lpref=∑j‖Ljpref (r)−LjG(r)‖22. | (5.3) |
This is the rendering loss of pre-filtered radiation.Ljpref (r) is inferred prefiltered radiance of jth level and LjG(r) is the radiance convolved with jth level Gaussian convolution, where L0G = L.
Lprior =‖a(r)−ˆa(r)‖22. | (5.4) |
The equation encourages our inferred albedo a to match the pseudo albedo.
LI, reg =‖I(r)−E[ˆI]‖22, | (5.5) |
This is the irradiance regularization loss, where E[ˆI] is the mean of irradiance (shading) values in training set images.
The absence of ideal light and the fact that the studied objects are in an unfavorable environment such as deflection, movement, darkness and high interference can lead to under-illuminated, single irradiated light source and complex illumination of the acquired images, all of which can degrade the performance of the final image generation. Next, we will review the various ways to deal with these aspects.
The use of illumination normalization GAN-IN-GAN can be well generalized to images with less illumination variations. The method combines deep convolutional neural networks and GANs to normalize the illumination of color or grayscale face images, then train feature extractors and classifiers, and process both frontal and non-frontal face images illumination. The method can be extended to other areas, not only for face image generation. However, it cannot preserve more texture details and has some limitations. Moreover, the training model is conducted with well-controlled illumination variations, which can deal with poorly controlled illumination variation to a certain extent, but there are limitations to the study of other features and geometric structures in realistic and complex environments, etc. It can be further investigated whether the model can work better if the model is trained under complex lighting changes [191].
When the data set is insufficient, an unsupervised approach can be used for this. For example, for low-light scenes, the unsupervised Aleth-NeRF method is used to learn directly from dark images. The algorithm is mainly a multi-view synthesis method that takes a low-light scene as input and renders a normally illuminated scene. However, a model needs to be trained specifically for different scenes and does not handle non-uniform lighting conditions well [192].
Furthermore, as far as the results are concerned, images taken in low-light scenes are affected by distracting factors such as blur and noise. For this type of problem, a hybrid architecture based on Retinex theory and GAN can be used to deal with it. For image vision tasks in the dark or under low light conditions, the image is first decomposed into a light image and a reflection image, and then the enhancement part is used to generate a high quality clear image, starting from minimizing the effect of blurring or noise generation. The method introduces structural similarity loss to avoid the side effect of blur. However, real-life eligible low level and high level images may not be easily acquired and have the shortage of input. Additionally, to maximize the performance of the algorithm, a sufficient size of data set is required. The data obtained after training also has the problem of real-time, which is not enough to meet real-life needs. In general, the algorithm is only from the perspective of solving image blurring and noise, making the impact of these two minimal, other aspects of the problem still exists more, need to further optimize the network structure [193]. This class of problems can also be explored by exploring multiple diffusion spaces to estimate the light component, which is used as bright pixels to enhance the shimmering image based on the maximum diffusion value. Generates high-fidelity images without significant distortion, minimizing the problem of noise amplification [194]. Later, the conditional diffusion implicit model is utilized in DiFaReli's method (DDIM) to decode the coding of decomposed light. Ponglertnapakorn et al. proposed a novel conditioning technique that eases the modeling of the complex interaction between light and geometry using a rendered shading reference to spatially modulate the DDIM. This method allows for single-view face reillumination in the wild. However, this method has limitations in eliminating shadows cast by external objects and is susceptible to image ambiguity [195].
In summary, the full objectives of this method are as follows:
L(G,D)=Ladversarial (G,D)+λ1×Lcontent (G)+λ2×Ll1(G) | (5.6) |
where λ1,λ2 are weight parameters respectively.
Ladversarial , Lcontent and Ll1 are as follows:
Ladversarial (G,D)=Ex[logD(x)]+EG(x)[log(1−D(G(x)))]Lcontent (G)=‖F(y)−F(G(x))‖1Ll1(G)=‖y−G(x)‖1 | (5.7) |
where x denotes input image, whereas y is the target image, F means feature extractor.
A method of generating scenes with a sense of reality from captured object images can be used when the light is moving. On the basis of NeRFs, the bulk density of the scene and the radiance of the directional emission are simulated. A representation of each object light transmission is implicitly simulated using illumination and view-related neural networks. This approach can cope with the problem of light movement without retraining the model [196].
For the characteristics of light inhomogeneity in the environment, it is possible to use the light correction network framework, UDoc-GAN, to solve it. The main thing is to convert uncertain normal to abnormal image panning to deterministic image panning with different levels of ambient light for learning guidance. In contrast, Aleth-NeRF cannot handle non-uniform illumination or shadow images. Meanwhile, UDoc-GAN algorithm is more computationally efficient in the inference stage and closer to realistic requirements [197].
Ling et al. monitored the camera illumination between the scene and multi-view image planes and noticed shadow rays, which led to a new shadow ray supervision scheme. This scheme optimizes the samples and ray positions along the rays. By supervising the shadow rays to achieve controllable illumination, a neural SDF network for single-view scene reproduction under multi-illumination conditions is finally constructed. However, the method is applicable only to point and parallel light sources and has obvious requirements for the position of the light source. The implementation of the method is also based on a simple environment where the scene is not illuminate [198].
Also, for uncontrolled complex environment settings from which images are acquired, the NeRF-OSR algorithm enables the generation of new views and new illumination. This is a solution for image generation in complex environments. Solving some fuzzy performance from the perspective of optimizing this algorithm can be an interesting future research direction. For example, resolving inaccuracies in geometric estimation, incorporating more priori knowledge of the outdoor scenes, etc., [199]. Later, for this problem, Higuera et al. proposed a solution to the complex problem of light variation by reducing the perceptual differences in vision and using a probabilistic diffusion model to capture light. The method is implemented based on simulated data and can address the limitations of large-scale data. Of course, the method suffers from the problem of computation time, especially in the denoising process which consumes more time [200]. This is especially true for reflections in complex environments, for example, with glass and mirrors. Guo et al. introduced NeRFReN for simulating scenes with reflections, mainly by dividing the scene into transmission and reflection components and modeling these two components with independent neural radiation fields. This approach has far-reaching implications for further research in scene understanding and neural editing. However, this method does not consider modeling curved reflective surfaces and multiple non-coplanar reflective surfaces [201].
Generally speaking, reflected light can be divided into three components, namely ambient reflection, diffuse reflection and specular reflection. The different media materials that cause the reflected light will show different lighting cues in the exposure. An omnidirectional illumination method trains deep neural networks on videos with automatic exposure and white balance to match real images with predicted illumination based on image reillumination and then regression from the background [202].
The method focuses on minimizing the reconstructed illumination loss function and adding an adversarial loss. And the reconstructed illumination loss and the adversarial loss are as follows:
Lrec=2∑b=0λb‖ˆM⊙(Λ(ˆIb)1γ−Λ(Ib))‖1. | (5.8) |
In this formulation, the linear rendering of the shear is γ-encoded with γ to match I.ˆM represents a binary mask.λb represents an optional weight.
Ladv=logD(Λ(Ic))+log(1−D(Λ(∑θ,ϕR(θ,ϕ)eG(x;θ,ϕ))1γ)) | (5.9) |
In this formulation, the D represents an auxiliary discriminator network, the G represents the generator, the x represents input image.
Therefore, combining the two yields the following common objectives:
G∗=argminGmaxD(1−λrec)E[Ladv]+λrecE[Lrec] | (5.10) |
Of course, there are certainly real-life situations where the reflectance is similar.
In illumination variation, there is also a cluster optimization method based on neural reflection field using reflection iteration to solve the problem of similar reflectance of different instances from the perspective of hierarchical clustering. However, there exists the challenge of facing complex scenarios that do not conform to the unsupervised intrinsic prior, and solutions to such problems need to be proposed [203].
Different mediums have different radiance to light, using a web-based query light integration network on which reflection decomposition is performed. The algorithm captures changing illumination, enabling more accurate new view compositing and reillumination. Finally, fast and practical distinguishable rendering areas are implemented. The algorithm can also estimate the shape and BRDF of the objects in the image, which is a point of superiority over other algorithms. However, this method has some limitations in the study of mutual reflection. In particular, an effective treatment of the interactions between all effects could be a future research direction [204].
The proportion of different neural network models utilized in the survey within Sections 3–5 of this review is visually presented through a Figure 9. These models can be classified into seven distinct types: NeRF, GAN, Hybrid NeRF, Transformer, DM and others. The chart reveals that NeRF stands out as the most prevalent model, accounting for 45% of the survey. Following closely behind is GAN, which represents 25% of the survey. Hybrid NeRF secures the third position with a representation of 13%, while DM follows closely at 12%. Transformer is the fifth most popular model, appearing in 2.5% of the survey. Lastly, a similar percentage of 2.5% is attributed to the utilization of other models.
This proportion of neural network models highlights the dominance of NeRF in the survey, indicating its widespread usage and recognition in the field. GAN also holds a significant share, reflecting its popularity for various applications. Hybrid NeRF, DM and Transformer, although not as prevalent as NeRF and GAN, demonstrate notable representation in the survey. The remaining 2.5% is distributed among other models, indicating a diverse landscape of neural network approaches explored in the reviewed survey.
Low-level controllable image synthesis has many potential applications in various domains, such as entertainment, industry and security.
a) Video games. 3D image synthesis can create immersive and interactive virtual worlds for gamers to explore and enjoy. It can also enhance the realism and variety of characters, objects and environments in the game [205,206].
b) Movies and TV shows. 3D image synthesis can produce stunning visual effects and animations for movies and TV shows. It can also enable the creation of digital actors, creatures and scenarios that would be impossible or impractical to film in real life [207,208].
c) Virtual reality and augmented reality. 3D image synthesis can generate realistic and immersive virtual experiences for users who wear VR or AR devices. It can also augment the real world with digital information and graphics that enhance the user's perception and interaction [209].
d) Art and design. 3D image synthesis can enable artists and designers to express their creativity and vision in new ways. It can also facilitate the creation and presentation of 3D artworks, models and prototypes [210].
a) Product design and prototyping. Using 3D image synthesis, designers can visualize and test different aspects of their products, such as shape, color, texture, functionality and performance, before manufacturing them. This can save time and money, as well as improve the quality and innovation of the products [211].
b) Training and simulation. Using 3D image synthesis, trainers can create realistic and immersive scenarios for workers to practice their skills and learn new procedures. For example, 3D image synthesis can be used to simulate hazardous environments, such as oil rigs, mines or nuclear plants, where workers can train safely and effectively.
c) Inspection and quality control. Using 3D image synthesis, inspectors can detect and analyze defects and errors in products or processes, such as cracks, leaks or misalignments. For example, 3D image synthesis can be used to inspect complex structures, such as bridges, pipelines or aircrafts, where human inspection may be difficult or dangerous [212,213].
a) Biometric authentication. 3D image synthesis can be used to generate realistic face images from 3D face scans or facial landmarks, which can be used for identity verification or access control. For example, Face ID on iPhone uses 3D image synthesis to project infrared dots on the user's face and match them with the stored 3D face model [214,215].
b) Forensic analysis. 3D image synthesis can be used to reconstruct crime scenes or evidence from partial or noisy data, such as surveillance videos, witness sketches or DNA samples. For example, Snapshot DNA Phenotyping uses 3D image synthesis to predict the facial appearance of a person from their DNA [216].
c) Counter-terrorism. 3D image synthesis can be used to detect and prevent potential threats by generating realistic scenarios or simulations based on intelligence data or risk assessment. For example, the US Department of Defense uses 3D image synthesis to create virtual environments for training and testing purposes.
d) Cybersecurity. 3D image synthesis can be used to protect sensitive data or systems from unauthorized access or manipulation by generating fake or distorted images that can fool attackers or malware. For example, Adversarial Robustness Toolbox uses 3D image synthesis to generate adversarial examples that can evade or mislead deep learning models [217].
In this paper, we have given a comprehensive survey of the emerging progress on low-level controllable image synthesis. We discussed a variety of low-level controllable image synthesis aspects according to their low-level vision cues. The survey reviewed important progress made on 3D data sets, geometrically controllable image synthesis, photometrically controllable image synthesis and related applications. Moreover, the global and local synthesis approaches are separately summarized in each controllable mode to further distinguish diverse synthesis tasks. Our goal is to provide a useful guide for the researchers and developers who would be interested to synthesizing and editing the image from the low-level 3D prompts. We categorize literatures mainly according to controllable 3D cues since they directly decide our synthesis tasks and abilities. However, there are still other non-rigid 3D cues such as body kinematic joints and elastic shape deformation which are not covered by this survey.
3D controlled image synthesis is a challenging task that aims to generate realistic and diverse images of 3D objects with user-specified attributes, such as pose, shape, appearance and viewpoint. In our view, some of the difficulties facing this task are: Data scarcity and diversity, as 3D controlled image synthesis requires large-scale and high-quality data sets of 3D objects with various attributes and annotations. However, such data sets are scarce and expensive to obtain, especially for complex scenes and fine-grained categories. Moreover, the data distribution may not cover all possible attribute combinations, leading to mode collapse or unrealistic synthesis. Model complexity and efficiency: 3D controlled image synthesis involves modeling both the 3D structure and the 2D appearance of the objects, which requires sophisticated and computationally intensive models. Controllability and interpretability: 3D controlled image synthesis aims to provide users with intuitive and flexible control over the synthesis process. However, existing methods often use latent codes or predefined attributes as control inputs, which may not reflect the user's intention or expectation. Moreover, the relationship between the control inputs and the synthesis outputs may not be clear or consistent, making it difficult to interpret and manipulate the results.
In response to the above-mentioned tasks, we recommend readers to utilize large-scale models judiciously, as the training of such models incorporates a vast amount of data, which can overcome certain challenges arising from data limitations. Furthermore, we suggest conducting further research on potential decomposition or inverse rendering techniques. In the future, we expect that more explainable controllable cues can be explored from current diffusion and NeRFs models by advanced latent decomposition or inverse rendering techniques. Together with the semantic-level controllable image synthesis, the low-level low-level controllable image synthesis and editing can generate more incredible and reliable images in our lives.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This research was supported by NSFC (No. 61871074).
The authors declare that there are no conflicts of interest.
[1] |
Cao Q, Tan CC, Xu W, et al. (2020) The prevalence of dementia: A systematic review and meta-analysis. J Alzheimers Dis 73: 1157-1166. https://doi.org/10.3233/JAD-191092 ![]() |
[2] |
Heinzel S, Berg D, Gasser T, et al. (2019) Update of the MDS research criteria for prodromal Parkinson's disease. Mov Disord 34: 1464-1470. https://doi.org/10.1002/mds.27802 ![]() |
[3] |
Ahn EH, Kang SS, Liu X, et al. (2020) Initiation of Parkinson's disease from gut to brain by δ-secretase. Cell Res 30: 70-87. https://doi.org/10.1038/s41422-019-0241-9 ![]() |
[4] |
Sohrab SS, Suhail M, Ali A, et al. (2018) Role of viruses, prions and miRNA in neurodegenerative disorders and dementia. VirusDis 29: 419-433. https://doi.org/10.1007/s13337-018-0492-y ![]() |
[5] |
Baev AY, Vinokurov AY, Novikova IN, et al. (2022) Interaction of mitochondrial calcium and ROS in neurodegeneration. Cells 11: 706. https://doi.org/10.3390/cells11040706 ![]() |
[6] |
Kovacs GG (2016) Molecular pathological classification of neurodegenerative diseases: Turning towards precision medicine. Int J Mol Sci 17: 189. https://doi.org/10.3390/ijms17020189 ![]() |
[7] |
Scialò C, De Cecco E, Manganotti P, et al. (2019) Prion and prion-like protein strains: Deciphering the molecular basis of heterogeneity in neurodegeneration. Viruses 11: 261. https://doi.org/10.3390/v11030261 ![]() |
[8] |
Nelson PT, Dickson DW, Trojanowski JQ, et al. (2019) Limbic-predominant age-related TDP-43 encephalopathy (LATE): Consensus working group report. Brain 142: 1503-1527. https://doi.org/10.1093/brain/awz099 ![]() |
[9] |
Ghemrawi R, Khair M (2020) Endoplasmic reticulum stress and unfolded protein response in neurodegenerative diseases. Int J Mol Sci 21: 6127. https://doi.org/10.3390/ijms21176127 ![]() |
[10] |
Oakes SA, Papa FR (2015) The role of endoplasmic reticulum stress in human pathology. Annu Rev Pathol-Mech 10: 173-194. https://doi.org/10.1146/annurev-pathol-012513-104649 ![]() |
[11] |
Ma Y, Hendershot LM (2004) ER chaperone functions during normal and stress conditions. J Chem Neuroanat 28: 51-65. https://doi.org/10.1016/j.jchemneu.2003.08.007 ![]() |
[12] |
Chiti F, Dobson CM (2017) Protein misfolding, amyloid formation, and human disease: A summary of progress over the last decade. Annu Rev Biochem 86: 27-68. https://doi.org/10.1146/annurev-biochem-061516-045115 ![]() |
[13] |
Glabe CG (2006) Common mechanisms of amyloid oligomer pathogenesis in degenerative disease. Neurobiol Aging 27: 570-575. https://doi.org/10.1016/j.neurobiolaging.2005.04.017 ![]() |
[14] |
Roberts HL, Brown DR (2015) Seeking a mechanism for the toxicity of oligomeric α-synuclein. Biomolecules 5: 282-305. https://doi.org/10.3390/biom5020282 ![]() |
[15] |
Kaur S, Verma H, Dhiman M, et al. (2021) Brain exosomes: Friend or foe in Alzheimer's disease?. Mol Neurobiol 58: 6610-6624. https://doi.org/10.1007/s12035-021-02547-y ![]() |
[16] |
Soto C, Pritzkow S (2018) Protein misfolding, aggregation, and conformational strains in neurodegenerative diseases. Nat Neurosci 21: 1332-1340. https://doi.org/https://doi.org/10.1038/s41593-018-0235-9 ![]() |
[17] |
Jarrett JT, Lansbury PT (1993) Seeding “one-dimensional crystallization” of amyloid: A pathogenic mechanism in Alzheimer's disease and scrapie?. Cell 73: 1055-1058. https://doi.org/10.1016/0092-8674(93)90635-4 ![]() |
[18] |
Soto C, Estrada L, Castilla J (2006) Amyloids, prions and the inherent infectious nature of misfolded protein aggregates. Trends Biochem Sci 31: 150-155. https://doi.org/10.1016/j.tibs.2006.01.002 ![]() |
[19] |
Melki R (2018) How the shapes of seeds can influence pathology. Neurobiol Dis 109: 201-208. https://doi.org/10.1016/j.nbd.2017.03.011 ![]() |
[20] |
Prusiner SB (1982) Novel proteinaceous infectious particles cause scrapie. Science 216: 136-144. https://doi.org/10.1126/science.6801762 ![]() |
[21] |
Soto C (2012) Transmissible proteins: expanding the prion heresy. Cell 149: 968-977. https://doi.org/10.1016/j.cell.2012.05.007 ![]() |
[22] |
Aguzzi A, Lakkaraju AKK (2016) Cell biology of prions and prionoids: A status report. Trends Cell Biol 26: 40-51. https://doi.org/10.1016/j.tcb.2015.08.007 ![]() |
[23] |
Liberski PP, Gajos A, Sikorska B, et al. (2019) Kuru, the first human prion disease. Viruses 11: 232. https://doi.org/10.3390/v11030232 ![]() |
[24] |
Das AS, Zou WQ (2016) Prions: Beyond a single protein. Clin Microbiol Rev 29: 633-658. https://doi.org/10.1128/CMR.00046-15 ![]() |
[25] |
Polymenidou M, Cleveland DW (2012) Prion-like spread of protein aggregates in neurodegeneration. J Exp Med 209: 889-893. https://doi.org/10.1084/jem.20120741 ![]() |
[26] |
Luk KC, Kehm VM, Zhang B, et al. (2012) Intracerebral inoculation of pathological α-synuclein initiates a rapidly progressive neurodegenerative α-synucleinopathy in mice. J Exp Med 209: 975-986. https://doi.org/10.1084/jem.20112457 ![]() |
[27] |
Bousset L, Pieri L, Ruiz-Arlandis G, et al. (2013) Structural and functional characterization of two alpha-synuclein strains. Nat Commun 4: 2575. https://doi.org/10.1038/ncomms3575 ![]() |
[28] |
Sacino AN, Brooks M, Thomas MA, et al. (2014) Intramuscular injection of α-synuclein induces CNS α-synuclein pathology and a rapid-onset motor phenotype in transgenic mice. Proc Natl Acad Sci U S A 111: 10732-10737. https://doi.org/10.1073/pnas.1321785111 ![]() |
[29] |
Mougenot AL, Nicot S, Bencsik A, et al. (2012) Prion-like acceleration of a synucleinopathy in a transgenic mouse model. Neurobiol Aging 33: 2225-2228. https://doi.org/10.1016/j.neurobiolaging.2011.06.022 ![]() |
[30] |
Carlson GA, Prusiner SB (2021) How an infection of sheep revealed prion mechanisms in Alzheimer's disease and other neurodegenerative disorders. Int J Mol Sci 22: 4861. https://doi.org/10.3390/ijms22094861 ![]() |
[31] |
Watts JC, Giles K, Oehler A, et al. (2013) Transmission of multiple system atrophy prions to transgenic mice. Proc Natl Acad Sci U S A 110: 19555-19560. https://doi.org/10.1073/pnas.1318268110 ![]() |
[32] |
Braak H, Del Tredici K, Rüb U, et al. (2003) Staging of brain pathology related to sporadic Parkinson's disease. Neurobiol Aging 24: 197-211. https://doi.org/10.1016/s0197-4580(02)00065-9 ![]() |
[33] |
Peelaerts W, Bousset L, der Perren AV, et al. (2015) α-Synuclein strains cause distinct synucleinopathies after local and systemic administration. Nature 522: 340-344. https://doi.org/10.1038/nature14547 ![]() |
[34] |
Shahnawaz M, Mukherjee A, Pritzkow S, et al. (2020) Discriminating α-synuclein strains in Parkinson's disease and multiple system atrophy. Nature 578: 273-277. https://doi.org/10.1038/s41586-020-1984-7 ![]() |
[35] |
Trotti LM (2010) REM sleep behaviour disorder in older individuals: Epidemiology, pathophysiology and management. Drugs Aging 27: 457-470. https://doi.org/10.2165/11536260-000000000-00000 ![]() |
[36] |
Haba-Rubio J, Frauscher B, Marques-Vidal P, et al. (2018) Prevalence and determinants of rapid eye movement sleep behavior disorder in the general population. Sleep 41: zsx197. https://doi.org/10.1093/sleep/zsx197 ![]() |
[37] |
Fernández-Arcos A, Iranzo A, Serradell M, et al. (2016) The clinical phenotype of idiopathic rapid eye movement sleep behavior disorder at presentation: A study in 203 consecutive patients. Sleep 39: 121-132. https://doi.org/10.5665/sleep.5332 ![]() |
[38] |
Schenck CH, Bundlie SR, Ettinger MG, et al. (1986) Chronic behavioral disorders of human REM sleep: A new category of parasomnia. Sleep 9: 293-308. https://doi.org/10.1093/sleep/9.2.293 ![]() |
[39] |
Cartwright RD (2014) Alcohol and NREM parasomnias: Evidence versus opinions in the international classification of sleep disorders, 3rd edition. J Clin Sleep Med 10: 1039-1040. Published 2014 Sep 15. https://doi.org/10.5664/jcsm.4050 ![]() |
[40] |
Bassetti CL, Bargiotas P (2018) REM sleep behavior disorder. Front Neurol Neurosci 41: 104-116. https://doi.org/10.1159/000478914 ![]() |
[41] | Berry RB, Quan SF, Abreu AR, et al. (2020) The AASM manual for the scoring of sleep and associated events: Rules, terminology and technical specifications, Version 2.6. American Academy of Sleep Medicine. |
[42] |
Frauscher B, Iranzo A, Gaig C, et al. (2012) Normative EMG values during REM sleep for the diagnosis of REM sleep behavior disorder. Sleep 35: 835-847. https://doi.org/10.5665/sleep.1886 ![]() |
[43] |
Ferri R, Rundo F, Manconi M, et al. (2010) Improved computation of the atonia index in normal controls and patients with REM sleep behavior disorder. Sleep Med 11: 947-949. https://doi.org/10.1016/j.sleep.2010.06.003 ![]() |
[44] |
Frandsen R, Nikolic M, Zoetmulder M, et al. (2015) Analysis of automated quantification of motor activity in REM sleep behaviour disorder. J Sleep Res 24: 583-590. https://doi.org/10.1111/jsr.12304 ![]() |
[45] |
Frauscher B, Gschliesser V, Brandauer E, et al. (2007) Video analysis of motor events in REM sleep behavior disorder. Mov Disord 22: 1464-1470. https://doi.org/10.1002/mds.21561 ![]() |
[46] |
Manni R, Terzaghi M, Glorioso M (2009) Motor-behavioral episodes in REM sleep behavior disorder and phasic events during REM sleep. Sleep 32: 241-245. https://doi.org/10.1093/sleep/32.2.241 ![]() |
[47] |
Sixel-Döring F, Schweitzer M, Mollenhauer B, et al. (2011) Intraindividual variability of REM sleep behavior disorder in Parkinson's disease: A comparative assessment using a new REM sleep behavior disorder severity scale (RBDSS) for clinical routine. J Clin Sleep Med 7: 75-80. ![]() |
[48] |
Waser M, Stefani A, Holzknecht E, et al. (2020) Automated 3D video analysis of lower limb movements during REM sleep: a new diagnostic tool for isolated REM sleep behavior disorder. Sleep 43: zsaa100. https://doi.org/10.1093/sleep/zsaa100 ![]() |
[49] |
Scherfler C, Frauscher B, Schocke M, et al. (2011) White and gray matter abnormalities in idiopathic rapid eye movement sleep behavior disorder: A diffusion-tensor imaging and voxel-based morphometry study. Ann Neurol 69: 400-407. https://doi.org/10.1002/ana.22245 ![]() |
[50] |
Ehrminger M, Latimier A, Pyatigorskaya N, et al. (2016) The coeruleus/subcoeruleus complex in idiopathic rapid eye movement sleep behaviour disorder. Brain 139: 1180-1188. https://doi.org/10.1093/brain/aww006 ![]() |
[51] |
De Marzi R, Seppi K, Högl B, et al. (2016) Loss of dorsolateral nigral hyperintensity on 3.0 tesla susceptibility-weighted imaging in idiopathic rapid eye movement sleep behavior disorder. Ann Neurol 79: 1026-1030. https://doi.org/10.1002/ana.24646 ![]() |
[52] |
Iranzo A, Lomeña F, Stockner H, et al. (2010) Decreased striatal dopamine transporter uptake and substantia nigra hyperechogenicity as risk markers of synucleinopathy in patients with idiopathic rapid-eye-movement sleep behaviour disorder: A prospective study. Lancet Neurol 9: 1070-1077. https://doi.org/10.1016/S1474-4422(10)70216-7 ![]() |
[53] |
Iranzo A, Santamaría J, Valldeoriola F, et al. (2017) Dopamine transporter imaging deficit predicts early transition to synucleinopathy in idiopathic rapid eye movement sleep behavior disorder. Ann Neurol 82: 419-428. https://doi.org/10.1002/ana.25026 ![]() |
[54] |
Aurora RN, Zak RS, Maganti RK, et al. (2010) Best practice guide for the treatment of REM sleep behavior disorder (RBD). J Clin Sleep Med 6: 85-95. ![]() |
[55] |
Devnani P, Fernandes R (2015) Management of REM sleep behavior disorder: An evidence based review. Ann Indian Acad Neurol 18: 1-5. ![]() |
[56] |
McGrane IR, Leung JG, St Louis EK, et al. (2015) Melatonin therapy for REM sleep behavior disorder: A critical review of evidence. Sleep Med 16: 19-26. https://doi.org/10.1016/j.sleep.2014.09.011 ![]() |
[57] |
Arnaldi D, Antelmi E, St Louis EK, et al. (2017) Idiopathic REM sleep behavior disorder and neurodegenerative risk: To tell or not to tell to the patient? How to minimize the risk?. Sleep Med Rev 36: 82-95. https://doi.org/10.1016/j.smrv.2016.11.002 ![]() |
[58] |
Roguski A, Rayment D, Whone AL, et al. (2020) A neurologist's guide to REM sleep behavior disorder. Front Neurol 11: 610. https://doi.org/10.3389/fneur.2020.00610 ![]() |
[59] |
Knudsen K, Fedorova TD, Hansen AK, et al. (2018) In-vivo staging of pathology in REM sleep behaviour disorder: A multimodality imaging case-control study. Lancet Neurol 17: 618-628. https://doi.org/10.1016/S1474-4422(18)30162-5 ![]() |
[60] |
Valencia Garcia S, Libourel PA, Lazarus M, et al. (2017) Genetic inactivation of glutamate neurons in the rat sublaterodorsal tegmental nucleus recapitulates REM sleep behaviour disorder. Brain 140: 414-428. https://doi.org/10.1093/brain/aww310 ![]() |
[61] |
Luppi PH, Clément O, Sapin E, et al. (2011) The neuronal network responsible for paradoxical sleep and its dysfunctions causing narcolepsy and rapid eye movement (REM) behavior disorder. Sleep Med Rev 15: 153-163. https://doi.org/10.1016/j.smrv.2010.08.002 ![]() |
[62] |
Fraigne JJ, Torontali ZA, Snow MB, et al. (2015) REM sleep at its core–circuits, neurotransmitters, and Pathophysiology. Front Neurol 6: 123. https://doi.org/10.3389/fneur.2015.00123 ![]() |
[63] |
Arrigoni E, Chen MC, Fuller PM (2016) The anatomical, cellular and synaptic basis of motor atonia during rapid eye movement sleep. J Physiol 594: 5391-5414. https://doi.org/10.1113/JP271324 ![]() |
[64] |
Blumberg MS, Plumeau AM (2016) A new view of “dream enactment” in REM sleep behavior disorder. Sleep Med Rev 30: 34-42. https://doi.org/10.1016/j.smrv.2015.12.002 ![]() |
[65] |
Garcia-Rill E (2017) Bottom-up gamma and stages of waking. Med Hypotheses 104: 58-62. https://doi.org/10.1016/j.mehy.2017.05.023 ![]() |
[66] |
Teman PT, Tippmann-Peikert M, Silber MH, et al. (2009) Idiopathic rapid-eye-movement sleep disorder: Associations with antidepressants, psychiatric diagnoses, and other factors, in relation to age of onset. Sleep Med 10: 60-65. https://doi.org/10.1016/j.sleep.2007.11.019 ![]() |
[67] |
Lin FC, Liu CK, Hsu CY (2009) Rapid-eye-movement sleep behavior disorder secondary to acute aseptic limbic encephalitis. J Neurol 256: 1174-1176. https://doi.org/10.1007/s00415-009-5067-9 ![]() |
[68] |
Zanigni S, Calandra-Buonaura G, Grimaldi D, et al. (2011) REM behaviour disorder and neurodegenerative diseases. Sleep Med 12: S54-S58. https://doi.org/10.1016/j.sleep.2011.10.012 ![]() |
[69] |
Postuma RB, Iranzo A, Hu M, et al. (2019) Risk and predictors of dementia and parkinsonism in idiopathic REM sleep behaviour disorder: A multicentre study. Brain 142: 744-759. https://doi.org/10.1093/brain/awz030 ![]() |
[70] |
Berg D, Postuma RB, Adler CH, et al. (2015) MDS research criteria for prodromal Parkinson's disease. Mov Disord 30: 1600-1611. https://doi.org/10.1002/mds.26431 ![]() |
[71] |
Högl B, Stefani A, Videnovic A (2018) Idiopathic REM sleep behaviour disorder and neurodegeneration–an update. Nat Rev Neurol 14: 40-55. https://doi.org/10.1038/nrneurol.2017.157 ![]() |
[72] |
Iranzo A, Santamaria J, Tolosa E (2009) The clinical and pathophysiological relevance of REM sleep behavior disorder in neurodegenerative diseases. Sleep Med Rev 13: 385-401. https://doi.org/10.1016/j.smrv.2008.11.003 ![]() |
[73] |
Hoque R, Chesson AL (2010) Pharmacologically induced/exacerbated restless legs syndrome, periodic limb movements of sleep, and REM behavior disorder/REM sleep without atonia: literature review, qualitative scoring, and comparative analysis. J Clin Sleep Med 6: 79-83. ![]() |
[74] |
Winkelman JW, James L (2004) Serotonergic antidepressants are associated with REM sleep without atonia. Sleep 27: 317-321. https://doi.org/10.1093/sleep/27.2.317 ![]() |
[75] |
Iranzo A, Santamaria J (1999) Bisoprolol-induced rapid eye movement sleep behavior disorder. Am J Med 107: 390-392. https://doi.org/10.1016/s0002-9343(99)00245-4 ![]() |
[76] |
Morrison I, Frangulyan R, Riha RL, et al. (2011) Beta-blockers as a cause of violent rapid eye movement sleep behavior disorder: A poorly recognized but common cause of violent parasomnias. Am J Med 124: E11. https://doi.org/10.1016/j.amjmed.2010.04.023 ![]() |
[77] |
Verma A, Anand V, Verma NP (2007) Sleep disorders in chronic traumatic brain injury. J Clin Sleep Med 3: 357-362. ![]() |
[78] |
Kimura K, Tachibana N, Kohyama J, et al. (2000) A discrete pontine ischemic lesion could cause REM sleep behavior disorder. Neurology 55: 894-895. https://doi.org/10.1212/wnl.55.6.894 ![]() |
[79] |
Xi Z, Luning W (2009) REM sleep behavior disorder in a patient with pontine stroke. Sleep Med 10: 143-146. https://doi.org/10.1016/j.sleep.2007.12.002 ![]() |
[80] |
Zambelis T, Paparrigopoulos T, Soldatos CR (2002) REM sleep behaviour disorder associated with a neurinoma of the left pontocerebellar angle. J Neurol Neurosurg Psychiatry 72: 821-822. https://doi.org/10.1136/jnnp.72.6.821 ![]() |
[81] |
Jianhua C, Xiuqin L, Quancai C, et al. (2013) Rapid eye movement sleep behavior disorder in a patient with brainstem lymphoma. Internal Med 52: 617-621. https://doi.org/10.2169/internalmedicine.52.8786 ![]() |
[82] |
Plazzi G, Montagna P (2002) Remitting REM sleep behavior disorder as the initial sign of multiple sclerosis. Sleep Med 3: 437-439. https://doi.org/10.1016/s1389-9457(02)00042-4 ![]() |
[83] |
Tippmann-Peikert M, Boeve BF, et al. (2006) REM sleep behavior disorder initiated by acute brainstem multiple sclerosis. Neurology 66: 1277-1279. https://doi.org/10.1212/01.wnl.0000208518.72660.ff ![]() |
[84] |
Iranzo A, Graus F, Clover L, et al. (2006) Rapid eye movement sleep behavior disorder and potassium channel antibody–associated limbic encephalitis. Ann Neurol 59: 178-181. https://doi.org/10.1002/ana.20693 ![]() |
[85] |
Ralls F, Cutchen L, Grigg-Damberger MM (2022) Recognizing new-onset sleep disorders in autoimmune encephalitis often prompt earlier diagnosis. J Clin Neurophysiol 39: 363-371. https://doi.org/10.1097/WNP.0000000000000820 ![]() |
[86] |
Schenck CH, Mahowald MW (1992) Motor dyscontrol in narcolepsy: Rapid-eye-movement (REM) sleep without atonia and REM sleep behavior disorder. Ann Neurol 32: 3-10. https://doi.org/10.1002/ana.410320103 ![]() |
[87] |
Nightingale S, Orgill JC, Ebrahim IO, et al. (2005) The association between narcolepsy and REM behavior disorder (RBD). Sleep Med 6: 253-258. https://doi.org/10.1016/j.sleep.2004.11.007 ![]() |
[88] |
Gaig C, Graus F, Compta Y, et al. (2017) Clinical manifestations of the anti-IgLON5 disease. Neurology 88: 1736-1743. https://doi.org/10.1212/WNL.0000000000003887 ![]() |
[89] |
Arnulf I, Merino-Andreu M, Bloch F, et al. (2005) REM sleep behavior disorder and REM sleep without atonia in patients with progressive supranuclear palsy. Sleep 28: 349-354. ![]() |
[90] |
De Cock VC, Lannuzel A, Verhaeghe S, et al. (2007) REM sleep behavior disorder in patients with guadeloupean parkinsonism, a tauopathy. Sleep 30: 1026-1032. https://doi.org/10.1093/sleep/30.8.1026 ![]() |
[91] |
Lugaresi E, Provini F (2001) Agrypnia excitata: Clinical features and pathophysiological implications. Sleep Med Rev 5: 313-322. https://doi.org/10.1053/smrv.2001.0166 ![]() |
[92] |
Friedman JH, Fernandez HH, Sudarsky LR (2003) REM behavior disorder and excessive daytime somnolence in Machado-Joseph disease (SCA-3). Mov Disord 18: 1520-1522. https://doi.org/10.1002/mds.10590 ![]() |
[93] |
Arnulf I, Nielsen J, Lohmann E, et al. (2008) Rapid eye movement sleep disturbances in Huntington disease. Arch Neurol 65: 482-488. https://doi.org/10.1001/archneur.65.4.482 ![]() |
[94] |
Mufti K, Yu E, Rudakou U, et al. (2021) Novel associations of BST1 and LAMP3 With REM sleep behavior disorder. Neurology 96: e1402-e1412. https://doi.org/10.1212/WNL.0000000000011464 ![]() |
[95] |
Krohn L, Heilbron K, Blauwendraat C, et al. (2022) Genome-wide association study of REM sleep behavior disorder identifies polygenic risk and brain expression effects. Nat Commun 13: 7496. https://doi.org/10.1038/s41467-022-34732-5 ![]() |
[96] |
Sosero YL, Yu E, Estiar MA, et al. (2022) Rare PSAP variants and possible interaction with GBA in REM sleep behavior disorder. J Parkinson's Dis 12: 333-340. https://doi.org/10.3233/JPD-212867 ![]() |
[97] |
Bencheikh BOA, Ruskey JA, Arnulf I, et al. (2018) LRRK2 protective haplotype and full sequencing study in REM sleep behavior disorder. Parkinsonism Relat Disord 52: 98-101. https://doi.org/10.1016/j.parkreldis.2018.03.019 ![]() |
[98] | Somerville EN, Krohn L, Yu E, et al. NPC1 variants are not associated with Parkinson's Disease, REM-sleep behaviour disorder or Dementia with Lewy bodies in European cohorts (2022). https://doi.org/10.1101/2022.11.08.22281508 |
[99] |
Comella CL, Nardine TM, Diederich NJ, et al. (1998) Sleep-related violence, injury, and REM sleep behavior disorder in Parkinson's disease. Neurology 51: 526-529. https://doi.org/10.1212/wnl.51.2.526 ![]() |
[100] |
Gagnon JF, Bédard MA, Fantini ML, et al. (2002) REM sleep behavior disorder and REM sleep without atonia in Parkinson's disease. Neurology 59: 585-589. https://doi.org/10.1212/wnl.59.4.585 ![]() |
[101] |
Boeve BF, Silber MH, Ferman TJ (2004) REM sleep behavior disorder in Parkinson's disease and dementia with Lewy bodies. J Geriatr Psych Neur 17: 146-157. https://doi.org/10.1177/0891988704267465 ![]() |
[102] |
Plazzi G, Corsini R, Provini F, et al. (1997) REM sleep behavior disorders in multiple system atrophy. Neurology 48: 1094-1097. https://doi.org/10.1212/wnl.48.4.1094 ![]() |
[103] |
Tachibana N, Kimura K, Kitajima K, et al. (1997) REM sleep motor dysfunction in multiple system atrophy: with special emphasis on sleep talk as its early clinical manifestation. J Neurol Neurosur Ps 63: 678-681. https://doi.org/10.1136/jnnp.63.5.678 ![]() |
[104] |
Schenck CH, Boeve BF, Mahowald MW (2013) Delayed emergence of a parkinsonian disorder or dementia in 81% of older men initially diagnosed with idiopathic rapid eye movement sleep behavior disorder: A 16-year update on a previously reported series. Sleep Med 14: 744-748. https://doi.org/10.1016/j.sleep.2012.10.009 ![]() |
[105] |
Iranzo A, Fernández-Arcos A, Tolosa E, et al. (2014) Neurodegenerative disorder risk in idiopathic REM sleep behavior disorder: Study in 174 patients. PLoS One 9: e89741. https://doi.org/10.1371/journal.pone.0089741 ![]() |
[106] |
Adler CH, Beach TG, Zhang N, et al. (2019) Unified staging system for Lewy body disorders: Clinicopathologic correlations and comparison to Braak staging. J Neuropath Exp Neur 78: 891-899. https://doi.org/10.1093/jnen/nlz080 ![]() |
[107] |
Boeve BF, Silber MH, Ferman TJ, et al. (2013) Clinicopathologic correlations in 172 cases of rapid eye movement sleep behavior disorder with or without a coexisting neurologic disorder. Sleep Med 14: 754-762. https://doi.org/10.1016/j.sleep.2012.10.015 ![]() |
[108] |
Uchiyama M, Isse K, Tanaka K, et al. (1995) Incidental Lewy body disease in a patient with REM sleep behavior disorder. Neurology 45: 709-712. https://doi.org/10.1212/wnl.45.4.709 ![]() |
[109] |
Boeve BF, Dickson DW, Olson EJ, et al. (2007) Insights into REM sleep behavior disorder pathophysiology in brainstem-predominant Lewy body disease. Sleep Med 8: 60-64. https://doi.org/10.1016/j.sleep.2006.08.017 ![]() |
[110] |
Boeve BF, Silber MH, Saper CB, et al. (2007) Pathophysiology of REM sleep behaviour disorder and relevance to neurodegenerative disease. Brain 130: 2770-2788. https://doi.org/10.1093/brain/awm056 ![]() |
[111] |
Sixel-Döring F, Zimmermann J, Wegener A, et al. (2016) The evolution of REM sleep behavior disorder in early Parkinson Disease. Sleep 39: 1737-1742. https://doi.org/10.5665/sleep.6102 ![]() |
[112] |
Liu Y, Zhang J, Chau SWH, et al. (2022) Evolution of prodromal REM sleep behavior disorder to neurodegeneration: A retrospective longitudinal case-control study. Neurology 99: e627-e637. https://doi.org/10.1212/WNL.0000000000200707 ![]() |
[113] |
McCarter SJ, Sandness DJ, McCarter AR, et al. (2019) REM sleep muscle activity in idiopathic REM sleep behavior disorder predicts phenoconversion. Neurology 93: e1171-e1179. https://doi.org/10.1212/WNL.0000000000008127 ![]() |
[114] |
Nepozitek J, Dostalova S, Dusek P, et al. (2019) Simultaneous tonic and phasic REM sleep without atonia best predicts early phenoconversion to neurodegenerative disease in idiopathic REM sleep behavior disorder. Sleep 42: zsz132. https://doi.org/10.1093/sleep/zsz132 ![]() |
[115] | Iranzo A, Borrego S, Vilaseca I, et al. (2018) α-Synuclein aggregates in labial salivary glands of idiopathic rapid eye movement sleep behavior disorder. Sleep 41: zsy101. https://doi.org/10.1093/sleep/zsy101 |
[116] |
Donadio V, Doppler K, Incensi A, et al. (2019) Abnormal α-synuclein deposits in skin nerves: Intra- and inter-laboratory reproducibility. Eur J Neurol 26: 1245-1251. https://doi.org/10.1111/ene.13939 ![]() |
[117] |
Doppler K, Jentschke HM, Schulmeyer L, et al. (2017) Dermal phospho-alpha-synuclein deposits confirm REM sleep behaviour disorder as prodromal Parkinson's disease. Acta Neuropathol 133: 535-545. https://doi.org/10.1007/s00401-017-1684-z ![]() |
[118] |
Antelmi E, Pizza F, Donadio V, et al. (2019) Biomarkers for REM sleep behavior disorder in idiopathic and narcoleptic patients. Ann Clin Transl Neur 6: 1872-1876. https://doi.org/10.1002/acn3.50833 ![]() |
[119] |
Wilham JM, Orrú CD, Bessen RA, et al. (2010) Rapid end-point quantitation of prion seeding activity with sensitivity comparable to bioassays. PLoS Pathog 6: e1001217. https://doi.org/10.1371/journal.ppat.1001217 ![]() |
[120] |
Scialò C, Tran TH, Salzano G, et al. (2020) TDP-43 real-time quaking induced conversion reaction optimization and detection of seeding activity in CSF of amyotrophic lateral sclerosis and frontotemporal dementia patients. Brain Commun 2: fcaa142. https://doi.org/10.1093/braincomms/fcaa142 ![]() |
[121] |
Saijo E, Ghetti B, Zanusso G, et al. (2017) Ultrasensitive and selective detection of 3-repeat tau seeding activity in Pick disease brain and cerebrospinal fluid. Acta Neuropathol 133: 751-765. https://doi.org/10.1007/s00401-017-1692-z ![]() |
[122] |
Tennant JM, Henderson DM, Wisniewski TM, et al. (2020) RT-QuIC detection of tauopathies using full-length tau substrates. Prion 14: 249-256. https://doi.org/10.1080/19336896.2020.1832946 ![]() |
[123] |
Fairfoul G, McGuire LI, Pal S, et al. (2016) Alpha-synuclein RT-QuIC in the CSF of patients with alpha-synucleinopathies. Ann Clin Tran Neur 3: 812-818. https://doi.org/10.1002/acn3.338 ![]() |
[124] |
Candelise N, Schmitz M, Llorens F, et al. (2019) Seeding variability of different alpha synuclein strains in synucleinopathies. Ann Neurol 85: 691-703. https://doi.org/10.1002/ana.25446 ![]() |
[125] |
Perra D, Bongianni M, Novi G, et al. (2021) Alpha–synuclein seeds in olfactory mucosa and cerebrospinal fluid of patients with dementia with Lewy bodies. Brain Commun 3: fcab045. https://doi.org/10.1093/braincomms/fcab045 ![]() |
[126] |
Stefani A, Iranzo A, Holzknecht E, et al. (2021) Alpha-synuclein seeds in olfactory mucosa of patients with isolated REM sleep behaviour disorder. Brain 144: 1118-1126. https://doi.org/10.1093/brain/awab005 ![]() |
[127] |
Rossi M, Candelise N, Baiardi S, et al. (2020) Ultrasensitive RT-QuIC assay with high sensitivity and specificity for Lewy body-associated synucleinopathies. Acta Neuropathol 140: 49-62. https://doi.org/10.1007/s00401-020-02160-8 ![]() |
[128] |
Iranzo A, Fairfoul G, Ayudhaya ACN, et al. (2021) Detection of α-synuclein in CSF by RT-QuIC in patients with isolated rapid-eye-movement sleep behaviour disorder: A longitudinal observational study. Lancet Neurol 20: 203-212. https://doi.org/10.1016/S1474-4422(20)30449-X ![]() |
[129] |
Poggiolini I, Gupta V, Lawton M, et al. (2022) Diagnostic value of cerebrospinal fluid alpha–synuclein seed quantification in synucleinopathies. Brain 145: 584-595. https://doi.org/10.1093/brain/awab431 ![]() |
[130] |
Postuma RB, Berg D, Stern M, et al. (2015) MDS clinical diagnostic criteria for Parkinson's disease. Mov Disord 30: 1591-1601. https://doi.org/10.1002/mds.26424 ![]() |
[131] |
McKeith IG, Boeve BF, Dickson DW, et al. (2017) Diagnosis and management of dementia with Lewy bodies: Fourth consensus report of the DLB Consortium. Neurology 89: 88-100. https://doi.org/10.1212/WNL.0000000000004058 ![]() |
[132] |
Gilman S, Wenning GK, Low PA, et al. (2008) Second consensus statement on the diagnosis of multiple system atrophy. Neurology 71: 670-676. https://doi.org/10.1212/01.wnl.0000324625.00404.15 ![]() |
[133] |
Howell MJ, Schenck CH (2015) Rapid eye movement sleep behavior disorder and neurodegenerative disease. JAMA Neurol 72: 707-712. ![]() |
[134] |
Schenck CH, Garcia-Rill E, Skinner RD, et al. (1996) A case of REM sleep behavior disorder with autopsy-confirmed Alzheimer's disease: Postmortem brain stem histochemical analyses. Biol Psychiat 40: 422-425. https://doi.org/10.1016/0006-3223(96)00070-4 ![]() |
[135] |
Boot BP, Boeve BF, Roberts RO, et al. (2012) Probable rapid eye movement sleep behavior disorder increases risk for mild cognitive impairment and Parkinson disease: A population-based study. Ann Neurol 71: 49-56. https://doi.org/10.1002/ana.22655 ![]() |
[136] |
Enriquez-Marulanda A, Quintana-Peña V, Takeuchi Y, et al. (2018) Case report: Rapid eye movement sleep behavior disorder as the first manifestation of multiple sclerosis: A case report and literature review. Int J MS Care 20: 180-184. https://doi.org/10.7224/1537-2073.2017-001 ![]() |
[137] |
Gómez-Choco MJ, Iranzo A, Blanco Y, et al. (2007) Prevalence of restless legs syndrome and REM sleep behavior disorder in multiple sclerosis. Mult Scler 13: 805-808. https://doi.org/10.1177/1352458506074644 ![]() |
[138] |
McCarter SJ, Tippmann-Peikert M, Sandness DJ, et al. (2015) Neuroimaging-evident lesional pathology associated with REM sleep behavior disorder. Sleep Med 16: 1502-1510. https://doi.org/10.1016/j.sleep.2015.07.018 ![]() |
[139] |
Provini F, Vetrugno R, Pastorelli F, et al. (2004) Status dissociatus after surgery for tegmental ponto-mesencephalic cavernoma: A state-dependent disorder of motor control during sleep. Movement Disord 19: 719-723. https://doi.org/10.1002/mds.20027 ![]() |
[140] |
Dauvilliers Y, Schenck CH, Postuma RB, et al. (2018) REM sleep behaviour disorder. Nat Rev Dis Primers 4: 19. https://doi.org/10.1038/s41572-018-0016-5 ![]() |
[141] |
Sabater L, Gaig C, Gelpi E, et al. (2014) A novel non-rapid-eye movement and rapid-eye-movement parasomnia with sleep breathing disorder associated with antibodies to IgLON5: A case series, characterisation of the antigen, and post-mortem study. Lancet Neurol 13: 575-586. https://doi.org/10.1016/S1474-4422(14)70051-1 ![]() |
[142] |
Perrone L, Valente M (2021) The emerging role of metabolism in brain-heart axis: New challenge for the therapy and prevention of Alzheimer disease. May thioredoxin interacting protein (TXNIP) play a role?. Biomolecules 11: 1652. https://doi.org/10.3390/biom11111652 ![]() |
[143] |
Sampson TR, Debelius JW, Thron T, et al. (2016) Gut microbiota regulate motor deficits and neuroinflammation in a model of Parkinson's disease. Cell 167: 1469-1480. https://doi.org/10.1016/j.cell.2016.11.018 ![]() |
[144] |
Olson CA, Vuong HE, Yano JM, et al. (2018) The gut microbiota mediates the anti-seizure effects of the ketogenic diet. Cell 173: 1728-1741. https://doi.org/10.1016/j.cell.2018.04.027 ![]() |
[145] |
Han JW, Ahn YD, Kim WS, et al. (2018) Psychiatric manifestation in patients with Parkinson's disease. J Korean Med Sci 33: e300. https://doi.org/10.3346/jkms.2018.33.e300 ![]() |
[146] |
Dujardin K, Sgambato V (2020) Neuropsychiatric disorders in Parkinson's Disease: What do we know about the role of dopaminergic and non-dopaminergic systems?. Front Neurosci 14: 25. https://doi.org/10.3389/fnins.2020.00025 ![]() |
[147] |
Schapira AHV, Chaudhuri KR, Jenner P (2017) Non-motor features of Parkinson disease. Nat Rev Neurosci 18: 435-450. https://doi.org/10.1038/nrn.2017.62 ![]() |
[148] |
Kelly LP, Carvey PM, Keshavarzian A, et al. (2014) Progression of intestinal permeability changes and alpha-synuclein expression in a mouse model of Parkinson's disease. Movement Disord 29: 999-1009. https://doi.org/10.1002/mds.25736 ![]() |
[149] |
Riedel O, Klotsche J, Spottke A, et al. (2010) Frequency of dementia, depression, and other neuropsychiatric symptoms in 1,449 outpatients with Parkinson's disease. J Neurol 257: 1073-1082. https://doi.org/10.1007/s00415-010-5465-z ![]() |
[150] |
Wong JMW, Esfahani A, Singh N, et al. (2012) Gut microbiota, diet, and heart disease. J AOAC Int 95: 24-30. https://doi.org/10.5740/jaoacint.sge_wong ![]() |
[151] |
Singh N, Gurav A, Sivaprakasam S, et al. (2014) Activation of Gpr109a, receptor for niacin and the commensal metabolite butyrate, suppresses colonic inflammation and carcinogenesis. Immunity 40: 128-139. https://doi.org/10.1016/j.immuni.2013.12.007 ![]() |
[152] |
Mulak A, Bonaz B (2015) Brain-gut-microbiota axis in Parkinson's disease. World J Gastroenterol 21: 10609-10620. https://doi.org/10.3748/wjg.v21.i37.10609 ![]() |
[153] |
Vriend C, Raijmakers P, Veltman DJ, et al. (2014) Depressive symptoms in Parkinson's disease are related to reduced [123I]FP-CIT binding in the caudate nucleus. J Neurol Neurosurg Ps 85: 159-164. https://doi.org/10.1136/jnnp-2012-304811 ![]() |
[154] |
Remy P, Doder M, Lees A, et al. (2005) Depression in Parkinson's disease: loss of dopamine and noradrenaline innervation in the limbic system. Brain 128: 1314-1322. https://doi.org/10.1093/brain/awh445 ![]() |
[155] |
Boileau I, Warsh JJ, Guttman M, et al. (2008) Elevated serotonin transporter binding in depressed patients with Parkinson's disease: A preliminary PET study with [11C]DASB. Mov Disord 23: 1776-1780. https://doi.org/10.1002/mds.22212 ![]() |
[156] |
Okun MS, Watts RL (2002) Depression associated with Parkinson's disease: Clinical features and treatment. Neurology 58: S63-S70. https://doi.org/10.1212/wnl.58.suppl_1.s63 ![]() |
[157] |
Ascherio A, Schwarzschild MA (2016) The epidemiology of Parkinson's disease: Risk factors and prevention. Lancet Neurol 15: 1257-1272. https://doi.org/10.1016/S1474-4422(16)30230-7 ![]() |
[158] |
Perez-Pardo P, Dodiya HB, Engen PA, et al. (2019) Role of TLR4 in the gut-brain axis in Parkinson's disease: A translational study from men to mice. Gut 68: 829-843. https://doi.org/10.1136/gutjnl-2018-316844 ![]() |
[159] |
Perez-Pardo P, Kliest T, Dodiya HB, et al. (2017) The gut-brain axis in Parkinson's disease: Possibilities for food-based therapies. Eur J Pharmacol 817: 86-95. https://doi.org/10.1016/j.ejphar.2017.05.042 ![]() |
[160] |
Olanow CW, Wakeman DR, Kordower JH (2014) Peripheral alpha-synuclein and Parkinson's disease. Mov Disord 29: 963-966. https://doi.org/10.1002/mds.25966 ![]() |
[161] |
Forsyth CB, Shannon KM, Kordower JH, et al. (2011) Increased intestinal permeability correlates with sigmoid mucosa alpha-synuclein staining and endotoxin exposure markers in early Parkinson's disease. PLoS One 6: e28032. https://doi.org/10.1371/journal.pone.0028032 ![]() |
[162] |
Di Lorenzo C, Ballerini G, Barbanti P, et al. (2021) Applications of ketogenic diets in patients with headache: Clinical recommendations. Nutrients 13: 2307. https://doi.org/10.3390/nu13072307 ![]() |
[163] |
Houser MC, Tansey MG (2017) The gut-brain axis: Is intestinal inflammation a silent driver of Parkinson's disease pathogenesis?. NPJ Parkinson's Dis 3: 3. https://doi.org/10.1038/s41531-016-0002-0 ![]() |
[164] |
Braniste V, Al-Asmakh M, Kowal C, et al. (2014) The gut microbiota influences blood-brain barrier permeability in mice. Sci Transl Med 6: 263ra158. https://doi.org/10.1126/scitranslmed.3009759 ![]() |
[165] |
Quigley EMM (2017) Microbiota-brain-gut axis and neurodegenerative diseases. Curr Neurol Neurosci Rep 17: 94. https://doi.org/10.1007/s11910-017-0802-6 ![]() |
[166] |
Turnbaugh PJ, Ridaura VK, Faith JJ, et al. (2009) The effect of diet on the human gut microbiome: A etagenomic analysis in humanized gnotobiotic mice. Sci Transl Med 1: 6ra14. https://doi.org/10.1126/scitranslmed.3000322 ![]() |
[167] |
Heinzel S, Aho VTE, Suenkel U, et al. (2021) Gut microbiome signatures of risk and prodromal markers of Parkinson disease. Ann Neurol 90: E1-E12. https://doi.org/10.1002/ana.26128 ![]() |
[168] |
Heintz-Buschart A, Pandey U, Wicke T, et al. (2018) The nasal and gut microbiome in Parkinson's disease and idiopathic rapid eye movement sleep behavior disorder. Movement Disord 33: 88-98. https://doi.org/10.1002/mds.27105 ![]() |
[169] |
Bedarf JR, Hildebrand F, Coelho LP, et al. (2017) Functional implications of microbial and viral gut metagenome changes in early stage L-DOPA-naïve Parkinson's disease patients. Genome Med 9: 39. ![]() |
[170] |
Huang B, Chau SWH, Liu Y, et al. (2023) Gut microbiome dysbiosis across early Parkinson's disease, REM sleep behavior disorder and their first-degree relatives. Nat Commun 14: 2501. https://doi.org/10.1038/s41467-023-38248-4 ![]() |
1. | Kaixuan Wang, Shixiong Zhang, Yang Cao, Lu Yang, Weakly supervised anomaly detection based on sparsity prior, 2024, 32, 2688-1594, 3728, 10.3934/era.2024169 |
type | data sets | the section used |
viewpoint | ABO [31] | Section 3 |
Clevr3D [37] | ||
ScanNet [38] | ||
RealEstate10K [39] | ||
point cloud | ShapeNet [40] | Section 4 |
KITTI [41] | ||
nuScenes [42] | ||
Matterport3D [43] | ||
depth map | Middlebury Stereo [44,45,46,47,48] | |
NYU Depth [49] | ||
KITTI [41] | ||
illumination | Multi-PIE [35] | Section 5 |
Relightables [50] |
Feature | Method | Publication | Image resolution | Data set |
No camera pose | NeRF– [89] | arXiv2022 | 756 x 1008/1080 x 1920/520 x 780 | [90]/ [39]/ [89] |
GNeRF [91] | ICCV2021 | 400 x 400/500 x 400 | [24]/ [92] | |
SCNeRF [93] | ICCV2021 | 756 x 1008/648 x 484 | [90]/ [94] | |
NoPe-NeRF [95] | CVPR2023 | 960 x 540/648 x 484 | [94]/ [38] | |
SPARF [96] | CVPR2023 | - | [92]/ [90]/ [97] | |
Sparse data | NeRS [98] | NIPS2021 | 600 x 450 | [98] |
MixNeRF [99] | CVPR2023 | - | [92]/ [90]/ [24] | |
SceneRF [100] | ICCV2023 | 1220 x 370 | [101] | |
GM-NeRF [102] | CVPR2023 | 224 x 224 | [103]/ [104]/ [105]/ [106] | |
SPARF [96] | CVPR2023 | 960 x 540/648 x 484 | [94]/ [38] | |
Noisy data | RawNeRF [107] | CVPR2022 | - | [107] |
Deblur-NeRF [108] | CVPR2022 | 512 x 512 | [108] | |
HDR-NeRF [109] | CVPR2022 | 400 x 400/804 x 534 | [109] | |
NAN [110] | CVPR2022 | - | [110] | |
Large-scale image synthesis | Mip-NeRF 360 [111] | CVPR2022 | 960 x 540 | [94] |
BungeeNeRF [112] | ECCV2022 | - | [113] | |
Block-NeRF [114] | CVPR2022 | - | [114] | |
GridNeRF [115] | CVPR2023 | 2048 x 2048/4096 x 4096 | [116]/ [115] | |
EgoNeRF [117] | CVPR2023 | 600 x 600 | [117] | |
Image synthesis speed | PlenOctrees [118] | ICCV2021 | 800 x 800/1920 x 1080 | [24]/ [94] |
DirectVoxGO [119] | CVPR2022 | 800 x 800/800 x 800/768 x 576/1920 x 1080/512 x 512 | [24]/ [120]/ [121]/ [94]/ [122] | |
R2L [123] | ECCV2022 | 800 x 800 | [24]/ [124] | |
SqueezeNeRF [125] | CVPR2022 | - | [24]/ [90] | |
MobileNeRF [126] | CVPR2023 | 800 x 800/756 x 1008/1256 x 828 | [24]/ [90]/ [111] | |
L2G-NeRF [127] | CVPR2023 | 756 x 1008 | [90] |
type | data sets | the section used |
viewpoint | ABO [31] | Section 3 |
Clevr3D [37] | ||
ScanNet [38] | ||
RealEstate10K [39] | ||
point cloud | ShapeNet [40] | Section 4 |
KITTI [41] | ||
nuScenes [42] | ||
Matterport3D [43] | ||
depth map | Middlebury Stereo [44,45,46,47,48] | |
NYU Depth [49] | ||
KITTI [41] | ||
illumination | Multi-PIE [35] | Section 5 |
Relightables [50] |
Feature | Method | Publication | Image resolution | Data set |
No camera pose | NeRF– [89] | arXiv2022 | 756 x 1008/1080 x 1920/520 x 780 | [90]/ [39]/ [89] |
GNeRF [91] | ICCV2021 | 400 x 400/500 x 400 | [24]/ [92] | |
SCNeRF [93] | ICCV2021 | 756 x 1008/648 x 484 | [90]/ [94] | |
NoPe-NeRF [95] | CVPR2023 | 960 x 540/648 x 484 | [94]/ [38] | |
SPARF [96] | CVPR2023 | - | [92]/ [90]/ [97] | |
Sparse data | NeRS [98] | NIPS2021 | 600 x 450 | [98] |
MixNeRF [99] | CVPR2023 | - | [92]/ [90]/ [24] | |
SceneRF [100] | ICCV2023 | 1220 x 370 | [101] | |
GM-NeRF [102] | CVPR2023 | 224 x 224 | [103]/ [104]/ [105]/ [106] | |
SPARF [96] | CVPR2023 | 960 x 540/648 x 484 | [94]/ [38] | |
Noisy data | RawNeRF [107] | CVPR2022 | - | [107] |
Deblur-NeRF [108] | CVPR2022 | 512 x 512 | [108] | |
HDR-NeRF [109] | CVPR2022 | 400 x 400/804 x 534 | [109] | |
NAN [110] | CVPR2022 | - | [110] | |
Large-scale image synthesis | Mip-NeRF 360 [111] | CVPR2022 | 960 x 540 | [94] |
BungeeNeRF [112] | ECCV2022 | - | [113] | |
Block-NeRF [114] | CVPR2022 | - | [114] | |
GridNeRF [115] | CVPR2023 | 2048 x 2048/4096 x 4096 | [116]/ [115] | |
EgoNeRF [117] | CVPR2023 | 600 x 600 | [117] | |
Image synthesis speed | PlenOctrees [118] | ICCV2021 | 800 x 800/1920 x 1080 | [24]/ [94] |
DirectVoxGO [119] | CVPR2022 | 800 x 800/800 x 800/768 x 576/1920 x 1080/512 x 512 | [24]/ [120]/ [121]/ [94]/ [122] | |
R2L [123] | ECCV2022 | 800 x 800 | [24]/ [124] | |
SqueezeNeRF [125] | CVPR2022 | - | [24]/ [90] | |
MobileNeRF [126] | CVPR2023 | 800 x 800/756 x 1008/1256 x 828 | [24]/ [90]/ [111] | |
L2G-NeRF [127] | CVPR2023 | 756 x 1008 | [90] |