Data integration and feature selection techniques for omic analysis in cancer research

  E-mail   Print

Guest Editor

Dr. Carlos Fernandez-Lozano,
Department of Computer Science and Information Technologies, Universidade da Coruña, Spain

Manuscript Topics

The amount of data in the public domain is currently enormous and increasing every year. This makes the success of modern science highly dependent on the quality, quantity, and information they contain. In many cases, problems are analyzed individually and the potentials that exist between the interactions of the data are wasted. One of the current questions of greatest concern to the scientific world are how best to integrate that set of separate biological data in a way that allows us to better understand the underlying biology that connects them to understand the functioning of complex organisms. It becomes necessary to find a way to interpret those disparate datasets that are generated from the same organism but that are necessarily related in a multidimensional analysis (combined simultaneously) rather than blended nonsensically or separately.

Data integration techniques can ideally combine different datasets to aid in the diagnosis of a disease and attempt to elucidate all phenotype-disease relationships at previously unreached levels. In any case, data integration narrows the search by interspersing information of different dimensions in the same analysis process and being related. One of the most suitable fields to perform a data integration approach is in the study of cancer (very molecularly characterized, multi-dimensionally complex, heterogeneous, several different omic platforms).

Depending on the time of integration, it can be categorized as early, intermediate, or late. Early approximations are the easiest to understand, basically, consist of concatenating all variables and letting the machine learning algorithm select those that seem more informative. They usually involve some method of selecting characteristics or reducing dimensionality through projections of variables to vectors that are generated with information from the whole set. The intermediate approximations work with the fusion of representations of each one of the existing dimensions in the analysis to generate inferences in a joint resolution model. It is not a simple union of data as in the previous case, but the structure of the data is preserved. Late approximations generate a model for each dimension of the data independently and then generate a model that integrates all of the above and remain by vote with the results obtained by the majority. As far as feature selection techniques are concerned these can be categorized mainly into filter, wrapper and embedded. Filter techniques select the variables most correlated with the pathology being studied regardless of the model to be used, wrapper techniques evaluate subsets of variables that could potentially detect interrelated variables and, finally, embedded techniques are those that try to combine the advantages of the two previous ones.

The aim of this special issue is to collect papers focusing on data integration (early, intermediate or late), feature selection (filter, wrapper or embedded) on molecular biology that can yield insight into key bio-relations on cancer biology. The special issue especially encourages the submission of manuscripts wherein model analysis and simulations are interpreted in clinically or experimentally relevant terms with real data (TCGA, GEO, etc).

Paper Submission
All manuscripts will be peer-reviewed before their acceptance for publication.
The deadline for manuscript submission is April 15th, 2020.

Instructions for authors
Please submit your manuscript to online submission system

Jie Chen, Jinggui Chen, Bo Sun, Jianghong Wu, Chunyan Du
+ Abstract     + HTML     + PDF(1788 KB)
Bo Wei, Rui Wang, Le Wang, Chao Du
+ Abstract     + HTML     + PDF(3703 KB)
Open Access Journals
Open Access Journals