Feature Selection in Single-Cell RNA-seq Data via a Genetic Algorithm

Chatzilygeroudis K.I., Vrahatis A.G., Tasoulis S.K., Vrahatis M.N.

dc.creator	Chatzilygeroudis K.I., Vrahatis A.G., Tasoulis S.K., Vrahatis M.N.	en
dc.date.accessioned	2023-01-31T07:44:07Z
dc.date.available	2023-01-31T07:44:07Z
dc.date.issued	2021
dc.identifier	10.1007/978-3-030-92121-7_6
dc.identifier.isbn	9783030921200
dc.identifier.issn	03029743
dc.identifier.uri	http://hdl.handle.net/11615/72655
dc.description.abstract	Big data methods prevail in the biomedical domain leading to effective and scalable data-driven approaches. Biomedical data are known for their ultra-high dimensionality, especially the ones coming from molecular biology experiments. This property is also included in the emerging technique of single-cell RNA-sequencing (scRNA-seq), where we obtain sequence information from individual cells. A reliable way to uncover their complexity is by using Machine Learning approaches, including dimensional reduction and feature selection methods. Although the first choice has had remarkable progress in scRNA-seq data, only the latter can offer deeper interpretability at the gene level since it highlights the dominant gene features in the given data. Towards tackling this challenge, we propose a feature selection framework that utilizes genetic optimization principles and identifies low-dimensional combinations of gene lists in order to enhance classification performance of any off-the-shelf classifier (e.g., LDA or SVM). Our intuition is that by identifying an optimal genes subset, we can enhance the prediction power of scRNA-seq data even if these genes are unrelated to each other. We showcase our proposed framework’s effectiveness in two real scRNA-seq experiments with gene dimensions up to 36708. Our framework can identify very low-dimensional subsets of genes (less than 200) while boosting the classifiers’ performance. Finally, we provide a biological interpretation of the selected genes, thus providing evidence of our method’s utility towards explainable artificial intelligence. © 2021, Springer Nature Switzerland AG.	en
dc.language.iso	en	en
dc.source	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85121904400&doi=10.1007%2f978-3-030-92121-7_6&partnerID=40&md5=ab81d5fc50f57ebc38c78ac0e203d7b9
dc.subject	Bioinformatics	en
dc.subject	Clustering algorithms	en
dc.subject	Cytology	en
dc.subject	Feature extraction	en
dc.subject	Genes	en
dc.subject	Genetic algorithms	en
dc.subject	Molecular biology	en
dc.subject	Support vector machines	en
dc.subject	Biomedical data	en
dc.subject	Biomedical domain	en
dc.subject	Data-driven approach	en
dc.subject	Features selection	en
dc.subject	High dimensional data	en
dc.subject	Low dimensional	en
dc.subject	Optimisations	en
dc.subject	RNA-Seq datum	en
dc.subject	Single cells	en
dc.subject	Single-cell RNA-seq	en
dc.subject	RNA	en
dc.subject	Springer Science and Business Media Deutschland GmbH	en
dc.title	Feature Selection in Single-Cell RNA-seq Data via a Genetic Algorithm	en
dc.type	conferenceItem	en

Files in questo item

Files	Dimensione	Formato	Mostra
Nessun files in questo item.

Questo item appare nelle seguenti collezioni

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

Mostra i principali dati dell'item