The Impact of Variable Selection and Transformation on the Interpretability and Accuracy of Fuzzy Models

Caro Fuchs, Simone Spolaor, Uzay Kaymak, Marco S. Nobile

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)
1 Downloads (Pure)

Abstract

Data transformation is an important step in Machine Learning pipelines which can strongly improve their performance. For instance, min-max normalization is often used to make all variables lie in the same range, while log-transformation is used to map data that is scattered across several orders of magnitude to a logarithmic space. Such transformations can be beneficial when the machine learning approach measures distance in a metric space, such as cluster-based approaches. These two transformation approaches can be combined to reveal hidden patterns in the data in the case of log-normally distributed data points, which commonly occur in biological and medical data. In this work we introduce a novel evolutionary approach designed to automatically determine the optimal log-transformation and selection of variables. Our approach is built around an interpretable AI system (created by pyFUME), so that all transformations are followed by inverse transformations to map back the values into the original universe of discourse, and preserve the interpretability of the results. We test our approach on two synthetic datasets, designed to reproduce a condition in which some variables are normally distributed, some variables are log-normally distributed, and some variables are just noise in the dataset. Our results show that our approach yields better performing models compared to conventional methods, and that the resulting model is also characterised by a better interpretability, making such approach particularly useful to study biomedical datasets.

Original languageEnglish
Title of host publication2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)
PublisherInstitute of Electrical and Electronics Engineers
Number of pages8
ISBN (Electronic)978-1-6654-8462-6
DOIs
Publication statusPublished - 26 Aug 2022
Event2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2022 - Ottawa, Canada
Duration: 15 Aug 202217 Aug 2022

Conference

Conference2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2022
Country/TerritoryCanada
CityOttawa
Period15/08/2217/08/22

Bibliographical note

Funding Information:
ACKNOWLEDGMENT The work has been performed under the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programme; in particular, the authors gratefully acknowledge the support of the Department of Environmental Sciences, Informatics and Statistics (DAIS) of the Ca’ Foscari University of Venice, and the computer resources and technical support provided by CINECA. Also, this work was partially supported by DAIS - Ca’ Foscari University of Venice within the IRIDE program.

Keywords

  • data normalization
  • data transformation
  • fuzzy logic
  • fuzzy model
  • genetic algorithm
  • interpretable AI
  • log-transformation
  • machine learning

Fingerprint

Dive into the research topics of 'The Impact of Variable Selection and Transformation on the Interpretability and Accuracy of Fuzzy Models'. Together they form a unique fingerprint.

Cite this