Towards efficient AutoML: a pipeline synthesis approach leveraging pre-trained transformers for multimodal data

Ambarish Moharil (Corresponding author), Joaquin Vanschoren, Prabhant Singh, Damian Tamburri

Research output: Contribution to journalArticleAcademicpeer-review

30 Downloads (Pure)

Abstract

This paper introduces an Automated Machine Learning (AutoML) framework specifically designed to efficiently synthesize end-to-end multimodal machine learning pipelines. Traditional reliance on the computationally demanding Neural Architecture Search is minimized through the strategic integration of pre-trained transformer models. This innovative approach enables the effective unification of diverse data modalities into high-dimensional embeddings, streamlining the pipeline development process. We leverage an advanced Bayesian Optimization strategy, informed by meta-learning, to facilitate the warm-starting of the pipeline synthesis, thereby enhancing computational efficiency. Our methodology demonstrates its potential to create advanced and custom multimodal pipelines within limited computational resources. Extensive testing across 23 varied multimodal datasets indicates the promise and utility of our framework in diverse scenarios. The results contribute to the ongoing efforts in the AutoML field, suggesting new possibilities for efficiently handling complex multimodal data. This research represents a step towards developing more efficient and versatile tools in multimodal machine learning pipeline development, acknowledging the collaborative and ever-evolving nature of this field.

Original languageEnglish
Pages (from-to)7011-7053
Number of pages43
JournalMachine Learning
Volume113
Issue number9
DOIs
Publication statusPublished - Sept 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Keywords

  • Automated machine learning (AutoML)
  • Bayesian optimization (BO)
  • Multimodal data
  • Pre-trained transformer models

Fingerprint

Dive into the research topics of 'Towards efficient AutoML: a pipeline synthesis approach leveraging pre-trained transformers for multimodal data'. Together they form a unique fingerprint.

Cite this