Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs Difficult Downstream Tasks in LLMs

Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, Zhangyang Wang (Corresponding author)

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

11 Downloads (Pure)

Abstract

We present Junk DNA Hypothesis by adopting a novel task-centric angle for the pre-trained weights of large language models (LLMs). It has been believed that weights in LLMs contain significant redundancy, leading to the conception that a considerable chunk of the parameters can be removed by pruning without compromising performance. Contrary to this belief, this paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks - manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude. Moreover, we reveal that these seemingly inconsequential weights can result in irreparable loss of knowledge and performance degradation in difficult tasks, even when downstream continual training is allowed. Interestingly, our evaluations show that the other popular compression, namely quantization fail to exhibit similar “monotonic" effect and does not as convincingly disentangle this task-difficulty information. To study formally, we introduce several quantifiable metrics to gauge the downstream task difficulty: (a) within the same task category, and (b) across different task categories. Our extensive experiments substantiate the Junk DNA Hypothesis across a diverse range of model sizes, tasks, datasets, and even pruning methods. Codes are available at https://github.com/VITA-Group/Junk_DNA_Hypothesis.git.
Original languageEnglish
Title of host publicationProceedings of the 41st International Conference on Machine Learning
EditorsRuslan Salakhutdinov, Ziko Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, Felix Berkenkamp
PublisherPMLR
Pages57053-57068
Number of pages16
Publication statusPublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: 21 Jul 202427 Jul 2024

Publication series

NameProceedings of Machine Learning Research (PMLR)
Volume235
ISSN (Electronic)2640-3498

Conference

Conference41st International Conference on Machine Learning, ICML 2024
Abbreviated titleICML 2024
Country/TerritoryAustria
CityVienna
Period21/07/2427/07/24

Funding

Z. Wang is in part supported by NSF Award DMS-02133861 and the NSF AI Institute for Foundations of Machine Learning (IFML).

Keywords

  • Pruning
  • LLMs
  • Quantization

Fingerprint

Dive into the research topics of 'Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs Difficult Downstream Tasks in LLMs'. Together they form a unique fingerprint.

Cite this