Perceptual and algorithmic evaluation of inter-song similarity in Western popular music

A. Novello

Research output: ThesisPhd Thesis 2 (Research NOT TU/e / Graduation TU/e)

114 Downloads (Pure)

Abstract

While listening to a piece of music, listeners automatically build a mental image of the song by abstracting its most prominent music elements. The mental representation of these elements is used to compare characteristics of different songs. Experiments have shown that musical timbre, tempo and genre play an important role in the perception of both inter- and intra-song similarity. However, it is not clear which musical cues dominate the listeners’ perception of similarity, and no theoretical or experimental framework has addressed the problem of establishing a hierarchical description of cue relevance. One of the main limitations in previous studies is in the narrow experimental methodology and the small number of songs and genres typically used in the perceptual experiments. Recent literature suggests that a large perceptual data-base could improve the performance ceiling reached by existing signal-based music-similarity algorithms. The aims of the present thesis are to gain a better understanding of the listener’s perception of similarity between songs of Western popular music and to collect perceptual data on an extended music data-base for both the test of theoretical models and the implementation of algorithmic applications. To investigate the perception of music similarity and collect a perceptual data-base, two perceptual experiments were conducted: a lab-based exploratory experiment to test and optimize the experimental method; and a larger-scale web-based experiment to extend the experimental paradigm to a larger set of stimuli and control variables. Both experiments used triadic comparisons of song excerpts selected from several genres of Western popular music: the participants listened iteratively to three song excerpts and chose the most similar and least similar pair. The experimental method was conceived to maximize the stimulus set size while keeping a reasonable experimental time for the participants. Data analyses include an examination of participant concordance to evaluate the existence of a stable and common perception of music similarity across and within participants, a comparison of the relative influence of control variables, and the investigation of factors underlying the organization of the participants’ perceptual space. The first part of this thesis focuses on the description of the experimental design used to collect the perceptual data. Several cross-checks of participants’ concordance in various conditions, and side experiments support the overall robustness of the experimental design and the simplicity of the task for the participant. The statistically significant concordance found within and across participants suggests the existence of a stable and common basis for the perception of music similarity. No difference was found in consistency between musicians and non musicians, and between participants classified as familiar and unfamiliar with the stimulus material. Within our experimental and selected-song context, we found a statistically significant evidence for a hierarchical salience of the control variables used in the stimulus selection on participants’ rankings: genre > tempo > timbre. The second part of the thesis includes a deeper analysis of the participants’ perceptual space using features calculated from the rankings of the second large-scale experiment. A quadratic discriminant analysis quantitatively confirmed the qualitative hierarchy of relevance in control variables found in the first experiment. We defined and labeled three axes "slow-fast", "vocal-non vocal", "synthetic-acoustic" that show significant separation of the excerpt classes. On the tempo axis, we found high correlation between the logarithm of the excerpt beats per minute and the projected positions of the excerpts. Finally, we found that the hierarchical order of relevance of control variables differs if evaluated globally, on the whole set of stimuli, or contextually, on a specific stimulus subset. In the last part of the thesis, we used commonly available feature-extraction algorithms to map the physical properties of each song signal to the participants’ perceptual space, in order to build an algorithm able to predict participant behavior. In this process, we evaluated the performance of the specific feature-extraction algorithms and the relevance of musicologically-grouped feature subsets: pitch, loudness, rhythm and timbre. A trained linear model can correctly predict 52:3 ?? 0:5% of the rankings on the most similar pair within song triads. This is a good result considering that the theoretical limit of algorithmic performance is 78 ?? 8%, estimated from participant concordance in the perceptual experiment. In predicting the perceptual similarity data, our model outperforms the current state of the art algorithm from the MIREX 2006 competition. Timbre features were found to be the most important subset for the prediction of inter-song perceptual similarity.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Department of Industrial Engineering & Innovation Sciences
Supervisors/Advisors
  • Kohlrausch, Armin G., Promotor
  • McKinney, M.F., Copromotor, External person
  • Cambouropoulos, E., Copromotor, External person
Award date9 Jun 2009
Place of PublicationEindhoven
Publisher
Print ISBNs978-90-386-1831-9
DOIs
Publication statusPublished - 2009

Fingerprint

Evaluation
Song
Popular music
Experiment
Music
Stimulus
Concordance
Timbre
Data Base
Ranking
Listener Perception
Experimental Method
Musicians
Feature Extraction
Experimental Design
Physical Properties
World Wide Web
Acoustics
Triad
Rhythm

Cite this

Novello, A.. / Perceptual and algorithmic evaluation of inter-song similarity in Western popular music. Eindhoven : Technische Universiteit Eindhoven, 2009. 121 p.
@phdthesis{7d76e59dd3594572b890a047e34caf06,
title = "Perceptual and algorithmic evaluation of inter-song similarity in Western popular music",
abstract = "While listening to a piece of music, listeners automatically build a mental image of the song by abstracting its most prominent music elements. The mental representation of these elements is used to compare characteristics of different songs. Experiments have shown that musical timbre, tempo and genre play an important role in the perception of both inter- and intra-song similarity. However, it is not clear which musical cues dominate the listeners’ perception of similarity, and no theoretical or experimental framework has addressed the problem of establishing a hierarchical description of cue relevance. One of the main limitations in previous studies is in the narrow experimental methodology and the small number of songs and genres typically used in the perceptual experiments. Recent literature suggests that a large perceptual data-base could improve the performance ceiling reached by existing signal-based music-similarity algorithms. The aims of the present thesis are to gain a better understanding of the listener’s perception of similarity between songs of Western popular music and to collect perceptual data on an extended music data-base for both the test of theoretical models and the implementation of algorithmic applications. To investigate the perception of music similarity and collect a perceptual data-base, two perceptual experiments were conducted: a lab-based exploratory experiment to test and optimize the experimental method; and a larger-scale web-based experiment to extend the experimental paradigm to a larger set of stimuli and control variables. Both experiments used triadic comparisons of song excerpts selected from several genres of Western popular music: the participants listened iteratively to three song excerpts and chose the most similar and least similar pair. The experimental method was conceived to maximize the stimulus set size while keeping a reasonable experimental time for the participants. Data analyses include an examination of participant concordance to evaluate the existence of a stable and common perception of music similarity across and within participants, a comparison of the relative influence of control variables, and the investigation of factors underlying the organization of the participants’ perceptual space. The first part of this thesis focuses on the description of the experimental design used to collect the perceptual data. Several cross-checks of participants’ concordance in various conditions, and side experiments support the overall robustness of the experimental design and the simplicity of the task for the participant. The statistically significant concordance found within and across participants suggests the existence of a stable and common basis for the perception of music similarity. No difference was found in consistency between musicians and non musicians, and between participants classified as familiar and unfamiliar with the stimulus material. Within our experimental and selected-song context, we found a statistically significant evidence for a hierarchical salience of the control variables used in the stimulus selection on participants’ rankings: genre > tempo > timbre. The second part of the thesis includes a deeper analysis of the participants’ perceptual space using features calculated from the rankings of the second large-scale experiment. A quadratic discriminant analysis quantitatively confirmed the qualitative hierarchy of relevance in control variables found in the first experiment. We defined and labeled three axes {"}slow-fast{"}, {"}vocal-non vocal{"}, {"}synthetic-acoustic{"} that show significant separation of the excerpt classes. On the tempo axis, we found high correlation between the logarithm of the excerpt beats per minute and the projected positions of the excerpts. Finally, we found that the hierarchical order of relevance of control variables differs if evaluated globally, on the whole set of stimuli, or contextually, on a specific stimulus subset. In the last part of the thesis, we used commonly available feature-extraction algorithms to map the physical properties of each song signal to the participants’ perceptual space, in order to build an algorithm able to predict participant behavior. In this process, we evaluated the performance of the specific feature-extraction algorithms and the relevance of musicologically-grouped feature subsets: pitch, loudness, rhythm and timbre. A trained linear model can correctly predict 52:3 ?? 0:5{\%} of the rankings on the most similar pair within song triads. This is a good result considering that the theoretical limit of algorithmic performance is 78 ?? 8{\%}, estimated from participant concordance in the perceptual experiment. In predicting the perceptual similarity data, our model outperforms the current state of the art algorithm from the MIREX 2006 competition. Timbre features were found to be the most important subset for the prediction of inter-song perceptual similarity.",
author = "A. Novello",
year = "2009",
doi = "10.6100/IR642834",
language = "English",
isbn = "978-90-386-1831-9",
publisher = "Technische Universiteit Eindhoven",
school = "Department of Industrial Engineering & Innovation Sciences",

}

Novello, A 2009, 'Perceptual and algorithmic evaluation of inter-song similarity in Western popular music', Doctor of Philosophy, Department of Industrial Engineering & Innovation Sciences, Eindhoven. https://doi.org/10.6100/IR642834

Perceptual and algorithmic evaluation of inter-song similarity in Western popular music. / Novello, A.

Eindhoven : Technische Universiteit Eindhoven, 2009. 121 p.

Research output: ThesisPhd Thesis 2 (Research NOT TU/e / Graduation TU/e)

TY - THES

T1 - Perceptual and algorithmic evaluation of inter-song similarity in Western popular music

AU - Novello, A.

PY - 2009

Y1 - 2009

N2 - While listening to a piece of music, listeners automatically build a mental image of the song by abstracting its most prominent music elements. The mental representation of these elements is used to compare characteristics of different songs. Experiments have shown that musical timbre, tempo and genre play an important role in the perception of both inter- and intra-song similarity. However, it is not clear which musical cues dominate the listeners’ perception of similarity, and no theoretical or experimental framework has addressed the problem of establishing a hierarchical description of cue relevance. One of the main limitations in previous studies is in the narrow experimental methodology and the small number of songs and genres typically used in the perceptual experiments. Recent literature suggests that a large perceptual data-base could improve the performance ceiling reached by existing signal-based music-similarity algorithms. The aims of the present thesis are to gain a better understanding of the listener’s perception of similarity between songs of Western popular music and to collect perceptual data on an extended music data-base for both the test of theoretical models and the implementation of algorithmic applications. To investigate the perception of music similarity and collect a perceptual data-base, two perceptual experiments were conducted: a lab-based exploratory experiment to test and optimize the experimental method; and a larger-scale web-based experiment to extend the experimental paradigm to a larger set of stimuli and control variables. Both experiments used triadic comparisons of song excerpts selected from several genres of Western popular music: the participants listened iteratively to three song excerpts and chose the most similar and least similar pair. The experimental method was conceived to maximize the stimulus set size while keeping a reasonable experimental time for the participants. Data analyses include an examination of participant concordance to evaluate the existence of a stable and common perception of music similarity across and within participants, a comparison of the relative influence of control variables, and the investigation of factors underlying the organization of the participants’ perceptual space. The first part of this thesis focuses on the description of the experimental design used to collect the perceptual data. Several cross-checks of participants’ concordance in various conditions, and side experiments support the overall robustness of the experimental design and the simplicity of the task for the participant. The statistically significant concordance found within and across participants suggests the existence of a stable and common basis for the perception of music similarity. No difference was found in consistency between musicians and non musicians, and between participants classified as familiar and unfamiliar with the stimulus material. Within our experimental and selected-song context, we found a statistically significant evidence for a hierarchical salience of the control variables used in the stimulus selection on participants’ rankings: genre > tempo > timbre. The second part of the thesis includes a deeper analysis of the participants’ perceptual space using features calculated from the rankings of the second large-scale experiment. A quadratic discriminant analysis quantitatively confirmed the qualitative hierarchy of relevance in control variables found in the first experiment. We defined and labeled three axes "slow-fast", "vocal-non vocal", "synthetic-acoustic" that show significant separation of the excerpt classes. On the tempo axis, we found high correlation between the logarithm of the excerpt beats per minute and the projected positions of the excerpts. Finally, we found that the hierarchical order of relevance of control variables differs if evaluated globally, on the whole set of stimuli, or contextually, on a specific stimulus subset. In the last part of the thesis, we used commonly available feature-extraction algorithms to map the physical properties of each song signal to the participants’ perceptual space, in order to build an algorithm able to predict participant behavior. In this process, we evaluated the performance of the specific feature-extraction algorithms and the relevance of musicologically-grouped feature subsets: pitch, loudness, rhythm and timbre. A trained linear model can correctly predict 52:3 ?? 0:5% of the rankings on the most similar pair within song triads. This is a good result considering that the theoretical limit of algorithmic performance is 78 ?? 8%, estimated from participant concordance in the perceptual experiment. In predicting the perceptual similarity data, our model outperforms the current state of the art algorithm from the MIREX 2006 competition. Timbre features were found to be the most important subset for the prediction of inter-song perceptual similarity.

AB - While listening to a piece of music, listeners automatically build a mental image of the song by abstracting its most prominent music elements. The mental representation of these elements is used to compare characteristics of different songs. Experiments have shown that musical timbre, tempo and genre play an important role in the perception of both inter- and intra-song similarity. However, it is not clear which musical cues dominate the listeners’ perception of similarity, and no theoretical or experimental framework has addressed the problem of establishing a hierarchical description of cue relevance. One of the main limitations in previous studies is in the narrow experimental methodology and the small number of songs and genres typically used in the perceptual experiments. Recent literature suggests that a large perceptual data-base could improve the performance ceiling reached by existing signal-based music-similarity algorithms. The aims of the present thesis are to gain a better understanding of the listener’s perception of similarity between songs of Western popular music and to collect perceptual data on an extended music data-base for both the test of theoretical models and the implementation of algorithmic applications. To investigate the perception of music similarity and collect a perceptual data-base, two perceptual experiments were conducted: a lab-based exploratory experiment to test and optimize the experimental method; and a larger-scale web-based experiment to extend the experimental paradigm to a larger set of stimuli and control variables. Both experiments used triadic comparisons of song excerpts selected from several genres of Western popular music: the participants listened iteratively to three song excerpts and chose the most similar and least similar pair. The experimental method was conceived to maximize the stimulus set size while keeping a reasonable experimental time for the participants. Data analyses include an examination of participant concordance to evaluate the existence of a stable and common perception of music similarity across and within participants, a comparison of the relative influence of control variables, and the investigation of factors underlying the organization of the participants’ perceptual space. The first part of this thesis focuses on the description of the experimental design used to collect the perceptual data. Several cross-checks of participants’ concordance in various conditions, and side experiments support the overall robustness of the experimental design and the simplicity of the task for the participant. The statistically significant concordance found within and across participants suggests the existence of a stable and common basis for the perception of music similarity. No difference was found in consistency between musicians and non musicians, and between participants classified as familiar and unfamiliar with the stimulus material. Within our experimental and selected-song context, we found a statistically significant evidence for a hierarchical salience of the control variables used in the stimulus selection on participants’ rankings: genre > tempo > timbre. The second part of the thesis includes a deeper analysis of the participants’ perceptual space using features calculated from the rankings of the second large-scale experiment. A quadratic discriminant analysis quantitatively confirmed the qualitative hierarchy of relevance in control variables found in the first experiment. We defined and labeled three axes "slow-fast", "vocal-non vocal", "synthetic-acoustic" that show significant separation of the excerpt classes. On the tempo axis, we found high correlation between the logarithm of the excerpt beats per minute and the projected positions of the excerpts. Finally, we found that the hierarchical order of relevance of control variables differs if evaluated globally, on the whole set of stimuli, or contextually, on a specific stimulus subset. In the last part of the thesis, we used commonly available feature-extraction algorithms to map the physical properties of each song signal to the participants’ perceptual space, in order to build an algorithm able to predict participant behavior. In this process, we evaluated the performance of the specific feature-extraction algorithms and the relevance of musicologically-grouped feature subsets: pitch, loudness, rhythm and timbre. A trained linear model can correctly predict 52:3 ?? 0:5% of the rankings on the most similar pair within song triads. This is a good result considering that the theoretical limit of algorithmic performance is 78 ?? 8%, estimated from participant concordance in the perceptual experiment. In predicting the perceptual similarity data, our model outperforms the current state of the art algorithm from the MIREX 2006 competition. Timbre features were found to be the most important subset for the prediction of inter-song perceptual similarity.

U2 - 10.6100/IR642834

DO - 10.6100/IR642834

M3 - Phd Thesis 2 (Research NOT TU/e / Graduation TU/e)

SN - 978-90-386-1831-9

PB - Technische Universiteit Eindhoven

CY - Eindhoven

ER -