We describe and test the methodological set up for a web-based listening experiment that assesses the perception of inter-song similarity optimizing the trade-off between stimulus coverage and experimental time. The experiment used a relatively large set of stimuli of Western popular music: 78 song excerpts selected from 13 genres, involving 78 participants. The experiment used triadic comparisons of song excerpts to present the participants with a low-complexity task, and a partially balanced incomplete block design (PBIBD) to reduce the number of stimulus comparisons with the consequent possibility of extending the stimulus set. The three control variables used in the excerpt selection, genre, tempo and timbre, showed statistically significant saliency and a hierarchical degree of impact on participants' pair rankings (genre > tempo > timbre). We investigated the participants' perceptual space using a combination of numerical and analytical methods that help to reduce and represent the dimensionality of the data. We used a combination of scaling and discriminant functions to gain insight into the important factors underlying the organization of the participants' perceptual space. In the perceptual space calculated through multidimensional scaling, we used quadratic discriminant analysis to search for axes that maximized the separation of the excerpt classes. We identified three axes that were a posteriori labelled as ‘slow–fast’, ‘vocal–non-vocal’, and ‘synthetic–acoustic’. We found a high correlation between the excerpt tempo in beats per minute and the excerpt projections on the slow–fast axis. A final analysis showed that the relevance of the factors responsible for the grouping of excerpt subsets is context dependent.