
Figure 1
Indexing of temporal predictability using tapping data. Example for the indexing approach is given for a 30 s long section from Ligeti’s 2nd Ricercata. a. The actual physical stimulus is graphically presented by indicating the MIDI notes played in each second. b. Tapping pattern of twenty musically-trained participants who were requested to tap along the beat. These data were used to calculate c. Inter-Subject-Tapping-Coherence: temporal predictability was operationally defined as the extent of synchronization of taps across the different experts, under the assumption that the better predicted the next beat the more participants will tap to it (within a narrow range of 100 ms). This index was assessed per second as the maximal number of synchronized taps across participants within that second.

Figure 2
Continuous reports of music-induced emotions and the corresponding indices of temporal predictability. (a) mean intensity of the continuous reported music-induced experience on the dimensions of valence and arousal. Lines represent mean values of arousal and valence and thickness of shading represents 1 deviation from the mean (SEM). (b) Time series of the tapping based index denoting temporal predictability of Inter-Subject-Tap-Coherence per piece. Dashed lines indicate the point of transition between the two parts in Ligeti’s and Mussorgaky’s pieces.
Table 1
Association between temporal predictability and behavioral responses to music: summary of correlation analyses and paired t-tests.
| A. VALENCE | |||
|---|---|---|---|
| MEAN r(SEM) | p-VALUE (BOOTSTRAP) | COMPARE HIGH VS. LOW | |
| Glass (n = 36) | .12 (.02) | <.001*** | t(35) = 4.42; p < .001*** |
| Ligeti, Ric. 1 (n = 37) | .23 (.07) | <.001*** | t(36) = 3.86 p < .001*** |
| Ligeti, Ric. 2 (n = 37) | .04 (.02) | .05* | t(36) = 2.27; p = .0294* |
| Mussorgsky, Part 1 (n = 34) | .08 (.01) | <.001*** | t(33) = 2.97; p = .0055* |
| Mussorgsky, Part 2 (n = 34) | –.24 (.04) | <.001*** | t(33) = -5.53; P < .001*** |
| B. AROUSAL | |||
| MEAN r(SEM) | p-VALUE (BOOTSTRAP) | COMPARE HIGH VS. LOW | |
| Glass(n=36) | –.02 (.02) | .54 | t(35) = –1.73; p = .09 |
| Ligeti, Ric. 1 (n=37) | –.06 (.05) | .32 | t(36) = –1.28; p = .21 |
| Ligeti, Ric. 2 (n=37) | –.05 (.02) | .006** | t(36) = –2.41; p = .02* |
| Mussorgsky, Part 1(n = 34) | –.02 (.02) | .24 | t(33) = –.46; p = .65 |
| Mussorgsky, Part 2(n = 34) | .35 (.03) | <.001*** | t(33) = 7.95; P < .001*** |
[i] Note: Averages and S.E.M of correlation coefficients between inter-subject tapping coherence and the ongoing fluctuations in reported a. valence or b. arousal per musical excerpt. The statistical significance, which was estimated using a phase randomization bootstrapping approach, is further indicated. T-values representing the result of a paired sample t-test for the comparison between the average ratings during moments of high vs. low moments temporal predictability are further provided. Effects corrected for multiple comparisons are highlighted in gray (FDR-corrected, p < .05).
Abbreviations: Synch. = synchronization.
Table 2
Temporal predictability in a wider context – Musical dimensions and the reported experience.
| A. VALENCE | ||||||||
|---|---|---|---|---|---|---|---|---|
| GLASS (n = 36) | LIGETI, RIC. 1 (n = 37) | LIGETI, RIC. 2 (n = 37) | MUSSORGSKY, PART 1 (n = 34) | |||||
| MEAN B(SE) | p-VALUE (BOOTSTRAP) | MEAN B(SE) | p-VALUE (BOOTSTRAP) | MEAN B(SE) | p-VALUE (BOOTSTRAP) | MEAN B(SE) | p-VALUE (BOOTSTRAP) | |
| Loudness/ timbre | –0.026 (0.012) | p = 0.038 | –0.038 (0.018) | p = 0.02 | –0.032 (0.009) | p < 0.001 | –0.055 (0.015) | p < 0.001 |
| Pitch | –0.008 (0.006) | p = 0.28 | –0.015 (0.008) | p = 0.05 | –0.002 (0.012) | p = 0.8 | 0.022 (0.01) | p = 0.04 |
| Tempo | –0.017 (0.012) | p = 0.17 | –0.046 (0.018) | p = 0.006 | –0.021 (0.012) | p = 0.09 | –0.051 (0.015) | p < 0.001 |
| Attack Slope | –0.003 (0.006) | p = 0.67 | 0.031 (0.011) | p = 0.02 | 0.02 (0.01) | p = 0.05 | 0.001 (0.006) | p = 0.88 |
| Spectral Spread | 0.072 (0.012) | p < 0.001 | –0.025 (0.01) | p = 0.006 | –0.008 (0.003) | p = 0.06 | 0.001 (0.005) | p = 0.87 |
| Spectral Irregularity | 0.022 (0.004) | p < 0.001 | 0.027 (0.007) | p = 0.02 | –0.009 (0.004) | p = 0.054 | –0.009 (0.005) | p = 0.05 |
| Key | 0.012 (0.004) | p = 0.02 | 0.046 (0.019) | p = 0.006 | –0.003 (0.003) | p = 0.38 | –0.012 (0.002) | p < 0.002 |
| Musical Surprises | 0.011 (0.003) | p = 0.002 | 0.003 (0.004) | p = 0.38 | 0.007 (0.003) | p = 0.07 | 0.002 (0.005) | p = 0.66 |
| Inter-subject-tapping-coherence | 0.037 (0.007) | p < 0.0001 | 0.034 (0.012) | p = 0.003 | 0.021 (0.009) | p = 0.01 | 0.019 (0.005) | p < 0.001 |
| B. AROUSAL | ||||||||
| GLASS (n = 36) | LIGETI, RIC. 1 (n = 37) | LIGETI, RIC. 2 (n = 37) | MUSSORGSKY, PART 1 (n = 34) | |||||
| MEAN B (SE) | p-VALUE (BOOTSTRAP) | MEAN B (SE) | p-VALUE (BOOTSTRAP) | MEAN B (SE) | p-VALUE (BOOTSTRAP) | MEAN B (SE) | p-VALUE (BOOTSTRAP) | |
| Loudness/ timbre | 0.119 (0.016) | p < 0.001 | 0.111 (0.015) | p < 0.001 | 0.076 (0.01) | p < 0.001 | 0.122 (0.015) | p < 0.001 |
| Pitch | 0.04 (0.009) | p < 0.001 | 0.03 (0.01) | p < 0.001 | 0.046 (0.012) | p < 0.001 | 0.017 (0.01) | p = 0.094 |
| Tempo | 0.082 (0.015) | p < 0.001 | 0.105 (0.011) | p < 0.001 | 0.113 (0.017) | p < 0.001 | 0.098 (0.014) | p < 0.001 |
| Attack Slope | 0.006 (0.006) | p = 0.48 | 0.002 (0.007) | p = 0.82 | –0.014 (0.011) | p = 0.26 | –0.007 (0.003) | p = 0.17 |
| Spectral Spread | –0.022 (0.01) | p = 0.023 | –0.019 (0.007) | p = 0.02 | 0.013 (0.003) | p = 0.013 | –0.027 (0.006) | p < 0.001 |
| Spectral Irregularity | –0.005 (0.004) | p = 0.25 | 0 (0.008) | p = 0.99 | –0.009 (0.006) | p = 0.15 | –0.011 (0.005) | p = 0.02 |
| Key | 0.011 (0.004) | p = 0.07 | 0.034 (0.013) | p = 0.01 | 0.016 (0.004) | p = 0.0001 | 0.011 (0.003) | p = 0.002 |
| Musical Surprises | 0.005 (0.003) | p = 0.25 | 0.008 (0.002) | p = 0.026 | 0.003 (0.003) | p = 0.51 | 0.003 (0.004) | p = 0.49 |
| Inter-subject-tapping-coherence | 0.001 (0.008) | p = 0.97 | –0.009 (0.009) | p = 0.41 | 0.016 (0.01) | p = 0.11 | 0 (0.005) | p = 0.96 |
[i] Note A: Averages of regression coefficients for the nine musical dimensions ±1 SEM for explaining continuous valence ratings are depicted per index of reported experience along with the level of statistical significance. Effects reaching statistical significance of p < .05 after False Discovery Rate correction for multiple comparisons are highlighted in light grey. Musical factors that show consistent effects across sections are highlighted in dark grey.
Note B: Averages of regression coefficients for the nine musical dimensions ±1 SEM for explaining continuous arousal ratings are depicted per index of reported experience along with the level of statistical significance. Effects reaching statistical significance of p < .05 after False Discovery Rate correction for multiple comparisons are highlighted in light grey. Musical factors that show consistent effects across sections are highlighted in dark grey.

Figure 3
DEAM database (Aljanaki et al., 2017) – support for the association between temporal predictability and music-induced emotions. Overall ratings (at the level of the entire song): Linear and quadratic regressions of overall pulse clarity for ratings of: a. valence and b. arousal. Markers represent the mean rating for each of 1780 songs taken from the DEAM database as a function of its overall pulse clarity. Lines represent the regression fit across songs.
