Have a personal or library account? Click to login
Online Audio-Visual Source Association for Chamber Music Performances Cover

Online Audio-Visual Source Association for Chamber Music Performances

Open Access
|Aug 2019

Figures & Tables

tismir-2-1-25-g1.png
Figure 1

Outline of the proposed universal source association system for chamber ensemble performances. Three types of motion are modeled and correlated with the audio and score events in three modules.

tismir-2-1-25-g2.png
Figure 2

Body motion extraction. Upper body skeletons (second row) are extracted with OpenPose (Cao et al., 2017) in each video frame (first row) followed by temporal smoothing.

tismir-2-1-25-g3.png
Figure 3

Example correspondence between body motion and note onsets. Top: temporally aligned score part with onsets marked by red circles. Middle: extracted motion salience (primarily bowing motion) from the visual performance of a violin player. Bottom: derived onset likelihood curve from the motion salience.

tismir-2-1-25-g4.jpg
Figure 4

Optical flow visualization of finger motion in five consecutive frames corresponding to note changes. The color encoding scheme is adopted from Baker et al. (2011).

tismir-2-1-25-g5.png
Figure 5

Example correspondence between finger motion and note onsets of a flute player. Top: temporally aligned score part with onsets marked by red circles. Bottom: extracted motion flux from finger movements.

tismir-2-1-25-g6.jpg
Figure 6

Optical flow visualization of hand motion corresponding to vibrato articulation. The color encoding scheme is adopted from Baker et al. (2011).

tismir-2-1-25-g7.png
Figure 7

The same segment of normalized pitch contour f(t) (green) overlaid with the motion displacement curve d(t) (black) from the associated track (left) and another random track (right).

Table 1

The number of pieces for different instrument arrangements from the original and expanded URMP dataset.

StringWindMixedTotal
OriginalDuet26311
Trio26412
Quartet56314
Quintet2417
ExpandedDuet579123171
Trio416520126
Quartet1525747
Quintet2417
tismir-2-1-25-g8.png
Figure 8

Onset overlap rate for each piece from the original URMP dataset.

tismir-2-1-25-g9.png
Figure 9

Onset detection evaluation results from: (a) body motion, and (b) finger motion, for different instruments.

Table 2

The number of evaluation samples with different length and instrumentation for source association.

StringExcerpt duration (sec)
51015202530
Duet1323642420303236200
Trio1044506333240189158
Quartet355172114826554
Quintet643121151210
WindExcerpt duration (sec)
51015202530
Duet1809887557435323266
Trio1275626391309229187
Quartet4742321451158668
Quintet66322016129
MixedExcerpt duration (sec)
51015202530
Duet441203141968260
Trio380174121827051
Quartet1999264443728
Quintet22107543
tismir-2-1-25-g10.png
Figure 10

(a)–(c): Source association accuracy only using onset correspondence between score parts and body motion (the first component M¯b in Eq. (5)). (d)–(f): Source association accuracy only using onset correspondence between score parts and finger motion (the second component M¯f in Eq. (5)).

tismir-2-1-25-g11.png
Figure 11

Source association accuracy of string ensembles by (a) only using vibrato correspondence between pitch fluctuation and hand motion (M¯v in Eq. (5)), and (b) combining vibrato correspondence with onset correspondence from body motion (M¯b and in M¯v Eq. (5)).

tismir-2-1-25-g12.png
Figure 12

Source association accuracy of ensembles with different instrumentation using all of the three modules: onset correspondence from body motion, onset correspondence from finger motion, and vibrato correspondence from hand motion (Eq. (5)).

DOI: https://doi.org/10.5334/tismir.25 | Journal eISSN: 2514-3298
Language: English
Submitted on: Dec 18, 2018
Accepted on: May 20, 2019
Published on: Aug 5, 2019
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2019 Bochen Li, Karthik Dinesh, Chenliang Xu, Gaurav Sharma, Zhiyan Duan, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.