
Figure 1
Quantitative face and voice features versus clinical progress. (A) Face psychomotor activity (gaze and head pose in radians per second scaled from 0 to 1) sized by BPRS depression score. The non-anxious depressed patients (3,5,6,7) tended to have more movement as their depressive mood scores increased and less overall than non-depressed patients. Patient 8 was anxious and depressed, hence her head movement decreased as she recovered. Individual patients are assigned their own color and are numbered by session (Patient 4’s third session is represented by a red dot with the number 3). The background density plot (blue hues) provides context from a larger (142 sessions), independently collected dataset, illustrating how features derived from unstructured conversation fall within the scope of a structured exam. (B) Vowel space density plots visually reveal the trajectory of acoustic changes in a depressed patient who received ketamine infusion. Reduced vowel space is clearly visible during an early session (representing a more monotonic voice, left) compared with a later session (representing a more varied voice, right) for the same patient. The BPRS depressive mood score for the first session was 6 and 0 for the second. Restricted vowel space is a well-documented acoustic feature which has been shown to correlate with depressive mood (Scherer et al., 2016).

Figure 2
Conversational effort and speech content can be measured from unstructured clinical conversation. A) Conversational effort illustrates words per session for clinician and patient. The Patient 8 (grey circles) displays an anxious depression phenotype producing markedly more words than the clinician. Participant feature data plotted longitudinally exposes subtle changes in objective measures that, we speculate, the human brain would find difficult to identify from memory. The cross-sectional plot facilitates patient comparison. B) Speech content analysis quantifies diminution of perseveration. Here, we used semantic analysis to calculate the cosine distance between the single perseverating patient’s speech vectors to the GloVe vector for the concept “consulting.” As the patient’s perseveration decreased, this topic became less frequent. No other patients displayed this behavior.
Table 1
Proof-of-concept predictive analyses indicate nonstructured interviews have sufficient signal to merit future model development. Results are reported for acoustic, facial, and linguistic feature types both within-sample for our internal dataset collected at the CNRU and out-of-sample for an external dataset collected independently by Northwell Health. Given the high likelihood of overfitting, within-sample analyses are considered exploratory. In the out-of-sample analyses of the external dataset, facial features were able to predict BPRS subscore for Blunted Affect, however the models did not perform as well when predicting Depressed Mood. Given the small sample size relative to the number of features (acoustic = 333, facial = 1030, linguistic = 24), prediction performance is reported as a Spearman rank coefficient and p-value. Regression algorithms used to obtain the best result noted as LR-linear, RI-ridge, LA-lasso, SV-support vector.
| FEATURES | INPATIENT CNRU STUDY (N = 8, 48 SESSIONS) WITHIN-SAMPLE ANALYSES | INDEPENDENT TESTING: NORTHWELL HEALTH DATASET (N = 81, 142 SESSIONS) OUT-OF-SAMPLE ANALYSES | |
|---|---|---|---|
| BPRS Blunted Affect | Acoustic | 0.26, p = 7E-2, SV | 0.35, p = 4E-5, LA |
| Facial | 0.72, p < 1E-5, SV | 0.30, p = 5E-4, RI | |
| Linguistic | 0.72, p < 1E-5, SV | – | |
| BPRS Depressive Mood | Acoustic | 0.36, p = 1E-2, RI | 0.16, p = 7E-2, LR |
| Facial | 0.63, p < 1E-5, LA | 0.12, p = 2E-1, LR | |
| Linguistic | 0.73, p < 1E-5, RI | – |
