Jason A. Thomas1‡, Hannah A. Burkhardt1‡, Safina Chaudhry1, Anthony D. Ngo1, Saransh Sharma1, Larry Zhang1, Rhoda Au2, Reza Hosseini Ghomi3
1University of Washington, Seattle, WA; 2Boston University, Boston, MA; 3UW Medicine, Seattle, WA; ‡These Authors Contributed Equally
Abstract
BACKGROUND: There is a need for fast, accessible, low-cost, and accurate diagnostic methods for early detection of cognitive decline. Dementia diagnoses are usually made years after symptom onset, missing a window of opportunity for early intervention. OBJECTIVE: To evaluate the use of recorded voice features as proxies for cognitive function by using neuropsychological test measures and existing dementia diagnoses. METHODS: This study analyzed 170 audio recordings, transcripts, and paired neuropsychological test results from 135 participants selected from the Framingham Heart Study (FHS), which includes 97 recordings of cognitively normal participants and 73 recordings of cognitively impaired participants. Acoustic and linguistic features of the voice samples were correlated with cognitive performance measures to verify their association. RESULTS: Language and voice features, when combined with demographic variables, performed with an AUC of 0.942 (95% CI 0.929-0.983) in predicting cognitive status. Features with good predictive power included the acoustic features mean spectral slope in the 500-1500Hz band, variation in the F2 bandwidth, and variation in the Mel-Frequency Cepstral Coefficient (MFCC) 1; the demographic features employment, education, and age; and the text features of number of words, number of compound words, number of unique nouns, and number of proper names. CONCLUSION: Several linguistic and acoustic biomarkers show correlations and predictive power with regard to neuropsychological testing results and cognitive impairment diagnoses, including dementia. This initial study paves the way for a follow-up comprehensive study incorporating the entire FHS cohort.
Acoustic parameters
The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) has the followings categories of variables (Eyben, 2016):
Frequency related parameters:
- Pitch, logarithmic F0 on a semitone frequency scale, starting at 27.5 Hz (semitone 0).
- Jitter, deviations in individual consecutive F0 period lengths.
- Formant 1, 2, and 3 frequency, centre frequency of first, second, and third formant
- Formant 1, bandwidth of first formant.
Energy/Amplitude related parameters:
- Shimmer, difference of the peak amplitudes of consecutive F0 periods.
- Loudness, estimate of perceived signal intensity from an auditory spectrum.
- Harmonics-to-Noise Ratio (HNR), relation of en- ergy in harmonic components to energy in noise- like components.
Spectral (balance) parameters:
- Alpha Ratio, ratio of the summed energy from 50–1000Hz and 1–5kHz
- Hammarberg Index, ratio of the strongest energy peak in the 0–2kHz region to the strongest peak in the 2–5kHz region.
- Spectral Slope 0–500Hz and 500–1500Hz, linear regression slope of the logarithmic power spec- trum within the two given bands.
- Formant 1, 2, and 3 relative energy, as well as the ratio of the energy of the spectral harmonic peak at the first, second, third formant’s centre frequency to the energy of the spectral peak at F0.
- Harmonic difference H1–H2, ratio of energy of the first F0 harmonic (H1) to the energy of the second F0 harmonic (H2).
- Harmonic difference H1–A3, ratio of energy of the first F0 harmonic (H1) to the energy of the highest harmonic in the third formant range (A3).
The extended Geneva Minimal Acoustic Parameter Set (eGeMAPS) includes additional features, including the following 6 temporal features:
- the rate of loudness peaks, i. e., the number of loudness peaks per second,
- the mean length and the standard deviation of continuously voiced regions (F0 > 0),
- the mean length and the standard deviation of unvoiced regions (F0 = 0; approximating pauses),
- the number of continuous voiced regions per second (pseudo syllable rate)
Neuropsychological parameters
The FHS administers the following standard neuropsychological tests:
Taken from the Wechsler Memory Scale (WMS):
- Logical Memory: Immediate: a short scenario is read. The participant is asked to retell the story from memory with as much detail as possible. Delayed: The participant is asked to retell the story once more after a delay (e.g. 30 minutes). Recognition: The participant is asked yes/no questions about the story. (Maccow, 2011)
- Paired Associate Learning: Immediate: The examiner reads 10 or 14 word pairs to the examinee. Then, the examiner reads the first word of each pair, and asks the examinee to provide the corresponding word. Delayed: After a delay (e.g. 30 minutes), the participant is again presented with the first word of each pair learned in the immediate condition and asked to provide the corresponding word. (Maccow, 2011)
- Visual Reproduction: Immediate: A series of five designs is shown, one at a time, for 10 seconds each. After each design is presented, the examinee is asked to draw the design from memory. Delayed: After a delay (e.g. 30 minutes), the participant is asked to draw the designs once more. Recognition: the examinee is asked to choose which of six designs on a page matches the original design shown during the immediate condition.
Taken from the Wechsler Adult Intelligence Scale (WAIS):
- Similarities: Describe how two words or concepts are similar. [Wikipedia]
- Information (WAIS-R): From the Wechsler Adult Intelligence Scale (WAIS), R edition. General knowledge questions. [Wikipedia]
- Block design: Put together red-and-white blocks in a pattern according to a displayed model. This is timed, and some of the more difficult puzzles award bonuses for speed. (Wikipedia)
- Digit span (forward and backward): Listen to sequences of numbers orally and to repeat them as heard and in reverse order. [Wikipedia]
Others:
- Boston Naming Test (BNT): The BNT stimuli are line drawings of objects with increasing naming difficulty, ranging from simple, high-frequency vocabulary (tree) to rare words (abacus). Administration requires a spontaneous response within a 20-sec period; if such a response is not made, two kinds of prompting cues (one phonemic, one semantic) may be given. (Spreen, 1998)
- Finger tapping test (left and right): The finger-tapping test (FTT) is a neuropsychological test that examines motor functioning, specifically, motor speed and lateralized coordination. During administration, the subject's palm should be immobile and flat on the board, with fingers extended, and the index finder placed on the counting device. One hand at a time, subjects tap their index finger on the lever as quickly as possible within a 10-s time interval, in order to increase the number on the counting device with each tap. (Springer Link, 2013)
- Time to complete Trails: The Trail Making Test (TMT) has parts A and B. The TMT Part A consists of 25 circles on a piece of paper with the numbers 1-25 written randomly in the circles. The test taker’s task is to start with number one and draw a line from that circle to the circle with the number two in it to the circle with the three in it, etc. The person continues to connect the circles in numerical order until they reach number 25. The TMT Part B consists of 24 circles on a piece of paper, but rather than all of the circles containing numbers, half of the circles have the numbers 1-12 in them and the other half (12) contain the letters A-L. The person taking the test has the more difficult task of drawing a line from one circle to the next in ascending order; however, he must alternate the circles with numbers in them (1-13) with circles with letters in them (A-L). In other words, he is to connect the circles in order like this: 1-A-2-B-3-C-4-D-5-E and so on. (Heerema, 2019)
- Hooper Visual Organization Test: This test has 30 line drawings, each showing a common object such as a ball or a pencil. Each drawing has been cut into several pieces. The pieces are scattered on the page like parts of a puzzle. The client's task is to tell you what the object is. The VOT has no time limit. (Mevius, 2019)
- Verbal Fluency: Animals only: Participants have to produce as many animal names as possible in 1 minute. Exclude animals: Participants have to produce as many words as possible that start with the given letter (e.g. F or A) in 1 minute, avoiding proper names.
- Wide range achievement test (reading): Assesses reading for words with irregular sound-to-spelling correspondence, for example unanimous.
Interactive heatmap
This visualization is a heatmap of the associations between the various acoustic and neuropsychological variables.
The figure implements filtering and details on demand behavior. Click on a box, or shift-click on multiple boxes, to filter. Double click on a box to clear the filter. Click on a cell in order to view the scatter plot of the two variables. This allows on-demand investigation of the relationships between the variables. The default box selected shows a positive correlation between the acoustic variable 'jitter' (deviations in individual consecutive F0 period lengths) and time to complete a neuropsychological task (worse cognitive performance)
Interactive Heatmap allowing hover display of values, filtering, and details on demand via cell selection to then display (top) a scatterplot allowing further examination of the relationship between the two selected variables.
Acknowledgements & Disclosures & References
This study was supported in part by the National Library of Medicine (NLM) Training Grant T15LM007442.
Dr. Hosseini Ghomi’s work was supported by the VA Advanced Fellowship Program in Parkinson’s Disease.
Disclosures: Dr. Hosseini Ghomi is a stockholder of NeuroLex Laboratories.
Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., Andre, E., Busso, C., … Truong, K. P. (2016). The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Transactions on Affective Computing, 7(2), 190–202. https://doi.org/10.1109/TAFFC.2015.2457417
Spreen & Risser. (1998). Acquired Aphasia (Third Edition)
Maccow, G. (2011). WMS-IV: Administration, Scoring, Basic Interpretation. Pearson.
Heerema, E. (2019). Administration, Scoring, and Interpretation of the Trail Making Test. Available at https://www.verywellhealth.com/dementia-screening-tool-the-trail-making-test-98624
Springer Link. (2013). Finger-Tapping Test. Encyclopedia of Autism Spectrum Disorders - 2013 Edition. Available at https://link.springer.com/referenceworkentry/10.1007%2F978-1-4419-1698-3_343.
Mevius, J. (2019). Hooper Visual Organization Test (VOT). Elderly Driving Assessments. Available at http://elderlydrivingassessments.com/hooper.php