Effects of temporal envelope cutoff frequency, number of channels, and carrier type on brainstem neural representation of pitch in vocoded speech


Purpose. The objective of this study was to determine if and how the subcortical neural representation of pitch cues in listeners with normal hearing is affected by systematic manipulation of vocoder parameters. Method. This study assessed the effects of temporal envelope cutoff frequency (50 and 500 Hz), number of channels (1–32), and carrier type (sine-wave and noise-band) on brainstem neural representation of fundamental frequency (fo) in frequency-following responses (FFRs) to vocoded vowels of 15 young adult listeners with normal hearing. Results. Results showed that FFR fo strength (quantified as absolute fo magnitude divided by noise floor [NF] magnitude) significantly improved with 500-Hz vs. 50-Hz temporal envelopes for all channel numbers and both carriers except the 1-channel noise-band vocoder. FFR fo strength with 500-Hz temporal envelopes significantly improved when the channel number increased from 1 to 2, but it either declined (sine-wave vocoders) or saturated (noise-band vocoders) when the channel number increased from 4 to 32. FFR fo strength with 50-Hz temporal envelopes was similarly small for both carriers with all channel numbers, except for a significant improvement with the 16-channel sine-wave vocoder. With 500-Hz temporal envelopes, FFR fo strength was significantly greater for sine-wave vocoders than for noise-band vocoders with channel numbers 1–8; no significant differences were seen with 16 and 32 channels. With 50-Hz temporal envelopes, the carrier effect was only observed with 16 channels. In contrast, there was no significant carrier effect for the absolute fo magnitude. Compared to sine-wave vocoders, noise-band vocoders had a higher NF and thus lower relative FFR fo strength. Conclusions. It is important to normalize the fo magnitude relative to the NF when analyzing the FFRs to vocoded speech. The physiological findings reported here may result from the availability of fo-related temporal periodicity and spectral sidelobes in vocoded signals and should be considered when selecting vocoder parameters and interpreting results in future physiological studies. In general, the dependence of brainstem neural phase-locking strength to fo on vocoder parameters may confound the comparison of pitch-related behavioral results across different vocoder designs.

Journal of Speech, Language, and Hearing Research, 65(8), 3146-3164