Effects of Metrical Context on the P1 Component
Article information
Abstract
Background and Objectives
The temporal structure of sound, characterized by regular patterns, plays a crucial role in optimizing the processing of auditory information. The meter, representing a well-organized sequence of evenly spaced beats in music, exhibits a hierarchical arrangement, with stronger beats occupying higher metrical positions. Moreover, the meter has been shown to influence behavioral and neural processing, particularly the N1, P2, and mismatch negativity components. However, the role of the P1 component in the context of metrical hierarchy remains unexplored. This study aimed to investigate the effects of metrical hierarchy on the P1 component and compare the responses between musicians and non-musicians.
Subjects and Methods
Thirty participants (15 musicians and 15 non-musicians) were enrolled in the study. Auditory stimuli consisted of a synthesized speech syllable presented together with a repeating series of four tones, establishing a quadruple meter. Electrophysiological recordings were performed to measure the P1 component.
Results
The results revealed that metrical position had a significant effect on P1 amplitude, with the strongest beat showing the lowest amplitude. This contrasts with previous findings, in which enhanced P1 responses were typically observed at on-the-beat positions. The reduced P1 response on the strong beat can be interpreted within the framework of predictive coding and temporal prediction, where a higher predictability of pitch changes at the strong beat leads to a reduction in the P1 response. Furthermore, higher P1 amplitudes were observed in musicians compared to non-musicians, suggesting that musicians have enhanced sensory processing.
Conclusions
This study demonstrates the effects of metrical hierarchy on the P1 component, thereby enriching our understanding of auditory processing. The results suggest that predictive coding and temporal prediction play important roles in shaping sensory processing. Further, they suggest that musical training may enhance P1 responses.
Introduction
The temporal structure of sound, characterized by regular patterns, plays an important role in optimizing the processing of auditory information by listeners [1,2]. In the realm of music, this structure is exemplified by the concept of meter, which represents a well-organized sequence of evenly spaced beats [1-4]. Meter exhibits a hierarchical arrangement, with stronger beats occupying higher metrical positions [1-4]. For instance, in quadruple meter like 4/4 time, the cycle of four beats—strong, weak, medium, and weak—is perceived, with the first and third beats holding the highest and second-highest metrical positions, respectively, while the second and fourth beats are considered the weakest. Humans naturally form grouping based on meter and utilize this hierarchical structure to anticipate upcoming sounds [1,2].
Dynamic attending theory proposes that meter guides real-time attention in music, with prompting listeners to allocate greater attention to beats at higher metrical levels [5,6]. This heightened attention to higher metrical positions leads to increased sensitivity to events occurring at those positions [5,6]. Numerous behavioral studies support this notion, demonstrating enhanced accuracy in auditory perception tasks, such as pitch judgment and detection of subtle temporal differences, when stimuli are presented at higher metrical positions [7-9]. Furthermore, even visual tasks, such as letter identification and word recognition, show improved performance with stimuli presented at higher metrical levels [10,11]. Neurophysiological investigations further confirmed these findings by revealing differential processing in the brain for sounds occurring on metrically strong beats [12-16]. Notably, evoked potentials like N1, P2, and mismatch negativity (MMN) exhibit larger amplitudes or earlier responses when the oddball stimulus occurs at higher metrical positions [12-16]. For example, irrespective of musical expertise, studies have found that N1 and P2 peaks are stronger when identical sounds are provided at metrically stronger positions compared to weak positions [16]. Studies on MMN have shown that MMN responses to deviants are earlier and higher at metrically strong positions compared to weak positions [17]. These findings suggest the significant impact of the metrical structure of music on both behavioral performance and neural processing within the auditory pathway of the human brain.
While previous research has extensively explored the effects of meter on cortical responses, with a particular focus on the MMN, N1, and P2 components, there exists a noticeable gap in the literature concerning the role of the P1 component. The P1 component, as previously established, originates in both the primary and secondary auditory cortices [18-20], and it exhibits sensitivity to attention [20]. Consequently, it is plausible that heightened attention during metrically strong beats enhances the P1 response to auditory stimuli, thereby fine-tuning the receptive fields in the auditory cortex. Supporting this notion, Tierney and Kraus [21] demonstrated an augmented P1 response, along with frequency-following responses (FFR), at positions coinciding with the beat compared to those off the beat. Similarly, Bouwer and Honing [22] reported enhanced P1 responses for deviant stimuli occurring precisely on the beat. These findings collectively suggest that the presence of heightened attention at beat positions amplifies P1 responses to auditory stimuli. However, up to the present time, no studies have delved into the effects of metrical hierarchies on the P1 component, particularly within the context of quadruple meter.
In our previous study, we explored the effect of meter on the subcortical processing of sounds by measuring human auditory FFR to speech presented at distinct metrical positions [23]. To establish a metrical structure, we superimpose the speech sound [da] with a recurring sequence of four tones. In this sequence, the initial tone is pitched higher than the subsequent three tones, thereby assigning it the role of a strong beat with the most prominent metrical position. The result showed that a metrically strong beat was enhanced at the subcortical level. The enhanced subcortical response at the strong beat may be the result of top-down modulation facilitated by the efferent corticofugal network connecting the cortex and lower auditory structures. However, it remains to be investigated whether this enhanced processing of strong beats actually occurs in the auditory cortex.
In our current study, we extend our investigation to analyze the temporal window associated with the P1 component using the same dataset. This extension serves the purpose of bridging the gap between subcortical and cortical processing. If the P1 component is also demonstrated to be influenced by meter, it would provide compelling evidence regarding the impact of metrical context on a coherent and integrated auditory processing system operating across distinct levels of the auditory hierarchy. This exploration will yield valuable insights into the intricacies of auditory perception and its modulation by metrical structures.
Additionally, we compare the P1 responses between musicians and non-musicians, considering our previous findings that musicians enhance the subcortical responses to sounds at the strong beat. For the P1 component, we expect musicians to exhibit higher amplitude, aligning with prior research indicating enhanced P1 responses in musicians [24-26]. However, we expect that the effect of meter on the P1 component will be consistent across both musicians and non-musicians, as previous studies have demonstrated similar metrical modulation of cortical responses regardless of musical expertise [16].
Through a comprehensive examination of the P1 component and its relationship with metrical hierarchy, this study aims to advance our understanding of the neural processes underlying auditory perception and the influence of musical experience.
Subjects and Methods
Participants
The study included a total of 30 adults aged between 19 and 27 years, with a mean age of 22.73 years. All participants completed a questionnaire assessing their musical background, including the age at which they commenced musical training, the duration of training, and the type of performance experience. Among the participants, 15 were female musicians, with a mean age of 21.27 years. Among these musicians, 12 were pianists, 2 were violinists, and 1 was a violist. Participants categorized as musicians had a minimum of 10 years of musical training, which began at or before the age of 7. They are university undergraduate and graduate students majoring in music. The remaining 15 participants were non-musicians, consisting of 12 females and 3 males, with a mean age of 24.2 years. These non-musicians had less than 3 years of musical training. The comprehensive data regarding participants’ demographics, including gender, age, musical instrument, and years of music training, can be found in Table 1. All participants reported no hearing or neurological impairments, and their pure-tone air conduction thresholds were below 20 dB HL for octave frequencies ranging from 125 Hz to 8,000 Hz. The study obtained approval from the Samsung Medical Center Institutional Review Board (SMC 2017-01-115-016) and adhered to the ethical guidelines outlined in the World Medical Association’s Code of Ethics (Declaration of Helsinki). Prior to the experiment, written informed consent was obtained from each participant.
Stimulus
The identical syllable used in our previous study [23] was also used in the current investigation. Specifically, we used a synthesized speech syllable, [da], consisting of a stop consonant and a vowel with a duration of 170 ms. The [da] syllable consisted of a 50 ms formant transition followed by a 120 ms steady-state vowel. It maintained a constant fundamental frequency (F0) of 100 Hz throughout the stimulus. Notably, the first, second, and third formants underwent temporal changes within the first 50 ms (F1: 400 to 720 Hz, F2: 1,700 to 1,240 Hz, F3: 2,580 to 2,500 Hz). An illustrative example of the stimulus is shown in Fig. 1. The interstimulus interval between successive stimuli was set to 500 ms. To create a quadruple meter, 4/4 time, the syllable [da] was presented together with a repeating series of four tones: 3,520 Hz (A7), 1,760 Hz (A6), 1,760 Hz (A6), and 1,760 Hz (A6). Importantly, the frequencies of these four tones did not overlap with the frequency components of the syllable, as can be seen in the lower graph of Fig. 1A and B. The duration of each tone was 100 ms.
Electrophysiological recordings
The stimulus was presented to the participants using inset earphones (ER-3A) in a binaural manner. The intensity of the stimulus, measured as the sound pressure level, was approximately 65 dBA, and the Neuroscan Stim2 system (Compumedics Neuroscan, Charlotte, NC, USA) was utilized for this purpose. During the testing phase, participants watched a movie of their choice with the sound muted and subtitles displayed. The data collection procedures followed the methods described in Lee, et al. [27]. Brain responses were recorded using the Scan 4.5 Acquire system (Compumedics Neuroscan, Charlotte, NC, USA). Four Ag-AgCl scalp electrodes were placed, with one electrode positioned at Cz as the active electrode and the others serving as linked earlobe references, while the forehead was used as the ground electrode. The contact impedance for all electrodes was maintained below 5 kΩ. This procedure enabled brainstem and cortical responses to be recorded simultaneously. Approximately 2,000 sweeps were collected for each stimulus polarity, and the data were sampled at a rate of 20 kHz.
Data analysis
Data analysis was conducted using Scan 4.5 (Compumedics Neuroscan, Charlotte, NC, USA). Offline processing included filtering, artifact rejection, and averaging. To focus on the cortex’s contribution, responses were bandpass filtered from 0.1 Hz to 20 Hz (12 dB/octave roll-off). Filtered responses were then segmented into epochs ranging from -50 ms to 250 ms relative to stimulus onset. Baseline correction was applied by referencing the response to the pre-stimulus period. Artifacts were identified and removed by excluding trials with activity exceeding ±70 μV, resulting in 2,000 remaining sweeps available for averaging.
The latency and amplitude of the P1 component were calculated in a similar way to the previous literature [21,22]. The latency of the cortical onset wave, the P1 component was determined manually by identifying the largest positive peak within the 40 ms to 140 ms timeframe. The amplitude of P1 was quantified as the mean amplitude within a 50 ms time window centered around the peak latency for each participant.
Statistics
Statistical analyses were performed using IBM SPSS Statistics for Macintosh (version 26.0; IBM Corp., Armonk, NY, USA) to analyze the data. To ensure normal distribution, the P1 amplitude data were log-transformed. A repeated-measures analysis of variance (ANOVA) with a mixed design was used for the P1 amplitude, considering the factors of metrical position (MP1, MP2, MP3, MP4) and group (musicians vs. non-musicians). To address violations of sphericity in the ANOVA results, the Greenhouse-Geisser correction was applied. As for the P1 latency data, which were not normally distributed, a Friedman test was conducted.
Results
P1 amplitude
A 4 (metrical position: MP1, MP2, MP3, MP4)×2 (group: musicians vs. non-musicians) repeated-measures ANOVA revealed a significant main effect of metrical position [F(1.855, 51.936)=32.457, p<0.001, η2=0.537] (Figs. 2 and 3). The effect of the group was also significant [F(1, 28)=4.920, p=0.035, η2=0.149]. There was no interaction between metrical positions and group [F(1.855, 51.936)= 2.036, p=0.144, η2=0.068]. MP1 showed the lowest amplitude (with Bonferroni correction, p<0.001 for MP2, p<0.001 for MP3, p<0.001 for MP4).
P1 Latency
The Friedman test revealed a significant effect of metrical position on the data [χ2(3)=54.393, p<0.001] (Fig. 3B). Specifically, MP1 exhibited the earliest latency compared to the other metrical positions (Wilcoxon signed ranks test; p<0.001 for MP2, p<0.001 for MP3, p<0.001 for MP4). The effect of the group was not significant.
Discussion
The present study aimed to investigate the effects of metrical hierarchy on the P1 component of the auditory evoked potential. Our results revealed a significant main effect of metrical position, indicating that the P1 amplitude was reduced on the metrically strong beat compared to the weaker beats. While previous research examined the effects of meter on cortical responses, particularly N1, P2, and MMN components [12-16], there has been a dearth of studies exploring the role of the P1 component. Therefore, our study is the first to demonstrate the effect of metrical hierarchy on the P1 component, contributing to the existing body of literature on auditory processing and attention.
Previous research on P1 has consistently shown enhanced responses when the sound is presented in on-the-beat positions compared to off-beat positions [21,22]. In our study, all sounds were presented on the beat, but their metrical positions varied. Surprisingly, we observed a reduction in P1 amplitude on the metrically strong beat, which contrasts with previous findings. One possible explanation for this unexpected finding is the difference in experimental settings between our study and previous ones. In Bower and Honing [22], the P1 response was measured to deviant sounds, which comprised only 4% of the stimuli, and comparisons were made between on-the-beat and off-the-beat deviants. The presence of random deviants on the beat could automatically attract attention and enhance the P1 response. However, in our study, P1 responses were measured during the presentation of a repeating quadruple metrical pattern composed of four tones, without random deviants. Given our experimental setting, where participants were continuously exposed to the repeating sounds for over an hour while watching a movie with subtitles, it is less likely for attention to be actively involved in the processing of the stimuli. In addition, the stimuli of Tierney and Kraus [21] did not include deviant sounds, but presented a sound embedded in ecologically valid music, which continuously changed over time. Thus, various acoustic properties of the beat positions could attract listeners’ attention.
The reduced P1 response on the strong beat of the present study can be interpreted within the framework of predictive coding and temporal prediction, rather than temporal attention. According to the perspective proposed by Vuust and Witek [28], metrical rhythm perception involves generating predictions about upcoming events, and the extent to which these predictions are fulfilled leads to prediction errors that update the perceived metrical structure. In our study, the reduced P1 amplitude on the strong beat can be attributed to the higher predictability of pitch changes at the strong beat compared to the weak beats. Strong beats in music are often characterized by pitch or intensity changes, making them more salient. The high predictability of pitch changes on the strong beat could contribute to the observed reduction in P1 response. Previous studies have also demonstrated that prediction attenuates early sensory responses to sounds [29-31]. For instance, Schwartze, et al. [31] found that the auditory P1 response to acoustic events was attenuated when sounds were presented regularly and more predictably compared to irregular presentations.
Another potential explanation for the reduced P1 amplitude on the strong beat is the frequency difference between the strong and weak beats. Previous studies have shown that the P1 exhibit lower amplitudes and shorter latencies in response to higher frequency sounds [18]. Thus, the frequency difference between the strong and weak beats could contribute to the reduced P1 amplitude observed in our study. To gain a better understanding of the effects of metrical hierarchy on the sensory processing of sound, future studies could manipulate the frequency of the strong beat in various ways.
Our study’s findings offer valuable insights into the relationship between the P1 component of auditory evoked potentials and musical expertise, building upon previous research consistently indicating enhanced P1 responses in musicians when exposed to diverse auditory stimuli [24,32,33]. In our investigation, musicians demonstrated higher P1 amplitudes compared to their non-musician counterparts, underscoring an elevated sensitivity to auditory stimuli within the cortical auditory processing. These observations align with previous studies. For instance, Musacchia, et al. [32] reported that musicians exhibited enhanced P1 peaks, with larger P1 amplitudes in auditory conditions, both when hearing the sound [da] alone and when presented with a video token of a speaker saying [da] simultaneously. Schneider, et al. [33] reported significantly larger P1 responses in professional musicians, correlating with the intensity of their musical practice. Even in children, musical training was associated with larger P1 amplitudes, especially when exposed to complex musical sounds like violin and piano tones [24]. These cumulative findings from prior research affirm that musical expertise engenders robust enhancements in P1 responses, embracing a wide gamut of auditory features spanning pitch, timing, and timbre, thus highlighting the profound influence of musical experience on early auditory processing. While previous studies have mainly focused on individual stimuli, our study examined the influence of metrical context on the P1 component. The results show that P1 amplitudes were higher for all tones, independent of metrical hierarchy, for both musicians and non-musicians. These findings align with the results reported by Fitzroy, et al. [16] and suggest that the metrical modulation of cortical responses is consistent irrespective of an individual’s musical expertise.
However, it is important to note that our study does not directly measure musical expertise or provide evidence of causality between musical training and P1 responses. To establish a clearer understanding of the effects of musical training on P1 responses, future studies should incorporate a comparison of P1 responses before and after musical training.
In our study, we introduced a synthesized speech syllable within a metrical framework composed of musical tones, revealing that this musical meter, commonly associated with rhythmic beats, has the capacity to influence the perception and processing of speech sounds. Building on our previous research, where we examined FFR to the same stimuli, we provided initial evidence of metrical modulation extending to speech processing [23]. These findings offer empirical support for the presence of metrical modulation, spanning from subcortical to cortical levels within the auditory hierarchy. Crucially, these results hold significant implications, particularly in the realm of speech rehabilitation. For individuals dealing with hearing loss or speech disorders, the incorporation of songs characterized by clear metrical structures during rehabilitation may effectively facilitate the neural processing of speech sounds embedded within musical contexts. This underscores the potential for speech rehabilitation strategies to benefit from the empirical evidence of metrical modulation observed in both speech and musical tones, thereby enhancing the efficacy of therapeutic interventions.
In summary, our study examined the influence of metrical context on early cortical sound processing by comparing P1 responses to metrically strong and weak beats. While previous research suggested that meter directs attention to the strong beat and enhances cortical processing, our findings indicate a reduction in cortical response to the strong beat. Notably, prior electroencephalography (EEG) studies that reported heightened cortical responses, such as MMN, typically employed oddball stimuli on strong beats. In contrast, our study incorporated a real-world music scenario, where a distinct pitch was repeated on the strong beat. This approach mirrored the typical metrical structure of music, often characterized by a repeating bass or drum pattern. By adopting this approach, we demonstrated that the increased predictability associated with strong beats actually diminishes early cortical responses. Furthermore, we explored how metrical context affects P1 responses in musicians and non-musicians. Previous studies highlighting enhanced P1 amplitudes in musicians predominantly focused on responses to single tones, overlooking the broader context organized by tones. Our study’s outcomes revealed that P1 is influenced by metrical hierarchy, yet no discernible differences emerged in the effect of metrical context on P1 between musicians and non-musicians. Musicians consistently exhibited elevated P1 amplitudes across all tones, irrespective of metrical hierarchy. These results collectively emphasize the robust and consistent nature of metrical modulation in cortical responses, unaffected by an individual’s level of musical expertise.
Supplementary Materials
The online-only Data Supplement is available with this article at https://doi.org/10.7874/jao.2023.00262.
Notes
Conflicts of Interest
The authors have no financial conflicts of interest.
Author Contributions
Conceptualization: Kyung Myun Lee, Sung Hwa Hong, Il Joon Moon. Data curation: all authors. Formal analysis: Kyung Myun Lee, Soojin Kang. Funding acquisition: Kyung Myun Lee. Investigation: Soojin Kang. Resources: Sung Hwa Hong, Il Joon Moon. Supervision: Kyung Myun Lee, Il Joon Moon. Validation: Kyung Myun Lee. Visualization: Kyung Myun Lee, Soojin Kang. Writing—original draft: Kyung Myun Lee. Writing—review & editing: all authors. Approval of final manuscript: all authors.
Funding Statement
This research was supported by the grant NRF-2023R1A2C1004755 and by KAIST.
Acknowledgements
None