Assessing the Homogeneity of Audibility of Pediatric Word and Sentence Corpus in Tamil
Article information
Abstract
Background and Objectives
The aim of this study was to assess the homogeneity of the audibility of word and sentence corpora for use in the development of speech audiometry test tools in Tamil.
Materials and Methods
A Tamil corpus (675 words and 195 sentences) was compiled from books, magazines, and novels of children aged five to ten years. A female speaker was chosen to record the corpus based on expert ratings. All recorded contents were root mean square–normalized using Adobe Audition 3.0. The recorded material was randomly placed into 27-word lists (25 words/list) and 20-sentence lists for psychometric assessment. Speech perception testing was done at five intensities (0, 10, 20, 30, and 40 dB SL; referenced to pure-tone average) in 20 adults with normal hearing.
Results
Analysis of variance indicated significant differences (p<0.01) in speech perception scores as a function of intensity for all words and sentences. A logistic regression model was fitted to estimate the thresholds (i.e., the intensity level at which 50% scores were obtained) from the curve, which was observed around 10.12 dB SL for the words and 11.77 dB SL for the sentences.
Conclusions
All words and sentences in the pediatric Tamil corpus were observed to be homogenous in audibility. Hence, all words and sentences can be utilized for developing an assessment tool process, as well as in subsequent clinical assessments.
Introduction
Speech audiometry is a routine clinical assessment used to evaluate an individual’s ability to receive, identify, and comprehend speech in various listening conditions (quiet and adverse listening conditions). It also helps assess the efficacy and performance of hearing devices (hearing aids, cochlear implants) and assistive listening devices [1,2]. The speech audiometry assessment consists of threshold and suprathreshold level testing. In threshold-level testing, the lowest level at which a person can identify 50% of the signal is assessed, called the speech recognition threshold (SRT). However, in the suprathreshold testing, speech understanding—word/sentence recognition scores—was computed to assess the individual’s ability to understand the speech [3].
To assess speech perception in routine clinical assessment, digitally recorded, reliable, and validated speech audiometric test tools (word or sentence list) are required to maintain consistency in the assessment across varied listening conditions. The recorded materials were most commonly used over live voice modality, as it controls variability in the outcomes due to accent, loudness level, and pattern of speech [4]. Additionally, it also improves the signal-to-noise ratio (SNR), reduces distortion of the harmonics, and provides better storage of the test for extended periods of use and greater consistency in presentation, which are critical for reliable testing [5].
Several speech perception materials were developed in English for both children and adults. However, to accommodate linguistic diversity, there is a need to develop these materials in several other languages to assess non-English speakers. Hence, validated materials were developed in Arabic, Brazilian Portuguese, Italian, Polish, Russian, Spanish, Mandarin, Cantonese, Korean, Thai, and other languages, addressing the unique phonetic and tonal characteristics of these languages [6–11]. These are developed to ensure that non-English-speaking individuals receive accurate and culturally appropriate audiological assessments.
India is a large, multi-ethnic population country that speaks many languages, where 18 training languages, 16 canvasing languages, and 6,661 mother tongues are spoken. Therefore, it is difficult to have a standardized test covering all languages [12]. Hence, speech audiometry word lists for adults were developed in Indian languages, including bisyllabic word lists in Indian English [13], monosyllabic word lists in Hindi [14], bisyllabic word lists in Kannada [15], monosyllabic phonetically balanced (PB) word lists and spondee list in Tamil [16], and a bisyllabic phonemically balanced word list in Tamil [17]. Also, other Indian languages like Telugu, Malayalam, Manipuri, Oriya, Bengali, Konkani, and Marathi have speech audiometry materials, including multisyllabic and bisyllabic word lists [18–21]. However, sentence materials have also been developed in Indian languages like Hindi, Kannada, Malayalam, Telugu, and Tulu [22–25].
For the pediatric population, the Picture Speech Reception Threshold Test in Kannada [26], the Lexical Neighbourhood Test in Telugu, Kannada, and Malayalam [27–29] have been developed. The Lexical Neighbourhood Test utilizes lists with an equal number of lexically easy words and lexically hard words. Among Indian languages, Tamil is one of the classic and ancient Indian languages that has been in use since 300 BC and was spoken by 66 million people at the start of the 21st century. Many South Asian countries, such as Sri Lanka, Malaysia, Singapore, and other countries like Mauritius, the Fiji Islands, and Canada, also possess a large number of Tamil-speaking individuals [12]. In the Tamil language, the existing materials for assessment of pediatric speech audiometry consist of the Picture Speech Identification Test [30] for children aged 2–6 years, which had two sets of phonetically balanced bisyllabic words. Each list had 25 Tamil words that could be assessed. More recently, the Early Speech Perception Test in Tamil for children aged 3–6 was developed to assess speech perception ability among children with cochlear implants and hearing aids. The test incorporates syllable categorization (4 bisyllabic, 4 trisyllabic, and 4 polysyllabic words) and word identification (10 bisyllabic words and 10 trisyllabic words) subtests [31].
While these tools provide a foundation for assessing speech perception in Tamil-speaking children, the existing materials often suffer from practice effects due to limited word lists, which hinder accurate and repeated assessments in research and clinical practice, particularly in evaluating speech perception in quiet and noisy conditions and in children using hearing devices such as hearing aids and cochlear implants. Hence, there is a need to develop standardized, age-appropriate, and linguistically diverse comprehensive speech audiometry tests to ensure accurate assessment and intervention for Tamil-speaking children with hearing impairments. To address this, the present study aimed to develop a comprehensive pediatric speech perception word and sentence list in Tamil for children between 5 and 10 years.
During the development of speech audiometry material (words/sentences) in any language, several factors must be considered to have a valid and reliable tool for testing. First, the words or sentences selected for testing should be familiar to the target population. Familiarity ensures that the test measures auditory perception rather than vocabulary or cognitive ability [32]. Second, the materials should be phonetically diverse to avoid providing additional auditory cues that could confound the outcome [33]. Third, the materials should possess homogeneity in audibility, meaning that all words or sentences should have similar psychometric thresholds and slopes [34]. Finally, the materials should be recorded by native speakers of the target language to ensure linguistic accuracy and cultural relevance [35].
Homogeneity in audibility is a critical factor to consider during the development process of speech test materials (words/sentences). It indicates that the test items (words/sentences) are equally difficult/equally easier to perceive as a function of intensity, ensuring that the difficulty level of the test items is consistent [36,37]. This procedure will predominantly be carried out for all the digitally recorded speech test materials. This assessment procedure has evolved over the past few decades. Initial methods involved the assessment of developed speech perception material across different intensities, and the number of correct words or sentences identified correctly was scored. A simple statistical analysis was applied to compare the performance of word/sentence lists within each of the intensities [32]. However, with the advent of digital recording technology and advanced statistical techniques, the assessment procedure utilizes a psychometric function test to assess psychometric properties (i.e., the thresholds and the slope). The psychometric property is assessed by obtaining individual speech performance as a function of intensities (performance–intensity [P–I] function) [37,38]. The individual’s speech perception scores across intensity are plotted in a graph, and a logistic regression model is used to estimate the speech recognition threshold, i.e., the level at which 50% scores are obtained [39]. Then, the slope of the curve is estimated, which shows how drastically the speech scores increase across the items as a function of intensity [38,40]. Items with high variability are considered less homogeneous, while those with low variability are more homogeneous [41]. If the material’s content is homogeneous, it indicates reduced variability, which enhances the reliability of the test tool [34]. However, if the material’s content is not homogeneous, speech audiometry outcomes during clinical assessment may be skewed, leading to inaccurate diagnoses or inappropriate treatment recommendations [42].
The assessment of homogeneity in audibility should be conducted during the initial development of speech audiometry materials and periodically thereafter to ensure that the materials remain consistent over time. This is particularly important when new recordings are made or the materials are adapted in different dialects or languages. Regular assessment of homogeneity is also necessary to account for changes in the linguistic environment or advancements in recording technology that may affect the psychometric properties of the materials. Hence, the current study aimed to assess the homogeneity in the audibility of the pediatric word and sentence corpus in Tamil by testing the null hypothesis that there is a significant difference in the speech perception scores for the items in the corpus (words and sentences) within each intensity assessed.
Materials and Methods
Collection of word and sentence corpus in Tamil
A pool of 700 words and 225 sentences was initially collected from 10 school textbooks, 20 magazines, 15 story books, and 15 novels of children aged 5–10 years. During word corpus collection, the following criteria were followed: 1) words should not be monosyllabic, as monosyllables are the least redundant and difficult to identify [43]; 2) words should not have any emotional, religious, political, or cultural overlay; 3) words should not be ambiguous or contain multiple meaning; and 4) words should not be a proper noun. Upon collection of the words, the word corpus was inspected for the following criteria to affirm that the rule had been followed, and the repetition of words was also inspected by the primary researcher. Along with that, the words were content validated for their inclusion in the corpus. The words were validated by 12 laypersons (native Tamil speakers), 8 experts (4 speech-language pathologists and 4 audiologists), and 3 schoolteachers, who evaluated the words across the domains of familiarity, relevancy, and clarity of the words. Those words that were repeated in the corpus did not follow the aforementioned criteria, and also achieved an item-level content validity index of <0.78, content validation ratio of <0.75, and modified kappa statistics of <0.74 were eliminated. A total of 25 words were eliminated from the corpus; hence, the total number of words included in the word corpus was 675. Likewise, the sentences were collected following certain rules: 1) the number of words in the sentence should be a minimum of 3 and not more than 6 words; 2) the sentence should have a proper syntactical structure; 3) the sentence should be a natural sentence with no emotional, religious, political or cultural overlay; and 4) the sentence should not be ambiguous [44]. Upon collection of sentences, those sentences with identical meanings were considered repetitions, and those with improper syntactic structure were eliminated. Similar to a word corpus, the items in the sentence corpus were also content validated by 8 experts (4 speech-language pathologists and 4 audiologists) on the naturalness, predictability, and clarity domains for their inclusion. Twenty sentences did not follow the criteria and also had an item-level content validity index of <0.78, content validation ratio of <0.75, modified kappa statistics of <0.74, and hence, those were eliminated from the sentence corpus, which sums up 195 sentences in the sentence corpus. The word and sentence corpus were randomized separately using an online randomizer (https://www.random.org/) to allocate the 675 words and 195 sentences into different lists. Each randomized word was split into 27 lists, each containing 25 words. Likewise, the sentences were divided into 20 sentence lists, each containing 10 sentences, with the last list alone consisting of 5 sentences.
Word and sentence corpus recording
Initially, advertisement flyers/posters were circulated among adult native Tamil female speakers for their voluntary participation and willingness to participate in the speaker selection process for recording the word and sentence corpus upon experts’ rating. Eleven native Tamil-speaking females volunteered to be a part of the selection process. All 11 native Tamil speakers were provided with one list of words, and the recording was carried out using standard recording procedures. All the recordings were carried out in a sound-treated room following standardized protocols and guidelines for sound measurements, such as those provided by the International Electrotechnical Commission (IEC) (IEC 60268-16) or the American National Standards Institute (ANSI) (ANSI/ASA S12.60/Part 1-2010 [R2020]). A personal desktop on a Windows 10 platform loaded with CSL 4500b (Computerized Speech Lab) software and hardware, developed by KayPentax (now part of PENTAX Medical), and a unidirectional condenser microphone, was used for recording. The microphone was distanced 1 m from the speaker’s mouth, and the sampling rate was fixed at 44,100 Hz. All 11 native Tamil-speaking volunteers recorded the sample list of words. All the recorded samples were saved in .wav (waveform audio file) format, and the speaker’s identity was blinded. Five expert speech-language pathologists randomly rated the speaker sample recorded by 11 female speakers, and consequent breaks were given between the audio samples. The experts rated the samples on the following domains: naturalness, loudness, pronunciation, and pleasantness using a 4-point Likert scale (0=least and 3= most) for each domain [45]. The speaker with the highest scores in all the domains was chosen to record the entire word and sentence corpus. This was followed by a recording session of word and sentence corpus for 2 hours with 10-minute breaks between the sessions for 5 days using the above-mentioned recording procedure. All the recorded word and sentence corpus was root mean square (RMS) normalized to a voltage of −6.02 dB using Adobe Audition 3.0 software. Upon normalization, all the words and sentences were arranged based on the list order in the Adobe Audition 3.0 software with an interstimulus interval of 2 seconds between the words and 4 seconds between the sentences. All the arranged word and sentence lists were inserted before the speech stimuli with a 1,000 Hz pure tone as calibration signal with an RMS value of −6.02 dB to control the volume unit (VU) meter deflection when the stimulus is routed via a calibrated audiometer.
Participants and procedure
Twenty adults with normal hearing (10 for the word corpus and 10 for the sentence corpus), with a mean and standard deviation age of 22.5±3.72 years, were assessed with the recorded word and sentence corpus. This sample size was determined using G*Power version 3.1 (Heinrich-Heine-Universität Düsseldorf), assuming an α error probability of 0.05, a power (1-β) of 0.95, and an effect size (f) of 0.3, which yielded a minimum required sample size of 10 participants per condition (actual power=0.9514). Accordingly, a total of 20 participants were recruited. The participants were recruited based on their active voluntary participation in the study. The participants were not provided monetary benefits for participating in the study. Those individuals with normal hearing to minimal hearing acuity, as denoted by four frequency puretone average (PTA4) of (<25 dB HL) with an air-bone gap of <10 dB HL with a correlated SRT, and normal middle function as indicated by normal tympanometric findings, were included in the study. To assess the homogeneity of audibility, the 675 words and 195 sentences were assessed on all the study participants at five different intensities (0, 10, 20, 30, and 40 dB SL, referenced to pure-tone average). Initially, to assess the inclusion criteria, all participants’ hearing acuity was assessed using a dual-channel Inventis Piano two-channel audiometer (Inventis) across 0.5, 1, 2, and 4 kHz third octave frequencies in air conduction and bone conduction modality using TDH39 circumaural headphones and RadioEar B81 bone vibrator. The average PTA4 value on the right and left ears of the participants was observed to be 13.87±4.87 and 14.87±4.34 for the right and left ears, respectively. The GSI TympStar Pro immittance audiometer was used to assess participants’ tympanograms. All participants had an “A” type tympanogram as indicated by normal static compliance, peak pressure, and ear canal volume in both ears.
Upon assessment of inclusion criteria, the participants were recruited to assess homogeneity in audibility measures for the word and sentence corpus. For assessing the homogeneity, the recorded stimuli were played through Adobe Audition 3.0 software linked with a dual-channel Inventis Piano two-channel audiometer coupled with Sennheiser HDA 300 supra-aural headphones with MX cushion for better comfort during the testing. The assessment homogeneity is carried out on all the participants, and the average time taken to complete one intensity on an individual took an average time of 25 minutes. Hence, the assessment was conducted over 5 days of 30-minute sessions for each intensity for an individual. To mitigate potential improvements in scores resulting from practice effects or content familiarity due to repeated testing, a minimum interval of 3 days was maintained between the initial and subsequent assessment sessions. Furthermore, to enhance control over the practice effect, the order of the test lists was randomized for each participant across sessions. In addition, intensity levels were also randomized among participants to minimize the influence of any confounding variables on performance outcomes. The homogeneity of the audibility assessment was estimated by the speech perception scores as a function of intensity. Hence, participants were instructed to listen to the presented word or sentence list and to repeat each word or sentence as administered. A score of 1 was given for every correct identification of words and 0 for every wrong or absent response in the word corpus. The overall percentage of correct responses was calculated for all the word lists presented. Likewise, in the sentence list, all the words in the sentence were considered to be the keywords. Hence, for every correct repetition of keywords in the sentences, a score of 1 was given, and 0 was given if the keyword was not repeated or there was no response to the keywords. Based on the total number of keywords repeated, individual speech scores across the sentence list were computed.
Statistical analysis
All data was entered into a Microsoft Excel 2019 spreadsheet, and data extraction and conversion were also carried out. The data were analyzed using Jeffreys’s Amazing Statistics Program (JASP version 0.19.3). The descriptive statistics and normality tests using the Shapiro–Wilk test of normality were carried out, revealing data to be normally distributed (p>0.05) across the word and sentence list. Hence, inferential statistical analysis was performed using a repeated-measures analysis of variance (ANOVA) considering the speech scores across the word and sentence lists as repeated-measure factors and intensity (dB SL) as a between-subject factor to check whether the data is homogeneous within the list and between the intensities. A logistic regression model was fitted to the data using custom-written MATLAB code (MATLAB 2014b, MathWorks) to estimate the 50% speech scores and the slope of the curve.
Compliance with ethical guidelines
All of the testing procedures were accomplished using a noninvasive technique and adhered to the conditions of the institutional ethical approval committee. Under the Helsinki Declaration for noninvasive procedures on human data, the Institutional Ethics Committee approved the study (IEC No: 8729/IEC/2023 dated 20/09/2023). The test procedures were clearly explained to the participants before testing. The participants gave written informed consent for the publication of the data and materials contained within this study.
Results
Speech perception scores for the word and sentence corpus were assessed for their homogeneity in audibility across different intensities (0, 10, 20, 30, and 40 dB SL). Initially, normality of the data was assessed using the Shapiro–Wilk test, revealing the data to be normally distributed (p>0.05) for both the word and sentence corpus. The list orders were retained to analyze the word and sentence corpus data for its homogeneity; both listwise analysis and the overall data inferential statistics were carried out.
Homogeneity in audibility assessment of word corpus
A repeated measures ANOVA was carried out, with intensities (dB SL) treated as the between-subject factor, and the number of correctly identified words in each word list was considered as a within-subject (repeated-measures) factor. No significant differences were observed when comparing the speech perception scores between the word lists, as indicated by F(26)=0.92, p=0.57, η2=2.534×10−4, revealing that the speech perception outcomes were similar across the word lists. Additionally, the effect of intensity on the performance between the word lists was compared, revealing no significant difference between the word lists within the intensities as evidenced by F(104)=0.73, p=0.98, η2=8.024×10−4. Levene’s homogeneity test was also assessed to account for the variance among the word recognition scores across the list, and intensity level (dB SL) indicated F(4, 108)=0.004, p>0.999, indicating strong homogeneity of variance among the word lists. Further, the consistency of the performance among the 10 participants across 5 conditions (dB SL) among 27-word lists was assessed using Cronbach’s α test, which revealed an interclass correlation coefficient of 0.986, indicating an excellent consistency among the word recognition scores. It is evident from the analysis that across the words in the word corpus and among the intensity levels, the scores were similar, indicating that the audibility of the signal recorded has no influence (i.e., neither underestimates nor overestimates) on the word recognition ability, indicating all words in the word corpus were homogenous in audibility. However, when the word recognition ability was compared between the intensities, a highly significant difference was observed with a high effect size (F(4)=1,400.87, p<0.001, η2=0.98), indicating that word recognition scores differed. Hence, post hoc Bonferroni correction was carried out between the intensities, indicating the lowest scores at 0 dB SL, whilst the ceiling was achieved around 40 dB SL (Table 1). The trend in change in performance with intensity is shown in Fig. 1.
Boxplot and scatter jitter of word (A) and sentence (B) recognition scores among the respective lists as a function of intensity.
It is evident that the standard deviation was observed to be very low, and the jitter plot depicts that the performance of all the word lists across the intensities was within the upper and lower limits, indicating similar performance (Fig. 1A).
Homogeneity in audibility assessment of sentence corpus
Sentence recognition scores were analyzed for their homogeneity in performance across the intensity using repeated-measures ANOVA, similar to word corpus analysis. The data revealed no significant difference between the list (F(19)=1.31, p=0.16, η2=0.001), and also, no significant difference in the performance was observed (F(76)=0.42, p>0.999, η2=0.002) when the sentence list was compared within each of the intensities (dB SL). Leneve’s homogeneity test (F(19, 80)=0.012, p>0.999) also indicates a strong homogeneity of the variance among the sentence corpus across the intensities. The Cronbach’s α values also revealed a high consistency in the performance among the sentence list across the intensities, as indicated by an interclass correlation coefficient of 0.955, indicating 95% consistency in speech perception scores across intensities. These findings suggest that the sentence corpus was homogenous in audibility (i.e., the influence of recording factors on the performance is negligible), and the graphical representation of the same is depicted in Fig. 1B.
However, when the sentence recognition scores were compared between the intensities (F(4)=208.96, p<0.01, η2=0.907), a highly significant difference in the performance was observed. Hence, a post hoc Bonferroni Correction was done between the intensities to analyze the performance (Table 1). It is evident from the comparison that the sentence recognition scores drastically improve with an increase in intensity, with comparatively low scores observed at 0 dB SL, with the ceiling in the performance after 30 dB SL. The trend in change in performance with intensity is depicted in Fig. 1B.
Psychometric property assessment of word and sentence corpus
A performance intensity curve was estimated as the word and sentence recognition scores (i.e., performance) varied as a function of intensity. Hence, the performance of the word and sentence corpus was fitted into the logistic regression model using a custom-written MATLAB code to assess its psychometric properties. The logistic regression model estimated the threshold (i.e., the level at which 50% performance is observed) and slope of the curve (estimates how drastically the word and sentence recognition scores increase with every 10 dB increase in intensity), which is mentioned in Supplementary Material (in the online-only Data Supplement).
The logistic regression analysis was carried out for all the words and sentences in the corpus. A listwise logistic regression analysis was also carried out to estimate the psychometric properties across the word and sentence lists as depicted in Tables 2 and 3, respectively. The same is depicted graphically in Fig. 2 for the overall words in the corpus and listwise. Fig. 3 represents the performance of overall sentences in the corpus and the listwise outcome.
Word recognition scores across the word lists as a function of intensity and their psychometric properties
Sentence recognition scores across the sentence lists as a function of intensity and their psychometric properties
Overall word recognition scores (A) and listwise word recognition scores (B) across the word lists as a function of intensity and their psychometric properties.
Overall sentence recognition scores (A) and listwise sentence recognition scores (B) across sentence lists as a function of intensity and their psychometric properties.
The logistic regression analysis reveals the speech recognition thresholds of the corpus (i.e., intensity level at which 50% of the words and sentences in the corpus have been identified) at 10.12 and 11.77 dB SL. The speech recognition threshold was also assessed across the word and sentence lists, which range from 10 to 10.32 dB SL for words and 10 to 13.96 dB SL for sentences, indicating similar threshold levels. Likewise, the slope of the curve (i.e., the rate of change of speech perception scores with change in the intensity levels) was observed at 6.58 and 6.45 for the mean values of overall words and sentences in the corpus, which indicates that the word recognition scores increase by 6.58%/dB and the sentence recognition scores increase by 6.45%/dB. The same trend was observed between the word and sentence lists when the slope of the curve was estimated, which revealed the slope of the curve to range from 6.55%/dB to 6.70%/dB for word lists and 6.315%/dB to 6.75%/dB for sentence lists. This change in the performance was predominant till 30 dB SL, beyond which not much change in the score was observed, as indicated by no significant difference (p>0.05) in the scores; the performance started to achieve a ceiling effect beyond that intensity level for both the words and sentences in the corpus.
Discussion
The speech perception outcomes of the word and sentence corpus were analyzed using repeated-measures ANOVA, comparing the performance between the list and the effect of intensity levels on the list performance, revealing no significant difference between the test items within each of the intensities assessed. This indicates that the word and sentence corpus items are highly homogenous in audibility and that the speech score is identical at a given intensity. Additionally, Levene’s homogeneity test also affirms that the variance among the list performance was similar across the test items, indicating the performance is identical across the word and sentence corpus. Similar findings were observed across different Indian [15,17,18,21,23] and non-Indian native languages [6–11,37]. However, in these languages, the homogeneity in audibility assessment was carried out after the complete development of the speech audiometry test tool, which caused deletion or re-recording of the material after the homogeneity assessment if the outcome did not indicate a homogeneous outcome across all the test items. Hence, the homogeneity assessment was carried out as an initial validation measurement to avoid content reduction after developing the test tool in the present study.
The interclass correlation coefficient values also revealed high consistency of the speech scores across the intensity levels, indicating that the construct of the material is consistent and highly reliable. This high level of internal consistency is a strong indicator of the homogeneity of the test materials, as it suggests that the items within each list are equally effective in assessing speech recognition [46]. Items with high variability are considered less homogeneous, while those with low variability are more homogeneous [41]. If the material’s content is homogeneous, it indicates reduced variability, which enhances the reliability of the test tool [34]. However, if the material’s content is not homogeneous, the speech audiometry outcomes during a clinical assessment may be skewed, leading to inaccurate diagnoses or inappropriate treatment recommendations [42].
The logistic regression analysis further supports the homogeneity of the word and sentence corpus. The mean thresholds (intensity level required to achieve a 50% recognition score) were observed at 10.12 dB and 11.77 dB for word and sentence lists, respectively. The regression analysis across the list reveals 10 to 10.32 dB SL (ref: PTA) for words and 10 to 13.96 dB SL for sentence lists. This narrow range of thresholds indicates that the items within each list were equally audible, as they required similar intensity levels to achieve the same level of recognition. This finding is consistent with previous research [34,39], indicating less variability in the outcome, yielding better speech perception scores and more reliable findings indicated by increased homogeneity among the test tools.
The psychometric function’s slope was also estimated to measure the rate of changes in speech perception scores as a function of intensity for the word and sentence corpus using logistic regression analysis. The slope of the curve at the 50% level was measured for the overall test items, which indicated 6.58%/dB for the word corpus and 6.45%/dB for the sentence corpus. When the slope was estimated across the lists, the estimates ranged from 6.55% to 6.70% for words and 6.31% to 6.75% for sentences. The above findings are consistent with the rate at which recognition scores increase with the increase in intensity across the word and sentence lists. Similar findings were observed in several non-native Indian languages with the slope of the curve ranging from 5.9% to 11.3%/dB [8,9,32,37,40,47,48]. This consistency in slopes further supports the homogeneity of the test materials, as it indicates that the rate of improvement in recognition scores was similar across the lists [34,37]. This homogeneity in the outcome could be due to better recoding protocol, selection of test items, and possible elimination of the confounding variables.
The findings of this study provide strong evidence supporting the homogeneity of the word and sentence corpora used in the speech audiometry tests. The consistent thresholds, slopes, and high internal consistency of the test materials indicate that they are reliable and valid measures of speech recognition. These findings have important implications for clinical practice and research, as they suggest that the words and sentences in the corpus can be used confidently to develop speech perception material to assess speech recognition abilities in individuals with normal hearing and hearing impairments. However, there are some limitations, such as using the adult population over the pediatric population in the current research. This population difference is due to the following practical difficulty, as the current research is a preliminary work in the development process of a speech perception test tool. At the corpora level, the number of words and sentences was humongous, hence it is practically challenging to assess the psychometric properties on the pediatric population, which requires prolonged attention and repeated follow-up of the participants to evaluate the corpus at varied intensities. Also, the words and sentences in the corpus were compiled from the textbooks of children from grades 1 to 5 (i.e., age range 5–10 years). Hence, there may not be a validity issue with respect to the population assessed in this preliminary study, as it assesses the homogeneity of the recorded words and sentences. The future research will focus on assessing the final developed word and sentence list on children aged 5 to 10. In addition, future studies could explore the impact of other factors, such as familiarity with the words or sentence, validating the material on the pediatric population, multicentric assessment, and internal redundancy on the homogeneity of speech audiometry materials. Overall, the development of homogeneous speech materials is a critical step in ensuring the accuracy and reliability of speech audiometry tests, and the findings of this study contribute to the growing body of literature.
Supplementary Materials
The online-only Data Supplement is available with this article at https://doi.org/10.7874/jao.2025.00262.
Supplementary Material
Notes
Conflicts of Interest
The authors have no financial conflicts of interest.
Author Contributions
Conceptualization: Udhayakumar Ravirose, Devi Neelamegarajan. Data curation: Udhayakumar Ravirose. Formal analysis: Udhayakumar Ravirose, Devi Neelamegarajan. Investigation: Udhayakumar Ravirose, Devi Neelamegarajan. Methodology: Udhayakumar Ravirose, Devi Neelamegarajan. Project administration: Udhayakumar Ravirose, Devi Neelamegarajan. Supervision: Devi Neelamegarajan. Validation: Udhayakumar Ravirose, Devi Neelamegarajan. Visualization: Udhayakumar Ravirose, Devi Neelamegarajan. Writing—original draft: Udhayakumar Ravirose, Devi Neelamegarajan. Writing—review & editing: Udhayakumar Ravirose, Devi Neelamegarajan. Approval of final manuscript: Udhayakumar Ravirose, Devi Neelamegarajan.
Funding Statement
None
Acknowledgments
The authors would like to acknowledge the participants for their support and cooperation. The authors gratefully acknowledge the financial support of SRM Medical College Hospital and Research Centre, Faculty of Medicine and Health Sciences, SRMIST, Kattankulathur, for bearing the defrayed costs of publishing this article. The current study is a part of a doctoral thesis.
