Mobile Audiometry for Use in Ototoxicity Monitoring Programs: A Scoping Review
Article information
Abstract
Ototoxicity monitoring programs (OMPs) for cisplatin-induced hearing loss have not been widely adopted in clinical practice for various reasons. Mobile audiometry (MA) offers cost and convenience advantages over conventional pure-tone audiometry (CA) and it is currently used in hearing screening. However, there is no consensus on whether MA can replace CA for measuring hearing thresholds in OMPs. This scoping review examines the challenges of OMPs and evaluates the diagnostic accuracy of MA for hearing thresholds. A comprehensive search was conducted in four databases from their inception to December 2024. Data on study characteristics, reported OMP challenges, MA specifications, test settings, and performance measures were extracted. Nine studies on OMP challenges were reviewed. Identified barriers were inconsistent referrals, resource constraints, low awareness of ototoxicity monitoring, and patient-related factors. Twenty-three studies reporting on three portable audiometers, and 14 app-based hearing tests were evaluated for the diagnostic accuracy of MA for hearing thresholds. Only two studies involved testing at extended high frequencies. Studies used measures including MA-CA threshold differences, sensitivity/specificity, and test-retest reliability. App-based MA represents an accessible and scalable solution to the resource constraints faced by OMPs. However, its diagnostic accuracy remains uncertain given the substantial methodological variability across studies. OMPs using MAs should consider clinically validated modalities.
Introduction
Ototoxicity-induced hearing loss (HL) is a common adverse event of cisplatin chemotherapy [1]. It is typically sensorineural [2] and affects higher frequencies above 8 kHz first [3]. Though initially asymptomatic, cumulative cisplatin exposure may eventually cause HL at speech frequencies [4], compromising communication and quality of life [5]. Since cisplatin is effective against many tumours, early HL detection and management are more practical than avoiding its use [6]. Early HL detection lets clinicians and patients weigh the risk of permanent HL against continued cisplatin use [7]. Ototoxicity monitoring programs (OMPs) are hospital-based programs that prospectively monitor hearing thresholds for patients on ototoxic medications [8] using conventional pure-tone audiometry (CA) with extended high frequencies (EHF). Beyond early HL detection, OMPs also longitudinally monitor patients to fit them with hearing devices at the earliest opportunity, or as soon as the need arises [9]. Unfortunately, OMPs are not well adopted in clinical practice. Reported barriers include patient inconvenience, logistical challenges, and costly infrastructure [9]. It is not known whether other OMPs worldwide face similar barriers.
Mobile audiometry (MA) is an alternative to CA. MA leverages on portable audiometers and mobile applications to assess hearing levels [10] and can be used in non-sound-treated test settings [11]. Mobile applications (henceforth referred to as “app-based MA”) have shown good sensitivity and specificity for hearing screening (binary outcomes of pass or fail) [12] while providing cost savings and convenience over CA. It is widely used for workplace screening and hearing screening in rural areas [13]. However, evidence supporting the use of MA for diagnostic threshold testing remains inconclusive, with meta-analyses both endorsing [14] and opposing its use [15]. Hence, the authors conducted this scoping review for two reasons: first, to identify the challenges faced by OMPs for patients on cisplatin; and second, to evaluate the diagnostic accuracy of MA for hearing thresholds.
Materials and Methods
Protocol
This scoping review was conducted based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) extension for Scoping Reviews (PRISMA-ScR) guidelines [16]. We included full-length articles published in peer-reviewed journals including primary and observational studies. Case reports, systematic reviews, meta-analyses, opinion pieces, theses, dissertations, and study protocols were excluded.
Search strategy
The authors searched three electronic databases (PubMed, Scopus, and Embase) and one search engine (Google Scholar) for articles published from database inception through December 2024 using the search terms listed in Table 1. Our search strategy included Medical Subject Headings (MeSH) terms and free-text words related to two concepts: 1) challenges facing OMPs for patients on cisplatin and 2) accuracy of MA for diagnostic hearing thresholds.
Study selection
Each investigator (P. W. C. Y. and L. Z. H.) manually performed screening of the titles and abstracts to select eligible articles. The full text of these articles were then examined for inclusion. All disputes regarding article selection were resolved through discussion. The study selection process is documented in a PRISMA diagram (Fig. 1).
Data extraction
Both authors (P. W. C. Y. and L. Z. H.) autonomously extracted pertinent data from included articles into a standardized data charting form. The data extracted included authors, publication year, title of study, study characteristics, reported OMP challenges, MA specifications (equipment type, calibration procedures, frequency ranges), test setting, and performance measures (sensitivity, specificity, and accuracy compared to CA).
Results
Characteristics of studies on OMP challenges
Database searches yielded a total of 379 articles, of which 34 full text were retrieved. There were nine studies included in the final review [9,17-24]. Publication years spanned 2018 to 2024. There were four mixed-methods studies [9,18,22,24], three qualitative studies [20,21,23] and two retrospective studies [17,19]. Study population included healthcare workers (physicians, audiologists, and nurses) and cancer patients on cisplatin chemotherapy. Four studies were from the United States (US) [9,18,19,22], three from South Africa [21,23,24], and two from Europe [17,20]. Table 2 shows the summary of the included studies.
Inconsistent referrals
Inconsistent referrals was reported as the key barrier by seven studies [9,17,20-24]. These studies noted the lack of standardized referral systems to OMPs, with patients either self-referring upon experiencing HL or being referred postchemotherapy at oncologists’ discretion.
Resource constraints
Insufficient sound booths and manpower to support OMPs was reported by four studies [9,11,21,24]. Such resource constraints reduced appointment slots for OMPs and necessitated testing in non-sound-treated test setting (such as clinic waiting area). It also made it difficult to coordinate OMPs on the same day as patients’ chemotherapy appointments [24], leading to increased travel, greater fatigue, and reduced compliance to OMP. Some centers lacking referral pathways allocate the responsibility of screening and enrolment of patients for OMP to audiologists, adding on to their regular workload [9].
Low awareness of ototoxicity monitoring
Despite the availability of ototoxicity monitoring guidelines, six studies [9,11,17,20,21,23] reported low awareness leading to inconsistent implementation. Two centers were unsure of monitoring protocols and testing procedures [17,20]. In some cancer centers, staff awareness of reason and purpose ototoxicity monitoring was low [24].
Patient-related factors
Fatigue, scheduling constraints and financial matters influenced patient participation in OMPs [9,18-20,23,24]. In chemotherapy centers without co-located audiology departments, patients reported increased fatigue from travel and difficulty keeping up with OMP appointments [9,24]. Two studies reported that lack of insurance coverage and patients with non-head and neck cancer were associated with lower rates of monitoring [17,19].
Characteristics of studies on MA
Initial database searches yielded only two articles on MA use in OMPs. The search was expanded to include MA in general, yielding a total of 501 articles, of which 61 full text articles were retrieved. Twenty-three studies relevant to MA were included in the final analysis to evaluate its diagnostic accuracy for hearing thresholds [25-47]. Eight studies were from the US [26,31,39,40,43,44,47], three from Canada [27,28,35], two from India [37,45] and South Africa [38,41], and one from each of these countries: Australia [33], Denmark [48], Indonesia [46], South Korea [30], Singapore [36], the United Kingdom (UK) [42], and Uganda [25]. Publication years spanned 2013 to 2024. Table 3 shows the summary of the included studies.
Reference test
CA performed manually using clinical audiometers and calibrated transducers in sound booths is the gold standard for diagnosing HL type and severity [42,47]. CA is the reference test for comparison with MA in all but one study. The sole exception was the KUDUwave Type 2 Clinical Audiometer (MoyoDotNet, Johannesburg; IEC 60645-1/2 compliant) [41], which meets international standards for diagnostic audiometry.
Types of MA
Two device types were used for MA. The first comprised portable audiometers (n=3 studies), specifically the OtoID [47], KUDUwave 5000 [33], and KUDUwave Prime [46].
App-based hearing tests (n=20 studies) formed the second device type. iOS-compatible applications included SHOEBOX [26-28,37,39,40], EarTrumpet [27,31,44], Mimi [36,49], Easy Hearing Test [42], Hearing Test and Ear Age Test [42], Audiogram Mobile [44], and Care 4 Ear [30]. Android-compatible applications included H3 Hearing Test [45], Wulira [48], R-App [48], and hearTest [32,38,41], while Eartone Hearing Test [42], Hearing Test [42], and Hearing Test with Audiogram [44] were compatible with both Android and iOS. Etymotic Home Hearing Test ran on Windows OS via touchscreen computer tablet [43].
Test settings for MA
Test settings varied across studies. The majority conducted MA outside sound booths, with seven studies not specifying any ambient noise monitoring measures [28,36,37,41,42,45,49]. Of the eight studies that measured or monitored ambient noise levels [26,30,33,35,44,46-48], two implemented noise control by pausing the test when ambient noise exceeded 40 dB [30], and when broadband noise exceeded 70 dB SPL [47]. Four studies performed MA exclusively inside sound booths [25,27,32,38]. One of these studies introduced white noise in the sound booth, and implemented placement of circumaural muffs over subjects’ ears for “passive noise cancellation” during hearing testing through audiometric insert earphones [27]. Two studies conducted MA in both settings without noise monitoring [31,42], while one study did not report the test environment [40].
Transducers for MA
A variety of transducers were utilized for air conduction testing. Some studies used more than one transducer. Audiometric headphones were used in nine studies [26,28,35,37-39,45,47,48]. Audiometric insert earphones were used in three studies [27,28,43]. Proprietary commercial headphones and earphones (non-audiometric headphones and earphones typically bundled with mobile phones and available for sale to consumers) were used in three studies [30,31,49]. Non-proprietary commercial headphones and earphones (non-audiometric headphones and earphones available for sale to consumers) were used in four studies [36,40,42,44]. One study reported using “standard headphone” without specifying the model and type [25].
Transducer placement for MA
Transducer placement was performed by researchers in four studies [33,41-43], by subjects in six studies [28,31,45,47-49], and not specified in 13 studies [25-27,30,32,35-40,44,46]. All MA hearing tests were automated.
Intensity range for MA
The intensity range of MA hearing tests was not indicated in 13 studies. Even among studies using the same MA, the reported intensity ranges varied. For instance, SHOEBOX with TDH-39 supra-aural audiometric headphones had lower and upper limits set to 15 and 90 dB HL in one study [26], while another study using the same MA with ER3A audiometric insert earphones and TDH-50 supra-aural headphones reported limits of 10 and 90 dB HL [28]. Another example would be hearTest app. One study using Sennheiser HD 202 II supra-aural audiometric headphones as transducer tested at ≥10 dB HL [32], while another study using Sennheiser 280 Pro supra-aural commercial headphones as transducer tested at 10–90 dB HL for 2 and 4 kHz and 10–80 dB HL for 8 kHz [41]. When the hearTest app was used for EHF testing with Sennheiser HDA 300 circumaural audiometric headphones and Sennheiser HDA 200 circumaural audiometric headphones, the intensity ranges were specified as follows: from 10 dB up to 75 dB HL at 8 kHz, up to 70 dB HL at 10 kHz, up to 75 dB HL at 12.5 kHz, and up to 65 dB HL at 16 kHz [38]. In some studies using other MA hearing tests, only the upper intensity limits were provided. For example, R-App, when used with RadioEar DD450 supra-aural audiometric headphones, had a limit of <80 dB HL [48], while the Etymotic Home Hearing Test with ER-3 insert earphones had a limit of ≤85 dB HL [43]. Mimi, when used with Apple EarPods proprietary commercial earphones, had a maximum measurement of 90 dB HL [49], but when paired with Baseus Encok D02Pro non-proprietary commercial headphones, the limit was reduced to 70 dB HL at 4 kHz (other frequencies not specified) [36]. Finally, OtoID used with Sennheiser HDA200 circumaural audiometric headphones was reported to have an intensity range of -10 to 105 dB SPL (not dB HL) .
Frequency range for MA
The frequencies tested in MA varied widely. Several studies assessed the standard pure-tone audiometry (PTA) frequencies (250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, 8 kHz) [28,30,33,36,39,40,44], with some including inter-octave frequencies [31,42,46,48]. Others tested at fewer than the standard frequencies [25,26,32,37,41,43,49]. Two studies excluded 250 Hz but added inter-octave 6 kHz [27,35,45]. One study tested at both the standard frequencies and EHFs (10, 12.5, 16, and 20 kHz) [47], while another assessed only EHFs (8, 10, 12.5, and 16 kHz) without standard frequencies [38].
Calibration for MA
Eight studies did not report any calibration done [25,30,31,33,35,36,40,45]. One study reported utilizing a calibrated sound card in their MA [43]. Among the studies that reported [26-28,32,37-39,41,42,44,46-49], methodological approaches varied considerably.
Results from statistical analysis
In the 23 studies reviewed, hearing thresholds obtained by MA were compared to that of the reference test’s (CA) for statistical analysis.
Applied statistical method
The statistical analysis methods for outcome measures also differed substantially across the studies. Three studies reported within-threshold differences between MA and CA using descriptive ranges of 0–5 dB or 0–10 dB threshold differences [27,28,31]. For MA conducted using EarTrumpet app-based MA, the proportion of thresholds falling within 10 dB of CA results ranged from 88% at 6 kHz to 98% at 750 Hz when testing was performed in a sound booth. When conducted in a quiet room, the agreement ranged from 87% at 250 Hz and 6 kHz to 96% at 750 Hz. Confidence intervals were not reported [31]. Another study that also used EarTrumpet reported more precise estimates, with 91.1% difference (95% confidence interval [CI]: 89.1%–98.2%) in a sound booth and 95.8% difference (95% CI: 93.5%–98.0%) when white noise was introduced in the same booth [27]. In the same study but using SHOEBOX MA, the 0–10 dB threshold difference was 86.5% (95% CI: 82.6%–88.5%) in a sound booth and 91.3% (95% CI: 88.5%–92.8%) when white noise was added to the testing environment [27]. Also using SHOEBOX, another study reported frequency-specific 0–10 dB threshold differences: 85.9% (95% CI: 76.0%–92.2%) at 250 Hz, 91.8% (95% CI: 83.2%–96.2%) at 500 Hz, 97.1% (95% CI: 89.9%–99.2%) at 1 kHz, 96.9% (95% CI: 89.5%–99.2%) at 2 kHz, 100% (95% CI: 94.0%–100%) at 4 kHz, and 85.5% (95% CI: 74.7%–92.2%) at 8 kHz [28].
Two studies reported threshold agreement rates but omitted confidence intervals. The first study, using EarTrumpet MA, found that 95% of thresholds obtained in a quiet room fell within 10 dB of CA results, compared to 88% agreement in a clinic waiting area [44]. The second study, using SHOEBOX, reported slightly lower agreement rates—76.4% at 250 Hz, 82.1% at 500 Hz, 85.7% at 1 kHz, 87.9% at 2 kHz, and 81.4% at 4 kHz [37]. The latter also analyzed threshold differences using paired t-tests.
Parametric
Paired t-test was done in six studies [26,37-40,45]. Other parametric methods identified across the reviewed studies were: Student’s t-test [35], Pearson’s correlations [30,42,43,45], ANOVA [33], and linear regression [41,43,48]. Among the studies where paired t-test was done, four utilized the SHOEBOX app-based MA. The first study [40], which did not specify the transducer or test environment, found statistically significant differences (p<0.05) only at 8 kHz (mean difference: 3.34 dB±11.55, p<0.0004) among standard PTA frequencies. The second study [26], conducted with TDH-39 supra-aural audiometric headphones in a clinic consultation room, reported significant differences at 1 kHz (3.18±4.5 dB, p<0.001) and 2 kHz (2.8±5.5 dB, p=0.002). The third study [39], using RadioEar DD450 supra-aural audiometric headphones in a clinic consultation room, demonstrated significant differences at 250 Hz (3.60±8.19 dB, p=0.006) and 500 Hz (2.74±8.71 dB, p=0.048). The fourth study [37], employing TDH-39 audiometric headphones in a hospital clinic room, found significant differences at 250 Hz, 500 Hz, and 1 kHz (all p<0.05).
In contrast, a study using the hearTest app-based MA with Sennheiser HDA 300 circumaural audiometric headphones in a sound booth [38] revealed no significant threshold differences between MA and CA (p>0.05) across tested EHF frequencies. Another study examining the H3 Hearing Test app-based MA with non-proprietary commercial earphones in home settings [45] reported significant differences (p<0.05) at all frequencies except 500 Hz in the left ear. This study also applied Pearson’s and Spearman’s correlations, finding significant correlations (p<0.05) for most frequencies in both ears, except at 500 Hz, 2 kHz, 3 kHz, and 4 kHz in the left ear.
The only study where Student’s t-test was applied used the Mimi app-based hearing test with Sennheiser HDA 200 and noise-cancelling HDA 300 audiometric headphones in a quiet room, demonstrating significant differences at 500 Hz, 1 kHz, and 2 kHz (all p<0.001) for both transducer types [35].
Using Pearson’s correlation, studies demonstrated varying degrees of agreement between MA and CA. One study evaluating four app-based MAs [42] using non-proprietary commercial earphones in non-sound-treated test settings reported the following means across tested frequencies: Easy Hearing Test (r=0.77), Hearing Test & Ear Age Test (r=0.47), Eartone Hearing Test (r=0.69), and Hearing Test (r=0.83).
A separate study utilizing the Care 4 Ear app-based MA [30] with proprietary commercial Apple EarPods in a quiet office setting found all frequencies to be significantly correlated with CA (all p<0.001), with the following coefficients: r=0.660 (250 Hz), r=0.748 (500 Hz), r=0.809 (1 kHz), r=0.791 (2 kHz), r=0.699 (4 kHz), and r=0.709 (8 kHz). The strongest correlations emerged in a study that used the Etymotic Home Hearing Test [43] with ER-3 audiometric insert earphones in a carpeted classroom environment. This study reported strong correlations for all frequencies in both ears: r=0.909 (500 Hz right ear), r=0.917 (500 Hz left ear), r=0.924 (1 kHz right ear), r=0.945 (1 kHz left ear), r=0.960 (2 kHz right ear), r=0.961 (2 kHz right ear), r=0.969 (4 kHz right ear), r=0.970 (4 kHz left ear), r=0.953 (8 kHz right ear), and r=0.968 (8 kHz left ear). It was further reported that between right and left ears, the side exhibiting strong correlation was statistically significant at each frequency (p<0.001).
Linear regression analyses in the reviewed studies found no significant associations between MA hearing threshold variability and: age, baseline ototoxicity-sensitive range, or cisplatin dosage [41]; degree of HL, age, or gender [43]; user-level effects in user-operated audiometry reliability [48].
Non-parametric
Three types of non-parametric tests were applied in some studies: Spearman’s rank correlation coefficient [36,45,49], Mann-Whitney U test [46], and Wilcoxon signed-rank test [32,41]. Graphical methods included Bland-Altman plots to evaluate CA-MA agreement [26,36,40,48] and violin plots to visualize data distributions [42].
Spearman’s rank correlation coefficient in Mimi app-based MA [49] with proprietary commercial Apple EarPods in a non-sound-treated room showed a Spearman correlation coefficient of 0.51 (p<0.0001) for normal hearing and mild HL, and 0.68 (p<0.0001) for moderate and worse HL. For normal hearing alone, the Spearman correlation coefficient was 0.22 (p=0.0001).
Another study using Mimi with Baseus Encok D02 Pro nonproprietary commercial headphones in a quiet clinic room demonstrated either strong or very strong correlations to CA at 250 Hz, 500 Hz, 1 kHz, 2 kHz, and 4 kHz [36]. These results were supported by Bland-Altman plots, which indicated good agreement between MA and CA at each frequency, with most data points falling within the limits of agreement and no proportional bias observed.
Two additional studies [26,40] reporting Bland-Altman plots for individual frequencies in MA and CA also found no proportional bias. Similarly, another study analyzing Bland-Altman plots based on mean thresholds—rather than individual frequencies—reported no proportional bias between MA and CA [48].
Sensitivity and specificity
Several studies reported the sensitivity and specificity of MA using different methodologies. One way was by frequency. The SHOEBOX app-based MA demonstrated sensitivity and specificity of 87.0% and 95.0% at 500 Hz, 100% and 92% at 1 kHz and 2 kHz, and 95.0% and 90.0% at 4 kHz, respectively [26]. The Wulira app-based MA reported frequency-specific results by frequency and ear, and these were the lower of the two values for each pair of ears, by frequency: 76.7% sensitivity and 75.6% specificity at 500 Hz, 87.0% sensitivity and 95.9% specificity at 1 kHz, 95.0% sensitivity and 98.0% specificity at 2 kHz, and 88.1% and 91.0% at 4 kHz [25].
Other studies classified results by HL severity. The KUDU-wave Prime portable audiometer showed 80% sensitivity and 89% specificity for normal hearing, 89% and 37% for mild HL, 89% and 70% for moderate HL, 97% and 85% for moderately-severe HL, and 93% and 96% for severe HL respectively [46]. The Mimi app-based MA reported 35.5% sensitivity and 97.1% specificity for normal hearing (n=76), 57.9% and 59.3% for mild HL (n=83), 19.4% and 84.6% for moderate HL (n=45), 18.2% and 94.7% for moderately-severe HL (n=11), and 80.0% and 96.0% for severe HL (n=11), with overall values of 97.1% sensitivity and 35.5% specificity (n=104) [49].
Several studies reported overall performance metrics. The SHOEBOX app-based MA showed 89% sensitivity (95% CI 80%–94%) and 70% specificity (95% CI 56%–82%) [37], while the OtoID portable audiometer demonstrated 80.6% sensitivity and 85.3% specificity [47]. For detection of HL specifically, SHOEBOX showed 94.3% sensitivity (95% CI 91.9%–96.8%) and 92.3% specificity (95% CI 90.1%–94.4%) in one study [40], and 100% sensitivity and 62.5% specificity in another study [39]. Detection of at least mild HL in one ear using SHOEBOX showed 100% sensitivity (95% CI 88%–100%) and 91% specificity (95% CI 62%–98%) [28]. Detection of moderate HL using Mimi showed 100% sensitivity and 80.2% specificity [43]. Lastly, in a study that evaluated three app-based MA across different environments, EarTrumpet showed 96.3% sensitivity and 83.1% specificity in a quiet room versus 100% and 72% in a clinic waiting area; Audiogram Mobile [44] demonstrated 85.3% sensitivity and 95.1% specificity in a quiet room versus 87.6% and 92.3% in a waiting area. Hearing Test with Audiogram showed 87.8% sensitivity and 69.4% specificity in a quiet room versus 89% and 68.2% in a waiting area [44].
Test-retest reliability
Two studies evaluated test-retest reliability using intraclass correlation coefficient (ICC) to compare CA and MA hearing thresholds. In these studies, each subject completed MA twice, with both MA results compared against their CA thresholds. One study using SHOEBOX reported an overall ICC of 0.98 [28], while another testing four app-based MAs found an overall ICC of 0.90 [42].
Discussion
Challenges of OMPs
The key barriers to ototoxicity monitoring identified in the literature include inconsistent referrals, resource constraints, low awareness of ototoxicity monitoring, and patient-related factors.
Inconsistent referrals can stem from the lack of referral pathway [9] and variability in clinician referral practices [50]. It was observed that cancer patients who followed up with ENTs had higher referral rates to OMP, attributed to higher awareness of ototoxicity and direct audiology access [19]. This suggests that on-site provision of OMP may eliminate administrative inefficiencies of referral pathways. MA is mobile by design; it can enable point-of-care OMP delivery at the bedside.
Resource constraints arose from two sources: infrastructure and manpower. Essential infrastructure—sound booths, audiometers, and transducers—requires an estimated USD 56,700 setup cost [24]. MA could achieve comparable functionality at lower cost [47,51]. While manpower cannot be replaced, MA with automated testing function may improve efficiency by enabling concurrent multi-patient testing and eliminating manual test administration.
Low awareness of ototoxicity monitoring stems from three factors. One, an underlying lack of national guidelines such as in the UK and Italy meant no agreed-upon standard of care [17,20]. Two, where guidelines do exist, the scope of tests varied from hearing monitoring only (baseline/pre-dose/posttreatment), to having a battery of tests including distortion product otoacoustic emissions (DPOAE) and EHF [52]. Three, staff were unfamiliar with the reasons behind ototoxicity monitoring [24]. Centers initiating OMPs should prioritize basic hearing monitoring for a start. Implementing this via MA further lowers barriers of entry.
Patient related factors directly affect patient compliance to OMPs. The added demands of attending OMPs on the patients already experiencing cancer- and treatment-fatigue [53] increases dropout rates, resulting in patients being lost follow-ups. Bringing OMP to the patient by use of MA can mitigate this risk.
Mobile audiometry
This scoping review identified 14 distinct app-based hearing tests, and three portable audiometers utilized for MA. It is important to recognize that app-based MAs are software applications dependent on an operating system (OS) for functionality. The 14 app-based MAs ran on various OS platforms, including iOS, Android OS, and Windows OS. Each OS requires specific hardware infrastructure to operate, with Android and Windows OS showing broader hardware diversity [34].
This hardware diversity inherently leads to differences in key components that can directly impact MA performance. Of relevance is the digital-to-analog converter (DAC), which converts digital signals to analog outputs. While apps can manipulate the digital signal transmitted to DAC, apps cannot bypass DAC’s hardware constraints [54]. This was why the researchers in one study connected an external DAC to modify stimulus level limits [48]. Similarly, transducers represent another hardware component with fixed physical characteristics that apps cannot override.
Standardizing hardware components is a direct solution to mitigate hardware diversity. This approach could be observed in studies that utilized SHOEBOX app-based MA, which maintained strict hardware consistency by operating exclusively on iOS devices (iPads [26,37,39,40] and iPad Air [27]) and using only audiometric transducers. These included circumaural headphones (TDH-39 [26,37], TDH-50 [28], RadioEar DD450 [39]) and insert earphones (3A E-A-Rtone [27], ER3A [28]).
Following the hardware standardization principle, integrated and purpose-built units are the solution, as they eliminate all ambiguities. Such solution already exists in the form of clinical audiometers. Portable audiometers are a type of clinical audiometers. This review identified three portable audiometers used in MA: two models from KUDUwave and one from OtoID.
When hardware standardization proves impractical, calibration is an alternative for managing device-specific limitations. One study highlighted the lack of established calibration protocols for MA [27], which likely explains both the unaddressed calibration in eight studies [25,30,31,33,35,36,40,45] and the application of existing technical standards not designed for MA in others [39,41,48].
Two studies highlighted the lack of standardized calibration protocols for MA [25,27]. This explained the complete omission of calibration in eight studies [25,30,31,33,35,36,40,45]. It also explained the varied calibration approaches undertaken. In a study using the hearTest app-based MA, researchers adapted the calibration protocol from its affiliated screening tool, hearScreen [32]. In two studies, researchers applied published reference equivalent threshold sound pressure level (RETSPL) values to their transducers [32,38]—Sennheiser HD202 II supra-aural non-audiometric headphones were calibrated using RETSPLs from a hallmark study [29], while the Sennheiser HD 280 Pro supra-aural non-audiometric headphones were calibrated using values presented at a conference [55].
Interestingly, researchers bypassed standard calibration and hardware limitations in one study by inserting a pre-calibrated USB sound card to the tablet running the Etymotic Home Hearing Test. The practical viability of this approach remains unexamined, however [43].
Hardware limitations directly affect downstream parameters, including intensity range and frequency range. Both are essential diagnostic elements of hearing threshold testing. Both were not consistently addressed in the studies reviewed. Frequency range is particularly important when it comes to ototoxicity monitoring (OMP). The studies that tested at standard frequencies and EHF used portable audiometers OtoID [47] and KUDUwave Prime [46]. The study that tested only at EHF of 8, 10, 12.5, and 16 kHz used hearTest app-based MA [38]. One thing in common for these three MAs were the transducers—only audiometric transducers were used. A study using hearTest MA reported 86.6% 0–5 dB threshold correspondence for audiometric headphones (Sennheiser HDA 300 circumaural), compared to 78.7% for commercial non-proprietary headphone (Sennheiser HD202 II supraaural) [38]. This highlighted an important trade-off: while some MAs can operate with various transducers, restricting use to audiometric transducers yields more accurate hearing threshold measurements.
The two studies that performed EHF testing through MA focused on populations at risk for high-frequency HL: adults potentially exposed to ototoxic medications [38] and veterans undergoing cisplatin chemotherapy [47]. MA was done in a sound booth in the first study [38] and done in “quiet patient care area around chemotherapy treatment unit” [47] in the latter study. Notably, only one of these studies [38] reported no statistically significant differences in EHF thresholds between CA and MA, though neither study provided sensitivity/specificity metrics or evaluated test-retest reliability. This indicated potential for OMP to be carried out in a non-sound-treated test setting like the chemotherapy bay using certain MA.
Within the scope of this review, only two app-based MA demonstrated capacity to mitigate the effects of dynamic ambient noise—testing halts if the ambient noise levels exceed 40 dB HL [30] or if broadband noise exceeds 70 dB SPL [47]. The different noise assessment approaches and cut offs indicates the lack of consensus on appropriate noise limits for MA done in non-sound-treated test settings.
The review found substantial variability in how studies reported statistical results, making comparisons difficult. A key factor was the difference in intensity ranges between MA and CA. When CA is able to test below 10 dB HL, MA testing down to only 10 dB HL inadvertently created a floor effect. This was acknowledged by the researchers, who reported results with and without floor effect: the 0–5 dB threshold correspondence of CA and hearTest app-based MA with audiometric headphones was 77% including floor effect, and 70.2% excluding floor effect [38]. Certain measures of meaningful variability are confounded by the floor effect. Three other studies also noted this limitation [32,41,48]. While one study pointed out that statistical significance is not equivalent to clinical significance [37], there can be clinical implications if the diagnostic accuracies are affected [38], whether by hardware constraints, intensity range, frequency range, or ambient noise to name a few.
Based on current evidence, there are more app-based options than portable audiometers options for MA. There were also more studies conducted on app-based MAs. App-based MA represent an accessible and scalable solution as they can be used with a wide variety of OS and hardware. However, the same latitude are also inherent flaws, affecting diagnostic accuracies. A study aptly concluded that, “If the test should be used for clinical purposes, it will require that the equipment should be approved for that purpose.” [48]. Among all the MA options, clinically validated MAs with ambient noise management at time of review are: KUDUwave, hearTest (by hearX Group), and SHOEBOX. There were few studies that used these MA, fewer that had EHF testing, and fewer still examined populations vulnerable to ototoxicity. The limited evidence precludes definitive conclusions about MA suitability for OMPs.
Limitations
While the benefits of and barriers to OMP adoption are well-documented, no studies have reported detailed implementation strategies for successful programs. There is no reference to guide implementation and practice. However, it also indicates that OMP is an emerging clinical framework with potential for growth.
The reviewed MA studies showed substantial methodological variability and limited reporting of key parameters, including the number of ears tested (which does not directly equate to the number of subjects), bone conduction testing, and air conduction masking, preventing meaningful cross-study comparisons. Only two studies evaluated MA use in OMPs.
Conclusion
There are various barriers to OMP adoption. MA presents as an accessible and scalable solution to overcome reported barriers, such as resource constraint, inconsistent referral, and patient-related factors. However, its diagnostic accuracy remains uncertain due to significant methodological variability across studies. For now, OMPs intending to use MA should consider clinically validated modalities.
Notes
Conflicts of Interest
The authors have no financial conflicts of interest.
Author Contributions
Conceptualization: Pierre W. C. Yim, Zee Hui Lim. Data curation: Pierre W. C. Yim, Zee Hui Lim. Formal analysis: Pierre W. C. Yim, Zee Hui Lim. Investigation: Pierre W. C. Yim, Zee Hui Lim. Methodology: Pierre W. C. Yim. Project administration: Pierre W. C. Yim. Supervision: Pierre W. C. Yim, Zee Hui Lim. Validation: Pierre W. C. Yim, Zee Hui Lim. Writing—original draft: Pierre W. C. Yim, Zee Hui Lim. Writing— review & editing: Pierre W. C. Yim, Zee Hui Lim. Approval of final manuscript: Pierre W. C. Yim, Zee Hui Lim.
Funding Statement
None
Acknowledgments
None
