Systematic review of validation studies for the use of wearable smartwatches in the screening of atrial fibrillation

Introduction Atrial fibrillation (AFib) is a common dysrhythmia and a risk factor for stroke and heart failure. Early detection and treatment are key to avoiding complications (especially in sustained AFib). Here, we systematically review the potential of wearable smartwatches (WSWs) to screen for AFib. Method A literature search was conducted, and only those validation studies were shortlisted where the screening ability of WSWs was compared with EKG, and the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and Kappa statistic were provided. Result Twelve studies were included with a combined sample size of 1,075,088. Most validation measures of WSWs were above 90% and comparable with KardiaBand by AliveCor (KB), which is an FDA-approved device to detect AFib. Conclusion WSWs have the potential to reliably and continuously screen for AFib and detect it in a timely manner. The inconclusive results produced by WSWs are a significant problem. Once the inconclusive results are rectified, WSWs may be used for widespread screening of AFib in those people who are at high risk of developing AFib.


Introduction
Atrial fibrillation (AFib) is the most prevalent atrial arrhythmia, affecting 37.6 million people globally, and is only expected to rise as the population ages [1].In the USA, the burden of AFib is projected to reach 12.1 million cases by 2030 [2].This arrhythmia is associated with increased morbidity and mortality primarily characterized by thromboembolic events and cardiomyopathy.Prompt detection is crucial to managing these risks.
Widespread adoption of wearable smartwatches (WSW) has introduced an accessible method for detecting AFib.Specifically, WSW with electrocardiographic capability may be used to make a new diagnosis of AFib with clinician expertise.The recently updated 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of AFib includes a class I recommendation allowing for the initial diagnosis of AFib to be made based on a health care professional's examination of the EKG tracing from WSWs [3].Of note, the most readily available WSWs use photoplethysmography (PPG)-based devices that are only useful for detecting irregular pulses and cannot be used to make a diagnosis of AFib-hence why we discuss the screening capability rather than the diagnostic potential of WSWs.
The screening capability of WSW in the detection of AFib is not entirely clear.Reported drawbacks of using WSW for detection of AFib, such as high false-positive results (i.e., low specificity) [4] unreliable readings with hairy wrists, or tattooed wrist skin, etc [5].Although seven reviews on this topic exist, none are systematic or meta-analytical, often being narrative and limited in scope [4,[6][7][8][9][10][11].Our study, encompassing eight varied studies, addresses this gap and responds to the evolving field of wearable technology, making our systematic review both timely and essential.
Our study aims to systematically review the effectiveness of these new screening methods compared to traditional ones in detecting AFib.We focus on evaluating the validity of smartwatches for the screening of AFib in comparison to EKG.

Methods
The systematic review included all published studies until December 30th, 2023.We utilized two databases, PubMed and OvidSP (outlined in Fig. 1), and searched the terms: "Smartwatch, " "Apple Smartwatch, " "Samsung Smartwatch, " "Huawei Smartwatch, " "Wearable Smartwatch, " "Atrial Fibrillation, " "AFib, " "Samsung", "Apple, " "Huawei, " "Paroxysmal Atrial Fibrillation, " "Persistent Atrial Fibrillation, " "Long-Term Persistent Atrial Fibrillation, " "Permanent atrial fibrillation".A manual search of references of key studies was also conducted.In the first phase, titles were screened and in the second phase, the full text of shortlisted articles was screened to shortlist studies.We followed the recommendation of the Preferred Reporting Items for Systematic Review and Metaanalysis (PRISMA) statement while conducting this systematic review [12].

Inclusion criteria for studies
We included research papers from English-language peer-reviewed journals that attempt to validate the detection of AFib by wearable smartwatches using a clinicianinterpreted EKG strip as reference and provide validation measures.

Data extraction
The extracted data included characteristics of each study, types of smartwatches used and the technology employed, and the sources of reference EKG used as reference.Additionally, validation measures such as sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and kappa statistics were also extracted.The same validation measures were also extracted for Kardiaband by AliveCor (KB) wherever possible, which is an FDA-approved device for the detection of AFib [13,14].

Results
Literature search and data extraction were done by the author MSZ.The author SK double-checked the data extracted and disagreements were mutually resolved.Twelve studies were included with a combined sample size of 1,075,088.Table 1 shows the characteristics of the study, smartwatches, and the sources of reference EKG.Table 2 shows validation measures of smartwatches along with comments on the peculiarities of each of the included studies.
In most of the included studies, the detection with WSWs was compared with results from a 12 or 1 lead EKG, which was interpreted by one or more clinicians who were sometimes blinded.In three studies, the validation measures for the KB were also reported [15][16][17].
In 10 out of 12 studies, the inconclusive diagnosis of AFib with smartwatches was excluded when calculating validation measures of sensitivity, specificity, PPV, NPV, accuracy, and Kappa statistic.The 2 studies, Mannhart et al. and Ford et al., presented these validation measures while including the inconclusive smartwatch diagnosis in calculating them.They considered the inconclusive results as incorrect, i.e., false negatives and false positives when EKG would show AFib and the lack of AFib, respectively.Such a method where they included the inconclusive readings of WSWs in calculating the validation measures is called "intention to diagnose/screen analysis" [15,16].
Four studies evaluated the effectiveness of Apple WSWs in detecting AFib with a combined sample size of 1679 originating from the USA, Australia, and Switzerland.
Three studies assessed AFib using Samsung WSWs, with a combined sample size of 919 originating from Germany, Switzerland, and the USA.
Three studies assessed AFib using Huawei WSWs [18][19][20] with a combined sample size of 1413 originating from China, the United Kingdom, and Denmark.Four studies assessed AFib using other WSWs [15,[21][22][23] such as Fitbit, Verily Study Watch, and an unknown WSW by Nonogushi et al., with a combined sample size of 56,297 from the USA, Japan, and China.
The validation measures, i.e., sensitivity, specificity, PPV, NPV, accuracy, and Kappa statistic of Apple, Samsung, and Huawei WSWs were comparable, while other WSWs showed comparatively lower values.
Table 3 summarizes the validation measures of WSWs and KB across all included studies and breaks them down according to whether the inconclusive readings of WSWs were included or not.The validation measures of smartwatches were above 90% and Kappa above 0.85 when not including the inconclusive readings, while for KB they were above 95% and Kappa above 0.85.When inconclusive readings were included (i.e., intention to diagnose/screen analysis) then the validation measures significantly reduced.

Discussion
Our systematic review shows that WSW has a similar screening accuracy to the FDA-approved KB [8,9].WSWs might offer a better alternative for AFib detection as compared to the standard of care.
False positive is a concern with such devices where patients get false alerts causing significant anxiety and leading to unwarranted emergency department visits and ultimate loss of healthcare resources.False positive results can be diminished by raising the specificity to a level that renders tolerable false negatives.Ford et al. achieved 100% specificity but at the cost of sensitivity (i.e., 50%) [16].Zhang et al. achieved high sensitivity and specificity (i.e., 100% and 99%, respectively), diminishing both false positives and false negatives [19].The overall median sensitivity and specificity above 95% and median PPV and NPV above 90% in this study might provide a good balance between false-negative and false-positive alerts.We also know that PPV increases when the prevalence of a disease increases.Hence, if people with a higher risk of AFib use these WSWs, their PPV would increase, and the false positive alert rates would decline.The risk factors for AFib include advanced age, high blood pressure, obesity, European ancestry, diabetes, heart failure, ischemic heart disease, hypothyroidism, chronic kidney disease, moderate-to-heavy alcohol use, smoking, and enlargement of left heart chambers [28].
Ford et al. and Mannhart et al. brought up a significant problem when using WSWs to detect AFib, and that's inconclusive results.They noted that WSWs produced inconclusive readings up to 30% of the time and KB up to 25% of the time [15,16].When these inconclusive results were compared with the EKG, it was discovered that some of these were AFib and some were not.Hence, both false negatives and false positives were hidden behind the inconclusive label.
Even though the frequency of getting inconclusive results was comparable across WSWs and KB, it is still a significant number.This frequency also varied between WSWs ranging from 17 to 30%.Ford et al. and Mannhart et al. incorporated these for calculating the diagnostic measures (i.e., intention to diagnose/screen analysis) and showed that it significantly and negatively impacted the validation measures [15,16].In the real world when people use these devices for continuous detection of AFib, they get inconclusive results up to 30% of the time, which may cause significant anxiety and loss of resources.That means we still have a long way to improve the performance of these devices.
Cohen's Kappa statistic informs us about how reliable the WSW is in detecting AFib.In other words, how much agreement is there between the WSW and comparator EKG utilized by a clinician?Overall median kappa statistic of above 0.8 in WSWs and KB shows almost perfect agreement with the comparator EKG.However, when inconclusive results are taken into account then it drops to 0.4 for both WSWs and KB, showing just moderate agreement.Some of the included studies that have not provided Kappa statistics have provided accuracy.All accuracy measures are above 90% which drops to 70% when inconclusive results are added.
PPG technology is not very new and has been used in various medical purposes for detecting oxygen saturation, measuring blood pressure, and cardiac output, evaluating autonomic function, and detecting peripheral vascular disease.PPG-equipped WSWs will be more beneficial compared to non-PPG since they have better outcomes and can be passively used for the detection of AFib.However, there are certain limitations associated with the utilization of the technology since PPG sensors are impacted by skin pigmentation, the color of the LED, the contact force between the site and sensor, ambient temperature, motion artifacts, and ambient light interface.The battery life can also bring limitations in terms of missing the arrhythmias, particularly in paroxysmal AFib [21].
As mentioned in the introduction, the updated 2023 ACC/AHA/ACCP/HRS Guideline for the Diagnosis and Management of AFib strongly recommends for initial diagnosis of AFib to be made based on a health care professional's examination of the EKG tracing from WSWs.Some of the WSWs assessed in the included studies have the capability to produce such EKG tracings.Having said that, once the PPG-equipped WSW notifies the user about AFib detection, the user can then touch the dial or the crown of the WSW with the opposite hand and immediately record an EKG tracing.This EKG tracing can be reviewed by clinician either in person or remotely to diagnose AFib.This way the asymptomatic or paroxysmal AFib can be caught in a timely manner to avoid medical consequences.
The economic burden of AFib is a significant concern in healthcare systems.AFib is associated with increased healthcare costs, hospitalizations, and resource utilization.The total estimated annual healthcare cost associated with AFib in Canada exceeds $800 million and €20,403-€26,544 ($22,340-$29,064) per patient in Denmark [29,30].The annual rise in AFib incidence is closely linked to cumulating risk factors, notably advancing age, obesity, hypertension, and type 2 diabetes.The prevalence of AFib in adults ranges between 2 and 4%, with a pronounced surge among individuals over 65 years of age [31].
AFib is the leading cause of stroke [22].This challenge is compounded by AFib's ability to remain asymptomatic or sporadic, revealing itself only over time.Such hidden nature exacts a substantial economic toll, accounting for 1%-2% of healthcare expenditures [32].One study highlighted the escalating burden, with AFib hospitalizations surging from 288,225 in 2007 to 333,570 in 2014, attributed to an increase in total annual emergency department visits during the study period [33].The adjusted annual charges for admitted AFib patients soared by 37%, from $7.39 billion in 2007 to $10.1 billion in 2014 [34].AFib increases the risk of ischemic stroke and heart failure (both heart failure with reduced ejection fraction and heart failure with preserved ejection fraction) by fivefold [35,36].In terms of mortality, the age-adjusted mortality rate with AFib sored from 18/100,000 in 2011 to 22.3/100,000 in 2018 [37].
The use of smartwatches to detect AFib has the potential to reduce medical costs, directly and indirectly.Currently, most of the symptomatic AFib cases are present in the emergency department.Detection of AFib with smartwatches can reduce direct medical costs due to AFib by diverting the flow of symptomatic AFib from the emergency department to outpatient care.It can also reduce the indirect costs by reducing the rates of AFibrelated consequences with timely and effective treatment of AFib.Those consequences include embolic stroke, falls, heart failure, etc.
We also understand the updated 2023 perspective of ACC/AHA/ACCP/HRS for management of AFib states that it is unusual to detect AFib using WSW for mass screening of the asymptomatic population.They maintain that the data demonstrating improved outcomes, including stroke, is lacking even when AFib is detected in an asymptomatic population.However, they do not provide any specific recommendation for routine AFib screening, which is in concordance with the US Preventive Task Force [38].
It is prudent to mention here that we are not advocating for everyone to wear a WSW to detect AFib.That would be a significant economic burden and can lead to high false-positive and false-negative rates.WSWs used for the detection of AFib can be considered for those who are at higher risk for AFib.It can also be considered for those patients with AFib who are on rhythm control therapy for the prevention of stroke and are not candidates for anticoagulation.In that case, a cardiologist or a primary care physician can adjust the dose of the anti-arrhythmic agent as per the number of AFib alerts on a WSWs.We recommend that such AFib alerts produced by PPG technology should then be confirmed by EKG tracing produced by the same WSW and read by a clinician.
Currently, the detection of AFib is primarily achieved through intermittent methods such as office visits, emergency department visits, or incidental findings, all of which are characterized by periodicity and a heightened likelihood of missing an AFib potentially leading to missed diagnosis and progression to fatal events [39].Moreover, the STROKESTOP study demonstrated that conducting several intermittent short EKG recordings over an extended duration led to a fourfold enhancement in sensitivity for detecting AFib when compared to single-time measurements [40].In cases where AFib is suspected but the diagnosis can't be confirmed, more advanced measures such as 24-h Holter monitoring or implantable cardioverter defibrillation are employed which bears more discomfort and psychological stress for the patients compared to WSWs.Furthermore, the introduction of WSWs provides a novel avenue for the identification of irregular pulses, offering users the ability to promptly detect such arrhythmias using WSWs, eliminating the constraints of periodic monitoring.These WSWs present a convenient, non-invasive, and easily accessible alternative to traditional EKG monitoring methods.
The treatment of AFib can become effective with the use of a WSW since the combination of machine learning with WSW has the potential to bypass error and fatigue which are embedded in human efforts.It can result in early detection and management of AFib resulting in reduced risk of serious consequences (especially in sustained AFib).Furthermore, unlike traditional devices, which are constrained by their intermittent monitoring and the requirement to be returned to the offices for subsequent analysis, the data storage on WSWs, capable of being stored on the cloud, enables remote access.This holds the promise of streamlining access to essential health information, facilitating telemedicine and remote monitoring, which fosters seamless connectivity between patients and healthcare providers.Integrating real-time cardiac data with AIdriven algorithms empowers remote monitoring, enabling timely interventions and personalized treatment.
Our study has some strengths.All of the included studies are validation studies comparing the detection capabilities of smartwatches with the gold standard screener, i.e., EKG.Most of the studies have reported multiple validation measures that are necessary for the evaluation of a screening device such as sensitivity, specificity, PPV, NPV, and Kappa statistic.A consistency was noted across most studies when comparing above mentioned measures which reflects the precision across the studied WSWs in detecting AFib.The combined sample size was large, in thousands.Our study has some limitations.There was some heterogeneity between studies, such as in some studies participants were coming from cardiology procedures, while others had no history of AFib or cardiac conditions and were comparatively healthy coming from the community.Additionally, some studies had small sample sizes.

Conclusion
This systematic review presents evidence of the potential of WSWs in detecting AFib, which is comparable with KB, an FDA-approved device for AFib screening.There is room for improvement to rectify the inconclusive readings.This review underscores the emerging role of digital health applications in modern healthcare, especially in cardiovascular monitoring.Cost-utility analysis is needed to know the quality-adjusted life-years gained against the monetary expense.Another systematic review, assessing the diagnostic rather than screening potential of those WSWs which are equipped with producing EKG tracings is needed.

Table 1
Characteristics of studies PAC: Premature Atrial Contraction, PVC: Premature Ventricular Contraction.ECG: Electrocardiogram, USA: United States of America, PPG: Photoplethysmography. NIH: National Institutes of Health, Yr: year/s, SD: standard deviation, IQR: interquartile range, WSW: wearable smartwatch *37 patients from the University of Massachusetts, 9 patients from Connecticut

Table 2
Results of studies