%0 Journal Article %T Validation of electronic medical data: Identifying diabetes prevalence in general practice %A Abhijeet Ghosh %A Adam Hodgkins %A Allan J Pollack %A Andrew Bonney %A Graeme C Miller %A Joan Henderson %A Khin Than Win %A Stephen Barnett %J Health Information Management Journal %@ 1833-3575 %D 2019 %R 10.1177/1833358318798123 %X Electronic medical records are increasingly used for research with limited external validation of their data. This study investigates the validity of electronic medical data (EMD) for estimating diabetes prevalence in general practitioner (GP) patients by comparing EMD with national Bettering the Evaluation and Care of Health (BEACH) data. A ※decision tree§ was created using inclusion/exclusion of pre-agreed variables to determine the probability of diabetes in absence of diagnostic label, including diagnoses (coded/free-text diabetes, polycystic ovarian syndrome, impaired glucose tolerance, impaired fasting glucose), diabetic annual cycle of care (DACC), glycated haemoglobin (HbA1c) > 6.5%, and prescription (metformin, other diabetes medications). Via SQL query, cases were identified in EMD of five Illawarra and Southern Practice Network practices (30,007 active patients; from 2 years to January 2015). Patient-based Supplementary Analysis of Nominated Data (SAND) sub-studies from BEACH investigating diabetes prevalence (1172 GPs; 35,162 patients; November 2012 to February 2015) were comparison data. SAND results were adjusted for number of GP encounters per year, per patient, and then age每sex standardised to match age每sex distribution of EMD patients. Cluster-adjusted 95% confidence intervals (CIs) were calculated for both datasets. EMD diabetes prevalence (T1 and/or T2) was 6.5% (95% CI: 4.1每8.9). Following age每sex standardisation, SAND prevalence, not significantly different, was 6.7% (95% CI: 6.3每7.1). Extracting only coded diagnosis missed 13.0% of probable cases, subsequently identified through the presence of metformin/other diabetes medications (*without other indicator variables) (6.1%), free-text diabetes label (3.8%), HbA1c result* (1.6%), DACC* (1.3%), and diabetes medications* (0.2%). While complex, proxy variables can improve usefulness of EMD for research. Without their consideration, EMD results should be interpreted with caution. Enforceable, transparent data linkages in EMRs would resolve many problems with identification of diagnoses. Ongoing data quality improvement remains essential %K electronic medical records %K data quality %K data accuracy %K general practice %K primary health care %K health information management %U https://journals.sagepub.com/doi/full/10.1177/1833358318798123