Introduction

Background

Record-linkage (de-duplication) is a critical function required for many CDC Division, Institute, and Office surveillance programs of human disease and national public health. The high-dimensional and multi-modal nature of the data informing these indications may vary in source (medical, educational, community, etc.) and may therefore contain differential, heterogeneous and complex records depending on the individual. These records, then, must be linked with another to ensure retention of epidemiological accuracy and to ensure that each individual case is recorded once. While “no-cost” computational algorithms designed to moderate this cross-linkage are currently available, they often coincide with heavy programming or technical prerequisite knowledge for utilization, thus the applicability of these available programs in non-programming-intensive fields, such as public health, is limited. There are existing proprietary platforms which perform this record-linkage function, however they are often associated with cost barriers that restrict accessibility, functionality and interdepartmental data interoperability, while also creating a dependence on expensive subscriptions and third-party developers that introduces concerns around long-term sustainability and data security. Thus, we propose that the creation of an open-source, no-cost, and intuitive record-linkage software will bridge the gap between currently existing robust algorithms and a user-friendly platform to expand accessibility and promote sustainable data management.

Challenge

This challenge is focally exemplified in the case of the Autism and Developmental Disabilities Monitoring (ADDM) Network, a CDC-supported public health initiative to provide autism-spectrum disorder surveillance across the U.S. Components of this surveillance are provided in various formats and derived from educational and clinical sources, and so ADDM records must be linked to ensure that each individual is represented once and to ensure that all relevant records are accessible within the system. The ADDM Network uses “The Link King” (TLK), a software based on a paid SAS platform, to connect records together. TLK, though a powerful and easy to use computational tool, is no longer an actively supported program. To continue critical operations the ADDM Network is forced to use an archival copy of TLK (www.the-link-king.party). Given the lack of support for this software and the tandemly integral application for record-linkage within the ADDM Network (and across CDC initiatives), further operating system (Mac, Linux, Windows, etc.) updates, budgetary constraints or updates to SAS that render this software incompatible pose an indelible threat to the future operations of this and other CDC subsidiaries.

Solution

ShinyLink, a universally accessible, open-source and no-cost record-linkage software user-friendly platform that would be the bridge between existing vigorous record linkage open-source algorithms and an urgently needed user-friendly and intuitive platform that eliminates cost and programming barriers and delivers a public health and informatics precedent toward increased data interoperability.