Data privacy tradeoffs for AI in Life Sciences

Covid -19 has changed the data privacy paradigm. What used to be controversial with regards to accessing and mining large data sets with Business Associate Agreements (BAA), due to ensuing consumer privacy backlash is beginning to become more acceptable. While we are still at baby steps, a few observations are moving things to a positive direction.

At the JP Morgan Conference in January, ChalkImpact Client Partner asked Head/VP of Google Health, Dr Feinberg about the Ascension Health AI research contract and how Google plans to balance consumer consent with innovation confidentiality yet build precise effective algorithms. He mentioned the use of BAA for a limited number of data handlers (developers and researchers), and de-identification of data

Challenges in Training Health AI Algorithms

Most data science and AI product experts know how challenging it is to aggregate data and isolate observations sans personal health information (PHI). One of the main reasons is that healthcare data is highly personalized. There are very few middle aged women, you will find in a dataset who is 55, lives in zip code 92864, has diabetes and CHD, and did a colorectomy in 2019. Yet, while researchers should exclude PHI to train algorithms to be HIPAA compliant, it is always preferable to have context of an observation (patient) and to understand what factors such as demographics information may have led to a given hypothesis, i.e. disease state. HIPAA requires confidentiality and proper privacy of a list of 18 PHI data identifiers. As one can see, it’s quite a list and particularly can be problematic with model development. Just the exclusion of zip code data alone invalidates the use of the data to build an effective model. Working with hospital and medical practice patient data is considered a covered entity and falls under PHI privacy confidentiality. Workarounds that tech companies have used is to combine these datasets with their own demographic data with the intention to match de-identified hospital/practice data, though that’s fraught with complications considering consumer consent issues and backlash.

Breakthrough in Digital Health AI for Social Good  

With the era of Covid 19, the concept of using data harbored by big tech for social good also known as AI for social good is gaining traction. When we have access to data that can collectively benefit society, without regard to profit, the sky’s the limit. Apple and Google demonstrated that social good when they announced their contact tracing mobile app development plans. The first ever partnership, Apple and Google will allow app developers access to data of covid 19 user positive cases and use bluetooth to alert mobile users when a positive person is nearby. According to the implementation plan, users can choose whether they will share their identity information after contact with a positive case. While the plan is light on details, epidemiologists and governments like France, see the benefits, but argue the privacy restrictions such as inability to identify users at risk are too limiting to be effective. ACLU on the other hand, believes it doesn’t go far enough with protecting user privacy. Among concerns, ACLU’s Cybersecurity Counsel Granick believes the 24 hour mobile data surveillance time frame for a bluetooth implementation, threatens the promised anonymity of positive Covid-19 users, collecting far more personal information than can later be traced back to the user. 

A Silver Lining for Health AI Projects Post Covid-19

Keeping those user privacy concerns in mind, there can be a middle ground as we as a society move toward consumer data empowerment and more AI post Covid-19. Easing data access restrictions for AI innovations when it has a discrete time frame, and truly results in a broader social good. 

In return, for product developers, allowing consumers transparency in product development, so they can claim their data whether on mobile or elsewhere in the health system, with clear options on tiered levels of data opt-in access, and the possibility of data destruction will advance the possibilities of AI far into the future.