Friday, February 20, the MIT Sloan graduate business school students organized the 2015 MIT BioInnovations Conference. The focus of the conference was “redefining value in healthcare”, Dr. John Halamka, CIO of Beth Israel Deaconess Medical Center and CIO and Dean of Technology of Harvard Medical School was a keynote speaker of the conference. One of the topics he spoke about was information security, a topic he has to be an expert in because he oversees three-thousand doctors, fourteen thousand employees and two million patients.
“Despite efforts to encrypt and build multimillion dollar fortresses around all of our systems, all it takes is one person, who can put everything at risk. Organized crime is interested in buying hospital’s medical records in bulk and selling them to people who could go into the hospital to pretend that they are the person whose medical record was stolen to get expensive services” Halamka says. This should come as no surprise as the largest data breach in the healthcare industry, Anthem’s breach of eighty-million patients occurred just earlier this month. The price to pay for a data breach is steep, a million and a half dollar fine for every privacy compromise and a loss reputation.
Not only do hospitals have to worry about medical identity theft, they also have to be prepared for attacks of “hacktivism”. At the Bioinnovations event Halamka told the story of a recent episode of hactivism attack on the Harvard hospital network, “On the eve of Boston marathon this year, anonymous decided to attack the Harvard networks because it had negative feelings towards one of the hospitals of Harvard and issued a threat to bring that hospital down. The only problem was anonymous didn’t know the IP space of Harvard very well and took down Harvard University, Beth Israel Deaconess Medical Center, Dana-Farber Cancer Institute, Joslin Diabetes Center, Brigham and Women’s Hospital and Boston Children’s Hospital. No records were comprised, but the whole network was flooded with 20/gigs per second of data. The entire Harvard network had to be outsourced to a third party vendor to put 100,000 servers in front of the flood and stop the traffic, so that the good traffic could flow.”
While hacktivism and medical identity theft are frightening, there are documented ways to drastically reduce the risk of a breach. Dr. Khaled El Emam, Founder & CEO, Privacy Analytics Inc was also at the MIT BioInnovations 2015 Conference and I got the chance to interview him about privacy and security issues in healthcare. Privacy Analytics is located in Ontario, Canada, specializes in data anonymization and has large hospital clients such as Mount Sinai.
HealthIT & mHealth: What HL7 fields do you think are most important to de-identify?
Dr. Khaled El Emam: For HL7 and beyond, names, address, medical record number and social security number are the obvious values. Phone numbers and email addresses can also be used to identify patients. After the direct identifiers above, the biggest things that have an impact on risk are geographical locations such as zip codes. Dates, such as date of birth, date of death, date of admission, date of discharge and date of visit can also be used to identify patients. If not properly de-identified, demographic and socioeconomic data such as race, ethnicity, income, number of children can also be used to identify patients. In the order of importance the types of fields to de-identify for HL7 messages are direct identifiers, locations, dates, demographics and socioeconomic data. When de-identifying data, the key is to look at the specific use of the data in order to produce a data set with the greatest value for research and analytics and the lowest level of risk for re-identification.
HealthIT & mHealth: What other security issues are hospitals seeing right now?
El Emam: Business associates (BA) are a really a big issue, the providers could manage their risk appropriately, but the BAs are using the same data and may not manage their data as well. Hospitals have a large transient population of employees, especially in academic medical institutions, which increases risk as well.
HealthIT & mHealth: How much is a stolen medical record worth on the black market?
El Emam: It depends on whether the person is looking to use the financial or medical information from the medical record. If they are looking to use the financial information, the worth would be similar to a credit card. In terms of value per medical record there is not a lot of verifiable data on this currently.
HealthIT & mHealth: When do you think the parsing of unstructured data will become prevalent and efficacious in healthcare?
El Emam: We are seeing more and more of our clients looking to de-identify unstructured data. There are now good tools to do interesting things with the de-identified unstructured data such as algorithms to determine diagnosis. The billing codes may not reflect the correct diagnosis, but the provider notes usually will. Parsing of unstructured healthcare data is going to pick up because we are now able to deal with the privacy issue at scale and the analytics issue at scale. At Privacy Analytics, we also have a text de-identification tool for data providers and analytics service providers wanting to perform analysis on unstructured data.
HealthIT & mHealth: How do the methods of patient de-identification compare today to when you started Privacy Analytics eight years ago?
El Emam: First, I think it is important to note that the term “de-identification” is often used interchangeably with data-masking. However, there is a big difference between the two methods. Real de-identification uses a risk-based approach that goes beyond simple data masking techniques. Strong and scalable de-identification protocols provide a defensible way in which to address privacy concerns and legal obligations while simultaneously preserving the utility of the data for analysis.
Our methodology has stayed the same over the past eight years.We have always used the expert determination methodology. This methodology was recently cited by the IOM in its recent report, “Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk.”
However, the nature of the data has changed. The data used to be in registries and be very structured. Now we de-identify the data from full electronic health records with 400-500 tables in it including clinical data, billing information, treatment plans and lab results for over 40 million patients. Basically, the data complexity has increased, but the way you approach the problem is the same.
HealthIT & mHealth: What is the root cause of the recent healthcare data breaches?
El Emam: Adoption rate. Current privacy and security techniques are very strong, but issues occur when people are not applying the principles correctly. The adoption rate of disclosure control is not keeping up with the urge to get data out of health information systems. The adoption problem can be dealt with by providing education, developing standards, and setting expectations by the regulators.
HealthIT & mHealth: What type of analytics does Privacy Analytics run on de-identified patient data?
El Emam: We do not run analytics on the data – our focus is de-identification. We provide the tools that gives health data organizations the ability to quickly and easily apply a proven de-identification methodology to produce high quality, custom data sets for specific secondary purposes such as research and analysis, clinical trials transparency, quality and safety measurement, public health, payment, provider certification or accreditation, marketing and other business applications.
HealthIT & mHealth: How do patient security issues affect healthcare informatics startups?
El Emam: Security affects M&A activity of startups and their future investments, if a privacy issue is suddenly raised, it can cause panic among investors. Deals can be jeopardized if patient privacy and security issues not handled up front.
Privacy and security issues can expose any organization, regardless of size and maturity, to significant legal, financial and reputational risks. There is a rapidly growing need to identify and minimize these risks in order to responsibly use healthcare data to drive innovative research and analysis, derive key insights and gain new knowledge to help solve some of healthcare’s most challenging problems.