Guest Column | October 6, 2014

De-identification Defined And Explained

This perspective paper helps organizations understand why de-identification of protected health information (PHI) is important. Covered entities and their business associates currently face the challenge of realizing value from healthcare data while protecting patient privacy. De-identification can provide a way to satisfy both needs without the need for a Data Use Agreement (DUA). By Dan Stocker, MBA, MS, QSA, Professional Services, Coalfire

By Dan Stocker, MBA, MS, QSA, Professional Services, Coalfire

Two Scenarios
A healthcare provider is approached by a university interested in data about the cancer treatments offered by the provider. The university is willing to pay for access to data that will allow them to study the effectiveness of offered treatments. The manner in which the provider shares the data is very important.

If the provider is insufficiently attentive to the HIPAA Privacy Rule, they may simply remove the names of their patients from the data and consider that sufficient for patient privacy protection. If the data is subject to re-identification by correlating the remaining personal identifying information with other available sources (e.g. census data, voter registration lists, etc.), the identities of patients and their health information could be reconstructed. This would constitute a breach of privacy and potentially subject the provider to a civil penalty.

Re-identification risk is not hypothetical: studies have shown that between 63% and 87% of the U.S. population is uniquely identifiable by the combination of their birth date, gender and zip code. On the other hand, if the provider followed one of the two approved methods to de-identify the shared data, the risk of re-identification would be managed. By ensuring that there is no reasonable basis to believe that individuals can be identified in the data, the provider will have met its regulatory responsibilities under HIPAA.

HIPAA Privacy Rule
The HIPAA Privacy Rule outlines the requirements for PHI that apply to both covered entities and business associates.

Protected Health Information (PHI)
The Privacy Rule protects most “individually identifiable health information”, regardless of form, that is in the possession of a covered entity or business associate. The following constitute PHI:

individual demographic data
information about individual healthcare provisions
payment information related to individual healthcare
common individual identifiers, such as: name, address, birth date and Social Security Number (SSN)

Examples of PHI would include medical records that associate an individual with their healthcare, such as laboratory records and bills for treatment. This is the key to the definition: healthcare information or individual identifiers alone do not constitute PHI. There must be an association between the two types of information. An important consequence of this is that data which reports averages for demographics (i.e. cannot be used to identify any individual) is not considered PHI and may be shared.

If a covered entity wishes to share PHI, it must have a Data Use Agreement with the recipient, or it must de-identify all PHI in the data.

Why Share At All?
The central tool applied to shared health data is called data mining. A combination of machine learning, statistical analysis and advanced database techniques is applied to large data sets in an attempt to identify patterns and features of the data that are not obvious. These patterns and features are new information derived from the data.

Health-related data have value in a number of ways:

Researchers can mine large data sets to find significant and useful patterns that suggest new treatments or improvements to existing procedures. The larger the data sets, the better the results.
Insurance and other risk management organizations can use data about outcomes to improve resource allocations, and generate better return on investment (ROI).

Health data represent significant value for health providers. Finding a way to share the data that enables realization of the economic value while respecting patient privacy is a high priority for the industry. The Privacy Rule outlines two methods for de-identifying PHI.

Expert Determination
A covered entity may choose to have an expert determine that the data to be shared is not individually identifiable. That expert must be qualified by having appropriate education and experience. Knowledge of statistical, mathematical or scientific methods is a good baseline. The expert should apply statistical and scientific principles and methods and document their work and findings.

The considerations are potentially many. The expert must consider the risk of re-identification by combining the data set with other available data sets (including prior versions). One example: some records contain a birth date, while others have an age. If a timestamp of the second record is known, the ages are valuable keys to the birth dates and can connect the otherwise disparate records.

The expert must determine an acceptable level of identification risk. Given the pace of advances in data mining and associated technology, any determination should be considered a risk assessment, with an inherent expiration. The primary advantage of hiring an expert is the outsourcing of a highly technical operation.

Safe Harbor
A covered entity may choose to de-identify data themselves prior to sharing. The HIPAA Privacy Rule offers guidance on how they may accomplish this so the covered entity or business associate can claim the benefits of Safe Harbor. Safe Harbor is very desirable as it carries no fines or penalties in the event of a breach.

Types of Data to Remove
There is a long list of data types to be removed. They fall roughly into four categories:

Personal data: names, pictures, biometrics, vehicle identifiers, contact info, id/account numbers.
Dates more specific than the year may not be included if they apply to an individual.
Geographic subdivision data smaller than a U.S. state may not be included. Some zip codes may be used with only their first three digits, unless that zip code has less than 20,000 residents.
Identifying numbers: serial numbers, URLs, record/certificate/license numbers.

No Actual Knowledge
After the removal of the data that increases risk of re-identification, the covered entity or business associate must lack actual knowledge of the ability to reidentify the data. This does not mean theoretical ability, but rather refers to the nature of the remaining data.

One example that Health and Human Services (HHS) uses to illustrate this is a patient whose occupation is so rare as to uniquely identify them. In practice, this standard will require covered entities and business associates to examine data thoroughly after removing the PHI.

Two Important Points

Not Zero Risk: “No Reasonable Basis”
As explained above, the state of the art of re-identification is advancing in parallel to de-identification. The HIPAA Privacy Rule does not require zero risk of re-identification, as that would be impractical. The standard is “no reasonable basis”. If there is a known technique to identify an individual from the data, then there is a reasonable basis to believe it can be done. Interpretation of “reasonable” will have technical and legal dimensions.

Structured vs. Unstructured Data
It would be easy to overlook that PHI can appear in unstructured data, such as physician notes, but that would be a mistake. The provisions of the HIPAA Privacy Rule apply to all data, whether structured or unstructured. This can pose technical hurdles for covered entities that do not have granular control over unstructured data. This should also be addressed in Business Associate Agreements.

Conclusion
Within the context of a Business Associate Agreement, de-identification is not required (for example, sending medical records to a billing vendor). However, in the absence of a Data Use Agreement, covered entities and their business associates who do not de-identify PHI shared with outside parties accept a risk of breach of privacy and civil penalties. De-identification isn’t just a good idea—it’s the rule.

Fortunately, there are two paths to satisfy the HIPAA Privacy Rule. Covered entities with the requisite talent in-house may choose to de-identify shared data themselves, keeping mindful of the “no actual knowledge” standard. Alternately, expertise is available for hire. Either way, it’s a good investment.

About the author
Dan Stocker, MBA, MS, QSA, is Senior Security Consultant at Coalfire Systems, Inc.

access the Guest Column! Log In