Competitive Intelligence

Tactical, Operational & Strategic Analysis of Markets, Competitors & Industries

RE: How Can We Insure the Accuracy of Data Mining – While Anonymizing the Data? - by Lance Winslow

Hi Lance,

To begin, I think there is a fundamental problem with the question itself; which is simply a reflection of the nature of the highly polarized debate, associated conflict within the government itself over online privacy.
If you want to understand what I mean, I suggest you go look into the House Subcommittee on Comm, Tech and Internet's session notes on Online Privacy, witness testimonies, etc and you will quickly get to the real heart of the data mining/privacy matter and understand why this question is really a red herring - - trust me it wont take you long .

That said, a few succinct points:

1) Data can be either useful or perfectly anonymous but never both. (First law of CS-Re identification Science)

2) Anonymization has been proven time and time again, to be an immense failure to try and protect Individual identity. It doesn't require a particularly sophisticated person either to mesh one set of data with another set of data to uncover individual identity. In certain instances, neither piece of data in and of itself was PII, but put the data sets together and Viola ! you get re-identification. (Reference: AOL, Census and Netflix supposedly anonymized DBs and how they were undone by Paul Ohm.)

See here for one of the best academic white papers on Data mining, Randomization and Anonymization: http://epic.org/privacy/reidentification/ohm_article.pdf

Regards,

Monica

Views: 9

Reply to This

Free Intel Collab Webinars

You might be interested in the next few IntelCollab webinars:

RECONVERGE Network Calendar of Events

© 2025   Created by Arik Johnson.   Powered by

Badges  |  Report an Issue  |  Terms of Service