Where’s the Big Wisdom in Big Data?

Jim Adler
Jim Adler
Published in
9 min readJan 28, 2015

--

2014 saw the most data security and privacy incidents on record.

Clearly, data is eating the world (with apologies to Marc Andreessen). The amount of data being collected is unprecedented, huge, and growing exponentially. Companies depend on effective use of data to run their businesses. A highly diverse set of technologies is being used to manage these burgeoning data loads. The legal landscape governing data is uncertain, both in the United States and around the world.

The sobering reality is that data technology is clearly moving faster than our ability to wisely manage it. As Pablo Picasso put it, “Computers are useless. They can only give you answers [not wisdom].” Or to put it less poetically, the data curve is running away from the wisdom curve.

Can wisdom’s tortoise ever hope to catch data’s hare? It’s doubtful the advancement of technology can be slowed given a 2.5 million year habit that has put humans at the top of earth’s food chain. Outside the dystopian apocalypse, we’re stuck with the Herculean task of bending the wisdom curve. Let’s call it big wisdom.

By big wisdom, I don’t mean the big insights we might get from the data. I mean the big wisdom we need to govern how data is collected and used so that it aligns with our traditional values of morality, ethics, and law. The good news is that the road to big wisdom is actually well understood.

What’s a Paradigm Shift, Really?

In 1962, Thomas Kuhn published The Structure of Scientific Revolutions where he coined the, now trite, term paradigm shift of how scientific theories advance. Contrary to the lilting term, paradigms don’t shift toward truth in a smooth, orderly way. They tend to lurch violently from one discredited school of thought to another through a series of steps:

  1. A paradigm is established within a community;
  2. Anomalies arise that show fractures in the existing theory;
  3. These fractures create a crisis, generating controversy and debate;
  4. From the crisis comes the destruction of the old paradigm;
  5. Go to Step #1.

A classic example of this structure comes from the debate over the gravitational physics of Isaac Newton versus Albert Einstein. The majesty of Newton’s 17th century theory is that it governed how apples fell from trees, planets rotated around the sun, and even how stars moved within galaxies. There was just one little crack in Newton’s theory — gravity was at the center of it, but Newton never actually explained how gravity worked.

A little over two hundred years later, Einstein realized that Newton’s little crack was actually a giant gorge. Einstein proved that nothing could move faster than the speed of light in direct conflict to Newton’s predictions — a full-blown paradigmatic crisis. The battle lines were drawn between Einstein’s relativists and the Newtonians. For several decades, the controversy churned until, by experimental evidence, Einstein’s theory emerged as the clear winner. Although we still use Newton’s physics everyday and must face it in high school physics, it was relegated to special case status to Einstein’s more general theory.

Our current policy paradigm is similarly buckling under the weight of exponential advances in data technology. To attain big wisdom, more violent lurching is in our future.

Fractured FIPPs

Ethics and the law have increasingly chased technology innovation to find the wise balance between innovation and safety. In the late 19th Century, paparazzi reporters were accused of violating the common law when taking public photos with their new “snapshot cameras.” In the early 20th Century, the U.S. Food and Drug Administration was founded to protect the public from snake oil tonics and unsanitary foods. Further, food regulation through the 1960s saw uniform labels bring transparency to food ingredients.

As for data privacy and security, the Fair Information Practice Principles (FIPPs) were first defined in a 1973 Health, Education, and Welfare report. They outlined eight governing principles for how the private and public sector should manage personal data:

  1. Awareness — People whose data is collected should be informed about who is collecting it, what it is, how it is collected, and how it is used.
  2. Accuracy and access — Inform people of steps taken to ensure accuracy and give them a chance to correct errors.
  3. Security — Keep data safe from theft, loss, destruction, modification, or inappropriate disclosure.
  4. Audit — Check data, processes, and training of personnel to ensure accuracy and security.
  5. Consent — Allow people to opt in or out of data collection and approve of how it is used.
  6. Specify Purpose — Inform people of the purpose for which the data will be used.
  7. Data Minimization — Collect only data needed to accomplish a specified use.
  8. Use Limitation — Use data only for the specified purpose and keep it only as long as necessary for that purpose.

After 50 years, the venerable FIPPs are fracturing into two distinct groups under the pressure from big data. The first group — access, accuracy, security, and audit — is quite compatible with big data. The second group — consent, data minimization, specific purpose and use limitation — is not. How do I consent to the use of my data if the collector doesn’t know how it might be used? If really big data is collected from everywhere, how can it possibly be collected for a specific, minimal purpose? These forces are tearing the FIPPs paradigm apart.

Geeks, Suits, and Wonks

Thus far, the soldiers in this fight are the technologists that create data innovations; the businesses that market products and services around these innovations; and the humanitarians — the scholars, activists, regulators, and journalists — that aim to make sure these products and services align with larger societal norms. I generally (and affectionately) refer to these groups as the geeks, suits, and wonks, respectively.

The suits contend that data use regulation, not collection, would empower citizens and enable many benefits. The wonks argue that the FIPPs are vital to protecting society’s most vulnerable, and that big data is the problem. As noted by privacy law scholar Chris Hoofnagle, “a revolution is afoot in privacy regulation.” This revolution isn’t just some narrow, insider policy battle. It will determine technology’s place in our home, bodies, social space, workplace, and those of artificial lifeforms. Yeah, heady stuff.

As Kuhn instructed, paradigms have two distinct characteristics. First, they are sufficiently unprecedented to attract a lot of people to them, some with religious fervor. Second, they’re open-ended enough to give each group stuff to work on for years.

It’s fair to say that both suits and wonks are dedicated to their open-ended careers with religious fervor. For suits, there’s no end to innovation. For wonks, there’s no end to understanding society’s evolution when faced with constant innovation. The suits will be pushing new, ever-inventive data products into the market. The wonks will resist with new rules and laws. At this rate, the crisis will roil for decades to come, keeping the wisdom curve on its plodding, linear crawl.

The trouble is that the suits and wonks don’t speak the same language — suits use numbers, wonks use words. They move at different speeds — suits operate across quarters; wonks operate over years. They don’t use the same tools — suits run reports; wonks conduct audits.

The Journey To “Big Wisdom”

Things are beginning to change. Privacy by Design efforts are pushing privacy professionals into product design teams. Data scientists are bridging the culture gap between wonks and suits. Cross functional conferences gather suits, wonks, and technologists (i.e., geeks). Data journalism, open data, open government data, and data activism are all efforts to integrate wonk and geek cultures.

These developments are encouraging, necessary, but hardly sufficient because the corporate wonk — that is, the governance, risk, and compliance (GRC) professional — hasn’t yet adopted the data technology required to keep pace. The GRC professional is critical to bending the wisdom curve but still largely operates within a pre-technology paradigm. They are accountable within the enterprise, sensitive to the regulatory turmoil outside the enterprise, but ill-equipped to reconcile the two.

The typical GRC professional is, typically, an attorney who tackles the outside regulatory and legal challenges mounted by the courts, regulators, and legislators. However, within the enterprise, the GRC professional only has two main tools — the contract and the audit. The sad truth is that contracts are too early and audits are too late. For contracts, it’s impossible to anticipate all issues that might arise across the life of a contract. So, contracts become either vapidly vague or tortuously long. Audits, by definition, are reactive, often conducted manually, and are obsolete soon after they’re issued.

GRC pros need the same operational data infrastructure used by the rest of the business to keep up with mushrooming data flows. They need unfettered access to every data silo in the organization so they can monitor and manage the data that they’re ultimately responsible for in the event of a security breach, regulator inquiry, or lawsuit. Only then will they have the operational transparency to keep pace with the business and be a proactive voice. Furthermore, this level of transparency enables appropriate review by boards of directors, outside counsel, external privacy organizations, or even end users.

Practical “Big Wisdom”

As an example to what this might look like in practice, consider the challenge of enforcing document deletion requirements. Maybe this deletion is part of a data retention policy or as a requirement to purge confidential documents at the termination of a corporate partnership. Typically, the GRC professional would issue a document deletion order, trust that the organization complies, and possibly conduct a spot audit to verify that the deletion order was honored.

Using data analytics techniques, the deletion order could be verified automatically across the organization using a SQL query, something like this:

SELECT AllDocuments.document_name
FROM scan_filesystem() AS AllDocuments, DeletedDocuments
WHERE similar(DeletedDocuments.document, AllDocuments.document) > 0.75

This query scans the entire filesystem and looks for documents that are similar to those that are supposed to be deleted. If any of the documents on the file system are more than 75% similar, they are returned as suspect documents for further review. Note that although this is only a three line query, it’s potentially a huge calculation across millions of corporate documents since every document on the filesystem must be compared to every document in the deletion list. Luckily, there are algorithms, like MinHash, that reduce the computational complexity and data compute engines that make these calculations possible.

Other analyses are possible, like gaining transparency into internal data flows, proving that products comply with privacy policy, or ensuring compliance with data contracts.

Final Thoughts

Marshall Mcluhan said, “We shape our tools and thereafter our tools shape us.” Eli Pariser used this quote as a cudgel to caution care with the tools we forge, because they will irrevocably change us. But that’s only half the story.

We humans are elastic. Yes, our tools shape us. But then we learn what doesn’t work (thanks to arguments from wonks like Pariser), reshape our tools, and the cycle repeats. That’s what propels progress. As Kuhn taught, we find the truth by moving away from what’s wrong. Or put more cynically by H.L. Mencken: “What the world turns to, when it is cured on one error, is usually simply another error, and maybe one worse than the first one.”

Ok, so the journey to big wisdom won’t be direct. But there’s a chance to get there if geeks, suits, and wonks work within a common paradigm. And within the wonk camp, what’s most critical is that corporate GRC professionals fully adopt data technology. That’s their best chance to keep pace with the rapid pace of technology innovation and exponential data growth within their companies.

Of course, the debates will be fierce. On this Data Privacy Day, let’s dedicate ourselves to a fierce, noble fight. Because with everyone engaged and well equipped, that wisdom curve might just cross the data curve some day.

--

--

entrepreneur · investor · executive · data geek · privacy thinker · former rocket engineer · on twitter @jim_adler