Managing the Murky

Sunlight is said to be the best of disinfectants.

Louis Brandeis

Since imagethe Summer of Snowden put “big data” on my father’s radar, the White House has dispatched John Podesta on a big data listening tour; Julia Angwin, former Wall Street Journal reporter, just released a book on digital dragnets; and new data broker legislation has been introduced in the Senate.

Given my prior pontification on privacy frameworks, it’s time to sharpen the pencil on (1) transparency, (2) private places, and (3) appropriate data use.

Transparency Philosophy

In Julia Angwin’s new book, Dragnet Nation, it was nice to see this:

Jim Adler, the chief privacy officer at Intelius, one of the few data broker executives who attended privacy conferences and took calls from privacy advocates.

Thanks Julia. Sure, I’ll take a bow for engaging with tough, fair critics. Frankly, we desperately need champions for privacy rights to buffet the forces that thirst for increasingly more information. These values of discretion and disclosure are in tension and we need heated debate to find the right balance.

However, we can’t have this debate in the dark. Powerful corporations and governments need to strike a transparent tone and, where reasonable, provide the opportunity for recourse and discourse.

Information asymmetry is power. In any negotiation, leverage is about “having something” on your adversary. Frank Underwood has taught this lesson well. If your adversaries are your customers or your citizens, the bar is pretty high to justify knowing their secrets.

Get Off My Lawn (wherever that is)

The lines are blurred on where public space ends and private space begins. The FBI got this one very wrong in US v Jones. Without a clear definition of whether a place is private or public, it will — like beauty — be in the eye of the beholder.

In his 1967 treatise, Privacy and FreedomAlan Westin said: “Each individual is continually engaged in a personal adjustment process in which he balances the desire for privacy with the desire for disclosure and communication of himself to others.” Too often, we don’t know when we’re in public or private, so don’t have the transparency to strike the right balance.

Nuala O’Connor, the newly minted CEO of the Center for Democracy and Technology, has a good take on this dilemma. There are really three places — private, public, and the curtilage between them. Curtilage legally defines the land immediately surrounding your house. Our new world of digital devices defines this curtilage. I think Nuala may be on to something here.

Use, Don’t Abuse

The Fair Credit Report Act (FCRA) of 1970 defines appropriate use of any data about you — public or private — when it’s used for credit, employment, insurance, and housing. Expect this permissible use doctrine to extend to unfair data uses like discriminatory pricing, high-tech profiling, and leaks of private information.

The Places-Players-Perils privacy framework is only as good as knowing how private the data is, who the players are, and what is being done with the data. Personal devices, the Internet of Things, and big data are on the rise. The stock of transparency will rise right along with them.




VIDEO: My Invited Talk at the Usenix Security Symposium




The “Not Ready For Prime Time” Classifier

At the Strata 2013 conference last February, I presented a felon classifier (here’s the video, slides, and conference interview) that estimated whether a person has felonies based on other data in their criminal record, like misdemeanors and publicly available information about them.

The training data, features, and resulting classifier (but not the underlying machine learning software) were released on the inome Github repo at Strata. To see how you might be classified, check out this widget (→) which is just a Javascript implementation of the classifier data (yaml, pdf) released at the conference.

I was also interviewed by Strata Conference Chair Edd Dumbill for the inaugural issue of Big Data Journal on whether big data inferences, like criminal profiling, should be outlawed — effectively defining a new category of machine-perpetrated thoughtcrime.

Update: Bloomberg’s Jordan Robertson wrote an article and did an on-camera interview on this work. Dave Merrill also created the very cool infographic linked below. 

One clarifying tweak I’d make to my quote in Jordan’s piece is to add the word “alone”: 

"Because geeks like me can do stuff like this, we can make stuff work - it’s not our job ALONE to figure out if it’s right or not. We often don’t know.”

imageMy point is that geeks alone can’t make the tough policy calls without the help of the more wonkish humanitarians among us — historians, philosophers, activists, ethicists, anthropologists, economists, and journalists. Technologists, too often, just don’t have the training or experience.

Update 2: But that’s not all! Jordan also wrote a companion piece on how London police used low-tech predictive policing in the 1990’s to reduce rapes. Essentially, they noticed that clothesline theft was a gateway crime to rape. Woah.

Update 3The Takeaway with John Hockenberry did a radio story too.

[




Pull insights from a sample. Push actions to the census.

— Unknown




The Geeks, Suits, and Wonks Convene at Strata

Too often we geeks, suits, and wonks live in our own worlds of making it work, making it sell, or making it right, respectively. We don’t seem to have the capacity to routinely cross-pollinate. This may be why knowledge grows exponentially and wisdom grows only linearly. Conferences like Strata give us a venue to escape from our mile deep and inch wide comfort zones.


Take the talk by Quid’s Amy Heineike on maps, not lists. She references Eli Pariser’s warning on how listed search results spoon feed us the “best” result at the top. Heineike argues that maps lend themselves to mindful exploration even for non-spatial data. I certainly agree. What’s more, she also strikes a hopeful chord for data science. In the Filter Bubble (which I highly recommend), Pariser quotes Marshall McLuhan who warns that “we shape our tools and thereafter our tools shape us.” Heineike provides a great example of how we then reshape our tools — something I discussed at Strata+Hadoop World last Fall.

Twitter’s Nathan Marz sounded similar “people meet data” themes during his keynote on human fault-tolerance. His battle cry is for data systems that protect themselves from human error, similarly to how we protect systems from hardware faults. His ground truth axiom that the worst problems result from lost or corrupt data (especially silent corruption) is spot-on. All else is recoverable. He recommends immutable systems where bugs can’t delete or corrupt data. Immutability resonates with my electrical engineering training where outputs are a result of transfer functions on inputs. This is the essence of functional programming that I’ve taken a liking to lately.

Hosted by Strata Conference chair Alistair Croll, the Great Debate: Design Matters More Than Math pitted LinkedIn’s Monica Rogati and SkyTree’s Alexander Gray on the side of math against O’Reilly’s Julie Steele and ClearStory Data’s Douglas van der Molen to battle for the value of design. It was a valiant attempt, but math won in a landslide. I was on the losing design side for the simple reason that, yes, math defines what’s possible, but design defines what’s preferable. Math may be the source of knowledge, but design is the source of wisdom — something our species has in short supply.

Finally, my talk (video here) on the sensitive topic of criminal profiling attempted to push the technology and the debate. We designed a felon classifier based on a defendant’s publicly available non-felony criminal record and personal data. The resulting classifier is available on GitHub here.

One of the motivations for the talk was to prove that big data inferences are not a new category of thoughtcrime. However, actions based on those inferences could very well be criminal. For predictive policing, the courts will be the final human arbiter on the admissibility of such computerized informants. Here’s my interview with O’Reilly’s Mac Slocum that touches on some of these issues:

What I found most interesting about this exercise is that the technology can only take us so far. The classifier’s operating point determines how many innocents will be classified as felons (false positives) and how many felons will go undetected (false negatives). Only we fallible humans can choose the right trade-off between tyranny and anarchy. Such is the line that the responsible innovator must walk between high-tech mercenary, traditional capitalist, and social entrepreneur.




Never mistake motion for progress.

with apologies to Ernest Hemingway




Pondering “How to Create a Mind” by Ray Kurzweil

The earth probably sees plastic as just another one of its children. Could be the only reason the earth allowed us to be spawned from it in the first place. It wanted plastic for itself. Didn’t know how to make it. Needed us. Could be the answer to our age-old egocentric philosophical question:

     Us: “Why are we here?”
     Earth: “Plastic … asshole.”

George Carlin

I just imagefinished Ray Kurzweil’s How To Create a Mind: The Secret of Human Thought Revealed. The book is technical enough for the nerdy, but plainspoken enough for everyone else. It got me thinking or, as Kurzweil would have it, pattern matching.

Kurzweil expands on his decades-long thesis that the Law of Accelerating Returns (LOAR as he’s coined it) drives the exponential increase in price/performance of computing. By 2029, this growth in hardware/software will create an intelligence that rivals our brain’s wetware. The LOAR is based on five key concepts that underly all computing:

  1. Arbitrarily accurate communication, based on Claude Shannon’s noisy channel coding theorem;
  2. Universal computation, based on the Turing Machine;
  3. Von Neumann’s architecture of the modern computer;
  4. Artificial, brain-like intelligence that passes the Turing Test; and
  5. Moore’s Law which says that the number of transistors on integrated circuits doubles approximately every two years.

[




Managers cannot single-handedly create value … but they can single-handedly destroy it.

— Jim Adler




Big Data is a Hotbed of Thoughtcrime — My Strata+Hadoop Talk

For those that missed my talk at Strata+Hadoop 2012 in October, Big Data is a Hotbed of Thoughtcrime. So What?, the kind folks at O’Reilly have made it available on their YouTube Channel. Here’s the video link

I also did an interview for the preview issue of Big Data Journal on the thoughtcrime topic. And for those that want to follow along with the video, here’s the presentation:




In great teams, there is a democracy of ideas, but a dictatorship of decisions.

Khoi Tu, author of Superteams

Esquire Theme by Matthew Buchanan
Social icons by Tim van Damme     Hacks by Jim Adler