From Equifax to Ashley Madison, the inevitability of big data leaks drives the democratization of disciplinary power.

Charles Fain Lehman is a staff writer for the Washington Free Beacon. He writes about policy, covering crime, law, drugs, immigration, and social issues. Reach him on twitter (@CharlesFLehman) or by email at lehman@​freebeacon.​com.

In late December, amid little fanfare, the cyber risk team of California‐​based cyber security firm UpGuard announced that data on 123 million U.S. households had been left unsecured, available to anyone with a free Amazon AWS account. There were about 126 million households in the U.S. as of 2016, meaning that the breach affected more or less every American household. [1]

The data was sitting on servers owned by Alteryx, a data‐​analytics firm, but came originally from both credit‐​rating giant Experian and the U.S. Census Bureau. The Census Bureau data, taken from the 2010 census, was mostly publicly available. The Experian data, however, was largely proprietary. “Taken together, the exposed data reveals billions of personally identifying details and data points about virtually every American household,” Dan O’Sullivan, a cyber resilience analyst with UpGuard, explained. [2] The Alteryx leak allowed essentially public access to information “from home addresses and contact information, to mortgage ownership and financial histories, to very specific analysis of purchasing behavior…a remarkably invasive glimpse into the lives of American consumers.”

Perhaps the reason that this achieved little attention is that such leaks have become common, even banal. Some people made a fuss when Experian competitor Equifax announced that half of all Americans’ social security numbers had been stolen from it by hackers. But who, outside of the cybersecurity community, remembers the 2015 Census Bureau or OPM hacks, or the 2013 hack in which all 3 billion Yahoo accounts were compromised?

The seeming mundanity of such leaks should, on reflection, cause us a great deal of alarm. Information about Americans has become essentially public; after the Equifax leak, it is questionable whether social security numbers have any significance as unique identifiers at all. These leaks are the natural result of decades, if not centuries, of efforts to aggregate identifying information about individuals, a technological process which has reached a new zenith in the 21st century. We must acknowledge that that such a process, including its breakdown in the form of leaks, will have an impact on the functioning of a liberal society.

Liberalism, as a political philosophy, is concerned primarily with the conditions required to preserve liberty, the freedom to do as we please. It is often framed in opposition to threats to this freedom — the enemies of the open society, to borrow the Popperism. In the classical liberal tradition, especially as interpreted by modern proponents, that enemy is force and, in a civilized society, the monopolist on force, the state. In more recent liberal traditions, those of FDR and the 20th century liberals, poverty and economic misery become the enemies of a free society.

But the sort of information that is aggregated by governments and credit bureaus alike, stolen, and disseminated represents its own unique threat to liberty. 21st century modes of knowing and forms of knowledge carry their own form of power, one that must be seriously interrogated as another enemy of the open society.

Furthermore, unlike traditional forms of power, which have required enormous institutional might to operationalize, this new knowledge is increasingly available to anyone with free time and an internet connection. We must take seriously the proposition that credit rating leaks permit the exercise of power by everyone against everyone and, as such, constitute an unprecedented threat to liberal societies.

In Xinjiang, China’s largest, northwestern‐​most province, there are eyes everywhere. Prompted by the separatist violence of some members of the province’s ethnic majority, the Muslim Uyghurs, Beijing has imposed what may be the most pervasive surveillance apparatus ever implemented.

A recent Wall Street Journal investigation captures the breadth of the surveillance: “Security checkpoints with identification scanners guard the train station and roads in and out of town. Facial scanners track comings and goings at hotels, shopping malls and banks. Police use hand‐​held devices to search smartphones for encrypted chat apps, politically charged videos and other suspect content. To fill up with gas, drivers must first swipe their ID cards and stare into a camera.”

The PRC’s new surveillance apparatus is simply the latest iteration of the modern surveillance state. Such apparatuses are familiar not merely to the denizens of autocratic nations, but those of ostensibly democratic ones. In the United States, the NSA has hoovered up enormous amounts of ostensibly private data. More than 17,000 cameras constantly watch the residents of New York City. In Chicago, police use demographic data to target probable homicide offenders, visiting them with proactive warnings before a crime is even committed.

Surveillance as a policing technique — the beat cop, surveilling the streets for which he is responsible, for example — is hardly new. But technology — cameras, sensors, ID cards, facial recognition, the whole apparatus for knowing who is where when — permits a kind of totalizing surveillance previously reserved for dystopian fiction.

The surveillance state is not merely a knowledge apparatus, but a power apparatus. If it were simply a benign tool for collecting knowledge, then we would hardly be so concerned about it. But how does the surveillance state translate knowledge into power?

One straightforward answer is that surveillance allows for the finer targeting of brute force. This is the message undergirding Chicago’s police pre‐​checks: knowing who is likely to break the law means that the police — the embodied threat of violence — can be dispatched to discourage them before they do so. Power can more precisely perform the same operations it has always desired to perform.

The more complicated answer is that surveillance expands the domain of things to which power can be applied. The tracking of biometric information, as in Xinjiang, permits the specific identification of persons along axes not previously available. Statistical discrimination becomes available across far more information dimensions, and big enough data sets allow the identification and regulation of patterns before would‐​be criminals even are aware of their participation in the pattern.

The word we might associate with the form of power that surveillance permits is, after French social theorist Michel Foucault, “discipline.” Foucault documents how the brutal punishments of the pre‐​Enlightment era were by and large displaced by discipline, exchanging flashy, violent displays of power for the micro‐​correction of behavior. [3]

Discipline, Foucault writes, treats the human as “something that can be made; out of a formless clay, an inapt body, the machine required can be constructed; posture is gradually corrected; a calculated constraint runs slowly through each part of the body, mastering it, making it pliable, ready at all times, turning silently into the automatism of habit.” [4]

This micro‐​application of power requires a concurrent knowledge apparatus which permits knowledge at the small scale, but which is able to acquire that level of precise knowledge across a whole population. The process of disciplining was, to Foucault, the central process of every social institution through the 18th and 19th centuries: the creation of the prison, the clinic, the asylum, the very notion of sexuality, all were focused around the specifying of micro‐​level knowledge about a population for the fine exercise of control across a large swath of people. [5]

The application of this concept to the surveillance state explicates an intuition many of us share; that the surveillance state’s subtle but totalizing knowledge of our society permits the exercise of a historically unique power. The notion of discipline allows us to grasp the regulatory effects of surveillance knowledge as a thing in itself, a power often exercised by the state, but not reserved to it.

We usually think about surveillance and discipline as things the state does. It is less natural to think about other institutions using surveillance and discipline. The discovery and use of ways of knowing that permit discipline are not functionally constrained to states. Surveillance and discipline have emerged in one private industry where knowledge and power neatly intersect; the modern American system of credit rating.

The earliest credit rating system in America is generally considered to be the work of 19th‐​century Philadelphia merchant Lewis Tappan, who used the dossiers he and his brother kept on neighbors’ unchristian behavior to judge their creditworthiness. [6] Tappan and his brother Arthur were business partners, and devout Calvinists — the two gave away Bibles, reported illegal gaming houses, and even tried to rescue prostitutes from the city’s brothels. Perhaps most significantly, they coordinated with other Christians to gather information on unchristian behavior, sharing reports of immoral conduct by their neighbors.

When the brothers Tappan ran into a spot of business trouble due to their pro‐​abolition views, Lewis dug their business out by extending credit to their customers. Distrustful of usury, he used the dossiers to make credit‐​worthiness judgements about his customers. Soon the brothers had started a business in credit checks, and thus was born Tappan’s Mercantile Agency — by 1844, it had over 280 commercial clients. [7]

Tappan’s system was for mercantile, rather than consumer credit. Prior to WWII, creditors, merchants, and landlords all relied on more informal, less systematized mechanisms of determining consumers’ credit‐​worthiness, making credit comparatively hard to come by. [8] Credit agencies, where they existed, were mostly local and community‐​based. [9] Individual creditors kept their own credit assessments of individuals with whom they did business. While they sometimes exchanged lists, these individuals made no attempt to aggregate or centralize their judgements. [10]

In keeping with the Tappan’s tracking of moral infractions, early consumer credit reports were “derogatory,” i.e. they only recorded negative information. Credit reporters would keep track of delinquencies — to have good credit was to have a blank credit report. [11]

In 1960, there were an estimated 1,500 local, independent credit reporting agencies collecting such information. [12] Over the course of the following decade they consolidated, transforming the mostly local, often not‐​for‐​profit credit bureaus into nationwide conglomerates through mergers and buyouts.

The consolidation of the credit rating industry, and concurrent regulation under the Fair Credit Reporting Act (FCRA), prompted a transformation of the credit reporting industry. Growing more powerful as they grew bigger, creditors began to capture more robust, verifiable data about customers. “This,” Mark Furletti writes, “included both positive information, such as a consumer’s ability to consistently pay her bills on time, as well as negative information, such as defaults and delinquencies.” [13] Computerization‐​born efficiency drove even more small credit agencies to pack up, selling off their records to larger conglomerates.

The rise of large‐​scale, technologically‐​enhanced credit rating agencies altered the social role of these institutions, which came to view themselves as not merely providing a service, but actively serving the public good. In a 1968 Congressional hearing on the Retail Credit Corporation (today known as Equifax), RCC’s president told the panel that the goal of the new credit agencies was to impose “discipline” on American consumers, to ensure not only that he wasn’t delinquent, but that he was made newly responsible by credit rating systems:

“It is a fact that the interchange of business information and the availability of record information imposes a discipline on the American citizen. He becomes more responsible for his performance whether as a driver of his car, as an employee in his job performance, or as a payor to his creditors. But this discipline is a necessary one if we are to enjoy the fruits of our economy and the present freedom of our private enterprise.” [14]

It should be apparent to anyone who has ever been denied a loan, or a lease, or paid a higher rate on their insurance, that the knowledge apparatus of credit rating has a disciplinary effect. In a mass society of strangers, in which individualized knowledge and concomitant trust are not available, credit ratings outsource the problem of telling businesses whom to trust. Credit agencies as knowledge aggregators and organizers, then, permit the regulation of our behavior at both mass and individual scale; they discipline us.

Among all the businesses which participate in surveillance and discipline, why opt to focus on credit raters? It is not merely because of their obviously disciplinary nature, but their importance as particularly unstable disciplinary institutions.

Hacks and leaks only become conceptually possible, a problem to be dealt with, under the paradigm of mass data collection which is an essential feature of modern disciplinary knowledge regimes. We can only steal huge databases if they exist as databases in the first place.

Defenders of the status quo of mass aggregation might be quick to note that a) aggregation has led to enormous good in the modern world and b) the reason that public and private institutions are the ones handling big data is precisely to safeguard against such leaks and hacks. To the former, we might easily agree; credit rating has almost certainly been an enormous boon to total global productivity and wealth, helping the rich and poor alike.

As to the latter response, that sufficiently advanced security systems and sufficiently competent institutions will ensure against leaks, we can only retort that human systems have a truly awful track record of both security and competence. The Equifax, Experian, OPM, or Yahoo hacks teach us that not only are hacks a possibility, but that they will, reliably, eventually happen. In fact, such leaks and hacks are a predictable consequence of the mere collection of such mass data.

UpGuard, in its analysis of the Alteryx leak, noted that “this exposure is a prime example of the way in which third‐​party vendor risk can result in sensitive data leaking from multiple entities.” It further identified the guilty third party: “While Experian rates 728 and the US Census Bureau 872 on the CSTAR cyber risk score, out of a maximum of 950, Alteryx, which owned the bucket, had a lower score of 692 — showing perhaps how a weaker link can be fatal throughout the chain.” [15] The Alteryx leak, in other words, is evidence of the old adage that a chain is only as strong as its weakest link.

Any system that is large enough will, thanks to its sheer complexity, the number of moving parts involved, develop weaknesses. This is, at the very least, a function of the laziness or incompetence of the humans who, of necessity, have to design and operate these systems. Somewhere in the chain of decision‐​making will leave an AWS bucket unsafely configured, or perhaps make a username and password “admin”. Alteryx, or some Alteryx employee, was the weakness for Experian’s data, but there will always be such a weak point.

There is further a natural imbalance inherent in all security systems protecting potentially lucrative information, namely the asymmetry of success standards between attackers and the defender. In order to be successful, the owner of the system needs to stop hacks every single time; hackers need to get in only once in order to “win.” This imbalance tilts the whole probability towards failure on the part of the defender, at least given adequate time.

On top of all this, such systems suffer from an imbalance of incentives. The incentives to keep consumer data safe are fairly apparent, but they come in the form of avoiding costs — reputation costs if a firm loses its data — or in paid incentives — someone tries to keep the data secure because they are paid to do so. These incentives do accrue over time, but it is a slow drip of a payout. That is especially by contrast to the incentives to steal consumer data. A single theft represents an enormous payoff. Such data is a big prize, for everyone from credit card thieves to foreign governments.

What that should mean is that there exist lots and lots of actors who are interested stealing data. Data security systems will always have weak links, and require only one weak link to break, to catastrophic ends. The proliferation of interested actors means that the mean time to failure is and always will be comparatively pretty low. This paradigm functions as a corollary of Linus’s law: with enough eyes, all hacks are inevitable.

If this is the case — and the recent proliferation of giant hacks, often through the abuse of human weakness suggests that it is — then access to vast stocks of data will no longer be mediated by institutions; it will become publicly available, accessible to anyone with an internet connection and the means to navigate a big data set.

Access to these data is not a mundane matter of knowledge: it is a matter of power. Disciplinary power, insofar as it is made possible by large stocks of information, can be exercised by anyone with the means to operationalize that information. Power is not localized to the institutions who control the data — it inheres in the data itself, as power inheres in the sword, not its wielder.

The leaking/​hacking of large amounts of data makes the power it creates publicly available: anyone can exercise discipline over anyone else.

In July of 2015, a group calling themselves The Impact Team stole and then released information on clients of Ashley Madison, a website where users could find someone with whom to have an affair. Multiple websites popped up making it easier for individuals to search the released data and find out if their spouse’s name was included.

The consequences were spelled out in an LA Times story by one victim of the breach. Rick Thomas, a registered Ashley Madison user prior to the breach, found himself blackmailed to the tune of $1,000, based on behavior he thought was private. Thomas owned up to his attempted infidelity, and the story ends in suitably saccharine fashion. The important takeaway, however, is the way in which men like Thomas found themselves suddenly and unexpectedly at the mercy of those with new knowledge‐​derived power, be they blackmailers, or spurned wives.

Obviously, cheaters like Thomas and his ilk are not the most sympathetic victims of such breaches, nor was the data stolen from him as comprehensive as that made available by the Alteryx leak. But their stories should illustrate the effect of leaking of knowledge out of the sphere of institutions and into the hands of individuals, for good or ill.

Perhaps a more sympathetic target will better illustrate the dangers of mass data centralization. Consider victims of harassment by collectors of so‐​called “phantom debt,” largely fabricated debt portfolios assembled from leaked information sold over the darknet. As Bloomberg documented in a recent article on the practice, millions of Americans regularly receive collection calls from the nascent industry which preys on the weak, using their personal information to demand that they pay debts they never owed:

It begins when someone scoops up troves of personal information that are available cheaply online—old loan applications, long‐​expired obligations, data from hacked accounts—and reformats it to look like a list of debts. Then they make deals with unscrupulous collectors who will demand repayment of the fictitious bills. Their targets are often poor and likely to already be getting confusing calls about other loans. The harassment usually doesn’t work, but some marks are convinced that because the collectors know so much, the debt must be real.

These scams are clearly profitable enough to be worth investing substantial time and energy into; Bloomberg noted that an India‐​based phone center made eight million calls over eight months before the Federal Trade Commission managed to shut it down.

One could readily object that the problem with phantom debt is the lying, not the original information. But the ability of companies to pick targets and extract money from them using dishonest debt portfolios depends upon the extensive collection of information which they are able to acquire on any given person.

These examples show the nascent but expanding practice of data‐​enabled discipline exercised by individuals on other individuals. The targets are the socially sanctioned or disempowered — the cheater, the debtor — but they are only targets because they are the subject of specification, analysis, and categorization. Certainly, they consensually enter themselves into these systems — Thomas signed up for Ashley Madison on his own — but consent doesn’t lessen the disciplinary power created by these systems.

It is a truism so old that no one can quite remember who said it first: knowledge is power. This is a statement about simple equivalency — where there is knowledge, there is power, and vice‐​versa. But there is also a more substantive equivalency, i.e. that the character of knowledge is related to the character of power. What you know determines how you can exercise power, whether you are a state, a business, or a lone wolf.

For most of its history, power has been most prominently exercised in brute fashion — one needs to know little about one’s victim in order to beat them to death with a club. More subtle forms of power have existed throughout history, but they have been reserved to matters of palace intrigue, never to be exercised on the population‐​wide scale.

Yet we live in an age of unprecedented technology for knowing, and therefore of unprecedented forms of power. This is already intuitive to everyone where the state is concerned — even the most diehard surveillance state apologist still feels his hair stand on end at the thought that the man from the government watches him while he sleeps. This concern would not exist if it did not imply some power. Harder for some to accept is the idea that businesses exercise the same power. Yet it is hard to call the pervasive practice of credit rating, with its concurrent impact on almost every financial decision we make, anything but disciplinary.

The problem, the thing which allows the exercise of modern disciplinary power, is not a particular institutional form. Discipline is not unique to the state; it is not unique to the corporation. The problem is not who does the knowing; the problem is what they know. If knowledge is power, and anyone can have knowledge, then anyone can have power.

These forms of knowledge have already leaked into the hands of individual ne’er-do-wells, and will continue to do so ineluctably. Better security is simply a stopgap, given our modified Linus’s law; with enough eyes, disciplinary power will be for everyone.

The implications of inevitably dispersed knowledge are dire for the liberal society. If liberty is non‐​domination, [16] then the capricious and arbitrary application of power poses a threat to liberty even when exercised by individuals against other individuals, even when it takes a distinctly modern, post‐​Republican form. We already fear the chilling effect of social media, “doxxing,” and the internet mob on the discourse fundamental to a functional liberal polity. Imagine such situations, but amplified by the sort of power discipline permits.

It is furthermore not obvious that there is much that we can do about this. If the problem is not the people who exercise power, but the form of knowledge itself, then where there is modern demographic information, there is also necessarily disciplinary power, and therefore a threat to liberty.

At the very least we can know that we have opened the box, can admit that the quantifying, categorizing, and compulsive gathering of data may not be an unqualified good. Quite the contrary, it may pose a real threat to the very foundations of our society.

[1] “Number of households in the U.S. from 1960 to 2016 (in millions)”, Statista, January 2017, accessed December 25, 2017, https://​www​.sta​tista​.com/​s​t​a​t​i​s​t​i​c​s​/​1​8​3​6​3​5​/​n​u​m​b​e​r​-​o​f​-​h​o​u​s​e​h​o​l​d​s​-​i​n​-the-…

[2] Dan O’Sullivan, “Home Economics: How Life in 123 Million American Households Was Exposed Online,” UpGuard, December 20, 2017, accessed December 25, 2017, https://​www​.upguard​.com/​b​r​e​a​c​h​e​s​/​c​l​o​u​d​-​l​e​a​k​-​a​l​teryx

[3] Michel Foucault, Discipline and Punish (Vintage, 1995).

[4] Foucault, Discipline, 135.

[5] Respectively, Discipline and Punish; The Birth of the Clinic: An Archaeology of Medical Perception (Vintage, 1994); Madness and Civilization: A History of Insanity in the Age of Reason (Vintage, 1988); and The History of Sexuality, Vol. 1 (Vintage, 1990).

[6] Daniel B. Klein, “Credit‐​Information Reporting: Why Free Speech is Vital to Social Accountability and Consumer Opportunity,” The Independent Review, Winter 2001, ISSN 1086–1653, 2001, p. 330.

[7] Klein, “Credit‐​Information Reporting,” p. 330.

[8] Ibid.

[9] Mark Furletti, “An Overview and History of Credit Reporting,” Federal Reserve Bank of Philadelphia, June 2002, p. 3

[10] Klein, p. 330.

[11] Andrea Ryan et. al., “A Brief Postwar History of US Consumer Finance,” Harvard Business School, 2010, p. 20.

[12] Ibid.

[13] Furletti, p. 4–5.

[14] “What’s the Limit for Bureaus?,” Kansas City Times, January 20, 1969, https://​www​.news​pa​pers​.com/​n​e​w​s​p​a​g​e​/​8​5​0​2​0773/, cited by Matthew Stoller, Twitter, September 8, 2017, https://​twit​ter​.com/​m​a​t​t​h​e​w​s​t​o​l​l​e​r​/​s​t​a​t​u​s​/​9​0​6​2​5​8​3​7​7​6​2​9​7​32864.

[15] O’Sullivan, “Home Economics.”

[16] Philip Pettit, “Liberty as Non‐​Domination,” in Pettit, Republicanism: A Theory of Freedom and Government (Oxford, 1999), http://​www​.oxford​schol​ar​ship​.com/​v​i​e​w​/​1​0​.​1​0​9​3​/​0​1​9​8​2​9​6​4​2​8​.​0​0​1​.​0​0​0​1​/​a​c​p​r​o​f​-​9​7​8​0​1​9​8​2​9​6​4​2​3​-​c​h​a​p​ter-3.