In memory of a talk I almost gave: A Millennial Chatbot’s Prayer


The following is the opening paragraph of Peter Bright’s article “Tay, the neo-Nazi millennial chatbot, gets autopised”, published by arstechnica on 26 march, 2016:

Microsoft has apologised for the conduct of its racist, abusive machine learning chatbot, Tay. The bot, which was supposed to mimic conversation with a 19 year old woman over twitter, kick, and groupme, was turned off less than 24 hours after going online because she started promoting Nazi ideology and harassing other Twitter users.

The disappearance of Tay is more farce than tragedy, but lost in the discussion of the nazi-fication of Tay was the nature of her existence itself. Tay was essentially designed to be an entity composed purely of data which could then reproduce relevant outputs from that data that, apparently, Twitter users would ideally seek to interact with. The days when people would be disturbed by this idea on its own are long gone, but understanding Tay as an innocent corrupted by the vile internet is a fundamental misconception. Tay after all is not Tay, but Microsoft, a corporation pursued by a number of US and European courts over anti-competitive practices and which has, at the moment, exactly zero cases of pursuing disinterested altruistic behaviour. The aim of the bot, to “experiment with and research on conversational understanding” is not an innocuous behaviour when the conversants don’t fully understand who is listening. The data that flows into Tay, or her more successful Chinese relative Xiaolce, is collected and analysed, most often by other computerised algorithms, but what becomes of that data is not always simply a matter of algorithmic shuffling. In 2014 for example, Facebook was forced to acknowledge that it had “tampered” with the newsfeeds of nearly 700,000 users by showing them “abnormally low numbers” of positive or negative posts. The experiment, in the words of the reporter, Dominic Rushe, sought to determine whether the company could alter the emotional state of its users. Knowing this fact, I think it is clear that the answer is yes; they can alter the emotional states of users. Particularly, they’re good at producing outrage. Every second we are online, and, indeed when we are not online, we are providing private companies with data about our lives. This data is complied, organised and collected by the companies, and then, the data is often sold on to advertisers, but also on exchanges that deal in data piles for their own sake.

One of the researchers at the forefront of the question of how our data is used is Dr. Bev Skaggs who teaches at LSE. She began looking into the question of how data was used in relation to the question of value, wondering if there were any area of life where commercial valuations did not intrude. Her answer appears to have been not many. The numbers she cites in her research are staggering: social media users DAILY generate 600 terabytes of information, there are 100,000 individual requests to Facebook from advertisers PER SECOND, bids are made across social media platforms 50 billion times per day. Skaggs’ research began by looking at the commercial side of the data industry, but has begun to explore the question of the political implications of the incomprehensible levels of commercial traffic associated with data. One of the major frontiers for social media platforms and the advertisers on which they depend is the formulation of a coherent picture of an individual through looking at their data. The advertisers want to match a specific product to a specific buyer at a specific time to further enhance market efficiencies and generate higher rates of growth; these platforms are essentially better thought of as data brokers or meta-brokers of data (aggregators that have a symbiotic relationship with formal data brokerage companies like Experian, Rubicon and Axion).

Consolidation is a major imperative, the more platforms and devices that you use from the same companies, or the more information is shared between companies (shared is not the right word, sold is), the more the data defines us. The integration of social media and platform apps and telecoms is a kind of Holy Grail for these companies so that they can manage both your online experience and your access to the internet. A particularly insidious example of this is the Free Basics programme which provides free mobile phones to populations in developing countries in exchange for being the only channel through which these populations have access to the internet. All data is fully captured. Thus, the more specific a picture of “you” they can harvest from your data, the happier they are. An advert in the UK featuring a character called “Dan” and his “Data Self” seated beside him, similar but not exactly the same person, is quite accurate, not least insofar as it is nearly impossible to tell which “Dan” is which, but it is also remarkable for how blasé the company using this advertising campaign is in acknowledging the concept of a shadow data self. Instead of outrage, the adverts are intended to actual reproduce the commercial relationships whose dangers they hint at. If you buy into the management of your data self, the data brokers and their customers will be very interested in this fact. In some ways, Skaggs’ research suggests, the Data Self is easier to recognise than the material self, individuals interact online in signature ways, but firms are attempting to reach out of the data and onto the body of the material selves as well, seeking to produce face recognition trackers in order to know who is using a specific device at a specific time. As these forms of tracking move forward, the political implications Skaggs speaks of become much more ominous. Already, data and analytics are used in a number of politically charged contexts, notably the approval or denial of parole to offenders. Inevitably the data, working from an existing set of events, writes in biases. In a patriarchy, they write in misogyny, in a white supremacist power structure, they write in white supremacy.

They also write in inequality. Of the 50,000 unique attributes Facebook uses to profile its users’ profiles, the question of value is at the centre of every decision, but it is purely a monetary evaluation. The high net worth and highly networked are worth more than the low net worth. Both are targets, but in different ways; both are products, but also in different ways. The high net worth data beings are sold to luxury goods producers, the low net worth data beings are sold for debt. On a commercial level it this is an odious but perhaps understandable dynamic, but it is not just commercial enterprises in the business of data trafficking. Recently the government of India found itself in the headlines for a sale of biometric data of its citizens to the governments of the US and UK, and to the CIA specifically. These governments hardly need help in gathering data, as the massive blanket surveillance revealed by Edward Snowden clearly demonstrated. The willingness of India’s government to blithely pass along sensitive information about citizens should be a warning that governments that do not value securing their citizen’s online identities will probably not value safeguarding their citizen’s offline identities for long either. More and more decisions will be made about our lives through the vector of our data selves. Institutions of state and commercial entities will know and understand aspects about our social identity – as slotted into their attribute matrices – that many of us will never even begin to understand. A shadow politics and a shadow culture is being created via our data (data to which we have no access). I say that we have no access to this data, but this is not entirely true. Presently, it is the case that all data compiled, packaged and traded by online platforms is proprietary, though periodically platforms make public a selective array of information. This disclosure process always happens on the terms of the company, however, as there is presently no formal legislation to make the platforms comply with requests for information about how the data they harvest is used. As we endlessly feed these companies more data, they become richer and more powerful. Their power and relative invisibility disguises an important truth about them: using any historic definition of value, all of us produce value for these corporations. In a sense, we are their labour force, yet we have no rights. No only do data companies not pay us for our information, many of them are structured for maximum tax efficiency, paying far less corporate tax than far smaller entities. Thus, these companies have the advantages of free labour, opaque internal structures, and favourable tax regimes, and all of this is before one considers the research and development advantages these entities had as much of the tech they’re based on was developed by publicly funded universities. The vision these companies have for us is the same as the vision they had for Tay, we are to be entities conditioned by the online interactions we engage in, permitted a range of options within a prescribed framework but essentially to feedback to the data system the information it wants. We are to become the employee, or perhaps the unpaid intern for our data selves.


Much like the Richard Serra and Carlotta Fay Schoolman work “Television Delivers People” and the later riff on Serra’s work by Jonathan Horowitz, “Art Delivers People”, social media delivers people, and in an age of app based tech, increasingly dependent on data for advantages over competitors, the internet as a whole delivers people to advertisers. When one accepts a ride from Uber or a meal from Deliveroo, they, too, are being delivered and sold and re-sold. Primary markets, the actual services these apps purport to deliver, are more akin to costs than income streams: whatever money that can be made by delivering a hamburger from Illegal Burger to someone’s house in Kreuzberg pales in comparison with the money that can be made by selling the information about that delivery, who ordered it, where they live, what their consumption pattern is like, when they like to eat, how much is ordered (i.e. do they live alone)? All of these bits of information are valuable to a potential client and the companies that most effectively exploit these data will be the ones that survive and dominate the markets. Our digital selves will define our lives more and more and we’ll know less and less about them. So this moment is a crucial one. States and groups of states like the EU are strong enough to exert pressure on corporations that have more resources at their disposal than many states. This condition may not last much longer, indeed as countries like Ireland demonstrated recently, they are so addicted to servicing tech firms that they reject tax monies that are owed to them in fear that companies will look elsewhere to be based. Berlin is in an especially important place in this dynamic, as more and more business are coming here to set up offices, one may think of the Google campus near Reichenbergerstrsse in Kreuzberg, or Apple’s secretive offices on Markgrafenstrasse in Stadtmitte. Users of social media and the platform economy must act to take back control of their data selves, or at least become acquainted with them. There are means individuals can take to minimise their vulnerability to being consumed by data brokers. One may use anti-tracking devices like trackmenot, which is decent, or put tape over the webcam on your computer, which is low-tech but somewhat effective in ensuring that companies cannot see you by stealthily turning on your web cam without your knowledge, which, sadly, is a more common practice than it should be. But, really, the imperative is to act on a political level, to form groups and alliances to lobby for new laws relating to the protection and monetization of data. It may not be possible fully escape being “tracked, bought and sold”, in Bev Skaggs’ words, but it may at least be possible to have insight into the process. It is one thing to be conditioned, another thing to be hacked, as Tay the Millennial Chatbot demonstrated, data flows can make us vulnerable, but, unlike her, we are not, yet, owned and operated fully by Microsoft. We can still stop the worst from happening. And so I conclude by quoting the great Sir Cliff Richard’s hopeful anthem of the millennium, his prayer for a brighter 21st century. “lead us not to the time of trial,” he sang, keep us from evil.” Personally, I would settle for a few of the more irresponsible data brokers being led to trials and for Google to return to its motto (don’t be evil), but none of this will happen without us, the Millennial Chatbots of the world who must unite to make Sir Cliff’s prayer a reality.