Before ChatGPT, there was Rekognition: How Amazon’s algorithms control which books you see

In 2009, approximately 57,000 books by and about LGBTQIA+ folks disappeared from Amazon’s search results, bestseller lists and sales ranks, a vanishing act that allegedly occurred after an Amazon employee mistakenly marked books classified as queer as adult material. The error was quickly reversed, but it highlighted the relationship between the classification and governance of cultural objects: determining how books are named also determines what we do with them, who they are supposedly made for, and who can access them.

Library and information scholar Hope A Olson refers to this as the power to name—a power that often reinforces an unquestioning acceptance of hegemonic presumptions and authority. Now, almost fifteen years later, Amazon’s algorithms are still unfairly targeting books by historically marginalised authors, including queer folks and people of colour, and controlling how readers can discover them.

To call Amazon inequitable feels redundant at this point in time. Logics of inequality are baked into every facet of Amazon’s operations, from its headquarters to its warehouses and platform. According to the latest company data, 28 per cent of its US workforce and 5.5 per cent of its senior leadership identify as Black, compared to 30.2 per cent of all its US workforce and 66.4 per cent of senior leadership who identify as White. Warehouse employees are staggeringly underpaid, overworked and constantly under surveillance. The company has constantly attempted to quash unionisation, and, until George Floyd’s murder and the subsequent Black Lives Matter protests in June 2020, sold racist surveillance software to US police departments. If there ever truly was an earnest hope that digital technologies might liberate books from the restrictive gatekeepers of traditional publishing, it can no longer be seriously entertained.

With its relentless interest in profit and apathy towards the cultural value of the items sold on the platform as well as the creators behind them, Amazon perpetuates engrained forms of inequality and introduces new ones in book culture through its classification and content moderation algorithms. These systems underpin the organisation, circulation, and discoverability of books, determining where books appear in Amazon’s marketplace—if they are allowed to be published at all.

Amazon’s algorithms, like all algorithms employed by commercial platforms, are a secret sauce, governed by trade-secret protection and made into opaque black boxes. My research into these algorithms, and platformed publishing more generally, follows on from Taina Bucher’s framework for knowing algorithms, which conceptualises them in terms of a relational ontology: as made into and made meaningful through events with users. Two algorithmic events described by BIPOC authors in interviews with me reveal how discriminatory data logics are interpolated in Amazon’s algorithms.

In the first instance, a Black self-published romance author relayed the struggle she experienced in getting her contemporary romance novel to be listed as contemporary romance.

Books published via Kindle Direct Publishing are subject to two classification systems that operate relationally. Firstly, authors classify their books from a list of ‘browse categories’ on KDP and secondly, upon publication, Amazon translates these and other metadata to determine the placement of titles in its online storefront. This placement includes the categories in which books are organised, in which bestseller list they’ll appear, and related titles that appear on a book’s product page as also-boughts (customers who bought this, also bought …).

Initially, this author chose categories like ‘multicultural romance’ for her books, though she decided to switch to ‘contemporary’ because it’s a subgenre that anecdotally has a larger readership. Except, Amazon kept changing the classification of her contemporary romance books from ‘contemporary romance’ to ‘African American literature and fiction’:

I did notice they kept putting one of my books into the multicultural category. I’d take them out … and put it in contemporary or whatever… even if it’s lower on the totem pole, it still gets exposed to a wider audience, and they kept putting it back [into African American literature]. And I stopped tagging. Even books I never tagged; they would get put in. And I would go back in and take it out.

This stopped, she told me, when she changed her profile photo on her Amazon author page from a portrait of herself to her logo. In addition to the title, description, browse categories, and keywords, it seems, then, that Amazon also uses profile data of authors, including author photos, biographies and linked titles, to determine what books are and where they appear in its marketplace.

In Algorithms of Oppression, Safiya Umoja Noble refers to this mechanised bias as ‘technological redlining’, building on the first use of the term redlining to describe the systemic racism of banks that refused mortgage loans to customers who lived in neighbourhoods associated with racial minorities near the end of the last century. This is not new to digital book culture, though: Amazon continues a history of cultural redlining that Richard Jean So shows has pervaded the book industry, from production to reception, throughout the twentieth century.

This automated systematising of books based on identity is something that can only happen to authors from historically marginalised communities. While marginalised identities are explicitly named categories on Amazon, in BISAC codes, and the Dewey Decimal system, there are no categories denoting Whiteness in these classification systems. White stories and authors are automatically subsumed into ‘general’ categories and those based on tropes. Fiction, in Amazon’s classification system as in traditional knowledge organisation systems, is White fiction. ‘Invisibility, with regard to Whiteness,’ as Ruha Benjamin states in relation to technology, ‘offers immunity.’

This immunity appears to extend to Amazon’s content-moderation algorithms, through which marginalised identities are conflated with or more often flagged as adult material. Another Black American author I spoke to had one of her books denied publication after the cover—which featured a black and white photo-style image of a Black woman with bare shoulders, but with her torso and legs covered by a white sheet—was flagged by Rekognition.

Rekognition is a deep-learning neural network that analyses image and video content and identifies objects, people, text, scenes, and activities. It is trained to detect unsafe content, including adult material. It does this by identifying nudity based on large uninterrupted areas of images in which the pixels have been determined as being the colour of human skin. Far from reliable, automatic detection tools can be confused by unusual lighting in images, images that include colours that may be confused with skin tones, clothing that interrupts areas of skin-coloured pixels, images that include naked skin but are not generally considered objectional, such as baby photos, and, in one instance on Facebook, the Venus of Willendorf.

Research from the MIT Media Lab has also found Rekognition, along with other detection software from Facebook and IBM, to be substantially less accurate in identifying darker-skinned individuals, particularly darker-skinned women.

Besides not showing any more nudity than Florence Pugh’s bedsheet-inspired gown at the Oscars last week, the level of undress on this cover is also minimal in comparison to some of the romance genre’s most famous cover layouts. These include the famous clinch covers, shirtless male torso covers, and the chest-bared, long-haired Fabio covers, which are circulated widely on the platform.

Just as ChatGPT can replicate human language, but not possess knowledge, Rekognition can detect patterns of pixels, but not understand their context or meaning. It certainly can’t account for, nor does it care about, genre conventions, reader expectations, and creative enterprise. And decisions around appropriateness, which have been relegated to automations via Rekognition, are as racialised as the politics of acceptability that have pervaded White hegemonic and colonial societies for centuries.

The author appealed the flagging through Amazon’s author services team and the book was eventually published on the site. However, its adult materials flag remains attached, casting a dark shadow that makes it much harder for its to find an audience. As with the 57,000+ books that vanished from the Search results in 2009, this book has been effectively suppressed in Amazon’s marketplace.

This suppression further compounds the insidiousness of Amazon’s algorithms, which hides biases under the guise of neutrality, restricts authors’ access to information that determines how their books circulate online, and obfuscates responsibility through rhetoric that represents the company’s function as mere intermediaries between users. The lack of evidence of moderation interventions continuing even after they are supposed to stop prevents users from fighting against inequitable treatment.

The gates in the publishing industry have not simply shifted, because the old gates are still well in force based on publishing industry data. Rather, new gates have been introduced that have automated older systems of racial discrimination in the book industry, whether through sinister intent or sheer indifference. Returning to Ruha Benjamin, after technology,

the glaring gap between egalitarian principles and inequitable practices is filled with subtler forms of discrimination that give the illusion of progress and neutrality, even as coded inequity makes it easier and faster to produce racist outcomes.

To fight against these systems, first we need to uncover them, to make them known and legible. It would be lovely if Amazon were to provide greater transparency around its algorithms, but it’s highly unlikely for a company that sees algorithms as trade secrets and that has baked inequitable treatment into its business model. However, we can achieve transparency in a meaningful way by building collective knowledge through sharing stories about algorithmic events, particularly from marginalised standpoints, and creating strategies for subverting Amazon’s imposing power.

Claire Parnell

Claire Parnell is a lecturer in digital publishing at the University of Melbourne. Her teaching and research focus on platformed publishing and cultural inclusion in the book and media industries.

More by Claire Parnell ›

Overland is a not-for-profit magazine with a proud history of supporting writers, and publishing ideas and voices often excluded from other places.

If you like this piece, or support Overland’s work in general, please subscribe or donate.

Related articles & Essays