Transforming Access to the TRC Archive: The Bitter Aloe Project and AI Technology

By Stephen Davis and William Mattingly
Charged with creating ‘as large a picture as possible’ of human rights violations that occurred during the last three and a half decades of apartheid, the South African Truth and Reconciliation Commission (TRC) amassed an extraordinarily large archive. Much of this archive consisted of hearing transcripts which were stored on a legacy website hosted on the Department of Justice servers. These transcripts provided the public with access to this facet of the TRC’s work, but navigation of testimonies proved limited to basic keyword searches. Essentially, users could only read single transcripts in a linear manner, meaning that the ‘large picture’ could not be viewed in a single frame, and patterns that existed across testimonies remained hidden.
The Bitter Aloe Project is an attempt to apply advanced machine learning methods to the massive corpus of text related to human rights in South Africa, namely testimony transcripts and incident descriptions collected by the TRC. The project began in 2019 when Stephen Davis and William Mattingly began working with a dataset of human rights violations derived from Volume 7 of the TRC Final Report. They applied a method called named entity recognition (NER) which automated the identification and classification of information contained in the 21,500 descriptions of human rights violations included in Volume 7. This method enabled them to map these incidents for the first time, as well as create filters that users can use to display select categories of incident data, such as organisations, types of violence, province and date. The result was a ‘big picture’ that users could zoom in and out of and filter for particular kinds of incidents, which finally made patterns visible.
The next stage of their work focused on hearing testimony transcripts. The SABC in partnership with the South African History Archive improved access to transcripts by cleaning up transcripts and adding new search functionality presented in a joint website they debuted in 2012.
The Bitter Aloe Project picked up where this collaboration left off by transforming the transcripts into structured data suitable for new machine learning methods such as sentence embedding. Sentence embedding is a method where entire sentences are rendered as mathematical expressions that are then compared with one another and plotted in a virtual space. The greater the mathematical distance between these ‘embedded’ sentences, the further their meanings diverge, and vice versa. What this method allows for is a new form of searching that operates on a semantic level. Instead of looking for the presence of individual words, sentence embeddings allow a user to search for abstract features of meaning such as ideas, sentiments, emotions and experiences. For example, this search method now allows users to read across testimonies and follow a particular line of interest, say the loss expressed by parents over their missing children, the sensory experience of township violence, or instances of hesitation expressed by reluctant perpetrators giving testimony about their complicity.
The long-term goal of the Bitter Aloe Project is to improve the accessibility of the TRC’s archive by creating new ways to read its many stories. In the examples above this new legibility could come in the form of maps, or in a search method that allows users to read across testimonies for shared experiences, ideas, or emotions. In this regard, these researchers hope that both victims and their families will be able to better contextualise their experiences, and a new generation can learn about the origins of the present by viewing their past through ‘as big a picture as possible’.
You can find out more about the Bitter Aloe Project on their website: https://bitteraloeproject.createuky.net/
More in this section:
Stephen Davis and William Mattingly
Related articles
Meet Bev Russell, CEO of Social Surveys

Celebrating Langa through its artists

The Power of Passion and Strategy: Interview with Carolin Gomulia, The Workroom
