Skip to main content

Transforming Access to the TRC Archive: The Bitter Aloe Project and AI Technology

By Stephen Davis and William Mattingly

Charged with creating ‘as large a picture as possible’ of human rights violations that occurred during the last three and a half decades of apartheid, the South African Truth and Reconciliation Commission (TRC) amassed an extraordinarily large archive. Much of this archive consisted of hearing transcripts which were stored on a legacy website hosted on the Department of Justice servers. These transcripts provided the public with access to this facet of the TRC’s work, but navigation of testimonies proved limited to basic keyword searches. Essentially, users could only read single transcripts in a linear manner, meaning that the ‘large picture’ could not be viewed in a single frame, and patterns that existed across testimonies remained hidden.

BitterAloeScreenGrab
The Bitter Aloe Project is an attempt to apply advanced machine learning methods to the massive corpus of text related to human rights in South Africa, namely testimony transcripts and incident descriptions collected by the TRC.  The project began in 2019 when Stephen Davis and William Mattingly began working with a dataset of human rights violations derived from Volume 7 of the TRC Final Report. They applied a method called named entity recognition (NER) which automated the identification and classification of information contained in the 21,500 descriptions of human rights violations included in Volume 7. This method enabled them to map these incidents for the first time, as well as create filters that users can use to display select categories of incident data, such as organisations, types of violence, province and date. The result was a ‘big picture’ that users could zoom in and out of and filter for particular kinds of incidents, which finally made patterns visible.

The next stage of their work focused on hearing testimony transcripts. The SABC in partnership with the South African History Archive improved access to transcripts by cleaning up transcripts and adding new search functionality presented in a joint website they debuted in 2012.

The Bitter Aloe Project picked up where this collaboration left off by transforming the transcripts into structured data suitable for new machine learning methods such as sentence embedding. Sentence embedding is a method where entire sentences are rendered as mathematical expressions that are then compared with one another and plotted in a virtual space. The greater the mathematical distance between these ‘embedded’ sentences, the further their meanings diverge, and vice versa. What this method allows for is a new form of searching that operates on a semantic level. Instead of looking for the presence of individual words, sentence embeddings allow a user to search for abstract features of meaning such as ideas, sentiments, emotions and experiences. For example, this search method now allows users to read across testimonies and follow a particular line of interest, say the loss expressed by parents over their missing children, the sensory experience of township violence, or instances of hesitation expressed by reluctant perpetrators giving testimony about their complicity.

The long-term goal of the Bitter Aloe Project is to improve the accessibility of the TRC’s archive by creating new ways to read its many stories. In the examples above this new legibility could come in the form of maps, or in a search method that allows users to read across testimonies for shared experiences, ideas, or emotions. In this regard, these researchers hope that both victims and their families will be able to better contextualise their experiences, and a new generation can learn about the origins of the present by viewing their past through ‘as big a picture as possible’.

You can find out more about the Bitter Aloe Project on their website: https://bitteraloeproject.createuky.net/


Stephen Davis is an Associate Professor of History at the University of Kentucky.  He is co-Principal Investigator of the Bitter Aloe Project and the author of The ANC's War Against Apartheid: Umkhonto we Sizwe and the Liberation of South Africa (Indiana University Press, 2018).  His research interests include biography, military history, and human rights discourse in southern Africa.
 
William Mattingly is a Postdoctoral Fellow for the Analysis of Historical Documents in the Smithsonian Institution’s Data Science Lab and co-PI and lead developer of the Bitter Aloe Project.  He has broad experience applying advanced machine learning methods to a variety of large human rights corpora.  He also developed a number of open-source tools and Python libraries such as LeetTopic and Streamlit Pandas and is the author of Introduction to Python for Humanists (Rutledge - Taylor and Francis, 2023).

Stephen Davis and William Mattingly

Related articles


Meet Bev Russell, CEO of Social Surveys
Social Surveys Africa
In today’s data driven world, sound public policy and civil society initiatives must be informed by accurate information. However, information on society’s marginalised and most vulnerable is often...
Celebrating Langa through its artists
GroundUp
On the pavement between small businesses in the busy street of Lerotholi Avenue in Langa is a large statue of a woman, made out of grass, stone, mud and clay. Inside, is Langa’s first art gallery, ...
The Power of Passion and Strategy: Interview with Carolin Gomulia, The Workroom
Ruen Govinder
This interview series with our #Nonprofit Service Provider Classifieds advertisers aims to unveil the stories and motivations behind those who specialise in uplifting and guiding the nonp...