Charitra Pabbaraju

Current MPhil Student, and Prospective PhD



Department of International Development

University of Oxford

3 Mansfield Rd, Oxford OX1 3TB, UK



Re-evaluating Kappa Scores: New Intercoder Reliability Metrics in Text-as-Data


For Emory's Oppression Resistance Lab, I helped develop and shape a new means of evaluating intercoder reliability for text-as-data and natural language processing modes of research. Kappa scores fail to account for differentiations in the spatial ordering of text-as-data, as well as for the selection of word choice, the density of the text, and relevance to the coder criteria. After several Kappa score evaluations that were simultaneously underselling accurate codes in accordance with the codebook we developed for the  Northern Ireland Violence tracking project, due to coders ordering event-data differently or picking multiple strains of text to support their code, there was a clear need to re-evaluate how intercoder reliability was assessed for this dataset. I developed two sets of intercoder reliability metrics to assess this gap.

First, I developed a method to check for accuracies between single documents of assigned text from the British Security Files. When new coders were on-boarded onto the project in groups, each coder was assigned one document identical to that of other coders, and each code was compared with a master key. Each coder was sent back corrections on a document or was spoken to about their decisions, and new coding decisions were made and recorded as needed.

Secondly, to check for cross-coding cohort accuracy, as coding decisions were made and updated, I blind assigned a set of coding-checkers to every document that was coded in the database, excluding their own. Each coding-checker selected ten sentences from a document that accounted for a variety of coding decisions that were fundamental to the datasets construction, plus a few random sentences (in cases with less than ten sentences, the entire document was re-coded). Coding-checkers re-coded these documents independently and were blinded to the source code, then compared answers. Each code was assigned a score out of 10 for accuracy, and each coder was assigned an aggregate score for accuracy once their documents were all re-evaluated. Inconsistent coding decisions and common mistakes were recorded in a meta-table, and corrected where applicable. This helped us determine whether or not coding errors came from individual misunderstanding, human technical error, or changes in the codebook, and were much more relevant and practically useful than kappa scores.

A paper exploring these re-vamped methodologies in text-as-data analysis is underway with the Emory Oppression Resistance Lab, where I will be first author. Publication forthcoming (late 2022, early 2023). 
Share



Follow this website


You need to create an Owlstown account to follow this website.


Sign up

Already an Owlstown member?

Log in