News & Updates: Data in Practice: Data of Language – Corpus Linguistics
Data in Practice: Data of Language – Corpus Linguistics
Posted by Marie Jonas
Application of Data to Statutory Interpretation
Listening to my favorite legal podcast this past week, Strict Scrutiny, the hosts introduced me to an application of data in the legal realm: corpus linguistics. The concept arose during oral argument in Pulsifer v. United States, a case concerning statutory interpretation of the First Step Act. The relevant amicus brief is available here. Corpus linguistics involves an empirical analysis of language use based on large databases of naturally occurring spoken and written texts. The premise is, in interpreting the meaning of words used in statutes, it can be helpful to see how words are used in other contexts.
Argument in the case turned on the meaning of “and”: does “and” mean “and,” or does “and” mean “or”? To put it another way, “[t]he question is whether this negated conjunctive structure expresses a joint meaning (does not have all three of A, B, and C) or a distributive meaning (does not have any one of A, B, or C).” The implications were substantial. The answer to this question would determine the eligibility of those convicted of crimes to discretionary sentencing rather than mandatory minimums.
Why Incorporate Data?
Statutory interpretation may seem like an unlikely field for empirical data analysis, but a common problem in law involves decision-makers generalizing their personal experiences to broader contexts. In other words, judges, like all individuals, tend to place undue emphasis on anecdotal evidence that they have personally encountered. The same with language. An individual’s sense of the “ordinary meaning” of words is shaped by their own experiences and biases. Corpus linguistics attempts to apply a neutral methodology to inform the “ordinary meaning” of language in the law. At its best, data can be a tool to overcome or identify personal bias.
Potential Problems
During the podcast, hosts Leah Litman, Kate Shaw, and Melissa Murray raised several concerns about the use of corpus linguistics in this context. One issue was data quality. As Leah aptly pointed out:
[Y]ou can substitute ‘and’ into a billion different combinations. And when you ask someone, well, what do you think it means here? You have no idea what they are imagining themselves, the audience to be [or] the speaker to be.
That is, how can someone be certain that the underlying data that is fed into the database correctly codes the meaning of “and” in a specific context and is that context relevant to the present scenario. It is a classic case of “garbage in, garbage out.”
Melissa also highlighted a related limitation, focusing on the sources from which the analysis draws its data. Is it meaningful to compare colloquial uses of “and” from everyday sources to formal uses of the word in statutory contexts? Are the survey responses employed in the analysis representative? Scrutinizing the origin and completeness of data is a pivotal step in determining its usefulness for the intended analysis. It is always vital to ensure that the relied-upon data is both pertinent and unbiased.
Corpus Linguistics Data Gathering
Data Analysis: Yes or No?
This begs the fundamental question of whether the data should be used at all. Does it contribute value that is not captured through other analytical tools? Here, the methods of statutory construction come into play. Does it bring anything meaningful to the table when 38% of survey respondents believe the “distributive” meaning of “and” was appropriate when interpreting cards featuring images of animals? See the example above. (Justice Amy Coney Barret thought so.)
The value of data is often overestimated due to perceptions of its scientific objectivity. Without an awareness of potential biases within datasets and the limits of analysis, there is a natural inclination to place undue importance on data outputs.
Takeaways
I am no corpus linguistics scholar, but the rise of data-based statutory analysis in our highest court caught my attention. It serves as a reminder of the importance of data-literacy and data-skepticism in all legal domains. As the hosts illustrate, if you believe that certain data analysis is flawed, it is crucial to be able to articulate why.
Data permeates every aspect of legal practice. Data in Practice is a bimonthly feature to provide practical tools for attorneys to better organize, manipulate, and understand data. Whether it’s working with basic case information, preparing document productions, or conducting exposure analyses, a more robust knowledge of Excel is guaranteed to streamline your work. A few simple tools can help attorneys more efficiently and effectively represent their clients, and better navigate a professional landscape inundated with big data.
Marie Jonas is a Partner in Folger Levin’s litigation practice group. Marie has over a decade of hands-on experience working with Excel in all aspects of her practice: ranging from investigations to trial. If you have an idea for a topic involving practical data tips for lawyers, she can be reached at mjonas@folgerlevin.com.