Blog article
When developing artificial intelligence technologies, such as voice biometrics, various ethical considerations must be taken into account to ensure that disadvantaged populations are not negatively affected.
Published August 29, 2024
By Nitin Verma, PhD
Fellow for AI and Society
Juana Catalina Becerra Sandoval, a doctoral student in the Department of the History of Science at Harvard University and a scholar in the Responsible and Inclusive Technologies Initiative at IBM Research, presented a talk as part of the New York Academy of Sciences (the Academy) Artificial Intelligence (AI) and Society seminar series. The talk, titled “What’s in a Voice? Biometric Fetishization and Speaker Recognition Technologies,” explored the ethical implications associated with the development and use of AI-based tools such as voice biometrics. After the talk, Juana joined Dr. Nitin Verma, a member of the Academy’s AI & Society Fellowship class of 2023, to discuss the promises and challenges facing society as AI advances.
*Some quotes have been edited for length and clarity*
Tell me about the key findings from your previous research on voice biometrics that you covered in your talk?
I think some of the key takeaways from the story of speaker recognition automation are first to really understand what the different motivations or incentives are for investing in a particular technology and a particular technological future. In the case of voice biometrics, a lot of the interest comes from different sectors like the financial sector or the security and surveillance sector. It’s important to keep an eye on those interests and watch how they influence the way in which voice biometrics is or is not further developed.
It is also important to note that although we have an idea of technological progress, some of the underlying ideas and assumptions are very old. These include ideas about the body, about what the human body is, and how people can or cannot change their bodies and the way they speak. In the case of voice biometrics, these ideas date back to 19th century eugenics and continue to inform research today, even though we have new technology. We must not just view this technology as new, but ask ourselves what ideas will endure or prevail over time, and in what context these ideas emerged.
What role do you think AI plays or would play in your historical consideration of voiceprint technology?
I think that’s the story of AI in some ways. So it’s not a separate story. AI doesn’t emerge in the abstract. It always emerges in the context of a particular application. Many of the different algorithmic techniques that we have today were developed in the context of voice biometrics. What AI really brings is a shift in the logic of voice ontology, where information can surface from the data or emerge from statistical methods without needing a theory of what the voice is and how it relates to the body or identity and disease. This is the kind of shift and transformation that artificial intelligence ushers in.
What do you think is the biggest concern about the use of AI in surveillance technologies such as voice biometrics?
Well, I think there are several concerns. I definitely think that the history of voice biometrics already has an interest in excessive surveillance and police presence toward black and Latino communities embedded in it. There’s always an inherent risk that technology will be used to over-police certain communities, and voice biometrics will then intrude into a larger infrastructure where people are already being monitored and controlled via video, computer vision, or other means.
In the security sector, my biggest concern is that the relationship between voice and identity is seen as fixed and unchanging. This can create problems for people who want to change their voice, or for people whose voice changes in ways that are outside of their control, such as through injury or illness. There are numerous reasons why people might be excluded from these systems, so we want to make sure we create infrastructures that are equitable.
Going to the other side of this question, what do you think would be some of the beneficial or ethical uses of this technology in the future?
Instead of starting with the question “What do companies or institutions need to make their work easier or more profitable?” we should instead focus on the question “What kind of tools and techniques do people want for themselves and their lives?” and “How can we use the current state of technology to achieve this?” I think it’s much more about the approach and the incentive.
There is nothing inherent in the technology that would cause it irreparable harm or make it inherently unethical. It is more about: what is the particular ontology of the voice? What concept of language is being brought into the system? And for whose purposes is it being used? I am hopeful and optimistic about anything that is driven by people and their desire for a better life and a better future.
Your work brings together different strands of research, such as criminology, the history of technology, inequality and the history of biometric technologies themselves. What challenges and benefits have you experienced by using this multidisciplinary approach to examine this topic?
I’m a historian by training and originally wanted to be a professor, but when I started working at IBM Research and on the Responsible and Inclusive Tech team, I became much closer to the people who wanted to improve technology in a very concrete way, or more specifically, improve the infrastructures and cultures in which it is created.
This really pushed me to take a multidisciplinary approach and look at things not only from a historical perspective, but also to engage with the technical as well as the current political and economic structures. I think about my own immigrant background. I’m from Colombia and I’ve always had a desire to engage with humanities and social sciences that look critically at these aspects of society, but that may not be the same for everyone. I think the biggest challenge is to appeal effectively to different audiences.
In the lecture you described listening as a political process. Can you explain that in more detail?
I rely heavily on scientists from the fields of sound and voice science. The Sonic Color Line, Race as sound, And Black Linguistics, are three of the main theoretical foundations I work with, which are to say that if we focus on listening rather than looking at the voice itself as something that stands alone, we can see and almost contextualize how different voices are understood, described, interpreted, classified, etc.
The political in listening is what makes people respond to certain voices or interpret them in a certain way. Accents are a good example. The perception of who has an accent and what an accent sounds like is highly contextual. The political in listening emphasizes this contextuality and how we associate things like eloquentness with certain ways of speaking or the sound of certain voices rather than others.
Would you like to add something else?
What strikes me about the history of voice biometrics and voiceprints is how little the public knows about what is happening. Many decisions about these technologies are made in contexts that are not publicly shared. As such, there are varying levels of awareness in the various public discourses about the ethics of AI and voice. This is very different from facial recognition, computer vision, or even toxic speech.