The Challenges Posed By The Emergence Of Generative AI

Ian Clark
9 min readMar 13, 2024

Last week, I spoke at the CPD25 event on artificial intelligence which took a “practical look” at AI and HE libraries. My talk was actually less practical and more theoretical (my co-presenter took more of a look at the various tools that are available), looking at AI from a critical perspective. I rarely talk at events these days, unless I feel I really have something to contribute, so I was a bit rusty on the day, but it was good to meet with colleagues and the discussions after the event were certainly thought-provoking and challenging.

One thing I will say before I get into the detail of the talk is that presenting on AI is…hard. Not just in terms of the critiques themselves, but also in ensuring your talk is timely and relevant. Things are changing and evolving on a daily basis, and it’s not easy ensuring your presentation reflects these developments. With that in mind, I decided to use a Padlet not only to collate all the resources referred to (and that have influenced my thinking) but also to ensure I didn’t need to be continually updating my slides and could instead point to additional materials (you can access the Padlet here).

In terms of the talk itself, I used roughly the same framework I have used with students and academics when I talk to them about AI, focusing on three key elements:

a) Hallucinations

b) Bias

c) Privacy

It’s very important to acknowledge that these aren’t the only issues to reflect on when it comes to AI and how it functions. Both issues around copyright and, crucially, the climate crisis are important aspects for us to consider. Whilst I think these should be part of the conversation more generally when we talk about AI, I would argue they are both pretty hard sells to an audience of students in terms of cautioning them in its use. Trying to convince students that the problem with AI is intellectual property theft when we know sci-hub is a thing is an exercise in futility. Equally, seeking to persuade students that this digital online resource (genAI) is bad for the environment but this other digital online resource (a specialised database) is unproblematic is…well…problematic.

Again, none of this is to say that these are unimportant issues, nor that they should be excluded from the conversation. My key focus when I talk about AI, however, is in response to the reality that students are using them, we know they are using them, we know they will use them whatever we say (see Google/Google Scholar)…so from my point of view, I tend towards ensuring the creation of spaces where we can engage with it. Given that reality, conversations around hallucinations, bias and privacy seem vital conversations to be having.

Although I open the talk with hallucinations, I tend to think this is actually the least important of the three. Why? Because I would argue over time this will become less of a problem, whilst the other two concerns will never go away. Generative AI will never be free from issues around bias and concerns around privacy, because they are really baked into what it is and how it operates. It also acts as a gentle breaking into the issues. Hallucinations are, after all, the most widely recognised issue with generative AI [although according to the recent Hepi report (Freeman, 2024), 7% of students believe it never produces hallucinations] and so it seems a natural starting point before reflecting on the bigger discussions around bias and privacy.

A recent publication by Haman and Školník (2023) neatly underlines the problem. Upon using ChatGPT to generate journal articles, they found only 8 of the 50 articles they obtained had DOIs that existed, only 17/50 appeared in a database and 66% of papers were non-existent. Clearly at present it’s not up to the required standard…but it’s important to note that we’re talking about at present. It’s inevitable things will improve, maybe it will take time, maybe five years, maybe 12 months, who knows? For me though, hitching our wagon to the argument that “it’s not very good” is not a sustainable position. Should we make that our focus and find the 17/50 becomes 50/50, or even 45/50, what then? Do we risk talking ourselves into irrelevance? It’s for these reasons, I think issues around bias should be the more substantial focus when talking about AI.

One of the key figures for me when it comes to bias in generative AI is Abeba Birhane. I find myself citing her work repeatedly whenever I talk about generative AI and thte role of bias. It was Birhane, alongside Vinay Prabhu (2021), whose revelations about racist slurs and misogynistic content ultimately led to the taking down of the Tiny Images dataset. Exposing the exclusionary nature of the datasets used to train AI is critical when guiding students because, ultimately, they “downstream in model outputs” (Kidd & Birhane, 2023), model outputs which will further be part and parcel of existing technologies as AI is increasingly incorporated into them. And these technologies will include our specialised databases, as much as smartphones or other tech. As we have already started to see, AI is being increasingly packaged up with our existing resources, putting pressures on us to provide the latest technological advances, as well as placing further pressures on our budgets.

The recent investigation into Stable Diffusion outlines how baked into AI societal biases and prejudices are (Nicoletti & Bass, 2023). With images reinforcing stereotypes and reflecting back to us a society where whiteness exclusively holds power and influence. Although of course there are substantial racial disparities in Western societies, it is nonetheless clear that tools like Stable Diffusion reinforce whiteness, taking existing disparities to the extreme: a professional is a white person, a fast-food worker is a person of colour…centring and privileging whiteness and the dominant culture. A dominant culture that is effectively reflected in the model outputs that generative AI produces because of all the data, with all of its societal biases, being used to train it. And even when Big Tech tries to correct for this, as with Google Gemini, it reproduces errors and falsehoods that ultimately result in images of historic, violent oppressors being recreated as those who were the victims of those oppressions (Gilliard, 2024). As Gillard (2024) notes, “algorithms inevitably perpetuate one kind of bias or another” no matter how much you try to correct for them.

It’s important to note, however, that these realities aren’t new to generative AI. I’m always keen to reinforce the point that we must not fall into the trap of suggesting new tech = BAD, old tech = GOOD. Search engines are certainly not immune to these critiques of bias, as Safiya Umoja Noble has outlined in Algorithms of Oppression. The delivery of search results is dependent on algorithms, and so are subject to the same societal bias. As Noble explained “ranking is itself information that also reflects the political, social, and cultural values of the society that search engines operate within…” (Noble, 2018). Understanding the impact of algorithms is critical in understanding how search results are presented and how generated outputs are created. Without that understanding, we cannot sufficiently engage critically with the outputs of generative AI.

Beyond technological development, there are also efforts to engage with generative AI and seek to address how it operates and its harmful effects. For example, participatory approaches to AI have been floated as a method towards “aligning AI towards prosperity” (Birhane et al., 2022; Queerinai et al., 2023). And there are various community-based collectives seeking to raise awareness, build solidarity and seek solutions to the harms caused by generative AI (Decolonial AI Manyfesto, 2024; Queer in AI, 2024)

The problems with bias also overlap with issues around privacy, placing marginalised communities at risk of harm due to the lack of security and, crucially, even the ownership of their personal data. When 90% of romantic chatbots share or sell personal data and 54% won’t let users delete their data, there are clearly substantial risks of harm (Caltrider et al., 2024). As researchers have noted not only are “purportedly inclusive AI projects…prone to ‘designing out’ certain queer lives” they have resulted in numerous documented harms and “have outed queer people, compromising their privacy and safety…requiring community education and adaptation to meet the needs of queer people” (Queerinai et al., 2023). These harms will only increase as artificial intelligence is incorporated into existing surveillance technologies (Burgess, 2024), harms that are incumbent upon us to educate users on and guide them in how to mitigate their effects (Clark, 2016).

Wherever we stand on generative AI, the reality is that students are going to be using it and there is an expectation there that universities will provide support and guidance on how to use it. As the Hepi report demonstrated, only 22% are satisfied with the support they receive, and 62% are neutral or say they do not know (Freeman, 2024). We also have to face up to a growing divide between students as those with the means to do so purchase access to advanced generative AI tools, whereas those without rely solely on what they can access freely. Whilst it is easy to shrug our shoulders and say they shouldn’t be using AI anyway, it’s not so easy to brush it aside should we see a widening gap develop when it comes to social class. Equally, this divide risks being manifested between institutions, as wealthy institutions buy up all the AI products and provision them for students, whilst others have to wrestle with whether to maintain existing journal subscriptions or purchase and incorporate the latest flashy tech (and no doubt there will be pressure inside and outside to do so).

None of this is to suggest embracing artificial intelligence is a necessity, nor that we must advocate for its incorporation into research practices. It is important, however, to recognise the challenges, be aware of the ramifications of its development and seek to establish where we fit in, how best we can guide people around AI with ethics at the forefront of how we do so. The best I’ve come up with so far is to expose the dangers around AI in terms of training data and its outputs and in terms of privacy, using these as the foundation for all my engagements with students in generative AI.

It is still early days in the development of generative AI, and we will no doubt face new challenges and new concerns as the technology seeps ever deeper into our lives, being ever deeply incorporated into the tools many of us use on a daily basis. The only thing we can say for certain is that many people who engage with generative AI will not be aware of the underlying issues with the technology. For me, I think it’s critical to help shine a light on these issues, ensuring that should people choose to engage with generative AI, they do so cautiously and, most importantly, critically.

References

Birhane, A., Isaac, W., Prabhakaran, V., Diaz, M., Elish, M. C., Gabriel, I., & Mohamed, S. (2022). Power to the People? Opportunities and Challenges for Participatory AI. Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, 1–8. https://doi.org/10.1145/3551624.3555290

Birhane, A., Prabhu, V. U., & Kahembwe, E. (2021). Multimodal datasets: Misogyny, pornography, and malignant stereotypes (arXiv:2110.01963). arXiv. https://doi.org/10.48550/arXiv.2110.01963

Burgess, M. (2024). London Underground Is Testing Real-Time AI Surveillance Tools to Spot Crime. Wired. https://www.wired.com/story/london-underground-ai-surveillance-documents/

Caltrider, J., Rykov, M., & MacDonald, Z. (2024). Happy Valentine’s Day! Romantic AI Chatbots Don’t Have Your Privacy at Heart. https://foundation.mozilla.org/en/privacynotincluded/articles/happy-valentines-day-romantic-ai-chatbots-dont-have-your-privacy-at-heart/

Clark, I. (2016). The digital divide in the post-Snowden era. Journal of Radical Librarianship, 2. https://journal.radicallibrarianship.org/index.php/journal/article/view/12

Decolonial AI Manyfesto. (2024). https://manyfesto.ai/

Freeman, J. (2024). Provide or punish? Students’ views on generative AI in higher education. Data Analysis. https://www.hepi.ac.uk/2024/02/01/provide-or-punish-students-views-on-generative-ai-in-higher-education/

Gilliard, C. (2024, February 26). The Deeper Problem With Google’s Racially Diverse Nazis. The Atlantic. https://www.theatlantic.com/technology/archive/2024/02/google-gemini-diverse-nazis/677575/

Haman, M., & Školník, M. (2023). Using ChatGPT to conduct a literature review. Accountability in Research, 0(0), 1–3. https://doi.org/10.1080/08989621.2023.2185514

Kidd, C., & Birhane, A. (2023). How AI can distort human beliefs. Science, 380(6651), 1222–1223. https://doi.org/10.1126/science.adi0248

Nicoletti, L., & Bass, D. (2023, August 22). Humans Are Biased. Generative AI Is Even Worse. Bloomberg.Com. https://www.bloomberg.com/graphics/2023-generative-ai-bias/

Noble, S. Umoja. (2018). Algorithms of oppression: How search engines reinforce racism. NYU Press.

Queer in AI. (2024). Queer in AI. https://www.queerinai.com

Queerinai, O. O., Ovalle, A., Subramonian, A., Singh, A., Voelcker, C., Sutherland, D. J., Locatelli, D., Breznik, E., Klubicka, F., Yuan, H., J, H., Zhang, H., Shriram, J., Lehman, K., Soldaini, L., Sap, M., Deisenroth, M. P., Pacheco, M. L., Ryskina, M., … Stark, L. (2023). Queer In AI: A Case Study in Community-Led Participatory AI. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 1882–1895. https://doi.org/10.1145/3593013.3594134

--

--