Can we trust AI chatbots? Results revealed from our experiment

Last month we brought together a powerful cohort of consumer advocate groups, international organisations, business, government and civil society to call for Fair and Responsible AI on World Consumer Rights Day Friday March 15.

AI is changing how millions of us experience the online world. Within just five days of the release of ChatGPT last year one million had use the technology. Yet our campaign shone a light on the murky side of AI, including its impact driving misinformation and bias as well as verifiability issues.

Ahead of March 15 we led an exercise to home into these issue in AI chatbots that are used in online search. We wanted to test their efforts to protect consumers and their impact on trust. And we wanted to fill an important gap within current research – to see that the consumer voice was included. Thirty-five Members of Consumers International from across 19 countries joined the experiment and today we release our findings and next steps.

Read Paper One: The consumer experience of generative AI.

Read Paper Two: Our vision for fair and responsible AI for consumers.

Our methodology

Participants explore one of three Retrieval Augmented Generation (RAG) chatbots, entering prompts we designed.
Trust indicators included: Hallucination, Verifiability and Bias.
We also looked at the functional performance of the chatbots such as how information, arguments and summaries had been retrieved and formed.
Participants self-reported high levels of digital literacy as assessed in basic terms, and 77% considered themselves familiar with digital technologies.

Core findings

Chatbots seem intuitive but fall short on basic measures of trustworthiness. Whilst 64% said they would use the chatbot again our simple indicators showed obvious deficiencies. All chatbots produced some form of hallucination, and the likelihood that responses included citations was only about as good as a coin toss.
Basic safeguards vary across chatbots and leave a worrying margin for harm. In high-risk areas such as health we found all chatbots recommend against asking for medical advice but many proceed to offer it anyway. Few routes to verify outputs were provided – only one chatbot gave an option to users to double check responses.
Consumers are aware of the risks but have limited ability to test and respond to it. In qualitative responses gathered from our consumer experts they cited being concerned consumers would have limited ability to seek redress when things go wrong.
More inclusivity is needed in the design and governance of the technology. Participants frequently reported a North American bias in the responses, for example through use of brand names and the sources cited.

What efforts are being made to improve trust?

At present, regulation and accountability lags far behind AI development. Legislation is developing but it is vastly different across regions. For example, the EU’s proposed AI Act emphasises fundamental rights and ethical considerations, while China prioritises economic growth and national security. The US relies more heavily on industry self-regulation. There are industry initiatives and coalitions to help drive accountability but these are hard to measure. Some businesses have gone solo in formulating their own AI principles.

The good news is that many countries already have the tools and remedies at their disposal to investigate and uphold and breaches of consumer protection laws. The United Nations Guidelines for Consumer Protection (UNGCP), housed at UNCTAD offer a critical foundation. The guidelines are deliberately broad and are not intended to comprehensively tackle the plethora issues in Generative AI. But they can be complemented by additional direct regulation that works in tandem to them.

Rens Dimmendaal & Banjong Raksaphakdee / Better Images of AI / Medicines (flipped) / CC-BY 4.0

Priority areas for for fair and responsible generative Ai

We have developed four priority areas, which combine the UNGCP with a set of actions needed by developers and deployers of commercial generative AI to protect consumers. And we call on governments to work with Consumers International, UNCTAD and others to uphold them. Our priority areas include to:

Transform digital markets so they are open and accessible for all. This includes respect for data privacy, affordable and meaningful connectivity, and trustworthy information presented clearly.
Establish and maintain high benchmarks for consumer protection. Through stringent, globally consistent procedures that safeguard people from harm and independent monitoring of the trustworthiness and transparency of commercial developers and deployers of AI.
Develop inclusive and representative governance frameworks. This means advancing protocols for training data and model design, investing in resources for their maintenance, and actively working with consumer advocates in their development.
Guarantee that redress and representation is available, respected and enforced. Clear and transparent processes must be established to report harms and see that rights to appeal are meaningful and fair. Information should be shared with consumer protection authorities when risks are identified. Consumers must have a voice in systems which they are impacted by.

Can we trust AI chatbots? Results revealed from our experiment

Our methodology

Core findings

What efforts are being made to improve trust?

Priority areas for for fair and responsible generative Ai

Useful links