LLMs can be quick to discriminate, and that says more about us than we’d like to think

Devon Coldeway writing for TechCrunch: Anthropic’s latest tactic to stop racist AI: Asking it ‘really really really really’ nicely

The problem of alignment is an important one when you’re setting AI models up to make decisions in matters of finance and health. But how can you reduce biases if they’re baked into a model from biases in its training data? Anthropic suggests asking it nicely to please, please not discriminate or someone will sue us. Yes, really.

These LLMS are trained on us, and despite some people’s insistence that there’s no such thing as structural or significant cultural discrimination against certain minority groups, and these companies’ best efforts to tamp it down, discrimination is still baked into these unfeeling models.

[T]hey checked that changing things like race, age, and gender do have an effect on the model’s decisions in a variety of situations, like “granting a work visa,” “co-signing a loan,” “paying an insurance claim,” and so on. It certainly did, with being Black far and away resulting in the strongest discrimination, followed by being Native American, then being nonbinary.

Again, this discrimination found in Claude 2.0 is after Anthropic had done work trying to make their model neutral on race and gender.

This is also a great opportunity for me, an LLM optimist, to state clearly that examples like this are prime reasons these AI tools should be used in moderation and rolled out to helping with people’s work very carefully.

Discussion