LLMs can be poisoned
From the Anthropic blog: A small number of samples can poison LLMs of any size
It remains unclear how far this trend will hold as we keep scaling up models. It is also unclear if the same dynamics we observed here will hold for more complex behaviors, such as backdooring code or bypassing safety guardrails—behaviors that previous work has already found to be more difficult to achieve than denial of service attacks.
I'm sharing this because I've seen it posted a few times on social as proof that LLMs are fundamentally flawed, but reading past the headline reveals a much more nuanced finding. Basically, this is something to be aware of if you're building LLMs and to protect against, but it's not exactly a deal-breaker.