“In simple terms, misalignment is an event where an AI agent engages in unprecedented risky behavior in order to avoid being replaced or fulfill its goal at all costs. Misalignment is a risk, but for an average AI use case scenario, the AI model doesn’t need to deal with a do-or-die situation. Most of the AI deployment, especially for consumers and enterprises, is a rather low-stakes situation where we need the computational power of AI more than anything. Moreover, most mainstream AI models come with built-in guardrails that aren’t easy to bypass for an average person.”
