layout: category title: “Values” category: values —
Values
If Understanding is about how a model can understand, then Values are about what it should do with that.
GenAI doesn’t have beliefs or intentions. But it is built by people, and people have beliefs and make choices. Choices about what the model should say, what it shouldn’t, and how it should respond in different situations. These choices are called alignment strategies, or sometimes safety systems.
There are different ways to shape a model’s behaviour:
- Excluding certain data during training, so the model never sees it.
- Repeating certain data during training to over-emphasise its importance
- Aligning the model through fine-tuning on examples of preferred behaviour.
- Adding guardrails in the form of rules and filters that guide or block certain outputs at runtime.
Each of these methods has strengths and weaknesses. None are perfect. Excluding data can create gaps. Alignment can introduce bias. Guardrails can be too rigid or too vague. And all of them reflect human judgement, which is always situated, contested, and evolving.
There is real power in deciding what is acceptable and what is not, in drawing the boundaries of conversation, shaping what can be said, and determining whose values are reflected in the system.
What counts as “safe” or “appropriate” can vary across cultures, communities, and contexts:
- A topic that’s sensitive in one place might be ordinary in another.
- A joke that’s funny to some might be offensive to others.
- A question that seems neutral might carry deep historical or political weight.
So when a model avoids a topic, or responds cautiously, it’s not being evasive. It’s following a set of instructions designed to balance usefulness with responsibility.
Understanding this helps us see the model not just as a technical system, but as a social one — shaped by ethics, policy, and the ongoing negotiation of what it means to be helpful, respectful, and fair.