enabledat

By Edris Yaghob//4 min read

Stop building data quality rules dimension-first

Most data quality rules are built from the dimensions out: completeness, validity, timeliness. They generate volume nobody can defend. Build from the consequence in, and the rule library shrinks and starts to protect something.

When the risk genuinely is data quality, a rule is the right tool. This is the one place in governance where writing a rule is the answer. So you would expect it to be the easy part. Instead it is where most teams produce their largest pile of work with the least to show for it.

The problem is not that they build rules. It is where they start.

Most teams build rules dimension-first

They start from the data quality dimensions as categories and work down. Completeness, so write rules that check for nulls. Validity, so write rules that check format. Timeliness, so check the load date. The dimensions become a checklist and rules get generated to cover each box.

It feels rigorous. You can show a stakeholder a grid with every dimension covered and every element ticked. What it skips is the part that matters: nobody stopped to understand the element. So when a rule reports "100 percent complete," nobody can say who benefits. When it drops to "50 percent valid," nobody can say who loses, or whether that 50 percent is even a problem the rule was ever taught about.

The rule reports a number against nothing, and reports it with false confidence.

Start from the consequence, not the dimension

Before you write a rule, ask what specifically goes wrong in the business when this element is wrong, missing, or stale. Whose decision breaks. Whose number lies. Name the consequence in operational terms, by someone who would actually feel it. Only then ask what kind of breakage the element is exposed to, which is where the dimensions finally earn their place.

Dimension-first

Start from a category and write a rule to cover it. Generates volume. The rules cannot defend themselves.

Consequence-first

Start from the business consequence and its owner, then design the rule around it. Generates few rules, each one defensible.

A rule built this way knows what it protects. When it passes, you can name who benefits. When it fails, you can name who loses. When it fires on a value that is technically off but legitimately so, you have the context to recognise it, because the conditions the element behaves under were part of the design from the start.

The same element, two different rules

Take employee tax classification, the field that tells payroll how to withhold tax.

A dimension-first approach writes three rules: flag empty values, flag values outside the code set, flag mismatches with the country code. Run them on a real table and you get noise. Some empty values are wrong because onboarding dropped the field. Some are correct because the person is a contractor whose classification lives elsewhere, or the record is mid-onboarding. The rule does not know the difference, so half the firings are false and nobody can tell which without going back to source every time.

A consequence-first rule starts elsewhere. If classification is missing for an employee whose onboarding is complete, payroll withholds at the wrong rate, the year-end statement does not match the tax authority, and the company faces a finding and back-payments. The owner is the head of payroll. That consequence shapes the rule: it fires only where onboarding is complete, excludes contractors, and checks the value against the tax authority's code set. Same element, same data. One defensible rule instead of three and a triage queue.

The work dimension-first skips is translation

Consequence-first asks for two translations. First, turn a loosely stated business need into clear business rules. "We should not pay contractors as full-time employees" becomes "any payroll record classified as full-time must match a full-time contract in the employment system." Then turn that rule into the specific data quality controls that protect it: flag payroll records with no matching contract, flag contract-type mismatches, flag records where the contract has ended but pay is still active.

Dimension-first lets you skip both translations. You go straight from "this is critical" to "write a completeness rule." That is why it is faster, and why it produces rules nobody can defend.

The hard part

The translation is hard, and it stays hard across 30 or 50 elements. The mind running unaided starts collapsing the steps around element twelve and drifting back to the dimension-first habit. And the organisation around you is built for the old way: stakeholders ask for dimensional metrics, vendors sell tools shaped around them, and Friday deadlines make doing the translation properly feel like a delay.

enabledat holds the method as a structure. It walks you from the business need, to the business rules that express it, to the controls that protect them, and pulls the discipline back when it starts to slip.

Request early access to the beta.

Turn a data problem into a usable outcome.

Request early access to beta