Metadata
- Author: Mikkel Dengsøe from Inside Data by Mikkel De ngsøe
- Full Title:: Using AI to Build a Robust Testing Framework
- Category:: 🗞️Articles
- Document Tags:: Vibe coding,
- Read date:: 2025-07-23
Highlights
To test our data models, we’ll provide some guidelines to the LLM, mostly based on this guide for testing best practices. Here’s a summary of our testing principles that we input to Cursor and Claude to keep track of. (View Highlight)
General Principles - Use warn vs. error severity levels to reflect actual impact. Errors should block deployments; warnings highlight issues to monitor. - Data assets tagged importance: P1 (or upstream of these) require more extensive testing. - Avoid testing business logic assumptions that aren’t visible from the SQL—focus on what can be objectively verified. - Fewer, high-signal tests are better than too many noisy or brittle ones. - Always leave a short comment on why each test is in place (e.g., “ensures IDs are unique to avoid joins blowing up”). Layer-specific Guidance - Sources: Test thoroughly using standard dbt source tests (e.g., unique, not_null, accepted_values). Mark source tables with a table_stats: true flag in YAML to activate SYNQ anomaly monitoring. - Staging: Avoid redundant tests. Only test columns that are transformed, derived, or critical for downstream joins/filters. - Marts: This is where business rules start to appear. Add custom tests only where you’ve validated the logic directly via SQL (e.g., verifying status values with a SELECT DISTINCT). Use this to prevent assumptions baked into dashboards. Additional Tips - Before writing a test, query the data directly to understand realistic constraints (e.g., should this value ever be zero? How many distinct values exist?). - When in doubt, prioritise coverage on high-impact data products and the metrics they feed. (View Highlight)