Skip to main content

Metric Checksums

Metric checksums are a simple but powerful validation tool. Metric totals are cross-checked by summing all data rows and comparing that against a direct totals pulled from the api. They help ensure:

  • Data integrity: The ingestion is accurate and complete.
  • Pipeline reliability: Transformation logic isn’t breaking totals.
  • Stakeholder trust: Numbers reported are correct.

By comparing the sum of data rows against API-reported totals, we can quickly detect and diagnose discrepancies before they become larger reporting problems.

Checksum viewer

Below is a view that shows the checksums for a variety of metrics for all of Thrive's active clients. Data is automatically updated twice daily at 7:30 AM and 11:30 AM PST from our BigQuery data warehouse.

How to Use

This visual dashboard helps you quickly assess pipeline health across all clients and data sources. Here's how to interpret and use it effectively:

Understanding the Visual

Each metric displays a 30-day timeline with color-coded bars representing daily checksum ratios:

  • Green bars (95-105%): Data is accurate and within acceptable variance
  • Yellow bars (80-94% or 105-115%): Minor discrepancies detected - monitor closely
  • Red bars (<80% or >115%): Significant data issues requiring immediate attention

Reading the Health Status

The status indicator at the top of each metric shows the overall health:

  • Healthy 🟢: Last 2 days averaged between 95-105%
  • Degraded Accuracy 🟡: Last 2 days averaged between 80-94%
  • Slightly Exaggerated 🟡: Last 2 days averaged between 105-115%
  • Major Issues 🔴: Last 2 days averaged below 80%
  • Overblown 🔴: Last 2 days averaged above 115%

The Average Ratio displayed reflects the full 30-day period, while the status is based on the most recent 2 days to prioritize current issues.

Filtering Options

  • All: View all metrics across all clients (default view)
  • Unhealthy Only: Filter to show only metrics with non-healthy status - useful for quick issue identification
  • Client Dropdown: Select only the clients you would love to see

How to Investigate Issues

  1. Identify Problem Areas:

    • Use "Unhealthy Only" filter to quickly find metrics requiring attention
    • Look for red or yellow bars, especially consecutive ones indicating persistent issues
  2. Expand Client Sections:

    • Click the ▶ arrow next to a client name to view their detailed metrics
    • Metrics are grouped by data source (Google Ads, Facebook Ads, etc.)
  3. Analyze Patterns:

    • Sudden drops: May indicate API connection issues or data pipeline failures
    • Sudden spikes: Could suggest duplicate data ingestion or calculation errors
    • Gradual trends: Might reflect changes in data collection methodology or business activity
    • Intermittent issues: Often point to unstable API connections or rate limiting
  4. Check Timeline:

    • Hover over individual bars to see the exact date and ratio value
    • Bars display from left (oldest) to right (most recent) - making it easy to track trends over time
  5. Next Steps:

    • For red/yellow status: Check the corresponding data source logs in the pipeline
    • For persistent issues: Review the ingestion job history and API connection status
    • For sudden changes: Compare with known client activity (campaign changes, account updates)

Best Practices

  • Daily Monitoring: Review the dashboard daily to catch issues early
  • Focus on Red First: Prioritize metrics with "Major Issues" or "Overblown" status
  • Track Trends: Watch for metrics that are consistently yellow - they may need threshold adjustments
  • Document Patterns: If certain data sources frequently show issues, investigate root causes
  • Communicate Proactively: Alert clients when their data shows anomalies before they notice

Understanding Ratio Values

  • 100%: Perfect match between summed rows and API totals
  • 95-105%: Normal variance due to rounding, timezone differences, or minor API delays
  • 80-95% or 105-115%: Moderate discrepancy - investigate if persistent
  • <80% or >115%: Significant mismatch - requires immediate investigation

Possible Scenarios

Though unexpected here are some possible scenarios and what could be the likely cause for them. In the event of any, please reach out to the pipeline team.

Scenario 1: All green bars and today a red bar

  • Possible API or Data Pipeline issue

Scenario 2: Gradual decline over several days

  • Accumulating data loss or calculation drift

Scenario 3: Single red bar followed by green

  • Temporary API hiccup or one-time ingestion retry
  • Action: Monitor for recurrence

Scenario 4: Consistently high ratios (>105%)

  • Duplicate data ingestion or incorrect aggregation