/* zh - CN text size */
Blogs

Put Your Retargeting to the Test

How incrementality testing proves impact and sharpens growth

Ariel Neidermeier
June 1, 2026

CPIs rose ~30% globally last year.

So growth teams are doing the rational thing: they're putting more budget towards reactivating the users they already have.

The logic checks out: a lapsed user already knows your app, already cleared the install friction, and already showed enough intent to download in the first place. 

But as retargeting spend has risen, so too has the skepticism: would these users have come back anyway?

It's a fair question. Standard attribution can tell you what converted after the ad fired. It cannot tell you what would have converted without it. The retargeted user who re-installs the day after seeing your ad gets credited to the campaign, regardless of whether the ad was the reason or whether they were already on their way back.

This is the gap that incrementality testing closes.

Investing in retargeting is the most efficient way to maximize your growth budget. But if you want to prove the impact of your retargeting efforts, incrementality testing is the way. It's also the sharpest tool available for understanding which inventory, segments, and buying behaviors are driving incremental lift, so you can double down on the strategies that work.

How Incrementality Testing Works

Incrementality testing borrows its core mechanic from clinical research, but the application is built for ads.

You take an audience of users who all meet the same criteria. Lapsed payers from the last 30 days, dormant users who haven't opened the app in 60+ days, whatever segment you want to put under the microscope. You randomly split that audience into two groups:

  • The test group is eligible to see your ads
  • The control group is held out completely

The held-out/control group exists for one reason: to show you what would have happened if you'd done nothing. Same users, same time window, same behavior patterns, same everything. The only variable that changes between the two groups is whether they got served your ads.

At the end of the test, you measure both groups on the same outcome. Re-engagement rate. Revenue per user. Whatever the campaign was supposed to drive.

If the test group beats the control group, the difference between them is your incremental lift. That's the impact your ads caused.

  • Everything above the control group's baseline is the part you can credit to the campaign
  • Everything at or below it is conversion behavior that was going to happen regardless

This is the read that standard attribution can’t give you; because it has no way of constructing the counterfactual: the version of reality where the ad didn't run. The control group is the counterfactual, built deliberately, and held apart from the campaign so the comparison stays clean.

What "clean" actually means

A trustworthy incrementality test is only as good as the discipline behind it. Three principles separate a real lift study from a test that's just telling you what you wanted to hear:

  • Random, deterministic group assignment. Every user in the audience pool has to be assigned to test or control by a method that cannot be steered, gamed, or accidentally biased. If higher-value users disproportionately end up in one group, the lift number will be inflated or understated. Random assignment is the only way to ensure both groups look statistically identical at the start.
  • No leakage. Once a user is assigned to the control group, they cannot see the ads. Not by accident, not from a different campaign, not from a duplicate impression that snuck through the supply chain. Any exposure that touches the control group contaminates the read.
  • The same measurement logic on both sides. Conversions are counted the same way for both groups, in the same time window, using the same definitions. The MMP postback rules don't change between arms. The look-back windows don't shift. The only difference being measured is the ad exposure itself.

When all three hold, the resulting number is causal: this is the lift your ads drove.

How RZR Runs Incrementality Testing

RZR’s Incrementality Framework is one standardized, holdout-based methodology, applied the same way to every client, and wired into how we optimize, report, and run our reporting. Four principles hold it together.

Tamper-proof group assignment via hash binning

Every device in the audience pool is assigned to either the test or control group using deterministic MD5 hash binning on its IDFV (Apple's Identifier for Vendor).

Here's how it works:

  • Each IDFV is run through the MD5 algorithm, producing a uniformly distributed bucket value between 0 and 99
  • Buckets 0 through 79 go to the test group (80% of the audience)
  • Buckets 80 through 99 form the control group (20% of the audience)
  • The same IDFV always lands in the same bucket within a given phase

No randomness at bid time. No possibility of a user drifting between groups. No chance of a single device appearing in both arms of the study. The assignment is fixed before the test begins, and RZR shares the full IDFV list with bucket assignments so the client can audit every device independently.

This is what removes the most common form of test contamination: users who would have been in the control group accidentally getting served impressions because the assignment logic was probabilistic instead of deterministic.

Mid-test reshuffle to eliminate carry-over bias

In any random split, one group can end up slightly heavier on power users by pure chance. Run a single phase and that imbalance bakes into the result.

RZR's test runs in two phases:

  • Phase 1: 14 days live
  • Cool-off: 3-day pause for Phase 1 analysis and audience reshuffle
  • Phase 2: 14 days live with reshuffled bin ranges

Any random imbalance from Phase 1 gets averaged out by Phase 2. This is the cleanest available defense against the most common form of incrementality result contamination, and it is the reason a 31-day RZR test produces a more defensible number than a single-pass holdout of the same duration.

Direct supply only and frequency discipline

Incrementality results are only trustworthy if the underlying impressions are real. Resold inventory, duplicate impressions, and non-transparent supply paths inflate baseline reach without driving genuine engagement, which distorts the comparison in both directions.

What keeps the test clean:

  • Direct supply only. Retargeting bids run on trusted, transparent inventory. Resold and non-transparent paths are excluded at the supply path level.
  • SPO filtering. RZR's supply path optimization engine filters suspicious traffic before any bid is placed.
  • Strict frequency caps. Over-exposure would skew re-engagement rates. Caps prevent that.
  • Even pacing. Stable campaign-level delivery across the flight. No front-loaded delivery that would distort the comparison between phases.

Bigger test pool, sharper intelligence

RZR's match rate is one of the highest in the industry. The reason is structural: we own the infrastructure. 4 data centers, our own NVIDIA GPU cluster, 6M queries per second, 220B auctions per day. We see more of the open internet than almost anyone, which means more of your test group is actually exposed to ads during the test window. Bigger test pool, sharper lift signal, cleaner number at the end.

What you get at the end

Within 5 business days of test close, RZR delivers a full results package built for independent verification:

  • Lift results deck. Re-engagement lift, ROAS delta, and user quality benchmarked against a clean holdout
  • IDFV-level Report. Full device-level report including bids, impressions, and clicks, reconcilable against your MMP 
  • Q&A session. A walkthrough with our team to validate every number independently before any decisions are made

How Incrementality Testing Sharpens Growth for Mobile Gaming Studios

Below are two examples of RZR’s Incrementality Framework in action, and what the results told the two mobile game publishers about their retargeting spend.

Paxie Games: +44% incremental revenue, hidden in plain sight

Paxie Games, a casual gaming studio with an IAA and IAP monetization mix, wanted statistical proof that their Android retargeting spend in the U.S. was driving genuine lift. RZR built custom audiences split by monetization behavior (IAA watchers vs. IAP buyers), ran each cohort through its own optimization model, and measured results against a 20% control group with Agresti-Coull confidence intervals.

The lift was significant:

  • +44% incremental revenue overall vs. control
  • +118% revenue uplift from payers
  • +25% revenue uplift from non-payers

The U.S. result became the benchmark. Paxie used it to scale retargeting into the UK, France, Germany, Canada, Australia, Japan, and Austria, increased retargeting spend by 50%, and named RZR their exclusive retargeting partner.

FOMO Games: +51% incremental revenue, 1.8x D7 LTV lift

FOMO Games had scale with Traffic Escape, with strong DAUs and a large churned user base. The growth team wanted to know whether retargeting could bring lapsed users back, and whether the return was incremental or just re-attribution. RZR ran the test across iOS and Android, split by payer and non-payer cohorts, with an 80/20 test/control split over three weeks.

The results:

  • +51% incremental uplift in retargeting vs. control
  • +63.7% payer uplift, +69% non-payer uplift
  • 1.8x D7 LTV lift for re-engaged users
  • 2x retargeting spend increase after the test closed

The test confirmed that IAA-heavy titles can unlock meaningful value from lapsed users, and it gave FOMO the confidence to scale retargeting as a long-term growth lever rather than a tactical experiment.

TL;DR

  • Retargeting investment is growing. Reactivating a lapsed user costs less than acquiring a new one, and rising CPIs make that gap wider every year.
  • Attribution can't prove what your retargeting is actually driving. It can only show what happened after the ad fired. The conversions that would have come back organically still get credited to the campaign.
  • Incrementality testing closes that gap. A clean test, with a deterministic control group held out from the campaign, gives you a causal read on what your ads caused versus what was going to happen anyway.
  • The discipline behind the test is what makes the number trustworthy. Random group assignment, no leakage between arms, identical measurement logic on both sides. 
  • RZR's methodology is built for clean reads. Deterministic MD5 hash binning prevents user drift between groups. A mid-test reshuffle averages out random imbalance. Direct supply, SPO filtering, and frequency discipline keep impressions real. An industry-leading match rate ensures enough of your test group is actually exposed to ads for the comparison to mean something.
  • The results compound. Paxie Games found +44% incremental revenue and scaled retargeting across seven new markets. FOMO Games found +51% incremental lift and doubled their retargeting spend.

Ready to put your retargeting to the test?

Run RZR’s Incrementality Framework on a segment or campaign, get a defensible lift number in 31 days, and scale what's working.

Follow us on:
Where Intelligence
Makes Impact
© 2026 RZR GLOBAL INC
Sign up for Newsletter

Stay Sharp