Deterministic and Probabilistic Matching: How Do They Work?


Our Newsletter

Get AdTech & MarTech resources sent straight to your inbox

We respect your privacy. Learn more here.

Regardless of whether you are an advertiser, marketer, or publisher, there’s one thing that’s true: identifying and tracking visitors across devices is difficult, to say the least.

As it stands currently, there is no foolproof method to identify online users as they move from one device to another, mainly because the traditional ways of identifying and tracking users (i.e. with cookies) wasn’t designed for the multi-device world.

Cookies have been used to track laptop/desktop computers for a number of years; however, they cannot be transferred from one device to another. This means if a visitor accesses your website on their desktop on one occasion and on their smartphone on another occasion, they will be recorded as two different visitors, rather than one.

Fortunately, there are two ways you can identify and track the same user across different devices with reasonable accuracy: deterministic matching and probabilistic matching.

What is Deterministic Matching?

Deterministic matching aims to identify the same user across different devices by matching the same user profiles together.

User profiles are comprised of different pieces of data about a particular user, with each user having a separate profile on different devices. For example, your user profile on your desktop will be different than the one of your smartphone.

In order to identify users across multiple devices, deterministic matching searches through data sets and links all user profiles that belong to the same physical person together with a common identifier.

Common identifiers are collected by all types of companies and can include:

  • First and last name (if uncommon)
  • Address
  • Email address
  • Date of birth
  • Phone numbers

Data Platform Development

We can build a range of data platforms such as customer data platforms (CDPs), data management platforms (DMPs), data clean rooms, data lakes and reporting dashboards.

How Does Deterministic Matching Work?

In online marketing and advertising, the most common way to deterministically match users together is by using email address, and as a person’s email address is unique to them, they can be identified and matched across a wide range of data sets.

Applications like Facebook, Google Apps, and Twitter are able to deterministically match users quite easily, as they require users to sign in with an email address to access their services across different devices.

How deterministic matching works.

The advantage of using deterministic matching is that it provides quite a high degree of accuracy (around 80-90%), but since not all applications and websites require users to log in or provide specific information, it lacks in scale.

In order to tackle the issue of scale, more and more publishers are starting to implement certain tactics to gain deterministic data (typically an email address) from their visitors/users. The two main ways they can do this are:

By way of encouragement. Publishers can encourage visitors to provide an email address by giving them more access and/or functionality for doing so.

By way of force. Publishers can restrict access and/or functionality to visitors unless they provide an email address.

While these tactics may help large publishers like news sites, they can be challenging for small- to medium-sized publishers to implement, like blogs, as not everyone will want to sign up to different sites just to read a few blog posts.

What is Probabilistic Matching?

Probabilistic matching uses various data sets (listed above) and algorithms to identify the same user across different devices and applications.

Let’s imagine you own a smartphone, laptop, and tablet and use them all at home. As all three devices would have the same IP address and location, and because you would probably look at the same types of websites, then it’s quite probable it’s you using all three devices.

How probabilistic matching works.

The key to achieving accurate probabilistic matching lies in linking together user profiles that contain the same highly specific pieces of information. For example, if a married couple living together each had a smartphone, tablet, and a desktop, then each device would access the same IP address, have the same Wifi ID, and be at the same location. The way to probabilistically match the devices to the same users would be to look at other pieces of personal data, such as age, gender, and interests that are consistent across all devices.

Probabilistic matching isn’t as accurate as deterministic matching, but it does use deterministic data sets to train the algorithms to improve accuracy. This works by taking a small group of deterministic and probabilistic data sets (around a couple hundred thousand or so) and teaching the algorithms to make the necessary connections. Then, the newly trained algorithms are applied to data sets not containing the deterministic pieces of information, which can possibly be in the millions.

One of the main advantages of using probabilistic matching over deterministic matching is scale, as you don’t need to gather email addresses or other pieces of personal data to be able to identify them across different devices.

However, there are a couple of disadvantages of probabilistic matching, the main ones being a lack of transparency in the matching methods and accuracy, as the algorithms used to power probabilistic matching are often proprietary – i.e. secret sauce. This is especially valid if you are relying solely on probabilistic matching to identify, track, and target users across different devices and applications.

Also, there’s the ever-growing issue of user privacy, with many government organizations, such as the FTC in the US and the European Union’s Article 29 Data Protection Working Party (Art. 29 WP), now starting to class certain pieces of data (e.g. IP addresses and device IDs) as personally identifiable information (PII), meaning companies utilizing probabilistic matching may soon need to either stop collecting these pieces of information or get consent from the consumer.

What are Deterministic and Probabilistic Matching Used For?

Cross-Device Attribution

Attribution in advertising and marketing has also been a bit of an uphill battle; as soon as progress is made in one area, another challenge emerges, and one of the main issues facing advertisers and marketers today is cross-device attribution.

The thing that’s so challenging about cross-device attribution is that there is no easy way to accurately attribute conversions when consumers interact with a brand on different devices. However, deterministic and probabilistic matching can help this in a big way.

By identifying the same user across different devices, you can get a clearer picture of your customer’s journey and properly attribute conversions, which leads to better marketing decisions and optimized budgets.

Cross-device attribution assigns credit to different touchpoints (aka interactions) a consumer has with your brand across different channels and across different devices.

Cross-Device Tracking

Cross-device tracking (aka cross-device targeting) is very similar to cross-device attribution, but instead of attributing conversions to different devices, cross-device tracking aims to track users across different devices and then target them with ads.

Cross-device tracking allows you to learn more about your visitors and how they behave on their laptop, smartphone, tablet, etc., and ultimately, serve them ads across different devices.

For example, let’s imagine you run an ecommerce store that sells shoes. If a consumer searches for some shoes on their laptop, then you’ll be able to display a retargeted ad to them on their smartphone, therefore enabling you to increase conversions.

Closing Thoughts

Although both deterministic and probabilistic matching are unable to provide 100% certainty, there is one thing that’s guaranteed: as more and more consumers adopt a multi-device habit, both tech vendors and marketers/advertisers will be focusing their energy on improving and utilizing deterministic and probabilistic matching in the very near future.

Data Platform Development

We can build a range of data platforms such as customer data platforms (CDPs), data management platforms (DMPs), data clean rooms, data lakes and reporting dashboards.

Reading recommendation

Read our online book

The AdTech Book by Clearcode

Learn about the platforms, processes, and players that make up the digital advertising industry.

Mike Sweeney

Head of Marketing

“The AdTech Book is the result
of our many years of experience in designing and developing advertising and marketing technologies for clients.”

Find out how we can help you with your project

Schedule a call with us today and find out how we can help you with your AdTech or MarTech development project.