One of the key elements of a data clean room is data security, but not all data clean rooms are created equal. Some have a bigger focus on data security and privacy than others.
In this video, Michael Sweeney, Head of Marketing at Clearcode, spoke to Juan Baron, Director Business Development & Strategy (media & adv) at Decentriq about the various use cases of a data clean room and how data scientists use Decentriq to analyze data in a highly secure and privacy-compliant way.
Q&A: Data Clean Rooms
Below is the transcript from the video interview above.
Michael Sweeney: Hello, everyone. My name is Michael Sweeney, and I’m the head of marketing here at Clearcode. In today’s video, I’m joined by Juan Baron, who is the Director of Business Development and Strategy of Media and Advertising at Decentriq, and we’ll be talking about data clean rooms. So Juan, thank you very much for joining me today.
Juan Baron: Thank you, Michael. It’s a pleasure to be here.
Michael Sweeney: Tell us a little bit about yourself and Decentriq and what you do.
Juan Baron: Sure. So as my name suggests, I was born and raised in Colombia, spent many years in AdTech in the US, and migrated over to Switzerland — about eight years ago. I’ve been on both the AdTech side and also on the publisher side, and since the rise of GDPR, the move towards the privacy-first type of advertising has taken hold and data clean rooms aim to fill that gap.
Decentriq is a Swiss-based data clean room provider, and our secret sauce is a hardware-based technology called confidential computing. But basically, we provide data clean rooms for a variety of industries.
Michael Sweeney: What are some of the other industries that you operate in, and the types of clients that use your data clean room for non-advertising and media use cases?
Juan Baron: At the core, Decentriq is a data science collaboration platform that is really the nuts and bolts of the technology.
So, a lot of the use cases that we’ve seen are actually pure data scientists using our technology to collaborate with other fellow data scientists at another company.
We have a company in Asia using our data clean rooms in the trade finance sector to collaborate with logistics data for tracking and monitoring cargo shipping data. We also work with numerous pharmaceutical companies on market share data inside the data clean room.
Obviously, we work in the media and advertising sector, enabling banks and insurance companies to activate their data within premium publisher inventory.
We also have publishers collaborating with insurance companies on what is called attribute prediction models. So they’re running machine learning models, in order to better predict data without ever leaking individual profile information. So the only thing that comes out of the data clean room is the model itself. So it’s probably it’s like the most privacy for serving this type of collaboration that we’ve seen.
We even work with the Swiss Army for protecting core infrastructure against cyber attacks.
Michael Sweeney: Great, thanks for the overview. I’m quite intrigued by the use of data clean rooms by companies and industries outside of advertising and media.
For most people, when they think of data clean rooms and applications for programming advertising and media, one of the key parts of the whole process is being able to match two different data sets together. You’ve got an advertiser and a publisher, and the advertiser might want to do some kind of ad targeting across the publisher’s website or properties. And in order to do that, they would use a data clean room to match the two different data sets together.
For the most part that would be some kind of linking ID that joins it all together: universal ID, mobile ID, phone number, email address, or something like that.
Michael Sweeney: How does that look when we’re talking about non-programmatic advertising and marketing use cases? Do both parties have some piece of data that can be connected together? And what can they do once they’ve done that?
Juan Baron: Not necessarily.
In our data clean rooms, we have strict user permissions on who can access and upload data. For example, it could be a data scientist who needs to compute the data without linking the data sets. The data allows the data scientists to pull different data sources and run specific models to achieve a certain result.
Michael Sweeney: That’s really interesting. I guess data clean rooms can be used across different industries to do things with data in a highly secure and privacy-friendly environment. That’s the whole value of data clean rooms for those industries.
Juan Baron: It really depends on the capabilities of the data clean room provider.
At Decentriq, we use Confidential Compute, which is hardware designed by Intel and AMD. We allow pretty much any programming language inside the data clean room. So we support not only SQL but also R or Python. This is exactly what data scientists use today but in a compliant way with sensitive data through our platform.
We have access permissions by users and interactive data workflows built into the platform. This allows for a collaborative and speedy iteration of data scientists to work back and forth in a fluent matter. This is very different from what we see in traditional competitors in the space who tend to focus on finding a specific number of users for retargeting in a safe and compliant way.
For us, that’s not the most exciting use case. The more exciting use cases are the ones that involve data science, especially in advertising.
Michael Sweeney: With the topic of data scientists using a data clean room, a lot of that programming-type stuff would be done inside the Decentriq data clean room, right? Because one of the whole points of the data claim is that other data, apart from your own that you put in, is not extracted from it. So a lot of analysis happens within your platform. Is that right?
Juan Baron: That’s correct. And then you can allow the other collaborating party to get access to specific results.
We have what is called K-anonymity filters, or a privacy filter in a way where we can actually hardwire the end results to be aggregated results. So that is kind of hardwired into the platform.
The whole idea as well is that you can have full transparency in what kind of code is being written. So, therefore, there’s no data leakage. On top of that, because of confidential compute, there’s this thing called remote data station, and what that provides is cryptographic proof of what is actually being done with the data.
So, the sexier term for this in our platform is called the audit log. This provides cryptographic proof that whenever somebody clicks to view the results or run the computation, we have a very transparent log between all the parties of what is actually being done, what users do, and what’s happening inside of the platform.
From a DPO perspective, this is an incredible feature because it provides a lot of assurance. It makes them sleep better at night, let’s put it this way.
Michael Sweeney: What are some of the clients doing with your Data Clean Room in the advertising and digital marketing space?
Juan Baron: In the digital and advertising space, the most common use cases are media planning. And then, obviously, activation. Activation takes place in very different flavors. And eventually measurement. When it comes to media planning, it all starts with an overlap.
That’s defined. I have my customer dataset, and I’m going to intersect it with the publisher’s network to see what the overlap looks like.
You can bring in your own data identity graph if you want to expand the match rate. And then, through activation, there are a few different flavors.
We have precise activation, which requires explicit marketing consent from the brand. That’s the traditional retargeting that everyone knows and loves.
The other one is what we already built into the platform, which is top affinity segments. Based on the intersection of the data, we identify the top affinity segments of that particular publisher. Then we create audiences or deal IDs around those particular segments.
The more sophisticated one is allowing the publisher to bring their own look-alike model inside of the data cleaner. And what that allows is actually to create one big segment built on a look-alike on that data intercept. And then the only thing that actually, exits that the data clean room is the model itself.
And what we’ve been able to prove in combination, because we use confidential computing and all the privacy and security guarantees that we provide, is that we, Decentriq, don’t have any physical or any way to access the data because the encryption keys are with the data owner not with Decentriq.
We also guarantee that not even the cloud provider has access to the data.
So what we do is that we actually back this up with legal opinions. So we have legal memorandums from prominent European legal firms backing our claims that: Yes, you don’t need marketing consent at all from the brand side in order to enable the top affinity and the look-alike model through Decentriq.
On top of that, which is pretty groundbreaking because of the way that technology is built, we have major European publishers acknowledging that they don’t even require joint control or agreement with the brand. So that is a game changer because of the way we build Decentriq.
And then — measurement. At the end of the day, it’s all about showing results. And for the longest time, for many years, publishers have been limited in terms of how much data they’re able to provide and show results to customers — and measurement is key.
So for the first time, they’re able to provide ad exposure data. Plus, add a lot of audience data into the measurement and provide very predictive analytics on behalf of the brand.
So, finally, premium publishers, in a way, have taking control back and showing a lot more value than traditional programmatic advertising.
Michael Sweeney: What are some of the channels that your clients are using in your data clean room? Can you offer the data clean room in different channels like web browser advertising, in-app, CTV, or are you focusing on one channel at a time? How can your data clean room be used across different channels by different advertisers?
Juan Baron: The data clean room itself is agnostic to the channel. It really depends on the publishing partner.
In Switzerland, we have publishers that not only sell standard programmatic display advertising or native, or even in-feed video, but we also have partners that sell CTV.
So, it really is agnostic. It depends on the capabilities and reach that the particular publisher has, what kind of inventory they control, and then make their DMP data and/or CDP data available in the clean room so that the advertiser can choose which particular channels and audiences will be the most relevant based on the data intersect.
Michael Sweeney: Perfect. And in terms of the thing that ties it all together, right? Because I guess, in most cases, or maybe every case when it comes to programmatic advertising and digital marketing, if you have an advertiser and they want to match their data sets with a publisher’s data sets, there needs to be an underlying linking ID, let’s say. So is that generally the case, like 100% of the time, there needs to be some kind of underlying ID linking those two different sets together?
Juan Baron: Yes. The most prominent linking is typically the email address. But we could do a combination of multiple things in our clean room because you can write any type of code.
You can ingest an identity graph, you can ingest multiple identity graphs, if you really need to, you can do a combination of what we call fuzzy matching. So, if you really happen to have the email, phone number, first name, last name, and even home address in your data set, you can use a combination of everything in order to further increase the match rate.
Michael Sweeney: I wanted to get back to the thing you mentioned about consent when it comes to collecting data and then using it in the data clean room.So if I’m a publisher and I have some kind of login, let’s say, where I collect email addresses, and then I would encrypt those email addresses and then upload them to Decentriq, for example, to the data clean room. Is it the case that when I collect that email address, I don’t have to ask for consent to collect it because I’ll be using it in a DCR where that personal data won’t be shared with anybody else?
Juan Baron: Well, the data clean room itself does not solve the problem. The law around consent is not about the technology; it’s about the processing of the data.
Based on our legal team, but also the law itself, what we are allowed to do is leverage data for legitimate business interest.
Let me give you an example.
Let’s say I am a bank, a big bank, let’s say I’m Barclays, and I want to advertise on The Guardian. So as Barclays, I have access to my own CRM. Perhaps I have surveys, perhaps I have Adobe Analytics, and I’m pulling all this data, and I have a very basic understanding of who my actual customers are. Maybe I have identified that they’re all male, 25-45, they’re interested in sports. Very plain and simple.
Then I’ll go to the Guardian and the typical workflow on programmatic advertising is like this: Hey Guardian, I want you to create an audience with male, 25-45 that are interested in sports, because that is what I know about my own customers.
With data clean rooms, in particular with Decentriq, in order, for example, to identify the activation case of top affinity, once you’re intersect those data sets, you now come up with a very different perspective of your own customers because now not only are they male but the the age group is actually different, it’s actually 28 to 35. And they’re not interested in sports, it’s more about adrenaline sports.
Now, if you realize what has happened is that, yes — you leverage data on both sides, from each other’s customers, but at the end of the day, all you’re getting is you’re extracting business analytics.
Therefore, based on the legal memorandums and the legal opinions that we have backing up this particular use case, no marketing consent or additional consensus is required in order to extrapolate that information because these are just business insights.
But you’re using those insights in order to further influence, to create a segment. And at no point in time, you’re transferring individual profile information from one entity to the other. Neither are you given access to your customer database to that particular publisher.
Michael Sweeney: Yeah, got it. So essentially, the publisher or even the advertiser wouldn’t need to state Decentriq or any other data as a vendor that they work with as well.
Like, if we imagine the typical CMP scenario, we go into a publisher’s website, and they’ve got a list of all the partners they work with. Essentially, that publisher wouldn’t need to list Decentriq as one of those partners because you’re not processing personal data.
Juan Baron: Yeah, that is correct. And that is a big change for us because, as I said, the way Decentriq is built, we have no way of even knowing what’s going on inside of the data clean room.
In some cases, we have no way of what kind of code they’re running, what kind of data they’re actually uploading, who has access to what. It’s kind of locked up in a hardware enclave protected by confidential computing.
And confidential computing, those hardware chips, those microprocessors are built in a way that they can only run the code that was agreed upon between the collaboration parties in the data cleanroom.
So at the end of the day, it’s the code, the rules, not any type of commercial agreement. That is being agreed upon by the collaborating parties.
Michael Sweeney: I think it is a big game-changer, especially when we talk about using first-party data. I see that is a massive advantage for data clean rooms.
If we think about some other ways a brand or a publisher would need to activate their first-party data, I mean, if they’re talking about Universal IDs, I’m not a legal expert on the GDPR and the whole thing about the lawful basis for data processing using a universal ID, but I would assume, and I’m pretty sure this is correct, that they would be essentially processing personal data, right?
So publisher or the brand would need to state that that’s part of the data processing and that these companies will also be processing data, but that’s not the case with data clean rooms.
Juan Baron: It depends on the data clean room.
As I said, it depends on how the data cleaner is built, what they require in terms of how they actually work with the data, the data architecture looks like.
In the Decentriq world, the data is encrypted on the device, so if you’re using our user interface, you can use it without even being connected to our infrastructure.
Once you hit encrypt and upload, all you’re sending to Decentriq is a blob of encrypted data, which is entered through confidential computing using specially designed chips. These chips can actually compute data that is encrypted.
So at no point in our infrastructure can anyone access the data, not even the cloud provider.
Every single data clean room provider will always hit the DPO office, the data protection officer. And they will always ask questions.
That’s what we tend to see as well, and that’s why we’re trusted by pharmaceutical companies, insurance companies, banks, and other highly sensitive data owners.
This is a testament to the trust they put in us because of the way our technology is built and the guarantees we provide.
Not all data clean rooms are capable of doing this because of the way their infrastructure is built.
Michael Sweeney: Let’s jump to this topic about different data clean rooms that are on the market.
There has been a lot of news in the past few months about independent data clean rooms operating on the market. There is also a lot of activity around data clean room or DCR-tech provided by the Walled Gardens — Google, Meta and Amazon. Of course, AWS also announced their data clean room solution.
So let’s start with the comparison between Decentriq and other independent, non-walled garden data clean rooms.
Obviously, you’ve touched on this before about privacy and data. And I guess one of the key things that you mentioned a second ago about data clean rooms is that they are very much viewed as a highly privacy-friendly solution. Privacy and security needs to be at the heart of it.
Michael Sweeney: How does the Decentriq differ from the other independent data clean rooms on the market? And how do you provide the security and privacy aspects that generally most people would think need to come automatically with a data clean room?
Juan Baron: A key differentiating part of the Decentriq is that now we’re going into the privacy-enhancing technology space.
Not only do we have a PETs environment, which is a completely new different world in particular for the advertising market, but we use a combination of trusted execution environment and confidential computing. As far as we know, based on the research that we’ve done and everything else, and partners that we talk to, we are the only data clean room in the space that uses this combination.
So that combination of both hardware and software is by far the most secure.
Now that being said, what we know of other data clean room providers in the space is that some of them are more like a CMO dashboard, so they’re pulling in information from existing data sources, creating collaboration between those data sources, but there’s not really sophisticated data science going behind it. It’s more about answering the chief marketing office query of the day.
On the other once, are built around data storage, which also has legal implications and limitations in terms of computational flexibility that they can run on the data clean room itself.
Data clean rooms from the Decentriq think about it like Snapchat. We are formal data clean rooms. You agree on the computation, and you can run any code, SQL, Python, R. You can even create synthetic data sets inside of the data premium for further protection of your data.
Once you agree on the code, you upload the data into the data clean room, your computer, and then you pretty much are done with it.
We’re not in the storage business, but we’re in the computational flexibility business, and the way we try to describe the Decentriq is called a trusted computational layer. You can send data into processing or computation inside of the Decentriq, and what you do with the data afterward, you can also send it to internal analytic dashboards if you have them.
Obviously, the Walled Gardens have their own business interests.
It’s to spend more money inside of the Walled Gardens, and that is the approach that they’re gonna take.
They want to show that their inventory produces more results with very little flexibility, and they’re not going to provide any data from their own users because they’re the Walled Gardens for a reason.
AWS is the same thing, right? Its marketing is trying to help you spend more inside of Amazon advertising. And it comes with these limitations as well because they’re not independent or data input-agnostic.
So, they kind of require for you to be on Amazon or perhaps you have to be in Snowflake. Then what do you do? You’re kind of limited in terms of the exposure or the possibilities that you can do with that.
We do see a future where large brands will want to have a very independent, highly flexible, extremely secure data science data clean room. And large brands will demand from the Walled Gardens to provide data sets in that particular… because, at the end of the day, the Walled Gardens also need to protect the data of its own users. So they need the most secure data clean rooms as well.
So we do see a world where a very large brand will start demanding data from the Walled Gardens into their own independent data clean room in order to provide better measurement.
Michael Sweeney: Do you think that will happen in the future? Because from what I’ve read, you can’t really extract the data from Google’s Ads Data Hub, you can’t use that with other companies.
Juan Baron: That’s a very Google-centered view.
Michael Sweeney: Yes.
Juan Baron: They say: give me your data, I will ingest it, I will pretend that everything’s secure. Well, I will not pretend, I’m sure it is secure in a way, but it’s all about showing you how your data looks like in the Google Universe.
But at no point in time, if you’re a large sophisticated advertiser, your entire advertising does not just live on Google.
It lives on Meta, it lives on Snapchat, it lives on programmatic advertising, and maybe you have direct deals with the New York Times and the Washington Post, and the Guardian.
If you’re a very big brand, you want to have full control and full disability of where your marketing spend is being run and how to properly allocate budgets as the world changes. With the Ads Hub from Google, it’s a very Google-centered view.
Michael Sweeney: Do you think there will be a point in time when Google will make their data clean room tech open to other parties?
Michael Sweeney: Or do you think it would be a situation where let’s say, a large brand that runs campaigns with Google and other independent ad tech companies and Meta, Amazon even, they’ll essentially just have a bunch of different data clean rooms that they will use?
Or do you think they’ll come a point in time where they’ll all be working together in some way? Or would it kind of be just like we’ve seen with pretty much every other area of programming advertising, where there’s the World Gardens and then there’s the independent AdTech companies?
Do you think that trend will follow into the data clean room space? Or do you think it will be a little bit different?
Juan Baron: I sure hope so. I think, at the end of the day, it’s all about trust.
Even if Google opens up their own data clean room, do you think Meta will ever upload their user data or ad exposure data into the Google data clean room? I don’t think so.
That’s exactly why we believe that independent data clean rooms are here to stay because it enables trust and control.
And those are the key factors that are definitely going to drive the adoption of data clean rooms. We’re talking about sensitive data, we’re not talking about third-party cookies, right?
A brand, let’s say a bank or even an e-commerce platform, they have highly sensitive data to understand.
These are personally identifiable information, lifetime value components, what kind of purchase history the particular customer has, wheter it’s a bank, mortgage information, credit card information, or transactional history. Very sensitive data.
And at the end of the day, the data provider needs to be fully in control.
If they are a very large advertiser, not only do they want to put in the data team room, the transactional data, and the CRM data, but they also expect the Walled Gardens and the programmatic of the world, or the publishers of the world, to be able to contribute.
And they also want to be in control of their own data that they’re putting inside the data clean room.
So that’s why we believe that independent data clean rooms are here to stay. And we believe that Decentriq obviously has a very strong future because of its computational flexibility, but also by far the privacy and security guarantees that we provide.
Michael Sweeney: Definitely. I think that there will absolutely be a need for independent data clean rooms because it’s simply because of the different applications of a data clean room.
As you mentioned, some examples of the companies that use these. They use Decentriq, for example, to deal with highly sensitive information that they won’t give to Google or Meta, especially because the main use case of Google’s data clean room is advertising, right?
Juan Baron: Absolutely.
Michael Sweeney: But that’s not always going to be the case.
Juan Baron: If you’re Kroger or Walmart, will you ever use a AWS?
Michael Sweeney: Exactly. So there’s always a market for independent AdTech companies and data clean room companies, which is fantastic to see.
Juan Baron: Exactly.
Michael Sweeney: I wanted to ask a couple of questions about the IAB’s recent standards and guidelines that were released in February of this year, in 2023. How does interoperability work with Decentriq, starting from a basic level?
Michael Sweeney: Let’s say you’ve got one advertiser and a publisher. If an advertiser wants to use a data clean room for campaign media planning, can they really only use that with one publisher partner? Or if they wanted to work with multiple publishers, would it just be a matter of bringing those publishers on, right? Is that essentially how it would work?
Juan Baron: Yeah. That’s exactly how it works today. We have cases where we have networks of publishers collaborating with one brand, for example.
Michael Sweeney: Perfect. In terms of the interoperability between the different data clean rooms, this was kind of part of the IAB standards and guidelines. What are your thoughts on that?
Michael Sweeney: What are your thoughts on the future of the data clean room space for independent data clean room vendors in terms of interoperability between them?
Michael Sweeney: If you imagine we’ve got advertiser #1 using Decentriq, and then you’ve got a publisher that’s using a different data clean room — how do you see that situation and that future?
Do you see that it is a common thing, or do you kind of see how it is now, where there will be one data clean room, and it’ll be used by advertisers and publishers, and there won’t be any kind of interoperability between the different data clean room vendors?
Juan Baron: The topic of interoperability is the key one. Obviously, Decentriq is among the co-authors of that particular paper. And what we set out to do with that paper is just to create a baseline in terms of the future of interoperability.
Interestingly enough, it’s all about not only agreeing on the terms of privacy and security, and we take a very hard stance in terms of establishing a standard of what is really a private and secure data clean room, but also the subject of interoperability is data normalization between clean rooms, right?
You want to make sure that in the Decentriq World, we are agnostic to the input and the output of the data. But if we’re going to ingest that data from a competing data clean room because our brand or our publisher is using our data clean room, we need to make sure that the data can actually be run and computed and intersected with whatever is going on inside of Decentriq data clean room.
One of our largest partners here in Europe, their perspective of interoperability is not just about the data, it’s actually about the legal framework.
It’s all about making sure that we can scale the usage of data clean rooms without being bogged down in compliance.
So for them, the question is: can we somehow digitally or programmatically scale the legal frameworks around it?
And I think, hopefully, my hope is that we can actually take up this topic and bring it into the subject of interoperability because it’s a very interesting approach.
Because, as you know, programmatic advertising is all about scale without worrying. It’s about ease of use – click a button and then scale. We hopefully can actually do that through data clean rooms as well in the future.
Michael Sweeney: Yeah, definitely. And as you mentioned a moment ago, I think there are a lot of challenges with the interoperability part, especially with data clean rooms, right?
Because, by nature, especially as you mentioned with Decentriq, you guys are highly secure in privacy as well.
So there are a lot of challenges not only with the technical side, as you said with data normalization, but it’s also the fact that you’re not like a data warehouse, right? Where you can kind of do all that. Like a normal data warehouse would be. There’s a lot of separation with the different data sets. You don’t get the raw data even. It’s even challenging from that part of it as well.
It’ll be interesting to see how the interoperability part works out, even whether that’s something that brands and publishers even want as well.
I think that there are a lot of challenges, a lot of hurdles you have to overcome.
I think for a lot of brands and publishers, they might just be happy with how the data clean room process works now, where they use one data clean room like Decentriq, and just work with a bunch of their publishing partners rather than work with a bunch of different data clean rooms as well, because that could add a lot more complexity to their processes as well.
Juan Baron: Yeah, exactly.
Michael Sweeney: Yeah, perfect. Juan, that’s pretty much all the questions I had for today. Was there anything else you wanted to add? Was there anything that we didn’t cover that you would like to talk about?
Juan Baron: No, I mean, this was a very good conversation, so thank you very much for giving me the time.
Michael Sweeney: You’re very welcome. Thank you very much for sharing your insights. It’s always good to learn about data clean rooms.
I feel that every second day, I’m reading some kind of article about data clean rooms, and we’ll always try to make sense of it in my head, and chatting with people like yourself certainly makes that process a lot easier. So it’s really great that you took the time to tell us about Decentriq and how your data clean rooms work, as well as get your thoughts on some of the other topics in the industry.
For today, we can wrap it up. If people want to contact you, they can obviously just head off to Decentriq. I’ll put a link to your website in the description below. I’ll also put a link to your LinkedIn profile as well. If people want to connect with you if they have any questions, that’s probably the best place to contact you.
Juan, thank you very much for your time today, and all the best with Decentriq.
Juan Baron: Thank you, Michael. It was my pleasure.
Michael Sweeney: Perfect, thank you very much, and we’ll speak to you soon.
Juan Baron: Thank you very much. Bye-bye.