Data Clean Rooms Explained: Q&A with Clearcode and Aqilliz [VIDEO]

Data protected by a shield and a lock


Our Newsletter

Get AdTech & MarTech resources sent straight to your inbox

We respect your privacy. Learn more here.

Data clean rooms have exploded in popularity as a result of the decline in third-party cookies and mobile IDs in programmatic advertising and AdTech.

In this video interview, Michael Sweeney, Head of Marketing at Clearcode, asked Gowthaman Ragothaman, CEO of Aqilliz, a series of questions about data clean rooms.

Get in contact with Michael and Gowthaman on LinkedIn

Below is the transcript from the video interview above.

Michael Sweeney: Hello, everyone. My name is Michael Sweeney and I’m Head of Marketing at Clearcode. And in today’s video, I’m joined by Gowthaman or Gmen, for short, who is the CEO of Aquilliz.

Today we’re going to be talking about data clean rooms.

Welcome, Gman. Thank you very much for joining me.

Michael Sweeney: Tell us a few words about yourself, Aquilliz, how you started, and then we’ll jump into some questions about data clean rooms.

Gman: Thank you. I think everybody knows me as Gman. I’ve been in the ad industry for the past 30 years.

Three years back, I quit WPP to set up this company with the specific conviction that the marketing and advertising tech industries require a distributed ledger solution because that was getting decentralized, and there is the need for a decentralized solution. And that’s the germ of the thought. Aquilliz is borne out of that conviction. 

So we stand for distributed ledger or blockchain for marketing solutions provider. Data clean room is one of the solutions that we offer. 

Over and above that, we leverage clean rooms for many things, including cross-media measurement and attribution solutions. 

That’s what Aquilliz is all about.

Michael Sweeney: Regarding the topic of data clean rooms, it’s a fairly new topic, especially within the programmatic advertising industry. And there are different types of data clean rooms? In your words, what is a data clean room?

Gman: In my view, a data clean room is a place where data owners make data sets available for collaboration, which means it should be cryptographically secure. 

It ensures that whatever is being done with the data, they’re able to record, maintain, and update those capabilities. 

Essentially, a clean room is a place that gives the certificate that this data can be collaborated.

Michael Sweeney: What is the difference between a centralized and a decentralized data clean room?

Gman: If you look at it today, there are quite a few clean rooms in the marketplace. All the current solutions are built for an enterprise for the purpose of maintaining data. So it offers them an enterprise A, or a company A, or a brand A, or a publisher A a clean room facility where their data is safe and maintained and kept for collaboration purposes. 

When do such companies want to engage with each other, let’s say a brand A and publisher A, or a brand A and a data provider and a publisher, and more than two or three participants, each of them will have their own clean room, by logic, because that’s how the current leaders built; But when they are all collaborating together, they cannot be putting all the data in another centralized repository. 

It has to be in a place that is owned by all of them or is made available to all of them for the purpose of clarity, trust, and transparency. 

But it’s where the differentiation comes in. 

A centralized clean room means another entity is taking the data out from the respective location into a central server of their own location, and then they take the responsibility of processing the data on their behalf; In the liability shifts from the data owner to the processor. But we don’t know how the record is being kept and pushed back. 

That is where, in my view, a decentralized clean room essentially means that it uses the techniques of distributed ledger, and whatever data is being used and processed is made available to all the participants who contribute. Otherwise, the current system is not scalable. It is one plus one. Maximum one plus two. But then, if I had a partner with more than two participants, centralization would not help. 

That’s the difference.

Michael Sweeney: I’d love to learn more about the decentralized part, specifically how you handle that and how the process works, as you said, for example, between a brand and a publisher. 

But maybe some of these questions may be asked when we talk about some of the main uses of a data clean room. 

Many people would have noticed data clean rooms have started to emerge. With all the changes in privacy over the past couple of years, specifically with the end of third-party cookies in browsers, such as Safari and Firefox and, third-party cookies and Google Chrome and not too far away.

Michael Sweeney: What are some of the main use cases or applications of a data clean room in not only advertising and media, but also potentially in some other industries as well?

Gman: Medical industry is one which is now extensively using this today to understand patient data a lot better.  The patient data are sitting in multiple sources. It helps to resolve records in a privately compliant manner without disclosing sensitive things. 

Real estate is another industry that is using it very effectively right now. 

And if you extend this logic, whenever there is a supply chain, and wherever there is a value chain that is happening over a period of partners in the supply chain, you will necessarily end up having a clean room. 

Somebody is getting the data, they are adding value to it, and they want to know what value others have added to it and rightfully command a price for the job they are doing. But that core data is not being used for any other purposes. 

And that is why today, that industry is desperately looking for such a solution. Privacy is at the heart of it. Consumer data is being abused beyond control, and the supply chain needs to be transparent in the value exchange that it is offered to the system. 

Clean room will become an essential component of any partner who is part of the digital advertising supply chain. Very, very soon.

Michael Sweeney: Staying on the topic of programmatic advertising, I think a lot of the typical use cases that people think of when they hear data clean rooms in programmatic advertising and digital marketing is the measurement part, right? Something that happens after an ad has been shown.

Michael Sweeney: Are there applications where data clean rooms can be used for ad targeting, audience targeting, and measurement as well?

Gman: Yeah, I’m going to talk about what we are offering. 

We see there’s three broad buckets. 

One is pure play insights which helps the participant to learn about the consumers a lot better and they’ll help you with the use case as well. 

Second bucket is advertising or activation, where the clean room capabilities can be used for safe and secure personalized advertising. 

The third one is the measurement or attribution. Bucket split into one. 

In all the three use cases, clean rooms are becoming critical and are gathering all the three use cases. 

I’m going to refer to some of our partners as well, so it helps to bring it to life. 

For example, we are working with one of the leading sports franchise in India, IPL sports franchise, and they have a sponsor ecosystem of all the sponsors of the team in the jersey, beverage partner, so on and so forth.

So they want to create a layer out of configuration where the sponsors can share their first-party data to all the other member sponsors of the franchise in a complaint manner, which helps them to know about the sports fans a lot better. They know about the fans much more because they are all part of the same sponsors ecosystem. 

And that helps upselling and cross-selling solutions amongst the sponsor ecosystem. 

It is extremely useful in the world today because the sponsorships can go beyond simple vanilla spins and sponsor details; It can bring it closer to the sports, the fan, and the consumer. That’s a fantastic use case. 

So it is about insight. It’s about knowing the fan a lot better. And it’s not about advertising, but it’s about simple marketing. You see this initiative, right? 

Even in such a situation, the clean room works very well. 

The way we do it in this case is we install a node in each of the participants’ native location, so the data doesn’t leave the premises. Nobody pulls in data into a central clean room. The fan data remains with each of the sponsors and a query is made to understand attributes about the fans from all of them that sits in the federated layer of the campaign. And then it is pushed back to all the partners for any further activation purposes. 

This helps in the participants knowing very well that the data is not being abused for any other purpose. That’s the purpose of the decentralized data layer. 

That’s on the insights side of it.

Many brands can use it. 

Any of the CBD brands or any other brands, can bring the other partners in the ecosystem. 

I’m just saying it’s not a case in point, but it’s a brand I worked on for more than two decades in my earlier life. Let’s say PepsiCo. 

Pepsi can partner with their voting rights partner, then Pizza Hut, Domino’s or any of the other sponsors. Together, they can generate more insights about their own consumer than what they have today. 

That’s a very powerful proposition for a clean room. Simply on insight. 

Decentralization ensures that the number of partners can be as many as you want. So it is not one, not two, that’s one wanted from being a centralized clean room. Otherwise, why would ten companies give the data to one company? Doesn’t make any sense. That’s on insights. 

On activation, which is where we are currently working with Airtel. Airtel is one of our strategic investors. We are also working with the Zee broadcasting in India, and with a few other publishers as well. 

When cookies are deprecating and there is no way you can identify your own consumer, your first-party data is your only source of knowing who your consumers are. Clean room helps in that transparent manner and the compliant manner share your brands data with the publishers data to understand your consumers better and say: Hey, there’s a match.’ So I will use your platform for retargeting or create lookalikes to push ads to that platform.

It also helps in more publishers coming together to offer the marketplace in the decentralized platforms. It is what we are trying to do in India right now. That’s the activation use case. 

Last but not least, the measurement, which is my personal favorite and one which I really love. As a planner in my early life I always struggled to allocate my money across the platforms. Each one of them are their own Walled Garden. They would take care of their own attribution and say ‘I am the best.’ But a brand who spends $400 across five sets platforms still don’t know how to allocate the money between these guardians, right?

Which is where WFA and the industry is really looking at cross-media measurement solution. And we are partnering with Ipsos in the Middle East in offering cross-media solution to the industry using the clean room technology where each of the publishers shared their publisher logs in the complaint manner. 

Then virtual IDs created to then do duplication to give the brand a real cross-media measurement solution. 

To me that’s the most powerful one of the three, Mike. 

While the first and second use case is good to do, nice to do. But if my planers had, I would say just ensure that we offer something to the industry where the brands really get their money worth on true return, on investment, on measurement. 

So measurement is the heart of the problem. And if we can fix it, we unlock so much money in the industry.

Michael Sweeney: Definitely. You know, as you mentioned, it’s critically important, the measurement part and this is really one of the main challenges with the whole end of third-party cookies, IDs, and identity in general.

Gman: That’s right.

Michael Sweeney: Certainly other, all the other areas are impacted as well. But measurement, as you said, is the key for planning, for understanding whether campaigns worked or not as well.

Michael Sweeney: A moment ago when you were talking about the different applications of a data clean room, you talked a lot about first-party data. Let’s say you’ve got a brand and a publisher that want to come together use data clean room like Aquilliz. What would they need in order to make that happen?

Michael Sweeney: A lot of companies already have a lot of first-party data that they collect. Obviously, a lot of companies are starting to invest a lot more in building up their first-party data strategies and collecting it more than ever before. But what do they actually need to tie that all together? Is it some kind of ID that needs to be at a match up?

Gman: At the very basic level, I think a device ID, or a mobile number, or a email address is what is one of the three key-connecting attributes that can be used on both sides of the partnership. 

Many publishers today don’t even have that. They are comfortable with the way consumers are logging into that website without actually logging in – they just check in. 

The publishers are also looking at finding some kind of an identity resolution solution that helps resolve the signals to say: Hey, these are my consumers,’ and they create their own ID. The proprietary ad that represents that consumer base. 

That can also be tagged on to these three variables that helps in creating cohorts or lookalikes not better, because it need not be only extremely determinist match from the clean room. We can find a decent attributes when we are talking to each of the partners apart from the three persistent identifiers, what kind of programs they watch, what kind of movies they like. 

There are many other attributes: volume of consumption, value of consumption… Any other attributes can also be added to that repository which can be queried on both sides to find them and match. 

Today people use third-party cookies for chasing or tracking the consumer on the other side, but still it is 50% efficient. People think that we are tracking, but we are not tracking very, very well. We all know that data is only half efficient.

When the cookies go away, it is almost going to be at near zero. You’re going to shoot at the dog, you’re going to be blind in identifying consumer. 

Any kind of these attributes that can be matched is still better than shooting in the dark. And any kind of attributes we imagine still be better than the current third-party cookies that are being matched, because it never was really delivering its promise. 

So, to answer to your question, it is email address, mobile number, or a device ID plus any other attributes that we can bring in is more than enough to find the corresponding consumers for insights and activist.

Michael Sweeney: You mentioned some quite interesting points there — even when we look at some of the IT solutions that are on the market, many of those use things like email addresses, phone numbers to create IDs. It’s certainly not been done the same way from a privacy perspective as it is in a data clean room, cause there’s no real decentralization. You can set the things to be collected. There’s encryption, but there’s still this missing piece of all the other parts, like decentralization or privacy.

Gman: It’s a very important point, Mike, thanks that you brought it up. 

So whether it is GDPR, CCPA, our personal data protection bill in India and in Indonesia, and any other market, the fundamental question that everybody is asking for or requiring is: as a data owner, they need to maintain a record of what’s been done with that data to beam. And that needs to be made available upon request either by the consumer or by the regulatory authority. 

That record can only be maintained in a distributed ledger because you are sharing it with your partner. 

Let’s say I am G-man and then I am a telco user and I’m also a public publication reader. When I am found on both the databases and they phoned me and say: ‘Hey, this is Gman tracking’, then both the publisher and the telco needs to update the record that Gman was phoned and tracked and sold out. 

That is not simply possible if it is not on a decentralized ledger because there has to be somebody else and neutral layer that maintains that record who does not have any other intent of monetizing it. And that’s very important. 

I remember when we were working with Project Rearc on IAB Tech Lab years back when we’re setting these regulations, this was the first and most important thing. 

It has to be a neutral entity that maintains the record of processing of activity and does not have any motivation of monetizing it. That can only happen if it is a further layer. 

That’s the connection with which Aqilliz was built. I just want to bring it to life during this point.

Michael Sweeney: I remember when we spoke previously, you mentioned that you utilize blockchain for as part of this this ledger. If you go back a few years ago when blockchain first came out, a lot of people in the industry were talking about the potential applications of blockchain in programmatic advertising, potentially using it for real-time bidding. 

It’s interesting to see real-life application of blockchain and to see it being used in such an appropriate way. 

Generally, how blockchain technology is used, it’s not necessarily used to run auctions, but, as you said, to provide a path to show records in this ledger.

Gman: It is a great story, Mike. 

Years back everybody thought blockchain will jump into cryptocurrencies and Bitcoin, they said: ‘Hey, what are you going to do with tokens in advertising?’ 

We steer clear of it. It is a pure SaaS platform. We are not putting the impressions or the consumer data on the public blockchain.

It’s a distributed ledger, a permissioned ledger built for a specific participant to do whatever they want to do with it. And at a periodic level, a merkle proof or a hash is only being put on the public blockchain. 

So we have built a very patented hybrid platform, which has got the patent in both the U.S. and Singapore right now. We are not a Bitcoin company, know it’s all about blockchain, and that’s why it’s clear to see that distributed ledger technology rather than a blockchain technology.

Michael Sweeney: It’s always good to tread distinction. You don’t want to get caught up in the whole ‘crypto net’.

Gman: In any case.

Michael Sweeney: You said that in order for a publisher and a brand to work together, they need to have some kind of common idea, with an email address and phone number. What’s the general process in terms of encrypting those IDs and ensuring that their privacy is maintained? What does that process look like?

Gman: Ideally, the industry universally uses SHA-256 today as an encryption technique, which is a one-way encryption. It ensures that when you’re decrypted, you get the same results back again. 

We use that for matching purposes. By default, our technology ensures data is encrypted. You never get your digital data out. 

The processing log also tells you that you encrypted it. 

Then when we are matching it, and it matched on the encrypted field and when pushed back to the respective participants for activation, they decrypt it and then they activated it. 

That is also put on the record as proof of activity. 

We give them a full-fledged provenance ledger for both the participants so that they know this data was encrypted, matched, insights, generated, activated and pushed. 

So that’s the advantage of the platform. It is encrypted, and it enables encryption.

Michael Sweeney: Is it possible to decrypt this? I think a lot of people, especially non-technical people, when they hear of encryption, if they know that I send email addresses encrypted and then it spits out some random string of letters and numbers, is it possible to then decrypt that? Or is that once it’s encrypted, that’s it, you can’t decrypt it?

Gman: You cannot decrypt. Only the source, only the owner can decrypt it.

Michael Sweeney: What kinds of channels are you working across currently? What are some of the clients that you mentioned before? What are the main channels where they are using a data clean room?

Gman: Our current focus is on publishers who are offering this as a solution to go to brands, to say: ‘I can offer you a better targeting purposes’ 

They have rich depository of first-party data, but clearly what platforms and CTV was written off though. 

In any other means of targeting, this gives them much better targeting because brands can have their own first-party data set up in their own premises, they can match it and then serve an ad on the OTT platform, which is very, very deterministic and it’s in its nature. 

So it is predominantly publishers who are the first set of people who are showing a lot of interest. 

And as I told you, sport franchise is another one which is showing interest in terms of understanding consumer insights and fan data. That’s the second one. I think that’s where we are today. 

The reason for… I wouldn’t say slow adoption. The reason for whatever state in terms of adoption is because many of them still don’t have first-party data, or many of them don’t have a structured customer data platform that offers them their data and in a manner that can be used. 

Those are intermediary headwinds for adoption. But that’s a question of time. And everybody now knows that they need to maintain their database in their own devices.

Michael Sweeney: It’s interesting you mentioned before that a lot of the companies that are using your data clean rooms are publishers. 

I think we’ve seen this with a lot of the other solutions that have been developed in response to the end of third-party cookies, whether it’s, the seller defined audiences… A lot of it seems to be led by the supply-side. 

So, I guess, a lot of the companies, agencies, brands on the buy-side are still very much relying on the third-party cookies. And maybe we won’t see as much movement until they’re completely gone and they’ve got no other option.

Gman: There is still the state of denial that one day it will happen. 

Till the proverbial cookie totally crumbles. Life goes on, but some of them are getting ready for the world because the winter is coming.

Michael Sweeney: Yes, exactly. Getting closer and closer. Even though Google has delayed it a few times, there will come a point where there won’t be any more delays. They would just go. 

So, I wanted to ask you about the IAB Tech Lab. I think it was a month or two ago they announced that they will be working on some standards around data clean rooms. I understand that Aquilliz is part of this group.

Michael Sweeney: What kinds of things would you like to see in IAB Tech Lab Standards or definitions of data clean rooms?

Gman: We’re not discussed at the bottom, but I would say it’s more from an individual or a personal point of view that we need to standardize the provenance ledger. 

I would say the consistent way in which the participants can see how the data is being used, which can be simply shared to the complaints authority to say: ‘This is what we have been doing with the data’ 

The best way to show it is that know if you go to any of the product that is a barcode and then that shows you the specifications of the of the product and it is like a stamp, right? We need to get to that level of sophistication to have some kind of an immediate ledger that shows that your data is being used and it is being used like a trail of information. 

I think that, to my mind, it needs to be standardized because each one can generate an audit report. 

Another one is like we talked about encryption. Consistent format of encryption and standards on encryption, and how it needs to be maintained. 

It can even be a kit which can be given to all the publishers to say that: ‘Look, This is an encryption kit. You need to just be certified with the encryption kit.’ 

So, yeah, these are my two initial thoughts. I’m sure if you’re listening, you will say: ‘Yes, let’s get down to it!’ 

Michael Sweeney: Yeah, definitely looking forward to those standards from the IAB Tech Lab to see what comes out of that.

Michael Sweeney: The last question I have is about AWS launching a data clean room. Tell us a little bit about what this announcement is? What it means for brands, publishers, etc.? Also, what does it mean for the data clean room providers that are on the market?

Gman: I woke up in the morning with this news. My immediate reaction is ‘wow!’ I think even before the industry has begun, it is getting commoditized.

We already have quite a few players offering data clean rooms today. 

Amazon coming with its data clean room solution is a fantastic thing. They have a natural product extension capabilities of offering these clean room solutions. 

And in fact, many of the other solution providers, let’s say, if I ignore Azure and GCP, all the other clean room providers, the Infosum, Habu, LiveRamp recently, there are quite a few people there. All of them use any one of this cloud solutions provider to offer the clean room solution. The cloud essentially sits with these three big guys.

If the the big guys already offer this solution as clean rooms, it is interesting to see how this entire clean room solution is going to play out in the longer run, because encryption can become standardized. The ledgers can become standard in the longer run. 

So, what is the actual role of a clean room? Apart from generating insights, offering activation or measurement? 

I think I’m seeing an accelerated… It’s like watching a movie in fast-forward to the end. You had it available that it’s going to lead to. 

That was my initial reaction. 

It’s a good thing because there is no category awareness. All of us have been trying to sort from a rooftop to say: ‘Hey, it is important! It is important the world to see it evolves. I need a clean room!’

The category building and awareness is definitely good from this. 

From our perspective, we see it as extremely complementary. We already use AWS today as our federated layer because Amazon offers blockchain solutions. so our node installations and our federated layer are already built on innovative solutions today.

I wouldn’t be surprised if you ask the Amazon team, they will say: ‘Hey, a AWS is different from Amazon advertising. We are two separate companies. We deal with that separately.’ 

Amongst them they are still two separate enterprises and it is just a technical solution coming from their technology team. 

It’s a very good news. For us it is even better news because we are already working with them as federated layer partner. This helps us in building adoption faster.

Michael Sweeney: Was there anything else you wanted to add? Any other final points or anything?

Gman: Not really, Mike. I think it’s a fantastic opportunity. Thank you for giving me time. It’s a much needed solution for the industry. 

I’m wishing you all the very best for doing a fantastic job of bringing awareness to these solutions from Clearcode.

Michael Sweeney: Thank thank you so much for the kind words and likewise, all the best with you and your ventures in the data clean room space. 

We’ll leave a link to our LinkedIn accounts in the bio in the description below. I know you’re very active on LinkedIn talking about data clean rooms.

So and once again, thank you very much for your time and we’ll speak again soon.

Gman: Thank you, Mike.

Reading recommendation

Read our online book

The AdTech Book by Clearcode

Learn about the platforms, processes, and players that make up the digital advertising industry.

Mike Sweeney

Head of Marketing

“The AdTech Book is the result
of our many years of experience in designing and developing advertising and marketing technologies for clients.”

Find out how we can help you with your project

Schedule a call with us today and find out how we can help you with your AdTech or MarTech development project.