Confessions of a tag hunter: Uncovering who owns what on your webpage

Confessions of a tag hunter: Uncovering who owns what on your webpage

Entropy is rising on the Internet, which is making it harder to figure out what’s really going on behind the scenes. Mezzobit is witnessing that first hand.

Our goal is to identify 99.99% of all third-party technology that our Audience Value Platform finds on the Internet, whether on our customers’ websites or in scans of millions of other webpages. Or in other words, what company is responsible for the code that comes back from this call:

Mezzobit’s database now has thousands of companies and 10,000+ code signatures to help in this process. (The above example code, incidentally, is from RadiumOne). While our systems collect more than 100 attributes about each tag, pixel and object we find in the page, if we can’t tie the element back to a specific company, it has limited usefulness to our customers.

Sometimes, tag providers make it easy. They may use a subdomain of their corporate domain to serve the tag or redirect the URL to their corporate site (as RadiumOne does above when you type in www.gwallet.com). Or they expose their corporate affiliation in the domain registration record, which is viewable through WHOIS services.

Hiding digital tracks

But what do you do when you see something like this:

Looks like someone dropped a few Scrabble boards on the floor.

Executing the call for this pixel doesn’t help, nor does calling www or other subdomains. Their domain registration is masked by a privacy service (Moniker Privacy Services in Ft. Lauderdale). This is a common practice to avoid domain-related spam, and there’s an email address that’s proxied to the domain owner. But 99% of the time, queries go unanswered. Another permutation of this problem is when tags are served through content delivery networks such as Akamai or AWS Cloudfront without custom domain names.

At this point, it becomes detective work. If it’s a script, you may look at the contents to see if there are any hints to corporate identity, although most scripts are minified, which strips out a lot of this information. You look at what script calls it, what other tags may be loaded by the mystery object, and where else it appears on the site, page, or in the Mezzobit network. You also may do an Internet search on the tag, but there’s a rich ecosystem of malware vendors who do SEO optimization on domain names to lure people to their sites (a topic for another day).

Nearly all of these mystery tags appear several generations deep in tag chains that sprout from the webpage and oftentimes they’re involved in programmatic advertising calls. Site operators are even less clueful about their identity than we are because they don’t have any legal relationship with a vast majority of tag providers.

There are some interesting reasons why tag vendors do this. Some involve a cat and mouse game by anti-fraud or anti-ad-blocking vendors, as once their obscure domains are discovered, they become less effective. Among the more lyrical domain names we've seen are FallingFalcon.com and BudgetedBauer.com. A games publisher created 100+ different domains that map to the same site with slight cosmetic differences and named 100+ different ways, likely to get around web filters in schools. There are still other sites with privacy-sounding names that actually host malware -- a bit of bait and switch.

Trade groups representing the ad- and marketing-tech industries have been preaching transparency for years, and a majority of vendors are good actors. But we see a not insignificant number of tags that employ this sort of obfuscation, particularly on small and medium-sized sites with lower quality traffic. On one, we saw 806 distinct tags in a single day’s worth of traffic and 17.6% of those had unclear origin. Luckily, their frequency in touching visitors was much lower, in that they accounted for 1.4% of all tag executions.

Encouraging accountability

In the interest of building trust in the Internet ecosystem, we feel that the industry should consider the following guidelines:

  1. Any technology that touches a consumer should be served from a domain that has unmasked registration tying it back to the company.
  2. The WWW domain should redirect to the company’s corporate site or another page clearly identifying the source.
  3. If #1 or #2 are technically infeasible, the tag should have some identifier (e.g., code comments) that establishes its origin.
  4. Demand-side, supply-side, ad exchange and ad server platforms should have the ability to preflight domains and permit publishers to blacklist any offenders from participating in ad transactions.
  5. The self-regulatory schemes of the major trade groups such as the Digital Advertising Alliance (DAA) or Network Advertising Initiative (NAI) should include origin transparency in their code of conduct and permit consumers to file complaints on that basis.

These are not unreasonable requests given consumers’ growing unease regarding data collection and tracking, and publishers’ and brand marketers’ increased emphasis on simplifying the ad supply chain and improving user experience. The email world has tried for years to implement parallel measures to reduce the tide the spam with some success.

The recently announced Trustworthy Accountability Group (TAG) takes some steps in this direction, particularly around fraud, but doesn't seem to address much regarding technology transparency. Also, the "pay to play" aspect discourages wider adoption outside of the major players.

In an ideal world, we’d love for Internet companies to disclose what each tag or pixel actually does, but that borders on fantasy (and would make our jobs as tag hunters a lot less interesting). Baby steps in transparency will help to reward the upfront corporate citizens and further marginalize the bottom-feeders that have questionable value.

If you want to see for yourself which companies are on your website, we can help.

Originally published on the Mezzobit Data Today blog.

To view or add a comment, sign in

Insights from the community

Explore topics