Digital Marketing

How will bots see your content?

Your customers are no longer on your website that way. Many websites have seen a drop in traffic as users ask about bots and bots provide answers. Bots are making fewer clicks on web pages, and the share of referral clicks seems to be dropping.

Web publishers are aware of the threat they face. So far, they have tried to make themselves more attractive to bots. They plan to be recognized by bots (GEO – the development of the generating engine). Or they try to make their pages “better” for bots (Google’s WebMCP is a recent example). Legacy thinking still poses the problem as one of visibility – recognition in the crowd.

However bots are not human, and they do not need to be manipulated. The old psychology of wooing no longer works. If bots need something, they will take it to your website, whether you invite them or not. In most cases, they will take the content even if you don’t want it.

The problem that websites have to solve now is how to make sure that the bots are extracting the right content from your site. And your existing HTML content, designed for web browsers and surfers, is not what bots need, if your organization cares about ensuring the accuracy and relevance of what bots provide. JavaScript, the foundation of websites, is the responsibility of bots.

AI platforms are developing rapidly. They are moving away from indiscriminately searching the web for “training” and moving towards RAG, where they seek information first before extracting answers. AI platforms have also adopted the Model Context Protocol (MCP) standard, which, when enabled, can access business content directly. Already, third-party MCP platforms such as Scite and Tollbit have emerged to connect content publishers with AI platforms.

Publishers will continue to publish web pages for human readers, but they need to ensure that AI platforms are accessing the right content for bot users. Best practices for doing this are still emerging, and there are several initiatives underway to define principles and standards.

What is becoming clear is that MCP will play an important role in bot access control and content control. The diagram below shows a potential content pipeline for a scholar publisher. A similar pipeline may be adopted by a website publisher – but some additional steps are required to convert HTML-centric content into bot-friendly content.

An example of a pipe. Source: Scholarly Kitchen

How do publishers prepare? Let’s take a look at how Tollbit helps web publishers. Tollbit works with the Associated Press and other publishers to make their content ready for AI platforms.

The first task is to “clean up” web content to remove inappropriate or unauthorized material. This can be done by filtering the DOM to exclude certain categories of content, such as navigation text, advertising assets, or customer comments.

Further filtering can be done by excluding procedural or administrative pages or indexes instead of focusing more.

Next, the content must be converted by removing obfuscated HTML tags to convert the content into a bot-readable format. Many organizations choose to convert content to Markdown, which preserves the sequence of topics (useful for bots) while removing extraneous markup that bots don’t need.

Bots benefit from metadata, but they need help indexing it. The content modification process must deal with metadata that is invisible to human readers. This includes descriptive metadata (such as schema.org) about the content of external systems such as search engines, and internal administrative and technical metadata (such as geolocation links) used for web page delivery. This transformation, known as re-serialization, makes the metadata questionable. Metadata can be “hydrated” in bot checkout.

AI platforms, which are always motivated to increase the complexity of their products, will take advantage of these content enhancements.

Finding “bot-friendly” content will be critical as AI platforms expand their agent capabilities. Publishers will need to define access rights and permissions. What things can bots read, republish, or process?

Publishers will adjust these costs through both explicit statements and explicit decisions that influence the ease with which bots can perform actions.

– Michael Andrews

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button