How can you identify the pages on your website cited as sources by ChatGPT and LLMs?

Portrait of Nils Talibart Nils Talibart has been an independent SEO consultant since 2012. He supports large organizations (La Poste, Brico Dépôt, SeLoger) and SMEs in optimizing their visibility. In this article, he shares some tips he frequently uses to detect when web pages are used as sources by AI bots!

A growing share of your visibility is now played out directly within answers generated by ChatGPT, Gemini, Perplexity, and other LLMs (Large Language Models). Yet while traditional SEO benefits from mature measurement tools, GEO (Generative Engine Optimization) remains a black box for many brands. How can you tell whether your content is actually being used as a source by AI systems? How can you go beyond visibility indicators or share-of-voice metrics - often calculated from artificial prompts far removed from real user queries - to focus on concrete data?

In this article, I suggest setting aside uncertain prompt simulations and returning to the only indisputable source of truth: your server logs. Here's how they can help you precisely identify which pages on your site are feeding LLM-generated answers.

Key takeaways

  • Tools that track your visibility in LLMs rely on artificial prompts, which may be far removed from real user queries
  • Your server logs allow you to identify the URLs on your site that have been used as sources for LLM responses
  • Use redirection.io to easily identify the content most frequently cited by ChatGPT, Perplexity, and other LLMs
  • Cross-reference these URLs with your Google Analytics 4 data to connect LLM visibility with traffic and conversions

Measuring share of voice or traffic from LLMs: an imperfect approach

GEO will once again be one of the key topics of 2026. The challenge for brands: measuring their visibility and share of voice on ChatGPT and other LLMs. Many dedicated tools have emerged in recent months. They allow you to enter prompts (or generate them automatically) and then measure whether you are mentioned in responses from ChatGPT, Gemini, or Perplexity, whether your website is used as a source, at what position, etc.

However, this approach has a critical limitation: the prompts used by these tools are not realistic at all, and do not reflect the ones your users actually type.

So how can you measure your visibility on ChatGPT and other LLMs? A common approach is to track the traffic generated to your site from these LLMs - which we'll look at later. But this approach misses a key point: your content may have been used as a source in an LLM's response… without necessarily generating any traffic to your site. And being cited in a ChatGPT or Perplexity answer potentially means your brand is mentioned and your website highlighted as a source.

The only realistic audience measurement today is to look in your server logs at the number of requests made to your website by the instant-answer bots of LLMs.

That's not me saying it - that's Sylvain Peyronnet (co-founder of YourText.guru and Babbar), in this November 2025 interview with Enzo Honoré, which I encourage you to watch.

Because when an instant-answer bot comes to your site, it means your page was considered in building a result in ChatGPT or elsewhere. Which means that if you're good, if you have strong EEAT (Experience, Expertise, Authoritativeness, Trustworthiness), you were very likely cited as a source and may have had an impact. And if you can see which page was visited by the bot, you can know which pages potentially influence your customers in their behavioral journey.

And the good news is: you can measure this very easily with redirection.io.

A closer look at LLM instant-answer bots

When you type a query into ChatGPT, it can provide a direct answer based on its training data. However, more often than not, it will also trigger several requests (via a query fan-out mechanism) based on your question to fetch its answer from external sources. It may therefore visit certain web pages to gather up-to-date, sourced information.

And when ChatGPT visits your website to use it as a source for its answer, it naturally leaves a trace in your server logs, allowing you to detect its visit.

OpenAI uses several bots (or crawlers):

  • GPTBot, which explores content later used for training generative AI models
  • OAI-SearchBot, used for ChatGPT Search results
  • ChatGPT-User, which is triggered in response to a user action in ChatGPT (or a custom GPT)

It's this last user-agent, ChatGPT-User, that particularly interests us: it's the one used when ChatGPT visits a page on your site whose content is used to answer a user's query.

Example: Let's run a test on my client's site Annuaire Audition. I ask ChatGPT to find me an ENT (ear, nose, and throat doctor) in Angers.

To provide its answer, ChatGPT relies on several sources: Doctolib, Annuaire Audition, Angers University Hospital, Clinique de l'Anjou, Santé.fr, etc. It visited each of these source pages to retrieve the information it presents in its response.

Screenshot of ChatGPT citing its sources

And indeed, when I check the Annuaire Audition server logs, I can see that ChatGPT has just visited the page on my site that appears among the cited sources.

A log line in redirection.io

Detail panel of the log line in redirection.io

The same logic applies to other LLMs. For example, Perplexity has a crawler called “Perplexity-User” that plays the same role and may visit a web page to gather more information when a user asks a question.

Identifying which of your pages serve as LLM sources with redirection.io

Based on your server logs, you can easily identify the pages on your site most frequently visited by instant-answer crawlers from LLMs—and therefore which pages most often serve as sources in LLM responses.

With redirection.io, this is very easy:

  • Go to your log details (Logs > Logs list)
  • Choose a meaningful time period (for example, I selected the last 7 days)
  • Enter the user-agent you're interested in (here, I want to see which pages have most often served as sources for ChatGPT, so I enter "ChatGPT-User")
  • Group the results by URL to get your top URLs for the period

And here's the result! I get a list of the URLs on my site most frequently visited by ChatGPT-User over the last 7 days—that is, the URLs that most likely served as sources in ChatGPT during that period.

List of GPTBot visits in redirection.io

Remember to save this view if you plan to consult it frequently. You can also export or share these results.

You can of course go further by filtering by URL type. For example, Annuaire Audition has a content section where URLs start with "/guides/". I want to know more specifically which content pieces are most frequently cited as sources by ChatGPT, so I filter on that URL pattern.

List of GPTBot visits on a specific URL type in redirection.io

Also consider filtering by response code to ensure your source pages are returning 200 status codes—and not errors like 404 or 500.

You can repeat the same process for other pages, or for another LLM such as Perplexity, simply by replacing the user-agent "ChatGPT-User" with "Perplexity-User":

List of PerplexityBot visits in redirection.io

You now have a clear view of the content most visible across different LLMs—the pages most likely to be cited as sources.

These data are extremely valuable for analyzing your customer journey: even if traffic generated by LLMs is still relatively modest compared to channels like SEO, citations within LLMs place your site at the very top of the funnel, when users are researching, comparing, and refining their criteria. Your user is in the consideration phase and may later trigger more easily measurable actions, such as branded searches on Google.

If you're optimizing your content for LLMs, monitoring which URLs are cited as sources is therefore critical and allows you to directly assess the impact of your actions.

Measuring traffic from LLMs

We've just seen how to identify which of your pages are most frequently cited by LLMs. But does being used as a source actually generate traffic to your site (and conversions)?

To measure traffic coming from LLMs, you can isolate it in your analytics tool. In Google Analytics 4, for example:

  • Go to Acquisition > Traffic acquisition
  • Click on "Add filter"
  • Select "Session source" as the dimension and "Matches regex" as the match type
  • Enter the value below and click "Apply"
  • In the dropdown menu above the table, replace "Session default channel group" with "Session source / medium" to see which LLMs drive the most traffic

Here is the regular expression to enter in the “Value” field:

.*chatgpt.com.*|.*openai.com.*|.*perplexity.*|.*mistral.ai.*|.*copilot.microsoft.com.*|.*copilot.com.*|.*copilot.cloud.microsoft.*|.*gemini.google.com.*|.*claude.ai.*|.*meta.ai.*|.*grok.com.*

It allows you to filter traffic coming from the main LLMs. You can of course extend this rule to add other sources if you wish. Also remember to update it regularly, as these elements are likely to evolve quickly.

Here's what it looks like:

ChatGPT traffic in Google Analytics

By applying the same method to Engagement > Landing pages, you can see which pages on your site generated the most traffic from LLMs.

ChatGPT destinations in Google Analytics

You can also create a report in the “Explore” section that you can come back to regularly:

ChatGPT report in Google Analytics

But the simplest approach is still to create a Google Looker Studio report, where you can:

  • Display traffic data coming from LLMs by day, by month (to track trends)
  • Display traffic breakdown by LLM
  • Display the pages generating the most traffic from LLMs
  • Display conversion data
  • Etc.

ChatGPT report in Looker Studio

Conclusion

To measure your visibility in LLMs, instead of guessing which prompts your users might be typing, the best approach is to return to something concrete: analyzing your server logs to identify the pages most frequently visited by LLM instant-answer crawlers. These are very likely the pages most often cited by LLMs to feed their responses. A tool like redirection.io makes this very easy.

By combining this monitoring of URLs cited as sources by LLMs with analysis of traffic and conversions generated by LLMs, you gain solid data to guide your content optimization efforts and improve your visibility within these systems.