Skip to content

LLMs.txt Generator

robots.txt tells search crawlers what they may fetch. llms.txt is the emerging equivalent for AI: a single, predictable file that tells ChatGPT, Gemini, Perplexity and other assistants what your store is and what it sells — in clean Markdown they can read directly, instead of scraping rendered HTML. This module generates and serves it for you, always from your live catalogue, and always limited to public, active, visible content.

LLMs.txt Generator configuration

Magento

Open Source 2.4.9 GA (and later 2.4.x).

PHP

Tested on 8.4 and 8.5.

Per store view

Independent URL keys and content per store/locale.

Live

Generated from the catalogue on request — never a stale dump.

The short manifest — your site description plus a pointer to the full file. This is the entry point a crawler fetches first.

# LLM Crawler Information
User-Agent: *
Disallow:
## Application Description
AgenticEcom — a Magento 2.4.9 store offering 40+ AI-native extensions…
## Detailed Information
Map: https://your-store.com/llms-full.txt
  1. Go to Stores → Configuration → AgenticEcom · SEO & Content → LLMs.txt Generator.

  2. Under General Settings, switch Enable LLMs.txt to Yes and write a one-paragraph Site Description — this is what an assistant reads first, so describe what you sell and who for.

  3. Under Content to Include, choose which sections appear in the full file: Products, Categories, CMS Pages, Blog Posts. Turn on Enable Deep Attribute Export if you want full specs, variants, stock and images per product.

  4. (Optional) Under Agentic Chat Integration, enable the JSON Structured Endpoint to expose /llms-products.json for vector-DB or AI-chat ingestion.

  5. Visit /llms.txt on your storefront to confirm it renders.

Products

Filtered to enabled + visible-in-site products only. A disabled or “not visible individually” product never appears in any of the three files.

Categories, CMS & blog

Categories filtered to active, CMS pages to active, blog posts to published — so nothing hidden from the storefront leaks to a crawler.

No system attributes

Deep mode lists only attributes flagged visible on front (or used in listings), and skips a long denylist — meta_*, layout/design fields, tax class, cost — so internal data stays internal.

JSON is opt-in

The structured /llms-products.json endpoint is off by default and returns 403 until you explicitly enable it, so you decide when full machine-readable product data goes public.

What actually is llms.txt — do I need it?

It’s a proposed convention (think robots.txt, but for AI assistants) that gives LLM-powered tools a clean, authoritative description of your site instead of leaving them to scrape rendered HTML. As AI search and shopping assistants grow, an accurate llms.txt means your words and your prices describe your store — not a model’s guess. It costs nothing to publish.

Will it expose disabled products, draft pages or internal fields?

No. Every section filters to active/visible content (enabled + visible-in-site products, active categories, active CMS and published blog posts), and Deep mode lists only front-visible attributes while skipping meta/layout/tax/cost fields. Hidden content stays hidden.

Is the JSON product feed public to everyone?

Only if you turn it on. /llms-products.json is disabled by default and returns 403 Forbidden until you enable JSON Structured Endpoint in the configuration — so exposing full machine-readable product data is always a deliberate choice.

Can I rename /llms.txt without breaking crawlers that already found it?

Yes — change the URL key in config and a 301 redirect from the previous path is written automatically, per store view, so existing references keep resolving.

Does it work with multiple store views / languages?

Yes — the description, content toggles and URL keys are all per store view, so each locale serves its own llms.txt from its own catalogue scope. Verified clean on PHP 8.4 and 8.5.