Storefront MCP for Shopify Plus: How Merchandising Rules Should Surface to AI Agents

Key Takeaways

The Model Context Protocol (MCP) is how an AI agent asks your store for things. It lists the tools your store offers, then calls one with arguments and gets a structured result back.
A storefront MCP server exposes read-only tools an agent uses to search, browse, and fetch recommendations. It should never expose write actions to an untrusted agent.
The risk is not access; it is bypass. An agent that reads a raw catalog endpoint sees an unmerchandised store, so your pins, boosts, and promotions disappear from what it surfaces.
Expose search and browse through your own ranking engine so agents inherit your merchandising logic, then verify on your store what an agent actually receives instead of assuming parity.
Test eight things before you let an agent touch your storefront, including a parity check that compares the agent's results against your storefront for the same query. The checklist is in this post.

Shopify now ships a Storefront MCP server that connects AI assistants to live commerce data, and eligible US stores were switched on by default in Q1 2026. So an AI agent can already search your store.

The open question is which store it sees: the one you merchandised, or a raw catalog dump. This post walks the tool contract an agent actually calls, gives you the honest answer on whether agents respect your pins and boosts, and hands you eight things to test before you let one near your storefront.

What is MCP, in five minutes, for merchandisers and not just devs?

The Model Context Protocol (MCP) is an open standard that lets an AI agent use external tools. The agent lists the tools a server offers, then calls one with arguments and receives a structured result. For commerce, an MCP server lets a shopping agent search your catalog, browse collections, and fetch recommendations through defined, read-only tools.

That is the whole loop, and it has three roles. The host is the AI application, a chat assistant or a shopping copilot. The client is the connector that lives inside it. The server is your store, exposing what it can do. The agent first asks the server "what can you do," gets a list of tools back, then calls a specific tool with arguments.

One sentence for your merchandiser: MCP is how an AI shopping agent asks your store questions, and the answers are only as good as what your store chooses to expose.

A few specifics worth holding onto:

The protocol runs on JSON-RPC 2.0 messages. A server can offer three kinds of things: tools (functions the model runs), resources (data), and prompts (templates). Commerce mostly uses tools.
The spec is an open standard, currently at revision 2025-11-25. It is not specific to us and not specific to Shopify, which is exactly why it matters as a shared surface that any agent can speak.
MCP is one of two open standards converging on agentic commerce. The other is the Agentic Commerce Protocol, which covers the checkout side of the same shift. MCP handles how an agent discovers and reads your catalog; both are worth tracking as the surface forms.
Two kinds of server matter for commerce. A management server reads and writes, sits behind OAuth, and is for operators. A storefront server is read-only and is for shopper-facing agents. Keep them mentally separate, because they carry different trust levels.

The first question we get from technical leads is whether this is a new thing to build. Mostly, it is a new way for the things you already configured to be reached. "Isn't this just an API with a new name?" Close. It is an API with a standard discovery and calling convention the model understands natively, so the agent does not need custom integration code per store. That standardization is the point.

What does an agent actually ask for, in tools, arguments, and shape?

An AI agent interacts with a storefront over MCP by listing the available tools, then calling one with arguments. A storefront search tool such as search-text accepts arguments like filter groups, facets, sort order, pagination, and personalization context, and returns a ranked list of products. Storefront MCP tools are read-only, so an agent can search but cannot change your store.

An agent does not browse your site. It calls tools. On a read-only storefront MCP server, the tools an agent has are a small, defined set, and each one takes structured arguments and returns a structured result. Walk the flow once, end to end.

First the agent calls tools/list to discover what is available. Then it calls tools/call with a tool name and arguments. That is the generic MCP shape, defined in the spec: a tool has a name, a description, an input schema, and a result.

The storefront tool set is small and documented by name:

search-text, documented as "Semantic text search with query understanding and typo tolerance."
browse-collection, documented as "Browse a collection with sort orders, merchandising, and faceted filtering."
search-similar, search-autocomplete, facets-get, sort-orders-list, blocks-recommendations, and search-content.

All of them are read-only. Notice the difference in those two descriptions, because it matters for the next section. The docs name merchandising explicitly on browse-collection. They do not name it on search-text. We are quoting both descriptions exactly, and we are not going to paraphrase merchandising onto the one that does not claim it.

Take search-text and walk a request. The documented common parameters across storefront tools include:

filter_group, for nested filter conditions, so the agent can scope a query the way a shopper narrows a collection.
facets, for the facet values the agent wants returned.
pagination, for paging through results.
sort_order_code, to pick a sort the agent wants applied.
attributes, for the product fields the agent reads back.
context, for personalization.
identity, for behavioral tracking.

What comes back is a ranked list of products, each carrying attributes the agent can reason over. Here is the shape of that exchange, built from the confirmed tool names, the documented common parameters, and the generic MCP tool shape:

// Illustrative shape, per MCP spec + Layers docs. Field names below the
// documented common-parameter level are descriptive, not a verbatim schema.

// 1. Agent discovers tools
{ "method": "tools/list" }

// 2. Agent calls the text-search tool
{
  "method": "tools/call",
  "params": {
    "name": "search-text",
    "arguments": {
      "query": "waterproof hiking boots",
      "filter_group": { /* nested filter conditions */ },
      "facets": [ /* facet values to return */ ],
      "sort_order_code": "<a sort order you configured>",
      "pagination": { /* page + size */ },
      "context": { /* personalization */ },
      "identity": { /* behavioral tracking */ }
    }
  }
}

// 3. Result: a ranked list of products with the attributes the agent asked for
//    (described at field-purpose level, not a verbatim response schema)

Treat that block as an illustration of the contract, not a copy of a published schema. We are showing you the confirmed tool name and the documented argument names. We are not inventing response fields.

One line answers the technical lead's first security question. The storefront server is read-only by design. An agent can read your catalog; it cannot mutate your store. Mutation lives on the management server behind OAuth with the mcp:use scope. Authentication splits the same way: the storefront MCP uses the same storefront access token as the REST API, passed as a bearer token, while the management server uses OAuth. Two servers, two trust levels.

Do agents honor your pins and boosts?

Whether an AI agent honors your pins and boosts depends on which path it reads through, so verify it on your own store. An agent reading a raw catalog endpoint ignores merchandising and returns relevance-only results. To keep your pins, boosts, promotions, and Search Instructions in the picture, expose agent search and browse through the same ranking engine your storefront uses.

The documented collection-browse tool browses "with sort orders, merchandising, and faceted filtering." For text search, run a quick parity check that compares the agent's results against your storefront for the same query.

There are two paths an agent can read through, and they return different stores.

Path A, the agent reads a raw catalog endpoint. This is a generic product-search tool that returns relevance-ranked results with no merchandising layer applied. The agent sees an unmerchandised store. Your pinned launch is not pinned. Your boosted brand is not boosted. Your scheduled promotion is invisible. The agent is technically reading your products and practically ignoring your merchandising.

Path B, the agent reads through your ranking engine. This is a storefront tool wired to the same engine that ranks your live storefront, and it is the path you want, because it is the only one that can carry your configuration: the ranking signal weights, the ranking rules (promote, demote, pin, sort), and the Search Instructions that steer the AI steps. Our docs confirm this directly for one tool. The collection-browse path is documented as browsing "with sort orders, merchandising, and faceted filtering," so it is, on the record, merchandising-aware. The text-search path is documented as "semantic text search with query understanding and typo tolerance," which does not name merchandising. So for that path we recommend the architecture and tell you how to confirm what your store actually returns.

Here is the mechanism the engine carries, confirmed in the docs. Our ranking model scores products across five signal groups: Semantic, Keyword, Engagement, Freshness, and Inventory. On top of that, ranking rules apply promote (1–50%), demote (1–50%), pin (which locks specific products to exact positions, max 50 pinned), and sort overrides (up to 3 weighted expressions, each 5–100%). That engine is what your storefront reads through. The recommendation is to route agent reads through the same engine so the same rules can apply.

Be precise about what is documented and what you should test. The docs do not state that search-text results honor your pins, boosts, and Search Instructions identically to the live storefront, so we will not claim they do.

What the docs do support is narrower and still useful. Your ranking engine applies pins, promote, demote, sort, and your Search Instructions to any search that reads through it, and the collection-browse tool is documented as merchandising-aware. For text search, the sound move is to expose agent reads through your own engine, then run the parity test below to confirm pin, boost, and Search-Instruction behavior on your store before you trust it.

We have watched a team spend a week pinning a launch to the top of every relevant collection, then realize an agent on a generic endpoint never saw a single pin.

So we are not asking you to take our word for whether your pins carry through. We are handing you the test that tells you, on your store, in a few minutes. Our AI Search is the engine we route every surface through, and the parity test is how you confirm an agent is reading through it rather than around it.

How should merchandising rules surface to agents, with Search Instructions as the guardrail?

Merchandising rules should surface to agents the same way they surface to shoppers: through one shared ranking engine. Search Instructions, written in plain English, attach to five AI steps in the search pipeline, so an agent reading through that pipeline picks up the same merchandising guidance a shopper gets. Configure the logic once, then verify a given agent path is reading through it.

The right design is not a separate "agent mode" with its own rules to maintain. It is one engine that ranks every surface you wire to it: the on-site search box, the collection page, the autocomplete, and the agent. Merchandising gets configured once and inherited by everything reading through that engine.

Search Instructions are the clearest expression of this. They are plain-English merchandising guidance that attaches to five AI steps in the search pipeline: query expansion, intent detection, facet value ordering, semantic redirect approval, and search result evaluation. When an agent reads through that pipeline, the same guidance that interprets a shopper's query is available to interpret the agent's.

A few specifics keep this honest:

Search Instructions are a short merchandising brief in English, capped at 5,000 characters, focused on decisions only you can make. They are not long lists of product IDs; those belong in ranking rules and pins.
Product-level control (pins, promote and demote, scheduling) lives in ranking and merchandising rules. Query-level judgment lives in Search Instructions. Both are available to an agent that reads through the ranker, and the parity test below is how you confirm a specific agent path picked them up.
The guardrail framing is simple. You do not write a separate rulebook for agents. You write one rulebook for your store, and the agent is another reader of it. That is the design we argue for, the design we built toward, and the design you verify rather than assume.

One honest boundary, because the technical reader will check. Search Instructions guide AI judgments; they do not hard-code individual promotions. Query interpretation, the redirect and SKU-detection steps, ignores merchandising guidance by design, because that work is about understanding the input, not ranking it. We built Search Instructions before agents were a real traffic source, and the happy accident is that an English-language merchandising brief turns out to be exactly the right way to steer a thing that reads English.

What breaks when an agent bypasses your merch logic?

When an AI agent bypasses your merchandising, it can surface out-of-stock products, ignore live promotions, bury pinned launches, flood results with one brand, and show stale items. The root cause is the same in every case: the agent reads your catalog but not your merchandising decisions. Routing agent search through your ranker carries those decisions into the agent's results.

When an agent searches around your merchandising instead of through it, specific, nameable things break:

Out-of-stock surfacing. A raw relevance endpoint can rank a sold-out product first. The agent recommends it, the shopper bounces, and your Inventory signal never got a vote.
Ignored promotions. A scheduled promo is live on the storefront and invisible to the agent, so the agent steers shoppers to full-price or wrong-priced items.
Buried launches. The product you pinned to the top sits somewhere on page three of the agent's results.
Lost diversity. Without diversity controls, an agent's results flood with one brand or one variant family, the exact thing you tune against on the storefront.
Stale or off-catalog results. An agent caching an old catalog snapshot surfaces products you delisted; real-time sync only helps the surfaces wired to it.
Brand-safety and trust. The agent is now a representative of your brand to the shopper. An unmerchandised, off-strategy result set is an off-brand storefront you did not approve. As AI surfaces take a growing share of discovery, this stops being a niche edge case. Google's guidance on generative AI features tells commerce sites that clear product data helps their items show up well in AI responses, which is the same point from the search side: machines, not just people, now read your catalog, and what you feed them is what gets surfaced.

Every one of these is the same root cause: the agent is reading your catalog but not your decisions. The fix is not to lock agents out. It is to make sure the path they read through carries your decisions, so control over what surfaces stays with you.

Brittany Csik, eCommerce Manager at Negative Underwear, put the value of that control plainly: "Layers doesn't just merchandise, it also finds the strategy and provides unparalleled control, all while saving us hours we didn't know we were losing." An agent reading around your ranker is exactly the control you lose without noticing. You can configure all of this through Merchandising and route the agent through the same rules.

How do agents tell you when discovery breaks?

Agentic Feedback lets AI agents report search and discovery problems back to your store. You add an instructions block to your llms.txt telling agents when and where to submit feedback, and the report routes through the Shopify app proxy. Agents flag specific mismatches, such as a query for one product type returning a different type, so you can fix them.

Agents are not only readers; they can be a feedback channel. You add an <AgentInstructions> block to your store's llms.txt that tells agents when to submit feedback, where (the Layers app proxy at /apps/layers/agentic-feedback on your storefront domain), and what to include.

The agent submits issues like "the shopper searched for dress boots, but the top results were mostly hiking boots." It is a structured way to catch the failure modes from the section above at the surface where they actually happen, instead of finding out from a flat conversion chart a week later.

What should you test before you let agents touch your storefront?

Before exposing your storefront to AI agents, test eight things: endpoint and authentication, read-only enforcement, that pins hold in agent results, that boosts and scheduled promotions surface, that out-of-stock items are deprioritized, that agent search matches your storefront search on the same queries, and that agent feedback routes back to you.

Run these eight on your store:

Endpoint and auth. Confirm your storefront MCP endpoint resolves and your storefront access token authenticates as a bearer token. Confirm the management server, which has write access, is on OAuth and is not exposed to shopper-facing agents.
Read-only verification. Confirm no write or mutate tool is reachable on the storefront server. An agent should never be able to change your store.
Pin test. Pin a known product to position 1 for a query, then run that query through the agent's search tool. Confirm the pin holds in the agent's result.
Boost test. Promote a brand or attribute, run the query through the agent path, and confirm the boost is reflected.
Promotion test. With a scheduled promo live, confirm the promoted items surface to the agent the way they do on the storefront.
Out-of-stock test. Confirm sold-out items are deprioritized or flagged in agent results, not ranked first.
Parity test, the load-bearing check. Run the same five queries through your storefront search box and through the agent's search-text tool, then compare the two result sets side by side. Look for merchandising drift: does a product you pinned hold the same position in the agent's set, do your promoted and demoted items keep their lift, do your Search-Instruction effects (synonym expansion, intent handling, facet ordering) show up the same way. Where the agent set diverges from your storefront SERP, the agent is reading around your ranker for that query, and you have found exactly what to fix. This is the agent equivalent of an autocomplete-versus-SERP parity audit, and it is how you confirm behavior on your own store rather than assume it from the docs.
Feedback loop test. Confirm Agentic Feedback is wired in llms.txt and a test report routes through the app proxy.

Run the eight on your store. Then book a demo and we will run the parity test live against your catalog, on a search engine where the agent and the shopper read the same merchandising.

Book a demo →

Closing

An agent will search your store whether or not you decided how. The only question is which store it sees: the raw catalog, or the one you merchandised. Wire the agent through your ranker, run the eight tests, and the answer is the second one.

Book a demo → · Read the MCP docs

FAQs

1. What is the Storefront MCP endpoint for Shopify? A storefront MCP endpoint is a read-only Model Context Protocol server that lets AI agents search and browse your store. It exposes tools like text search, collection browse, recommendations, and facets, and authenticates with your storefront access token passed as a bearer token. Shopify ships a Storefront MCP server and switched it on by default for eligible US stores in Q1 2026.

2. What is MCP in ecommerce search? The Model Context Protocol (MCP) is an open standard that lets an AI agent use external tools. In ecommerce, an MCP server exposes read-only tools so a shopping agent can search your catalog, browse collections, and fetch recommendations. The agent lists the tools your store offers, then calls one with arguments and receives a structured, ranked result.

3. Can I expose merchandising rules through MCP? You expose merchandising to agents by routing their search and browse through the same ranking engine your storefront uses, rather than through a raw catalog endpoint. The documented collection-browse tool browses "with sort orders, merchandising, and faceted filtering." For text search, route the agent through your engine and run a parity check to confirm your pins, boosts, and Search Instructions carry through on your store.

4. How do AI agents respect promotions and pinned products? It depends on which path the agent reads through. An agent reading a raw catalog endpoint ignores merchandising and returns relevance-only results, so pins and promotions disappear. An agent reading through your ranking engine can inherit your ranking rules and Search Instructions. The honest step is to verify it: run the same query through your storefront and the agent, then compare.

5. Is the Storefront MCP server read-only? Yes. Storefront MCP tools are read-only by design, so an agent can search, browse, and fetch recommendations but cannot change your store. Any write or mutate capability lives on a separate management server behind OAuth, which should never be exposed to shopper-facing agents. Confirming read-only enforcement is one of the pre-flight checks in this post.

6. How do I authenticate an AI agent to my storefront? The storefront MCP server uses the same storefront access token as the REST API, passed as a bearer token. The management server, which can write to your store, uses OAuth with an mcp:use scope. Two servers, two trust levels: keep the write-capable management server off any shopper-facing agent and reserve it for operator tooling.

7. What breaks when an AI agent bypasses my merchandising? An agent that bypasses your merchandising can rank out-of-stock products first, ignore live promotions, bury pinned launches, flood results with one brand, and surface stale or delisted items. The root cause is consistent: the agent reads your catalog but not your merchandising decisions. Routing agent search through your ranker carries those decisions into the agent's results.

8. How do I test my storefront before exposing it to AI agents? Run eight checks: confirm endpoint and authentication, confirm read-only enforcement, test that pins hold, test that boosts and scheduled promotions surface, test that out-of-stock items are deprioritized, run a parity check comparing agent results to your storefront on the same queries, and confirm agent feedback routes back to you. The parity check is the load-bearing one.

About the author

Deb Mukherjee is an Ecom Growth Advisor who writes about ecommerce search and merchandising for Layers, the enterprise search and merchandising platform built for Shopify Plus. He works with Plus brands on search relevance, merchandising control, and the agent-readiness work that decides what a shopping agent sees on your store. Connect on LinkedIn.