No punchline. Just a train ticket, three protocols, and an authorization gap.
I’ve been spending a lot of time inside agentic payment specs lately, building against them, and the thing that keeps striking me is how much of the conversation treats “agent buys a thing” as a single step. It’s a stack of problems that happen to fire in sequence, each one with its own spec being written by a different group right now.
The agent gets an instruction: “Book me business class, Milan to Bologna, tonight, window seat, cheapest option, proceed with payment.”
Let’s say the merchant is Trenitalia, and the agent is booking a Frecciarossa high-speed service. They don’t actually support any of this today, but a real company makes it easier to follow than inventing one, and I love high speed trains.
Finding the merchant
First problem, the agent has no idea who sells train tickets or how to talk to them.
It could scrape. Agents can absolutely load a browser, parse HTML, fill out forms. It works. It’s also fragile, breaks on every redesign, and honestly a bit embarrassing for an industry that’s solved way harder problems than this. A generation of engineers that streams billions of events per second and figured out real-time settlement shouldn’t be screen-scraping timetable pages.
This is what UCP (Universal Commerce Protocol, backed by Google, Shopify and others) was designed for. The merchant publishes a machine-readable capability document, sort of like a .well-known endpoint, that tells the agent what it sells, what it supports, and how to interact with it. The agent reads that, understands the interface, and knows what’s possible before making a single request.
No guessing, no scraping — the merchant opted into being discoverable, on its own terms.
What’s available right now?
So the agent found Trenitalia. But what trains run tonight? What does business class cost on the 8pm departure? Is a window seat even available?
UCP covers this too. Structured catalog, queryable inventory. The merchant exposes routes, classes, pricing, seat availability as structured data. The agent queries, filters, compares, picks the cheapest option that matches all the constraints. Everything queryable, everything structured.
Checkout
Agent found the ticket, user said go ahead, now things need to get formal.
UCP defines the checkout lifecycle. Session creation, order state, identity collection, tax calculation where it applies. But the part that matters here is payment handler selection. UCP’s checkout is designed to be extensible — the merchant declares which payment methods it accepts, and any handler that implements the interface can plug in. Cards, crypto, stablecoins, loyalty points, whatever. Trenitalia could accept payment via their own loyalty points as long as they implement identity linking. The protocol doesn’t care what settles underneath, it just needs a handler that follows the contract.
At this point both sides have agreed on what’s being bought and what it costs. The session holds that state.
Payment, and this is where it gets interesting
Checkout session is open, amount is set, merchant is waiting for money.
On card rails, here’s what actually happens. The merchant sends a charge request to their PSP (payment service provider), and the PSP pulls the amount from the card. The buyer never explicitly signed for that exact amount. The merchant has pull power.
The entire trust model for card payments is built around this. Anyone who’s applied as a merchant knows how much work that vetting is, and it still doesn’t prevent issues, it just creates someone to blame after. Vet the merchant hard: KYC, PCI compliance, contractual liability. Give them pull access to the buyer’s card. If they cheat, punish them after the fact with chargebacks, penalties, or kill their processing rights entirely. This works when the buyer is a human who can read a checkout page, verify the total, and call their bank if something looks off.
When the buyer is an agent, this gets shaky. The agent can’t eyeball the final charge, can’t notice a surcharge that got added after the session was confirmed. It trusts whatever the checkout session returned, but nothing cryptographically binds the actual charge to what was agreed. The merchant could pull a different amount and the payment would still go through.
The authorization gap
This is where AP2 (Agent Payments Protocol, Google and Coinbase, with Mastercard, Revolut, PayPal and others involved) comes in. It’s not a payment protocol, it’s a trust layer that sits alongside the payment to cover the authorization side of things.
During checkout, the merchant returns a checkoutSignature, a detached JWT that signs the hash of the current cart state (items, amounts, terms). When the agent confirms the purchase, the platform produces two verifiable digital credentials:
A CheckoutMandate that binds to the cart hash. This proves exactly what was agreed, down to the line items.
A PaymentMandate that authorizes the payment, cryptographically scoped to that same cart hash.
Both go to the merchant’s PSP at settlement time. If the merchant tries to charge a different amount, or quietly swaps business class for economy at the same price, the hash doesn’t match. The PSP rejects the charge.
So the token covers who can pay, and AP2 covers what was authorized to be paid.
On-chain, this problem barely exists
Worth stepping back for a second, because in the crypto payment world the authorization gap is mostly solved already, just at a different layer.
x402 (the protocol that finally uses HTTP 402 for what it was meant for) settles payments on-chain using EIP-3009 transferWithAuthorization. The agent signs a transfer for the exact amount, to the exact recipient, with a nonce and expiry baked in. The smart contract enforces all of it. The merchant can’t change the amount, can’t redirect the funds, can’t replay the authorization. You don’t need to trust the merchant, just the math.
So AP2’s payment authorization is mostly redundant when you’re settling on-chain. Though the cart-content binding still adds something even here. The x402 signature locks the amount and the recipient, but it has no idea whether the product was actually business class or economy. If the merchant swapped one for the other at the same price, the x402 payment would still go through just fine. The CheckoutMandate is what catches that.
I’m going deeper on x402 and how it connects to UCP in a follow-up piece. There’s a lot to unpack there, specifically around how x402 fits as a UCP payment handler and what that integration actually looks like at the spec level. For this article, the point is simpler: the authorization gap is a card-rail problem, on-chain signatures handle most of it natively, and AP2 bridges the gap for everything else.
What’s still missing
One purchase, and we’ve already touched three protocols just to get from “buy this” to “paid.” And that’s assuming nothing goes wrong. There are real gaps that none of these specs fully address yet.
What happens when an agent needs a refund? The purchase was autonomous, the agent confirmed, but the user wants to return it. The chargeback model was designed for humans calling their bank. Nobody’s defined what dispute resolution looks like when the buyer was a piece of software acting on a mandate that has since expired.
Multi-merchant transactions. The user says “book me the train and a hotel near Bologna Centrale.” That’s two merchants, two checkouts, one instruction. Right now each is a separate session and there’s no spec for composing multiple purchases into a single agent flow with a shared budget constraint.
Agent identity across sessions. The agent that bought the ticket tonight has no persistent identity that the merchant can recognize next time. No loyalty, no history, no trust accumulation.
These are solvable problems, and people are working on them. OpenAI and Stripe have their own commerce protocol (Agent Commerce Protocol, ACP) that takes a different approach to some of this. Stripe and Tempo just launched MPP (Machine Payments Protocol) for machine-to-machine settlement. The protocol space is getting crowded fast. Which one wins matters less than whether they can talk to each other: whether you can pick a commerce layer and a settlement layer and a trust layer and have them work together, or whether we end up with competing vertical stacks where you buy into one ecosystem and everything else is incompatible.
I don’t think anyone knows the answer yet, but these seams are where the next few years of agentic commerce will actually be decided.