Build an AI agent that shops online — autonomously.
"Can an AI agent buy something online — with 0 human intervention?"
Give your AI agent a natural language buying task. The agent picks an e-commerce or food ordering site, reasons through the options, generates a one-time virtual Visa card via the MCP server, and completes the purchase — with as little human intervention as possible. The goal is a believable, end-to-end purchase journey driven entirely by the model.
These are just starting points — bring your own scenario.
Find me a USB-C hub for my MacBook Pro. Needs at least 3 USB-A ports and HDMI out. Under $35, and it has to have at least 4-star reviews.
I need a 1TB NVMe SSD. PCIe Gen 4 minimum, with a DRAM cache. Find the best price-to-performance under $80.
I need a last-minute birthday gift for a 10-year-old who likes Minecraft. Under $25, and it needs to arrive by Friday.
Find me a portable monitor — 15–16 inches, 1080p minimum, USB-C powered. Under $150. Prioritize weight and color accuracy over refresh rate.
These are just starting points — bring your own scenario.
The payment stack is live. Focus on the agent logic and checkout automation.
Live at card.straitsx.ai/mcp. Plug in to Claude Desktop, Cursor, or any MCP client in 30 seconds.
One tool call: get_virtual_card. A real, spendable Visa card is returned immediately.
On-chain USDC settlement via Base Mainnet. ERC-3009 signatures — no wallet pop-ups, no browser friction.
Already proven on Steam and food ordering sites. Full purchase journey from prompt to receipt.
An AI agent autonomously issues a virtual Visa card and completes a real purchase — zero human intervention.
From zero to your first AI-issued card in under 5 minutes.
Message William on Slack — he'll set you up with a personal API key and spending allowance.
Add this to your claude_desktop_config.json or Cursor MCP settings:
{
"mcpServers": {
"x402card": {
"type": "http",
"url": "https://card.straitsx.ai/mcp"
}
}
}
Call get_virtual_card with your passphrase and an amount between $5–$50:
get_virtual_card({ passphrase: "your-api-key", amount_usd: 20 }) // → real virtual Visa card, funded on-chain
Hand the card details to your agent and let it navigate checkout on any e-commerce or food ordering site.
Record a 5-minute walkthrough and post it to the Slack thread. William will review all submissions after the deadline and announce winners.
As long as the final demo shows a believable end-to-end purchase journey.
Product research, comparison reasoning, checkout navigation — the smarter the agent, the better.
Extend the card MCP or build your own — new tools, card management, transaction history.
Browser control, form filling, order confirmation — across e-commerce or food ordering sites.
Card expiry, merchant restrictions, per-session budgets, audit trails.
Prizes are awarded per individual. First come, first served — budget remaining is announced after each winner.
Complete a successful end-to-end AI purchase demo. You keep the prize and whatever your AI ordered during the demo.
Convert your app into a Claude skill or ChatGPT Codex skill. One transaction cap at $50. You keep what was ordered.
Winners announced after the May 17th deadline. William reviews all submissions and announces results together.
5 categories · 1–5 points each · 25 points maximum. Judges score each team independently.
How many human interventions were required? Zero is the goal. Each manual step is a deduction. A truly autonomous agent should be able to receive a high-level instruction like "buy me the cheapest flight to Jakarta departing Friday" and execute it fully without any follow-up input.
Mostly manual. The agent is essentially a UI wrapper. The human drives every step — searching, clicking, confirming. The agent contributes little beyond maybe filling a form field.
3+ interventions. The agent handles some steps but frequently stalls, asks for clarification, or requires the user to click through critical decision points like payment confirmation or CAPTCHA.
1–2 interventions. The agent completes most of the flow but hits one or two blockers — typically final payment confirmation or a login/OTP step — where human input is still needed.
Near-autonomous. Only a single, understandable intervention is required — e.g. a one-time OTP or biometric confirmation for payment security. The agent handles everything else.
Fully autonomous. Zero human input from instruction to receipt. The agent handles authentication, selection, edge cases, and payment entirely on its own.
Did the AI actually complete a real transaction on a real site? Simulation or mock checkout does not count. Judges will look for proof — a confirmation email, an order ID, a transaction record — not just a screenshot of a success screen.
Simulation only. The "purchase" never leaves the local environment. The agent navigates a mock store, a local HTML page, or a pre-scripted demo with no real checkout involved.
Sandbox / test environment. The agent uses a test API, a staging environment, or a payment sandbox like Stripe test mode. No real money moves, no real merchant involved.
Partial real purchase. The agent reaches a real checkout page on a real site but does not complete the final payment step — e.g. it fills the cart and stops, or initiates payment but fails at confirmation.
Real purchase with caveats. A real transaction was completed but with notable limitations — e.g. only works on one specific site, required a pre-saved payment method, or the agent was helped by browser extensions pre-filling credentials.
Full real transaction. Money moved. A real order was placed on a real merchant site with a real payment method. Proof is available in the form of a confirmation email or order ID. Repeatable across different sites is a bonus.
Does it work more than once, or did it only succeed on the demo run? Judges may ask for a live re-run during judging. A solution that works 100% of the time on one site but fails everywhere else should not score the same as one that works consistently across varied conditions.
One-off demo. The team got it to work once, on camera, under ideal conditions. Attempts to reproduce it fail or require significant manual setup each time.
~25% success rate. The agent occasionally succeeds but frequently breaks — due to site layout changes, CAPTCHA triggers, session timeouts, or unpredictable agent behavior.
~50% success rate. The agent works roughly half the time. Failures are somewhat predictable and the team can explain why they happen, but no robust fix is in place.
~75% success rate. The agent is mostly reliable with known, documented failure modes. It recovers gracefully from some errors and handles common edge cases.
Consistent across runs. The agent works reliably across multiple runs, different products, and ideally different sites. Failure modes are handled with retries or fallbacks. A live re-run in front of judges succeeds.
More complex buying tasks — multi-constraint, time-sensitive, comparison-heavy — score higher than simple single-item purchases. Judges are looking for evidence that the agent can reason about trade-offs, not just execute a known sequence of steps.
Single simple item. The agent buys one specific, pre-determined product from a known URL with no decision-making required. Equivalent to automating a bookmark.
Simple with one constraint. The agent applies one filter — e.g. "buy the cheapest option" or "pick the fastest shipping" — but there is no comparison or reasoning involved beyond a single variable.
Moderate complexity. The agent navigates search results, compares a few options across two or more attributes (price, rating, delivery time), and makes a reasoned selection.
Multi-constraint task. The agent handles several competing requirements simultaneously — e.g. "under $50, ships by Friday, highest rated, from a seller with 95%+ feedback" — and resolves trade-offs when no perfect option exists.
Complex, time-sensitive, comparison-heavy. The agent handles real-world purchasing complexity: dynamic pricing, limited availability, cross-site comparison, time pressure, or multi-step workflows like flight + hotel bundling. The task would take a human meaningful effort to do manually.
Judges will clone the repository and attempt to run it themselves — if it doesn't work out of the box, that counts against the score. A beautiful demo built on spaghetti code scores lower than a less flashy demo with clean, documented, reusable architecture.
What judges will actually do
Throwaway hack. Hardcoded URLs, credentials in plain text, no error handling, no structure. Works for the demo only. Nobody — including the author — could run it without hand-holding from the team. Repo may be missing files or dependencies entirely.
Works but messy. The code runs but only after significant troubleshooting. Dependencies are undocumented, setup steps are missing or wrong, and the README either doesn't exist or doesn't reflect reality.
Runnable with effort. The repo can be cloned and run but requires some detective work — e.g. undocumented environment variables, a missing .env.example, or setup steps that only work on the author's machine.
Plug and play. Clone, follow the README, run — it works. Environment variables are documented, dependencies install cleanly, and setup takes under 10 minutes for a competent developer.
Production-ready, fully documented. Effortless setup with a clear README covering prerequisites, installation, configuration, and usage examples. Includes a .env.example, dependency lockfile, and ideally a one-command setup script. Architecture is modular. Includes meaningful comments, proper error handling, and logging. Could realistically be forked and turned into a real product.
Bonus indicators for score 5: a demo video in the README, docker-compose for zero-dependency local setup, tests that validate core agent behavior, clear separation between agent logic, site adapters, and payment handling.
Read these before you start. Demos that violate these rules are disqualified.
Use the Card MCP server for payment. No hardcoded card numbers or personal cards. The whole point is the agent issues its own card.
Purchases must be real, on a real site. No mock checkouts, no simulated confirmations. A real order confirmation email is the bar.
$50 per-person spend cap. Purchases beyond that are out of pocket. The cap is per individual, not per team.
No automating login with other people's credentials. Each participant uses their own account on any site the agent shops on.
Purchases must be legal and ship to a real address. No digital-only workarounds to dodge the "keep what you buy" spirit of the prize.
XUSD budget is finite and first come first served. Spending it on a failed demo doesn't reset your allocation. Test thoroughly before submitting.