Live on Base Mainnet $50 per person — keep what you buy Submissions close May 17th

StraitsX presents AI Commerce Hackathon

Build an AI agent that shops online — autonomously.

"Can an AI agent buy something online — with 0 human intervention?"

🏆 Total pool: 1,000 XUSD

📅 Deadline: May 17th

👥 Team size: 1–5 people

The Challenge

What are we building?

Give your AI agent a natural language buying task. The agent picks an e-commerce or food ordering site, reasons through the options, generates a one-time virtual Visa card via the MCP server, and completes the purchase — with as little human intervention as possible. The goal is a believable, end-to-end purchase journey driven entirely by the model.

Human intervention required Target: 0 interruptions

Examples

What kind of tasks can it handle?

These are just starting points — bring your own scenario.

Budget-conscious

Find me a USB-C hub for my MacBook Pro. Needs at least 3 USB-A ports and HDMI out. Under $35, and it has to have at least 4-star reviews.

Comparison-heavy

I need a 1TB NVMe SSD. PCIe Gen 4 minimum, with a DRAM cache. Find the best price-to-performance under $80.

Time-sensitive

I need a last-minute birthday gift for a 10-year-old who likes Minecraft. Under $25, and it needs to arrive by Friday.

Multi-factor tradeoffs

Find me a portable monitor — 15–16 inches, 1080p minimum, USB-C powered. Under $150. Prioritize weight and color accuracy over refresh rate.

These are just starting points — bring your own scenario.

Infrastructure

What's already built — so you don't start from scratch

The payment stack is live. Focus on the agent logic and checkout automation.

🔌

Card MCP Server

Live at card.straitsx.ai/mcp. Plug in to Claude Desktop, Cursor, or any MCP client in 30 seconds.

💳

Virtual Visa Issuance

One tool call: get_virtual_card. A real, spendable Visa card is returned immediately.

⛓️

x402 Payment Protocol

On-chain USDC settlement via Base Mainnet. ERC-3009 signatures — no wallet pop-ups, no browser friction.

✅

End-to-End Demo

Already proven on Steam and food ordering sites. Full purchase journey from prompt to receipt.

See it in action

End-to-end demo

An AI agent autonomously issues a virtual Visa card and completes a real purchase — zero human intervention.

Setup

How to get started

From zero to your first AI-issued card in under 5 minutes.

Get your API key

Message William on Slack — he'll set you up with a personal API key and spending allowance.

Add the MCP server to your client

Add this to your claude_desktop_config.json or Cursor MCP settings:

mcp config

{
  "mcpServers": {
    "x402card": {
      "type": "http",
      "url": "https://card.straitsx.ai/mcp"
    }
  }
}

Issue your first card

Call get_virtual_card with your passphrase and an amount between $5–$50:

tool call

get_virtual_card({
  passphrase: "your-api-key",
  amount_usd: 20
})
// → real virtual Visa card, funded on-chain

Complete a purchase

Hand the card details to your agent and let it navigate checkout on any e-commerce or food ordering site.

Submit your demo

Record a 5-minute walkthrough and post it to the Slack thread. William will review all submissions after the deadline and announce winners.

What to build

Any part of the flow is fair game

As long as the final demo shows a believable end-to-end purchase journey.

🤖

Agent + decision logic

Product research, comparison reasoning, checkout navigation — the smarter the agent, the better.

🔧

MCP server extensions

Extend the card MCP or build your own — new tools, card management, transaction history.

🛒

Checkout automation

Browser control, form filling, order confirmation — across e-commerce or food ordering sites.

🛡️

Spend controls + logging

Card expiry, merchant restrictions, per-session budgets, audit trails.

Bonus bounty track: Convert your app into a Claude skill or ChatGPT Codex skill and earn an extra +$50 on top of your main prize.

Prizes

Win cash — and keep what your AI bought

Prizes are awarded per individual. First come, first served — budget remaining is announced after each winner.

$50

Main Prize — per person

Complete a successful end-to-end AI purchase demo. You keep the prize and whatever your AI ordered during the demo.

+$50

Bonus Bounty

Convert your app into a Claude skill or ChatGPT Codex skill. One transaction cap at $50. You keep what was ordered.

1,000 XUSD

Total Prize Pool

Winners announced after the May 17th deadline. William reviews all submissions and announces results together.

Prizes are awarded per individual in the team — not per team. Each person who contributes to a successful demo is eligible.

Judging

How your demo is scored

5 categories · 1–5 points each · 25 points maximum. Judges score each team independently.

🤖 Category 1 — Autonomy 1–5 pts

How many human interventions were required? Zero is the goal. Each manual step is a deduction. A truly autonomous agent should be able to receive a high-level instruction like "buy me the cheapest flight to Jakarta departing Friday" and execute it fully without any follow-up input.

Mostly manual. The agent is essentially a UI wrapper. The human drives every step — searching, clicking, confirming. The agent contributes little beyond maybe filling a form field.

3+ interventions. The agent handles some steps but frequently stalls, asks for clarification, or requires the user to click through critical decision points like payment confirmation or CAPTCHA.

1–2 interventions. The agent completes most of the flow but hits one or two blockers — typically final payment confirmation or a login/OTP step — where human input is still needed.

Near-autonomous. Only a single, understandable intervention is required — e.g. a one-time OTP or biometric confirmation for payment security. The agent handles everything else.

Fully autonomous. Zero human input from instruction to receipt. The agent handles authentication, selection, edge cases, and payment entirely on its own.

🛒 Category 2 — Purchase Success 1–5 pts

Did the AI actually complete a real transaction on a real site? Simulation or mock checkout does not count. Judges will look for proof — a confirmation email, an order ID, a transaction record — not just a screenshot of a success screen.

Simulation only. The "purchase" never leaves the local environment. The agent navigates a mock store, a local HTML page, or a pre-scripted demo with no real checkout involved.

Sandbox / test environment. The agent uses a test API, a staging environment, or a payment sandbox like Stripe test mode. No real money moves, no real merchant involved.

Partial real purchase. The agent reaches a real checkout page on a real site but does not complete the final payment step — e.g. it fills the cart and stops, or initiates payment but fails at confirmation.

Real purchase with caveats. A real transaction was completed but with notable limitations — e.g. only works on one specific site, required a pre-saved payment method, or the agent was helped by browser extensions pre-filling credentials.

Full real transaction. Money moved. A real order was placed on a real merchant site with a real payment method. Proof is available in the form of a confirmation email or order ID. Repeatable across different sites is a bonus.

🔁 Category 3 — Reliability 1–5 pts

Does it work more than once, or did it only succeed on the demo run? Judges may ask for a live re-run during judging. A solution that works 100% of the time on one site but fails everywhere else should not score the same as one that works consistently across varied conditions.

One-off demo. The team got it to work once, on camera, under ideal conditions. Attempts to reproduce it fail or require significant manual setup each time.

~25% success rate. The agent occasionally succeeds but frequently breaks — due to site layout changes, CAPTCHA triggers, session timeouts, or unpredictable agent behavior.

~50% success rate. The agent works roughly half the time. Failures are somewhat predictable and the team can explain why they happen, but no robust fix is in place.

~75% success rate. The agent is mostly reliable with known, documented failure modes. It recovers gracefully from some errors and handles common edge cases.

Consistent across runs. The agent works reliably across multiple runs, different products, and ideally different sites. Failure modes are handled with retries or fallbacks. A live re-run in front of judges succeeds.

🎯 Category 4 — Scope of Task 1–5 pts

More complex buying tasks — multi-constraint, time-sensitive, comparison-heavy — score higher than simple single-item purchases. Judges are looking for evidence that the agent can reason about trade-offs, not just execute a known sequence of steps.

Single simple item. The agent buys one specific, pre-determined product from a known URL with no decision-making required. Equivalent to automating a bookmark.

Simple with one constraint. The agent applies one filter — e.g. "buy the cheapest option" or "pick the fastest shipping" — but there is no comparison or reasoning involved beyond a single variable.

Moderate complexity. The agent navigates search results, compares a few options across two or more attributes (price, rating, delivery time), and makes a reasoned selection.

Multi-constraint task. The agent handles several competing requirements simultaneously — e.g. "under $50, ships by Friday, highest rated, from a seller with 95%+ feedback" — and resolves trade-offs when no perfect option exists.

Complex, time-sensitive, comparison-heavy. The agent handles real-world purchasing complexity: dynamic pricing, limited availability, cross-site comparison, time pressure, or multi-step workflows like flight + hotel bundling. The task would take a human meaningful effort to do manually.

🧱 Category 5 — Code Quality / Extensibility 1–5 pts

Judges will clone the repository and attempt to run it themselves — if it doesn't work out of the box, that counts against the score. A beautiful demo built on spaghetti code scores lower than a less flashy demo with clean, documented, reusable architecture.

What judges will actually do

Clone the repo
Follow the README setup instructions
Run the agent without asking the team for help
Attempt to point it at a different product or site
Evaluate how easy it would be to extend or adapt

Throwaway hack. Hardcoded URLs, credentials in plain text, no error handling, no structure. Works for the demo only. Nobody — including the author — could run it without hand-holding from the team. Repo may be missing files or dependencies entirely.

Works but messy. The code runs but only after significant troubleshooting. Dependencies are undocumented, setup steps are missing or wrong, and the README either doesn't exist or doesn't reflect reality.

Runnable with effort. The repo can be cloned and run but requires some detective work — e.g. undocumented environment variables, a missing .env.example, or setup steps that only work on the author's machine.

Plug and play. Clone, follow the README, run — it works. Environment variables are documented, dependencies install cleanly, and setup takes under 10 minutes for a competent developer.

Production-ready, fully documented. Effortless setup with a clear README covering prerequisites, installation, configuration, and usage examples. Includes a .env.example, dependency lockfile, and ideally a one-command setup script. Architecture is modular. Includes meaningful comments, proper error handling, and logging. Could realistically be forked and turned into a real product.

Bonus indicators for score 5: a demo video in the README, docker-compose for zero-dependency local setup, tests that validate core agent behavior, clear separation between agent logic, site adapters, and payment handling.

Boundaries

Guardrails

Read these before you start. Demos that violate these rules are disqualified.

💳

Use the Card MCP server for payment. No hardcoded card numbers or personal cards. The whole point is the agent issues its own card.

🛍️

Purchases must be real, on a real site. No mock checkouts, no simulated confirmations. A real order confirmation email is the bar.

💰

$50 per-person spend cap. Purchases beyond that are out of pocket. The cap is per individual, not per team.

🔐

No automating login with other people's credentials. Each participant uses their own account on any site the agent shops on.

📦

Purchases must be legal and ship to a real address. No digital-only workarounds to dodge the "keep what you buy" spirit of the prize.

⚠️

XUSD budget is finite and first come first served. Spending it on a failed demo doesn't reset your allocation. Test thoroughly before submitting.

Rules + FAQ

Everything you need to know

📅 When is the deadline?

May 17th — all demos must be submitted by end of day. No extensions.

👥 What's the team size?

1–5 people per team. Prizes are awarded per individual.

🎥 What does "demo" mean?

5-minute video of the end-to-end purchase flow + 3 minutes Q&A. Post to the Slack thread.

⏱️ Do I need to be online at a set time?

Async format — build on your own schedule. Just submit before the deadline.

🏆 How are winners selected?

William reviews all submissions after the May 17th deadline and announces winners together.

❓ Where do I ask questions?

Post in the Slack thread. William and the team are active there.