May 18, 2026

The Auction Site Test: Where AI Coding Assistants Quietly Fail at Marketplace MVPs

The promise is irresistible: describe what you want, watch the code appear, ship a marketplace in a weekend. By mid-2026, that promise has mostly come true — for the parts of a marketplace that don’t matter very much.

This is a field report on what’s actually happening when founders sit down with Cursor 3, Claude Code, and v0 and try to build a two-sided marketplace. The shorthand we’ll use is “auction site,” because auctions are an unusually honest test: they involve every hard problem in marketplace software — concurrency, money, time, trust — compressed into a single user flow. If AI coding assistants can build a working auction site in a weekend, they can build almost anything. If they can’t, the gap tells you exactly where the technology stops being useful.

The short version: the front end ships in hours. The bidding logic ships in a day, looks correct, and is wrong in ways that don’t surface until two users hit the same lot at the same time. The payments integration takes longer than the entire rest of the build combined. And the moment a real bidder shows up, the founder is in a code review they don’t have the skills to perform.

None of this means AI coding tools are oversold. It means they’re sold without the asterisks.

The seven things AI assistants consistently get wrong in marketplaces

Before the build log, here is the short version of where the failures cluster. Every item on this list has shown up in multiple builder accounts, vendor postmortems, and direct teardowns; the rest of this report unpacks the mechanics.

Concurrent writes to the same record. Bidding, inventory decrements, booking slots — anywhere two users can act on the same row at the same time. The generated code reads, compares, writes. The race condition is invisible until it isn’t.
Payment edge cases beyond the happy path. Webhook idempotency, failed-charge reconciliation, refunds-after-payout, dispute handling, split payments, escrow release on confirmation events. AI gets the Stripe Connect demo right and the operational reality wrong.
Time-bounded events. Auction closes, booking windows, listing expirations, promotion end-dates. Cron-based polling is the default output and it is wrong in three different ways at once (sniping, enforceable-time gaps, scale).
Trust and abuse infrastructure. Review-bombing, sockpuppet detection, KYC, fraud signals, dispute resolution. There are no good general patterns for an AI to emit here, so it emits nothing — and the founder doesn’t notice the absence until the first bad actor arrives.
State machines that span users. A listing moves through draft, active, reserved, sold, shipped, delivered, disputed, refunded. AI-generated code tends to represent this as a single status field with ad-hoc transitions, rather than an explicit state machine with guarded transitions. The bugs that result are subtle and expensive.
Architecture for scale that isn’t there yet. Database indexing, connection pooling, rate limiting, caching, queue-based job processing. AI doesn’t think about these because the prompt didn’t ask, and the founder doesn’t ask because the prompt was about features. The system works at ten users and breaks at a thousand.
Security defaults appropriate to a money-handling application. Veracode found that 45% of AI-generated code contained an OWASP Top 10 vulnerability, and Forrester research highlights frequently occurring issues including missing or weak access controls, hardcoded secrets, unsanitized input, and insufficient rate limiting. For a marketplace touching real payments and real PII, the default security posture of AI-generated code is below the legal minimum in most jurisdictions.

If your marketplace doesn’t touch any of these seven categories, vibe coding will probably take you all the way. The catch is that there is no such marketplace.

The setup most founders are actually using

Before the build, it’s worth being specific about what “vibe coding a marketplace” looks like in 2026, because the term has become slippery.

The dominant stack among non-technical and semi-technical founders right now is some combination of Cursor 3 for the editor surface, Claude Code as the agentic assistant (often routed through Cursor or run directly in the terminal), v0 for initial UI scaffolding, and a starter framework — most often Next.js with Supabase or Convex for the backend. Lovable, Bolt, and Replit’s Agent show up earlier in the funnel for founders who want zero infrastructure decisions; Cursor and Claude Code show up the moment a founder wants to actually own the code.

The stack works. Claude Code runs in your terminal with full file system and CLI access, reads your codebase, runs tests, commits to Git, and opens pull requests, and Cursor is roughly 10× faster for greenfield prototyping and MVP creation thanks to sub-second Tab autocomplete and Cmd+K inline editing. The two are increasingly used together: Cursor for the tight loop, Claude Code for the multi-file refactors and longer-horizon work.

A founder armed with this stack and a clear prompt can have a deployable Next.js application with authentication, a landing page, listing pages, and a seller dashboard inside of a single sitting. The output looks production-ready. This is the part of the story that gets told.

What ships beautifully

The pattern across builder accounts and our own teardowns is consistent. AI coding assistants are now genuinely excellent at the things a marketplace shares with any other CRUD application:

The marketing surface. Landing pages, hero sections, feature grids, pricing tables, and email capture flows are now a single-prompt operation. v0’s output in particular is closer to what a mid-level designer would produce than what most founders would have built themselves a year ago.

Authentication and account scaffolding. Sign-up, login, password reset, email verification, OAuth — all of it is well-trodden ground. The AI has seen these patterns ten million times, and the generated code is typically correct.

Listing CRUD. Create a listing, edit a listing, browse listings, search listings with basic filters. This is the operational heart of a marketplace from a screen-count perspective, and it’s the part that vibe coding actually does well.

Seller dashboards. Tables, charts, basic analytics, listing management. v0 in particular shines here because the UI patterns are well-documented and the data shapes are simple.

Admin tools. Internal screens for reviewing flagged content, moderating users, or eyeballing transaction logs are low-risk because they’re behind authentication and used by one person. AI-generated admin tools are often the highest-quality part of the codebase precisely because nobody is editing them under pressure.

If a marketplace consisted only of these surfaces, the headlines would be right. A weekend is enough.

Where the weekend ends

The trouble starts at the boundary between “displaying state” and “changing state under contention.” Marketplaces — and auctions in particular — live almost entirely on the wrong side of that boundary.

The bidding race condition

Here is a representative pattern that AI assistants will happily generate when asked to implement a bid:

1. Read the current high bid from the database.
2. Compare the user's bid to the current high bid.
3. If higher, update the auction record with the new high bid.

This is wrong. It is wrong in a way that is invisible at one-user-at-a-time testing on localhost, looks correct in every code review the AI can perform on itself, and produces silent data corruption the first time two bidders click “Place Bid” within the same hundred milliseconds. Both reads see the same “current high bid.” Both writes succeed. The lower of the two bids becomes the record of last write. The system tells both bidders they won.

The correct implementation requires either a database-level constraint with an atomic compare-and-swap, an advisory lock around the auction row, or an append-only bid log with the winning bid derived rather than stored. None of these patterns reliably appear in AI-generated marketplace code unless the founder knows to ask for them by name — and the founders most likely to use vibe coding are precisely the ones who don’t.

This is not a hypothetical. Studies show roughly a quarter of AI‑generated Python and JavaScript snippets contain logic flaws or insecure defaults — not syntax errors, but subtle logic bugs that might only surface under specific conditions. Concurrency bugs are the canonical example.

The payment edge cases

Stripe Connect is the standard for marketplace payments, and the happy path — buyer pays, platform takes a cut, seller gets the rest — is documented well enough that AI assistants reliably get it right.

What they reliably get wrong is everything else. Idempotency keys on webhook handlers. Reconciling a charge that succeeded but whose webhook never arrived. Handling a refund after a payout has already happened to the seller. Disputing a charge that touches three different ledger entries. Holding funds in escrow for the duration of a delivery window and releasing them on a confirmation event. The standard formulation is that payment flows require idempotency keys, webhook verification, refund logic, and dispute handling — all areas where AI‑generated code tends to cut corners.

In a marketplace, every one of these edge cases is a real money path that will be exercised in the first month of real users. In an auction specifically, the edge cases multiply: what happens to the second-highest bidder’s authorization when the winner’s payment fails? When does the reserve price get checked, and against what — the bid or the captured charge? If you’re charging a buyer’s premium on top of the hammer price, does it apply before or after sales tax, and which jurisdiction’s rules govern?

These aren’t AI’s failures specifically. They’re the failures of a code generation process that has no model of the business domain. The AI knows what a Stripe webhook handler looks like. It doesn’t know what an auction is.

The time problem

Auctions have endings. Endings need to be enforced. The naive implementation — a cron job that runs every minute and closes auctions whose end time has passed — is what AI assistants will produce by default, and it fails in three ways at once.

First, it produces “sniping,” where the entire competitive dynamic of the auction collapses into the final second because nobody bids until the last moment. Most modern auction platforms solve this with anti-sniping extensions: any bid in the final N minutes pushes the end time out by N minutes. This is two lines of code to describe and a moderately involved refactor to actually implement correctly, because it interacts with the bid race condition above.

Second, it creates a band of unenforceable time. An auction that ends at 14:00:00 might not actually close until 14:00:45 when the cron next runs. Bids placed in that window may or may not count, depending on what the database happened to see at the moment of the eventual close. This is the kind of bug that produces lawsuits, not bug reports.

Third, it scales badly. A platform with ten thousand simultaneous auctions ending throughout the day cannot rely on a once-a-minute polling job; it needs scheduled jobs or a queue-driven architecture. AI assistants will generate the polling job. They will not, unprompted, generate the queue.

Trust, reviews, and identity

The trust layer of a marketplace — reviews, reputation, identity verification, dispute resolution — is where AI generation degrades from “subtle bugs” to “absent entirely.” There are no good general patterns for review-bombing prevention, sockpuppet detection, or KYC integration that an AI can simply emit. These systems have to be designed against the specific abuse model of the specific marketplace, and that design work is currently outside the capability of any coding assistant.

Founders who skip this layer find out about it when the first bad actor shows up, which in a marketplace with any real volume is typically week two.

The honest assessment, tool by tool

The “Cursor vs Claude Code vs v0” comparison gets framed as a feature war, but for marketplace work the differences matter in specific ways:

v0 is the best of the three for the marketing site, listing UI, and seller dashboard. It is the worst of the three for anything involving server-side logic or stateful interactions, because that isn’t really what it’s for. Founders who treat v0 output as a starting point for a fuller build, rather than a finished application, do well. Founders who try to ship v0 output as-is to real users have a bad time.

Cursor is where most of the actual development happens. It is fast, it is good at the inner loop, and the Tab autocomplete is genuinely the productivity multiplier its users claim. Cursor wins rapid MVP creation; Claude Code wins multi-file operations and complex refactors, and that split is the right way to think about it. For marketplace work, Cursor is the tool you reach for during a building session, not the one you use to plan an architecture.

Claude Code is the only tool of the three that meaningfully helps with the hard parts of a marketplace, and it helps only when the founder knows to invoke it correctly. Asking Claude Code to “review the bidding logic for race conditions” will frequently surface the bug described above; asking it to “implement a bid” will frequently produce the bug. The tool reflects the prompt; the prompt reflects the founder’s understanding of what they’re building. This is the recurring story.

None of these tools, alone or in combination, designs a marketplace architecture. They implement the design they’re given. When the design is missing — and for a non-technical founder building from prompts, the design is always missing — the system that gets built is whichever architecture the model defaulted to. That architecture, on inspection, is rarely fit for production.

The handoff moment

There is a recognizable point in every vibe-coded marketplace build where the founder realizes they have built something they cannot ship. The shape of this moment is consistent across the accounts:

A first paying user or test bidder surfaces a bug that the founder cannot diagnose without reading the code, and the code is unfamiliar.
A second user joins, and something that worked at one-user scale breaks.
A Stripe webhook fires in production that the local development environment never saw, and the application’s state diverges from the payment provider’s state.
The founder asks the AI to fix it, and the fix introduces a new bug. The founder asks the AI to fix that one, and the codebase begins to drift.

This is the handoff moment. It is not a failure of the AI tools; it is the point at which the work moves from “generation” to “engineering,” and the two are not the same activity. The AI can write more code, but it cannot really understand the system it has created. It keeps stacking complexity on top of a fragile foundation until the whole thing collapses.

Founders at the handoff moment have three real options, and the rest of this report is about choosing among them honestly.

What production-grade marketplace infrastructure actually requires

The reason this report is worth writing is that the gap between “vibe-coded MVP” and “marketplace that can take a real transaction” is much larger than the discourse around AI coding tools suggests. Closing that gap requires one of three commitments:

Option 1: Rebuild on a marketplace-specific platform. Tools like Sharetribe, Mirakl, CS-Cart, and — for auction-specific use cases — purpose-built platforms like premium auction software exist because the patterns above are solved domain problems that someone has already implemented correctly. The trade-off is loss of architectural flexibility; the benefit is that bidding, payments, escrow, and trust come pre-built rather than as a stack of bugs waiting to be discovered. For most founders, this is the fastest honest path to a marketplace that can take real money.

Option 2: Engage an engineering team to rescue and harden the vibe-coded MVP. This is the path agencies like Roobykon, Outsourcify, and others are now explicitly building “vibe coding rescue” practices around. The work is real: it involves auditing the AI-generated codebase, identifying which pieces can be kept versus rebuilt, and applying the engineering discipline that was skipped during the prompt phase. It is slower than Option 1 and gives the founder more long-term control, at the cost of materially higher spend.

Option 3: Treat the vibe-coded build as throwaway and rebuild from scratch. This is more common than founders want to admit. The MVP served its purpose — it proved demand, attracted a first cohort, validated the transaction model — and the codebase is now technical debt that costs more to rehabilitate than to replace. Founders who can be honest with themselves at the handoff moment, rather than six months later, save substantial money.

The mistake is none of the above. The mistake is continuing to prompt the AI through a problem it cannot reason about, accumulating fixes on top of fixes, until the codebase becomes unmaintainable to humans and unintelligible to the model. Multiple analyses have confirmed what’s been seen firsthand: startups that built their initial product with vibe coding tools often found their codebases so poorly structured and undocumented that scaling was impossible, and the typical outcome is a complete rewrite once the product gets traction — meaning you end up paying for your MVP twice.

The right question

The marketplace test is harsh on AI coding assistants, but it’s harsh on every method of building marketplaces. The interesting question is not “can AI build an auction site.” A weekend with current tools demonstrably produces something that looks like an auction site. The interesting question is what the demonstration is actually for.

If the goal is to test demand, validate a niche, or show an investor that the product concept is real, then the vibe-coded MVP is overwhelmingly successful and probably the best version of itself the technology has ever supported. A founder in 2026 can prove a marketplace concept faster and cheaper than a founder in 2022 could prove a landing page.

If the goal is to take real money from real users, the vibe-coded MVP is a starting condition, not a finish line. The technology has moved the cost of building a demo close to zero. It has not moved the cost of building a system very much at all. Treating the two as the same thing is the single most expensive mistake in this category, and it is happening at scale right now.

The honest framing for any founder reading this: the AI built you a prototype faster than you could have hired anyone to. That is real, that is valuable, and that is roughly 20% of what a marketplace business needs. The other 80% is the same engineering work it has always been. The tools have changed. The job has not.