More Than "Vibe Coding": 1B Tokens and an App That Never Launched

I spent most of a year, and well over a billion tokens, building an Android app that never made it to the store.

And this is only the 2025 Cursor recap. The 2026 numbers aren't out yet, and this doesn't even count the tokens I burned through other tools.

Most write-ups about "vibe coding" stop at the two-hour demo. This one is about everything after that: the part where you actually try to ship, and where I got plenty wrong.

If you only remember one thing, remember this: the hardest problems were never the code. They were judgment, money, regulation, and people. Code was just the part that kept me busy enough to avoid them.

The Idea: A Trust Layer for Tabletop RPGs

Midway through last year, a friend pitched me an idea and asked if I'd be interested, since they needed someone technical. The product was TRPGHub, a platform for tabletop role-playing games (TRPG, 跑团) like Call of Cthulhu and D&D, which have a huge and fast-growing community in China.

If you've never played: a session needs a KP (Keeper, the host who runs the story) and a group of players. The KP designs and narrates the game, and players pay to join. Today most offline matchmaking happens in the wild: QQ groups (think Discord servers, but on China's QQ messenger), Xianyu (China's second-hand marketplace, like Carousell), or just friends passing names to each other. Money changes hands with no guarantees on either side. KPs sometimes 跑单, taking the booking and never showing up. Players sometimes finish a session and refuse to pay, or argue the quality wasn't there.

The basic problem was that nothing protected either side. The payment wasn't held anywhere safe, there was no reputation system to tell a reliable KP from a flaky one, and if the other person bailed you had no way to get your money back. So the pitch was simple: build a marketplace that holds the payment until the session is done and matches players with KPs, so both sides have less to worry about.

Part of what made it look like an opening: nobody had really built for it. There was no mature platform doing this in China, and no real competitor we had to worry about. The market is small enough that the big companies don't bother with it; TRPG matchmaking isn't worth their time, so the space was just sitting there, unserved. At the time that felt like an advantage. Later I learned that "no competitor" can also just mean nobody's proven the market is worth serving yet.

How we'd make money was simple: take a small cut of each booking. The idea came from an app I'd seen for artists taking commissions, which worked the same way: the platform sits in the middle of the transaction and charges a fee for it.

I said yes for two reasons that looked like real distribution on paper. My friend was deeply embedded in the TRPG community and could line up testers, so we figured a first launch could realistically pull ~100 real users. And she'd worked as an artist, so she understood the look and the audience in a way I never could.

Looking back, "she knows the market" and "we've actually proven people will pay for this specific product" are two very different things. I'll come back to that.

The Team and How We Worked

We split the work cleanly. I owned engineering and infrastructure. My friend owned product and market. A third co-founder owned regulation and part of product.

The process was straightforward: she produced prototypes, requirements, and UI; I scoped, estimated, and built. We tried Jira and dropped it within days because it's built for a team of fifty, not three people doing this on the side. A shared spreadsheet became our backlog, and a weekly sync kept us honest.

That part actually worked fine. The real friction came later, and it had nothing to do with tools.

Building 0→1: This Wasn't Vibe Coding

This was the first time I drove a whole product end to end, alone. Vibe coding was everywhere at the time, and from the outside this probably looked like more of the same. But I want to be precise about it. I have a real engineering background, and I used agents to move fast and to fill in areas I was less fluent in, not to hand over the judgment. That difference turned out to matter more than anything else.

System design came first

I treated this as real system design, not a prompt. And by "system design" I don't mean the interview-style stuff you memorize from a book or a website. I mean the actual engineering kind: designing a system that has to run in the real world. I asked AI a lot about best practices, and that's where I hit the first hard lesson of building with agents: if you aren't already strong in system design yourself, you can't really review the design an AI hands you.

And the design is the most expensive thing to change later. You don't need the perfect architecture on day one; nobody does. But you do need one that holds up in production, because unwinding it later costs you weeks. An agent can give you three plausible architectures in seconds. Which one actually fits your stage is still your call, not the model's.

For the stack, I went with Java + Spring Boot on the backend and React Native on the client, which kept iOS open for later. I picked these mainly because I already knew them well and wanted to prototype fast — nothing more clever than that.

Then get the core loop working

Once the design was set, I leaned on agents hard to stand up the codebase and build features against the requirements. This is where AI actually helped a lot: the core flows came together fast. We had an internal prototype quickly. Honestly it was closer to a toy, since the backend was partly mocked, but it let us validate the core and iterate.

The speed is real. The problem is that it feels like progress, when most of the hard part hasn't even started yet.

From Toy to Something Real

The gap between "works on my machine" and "a stranger can use it" is huge, and almost none of it is feature work.

Deployment. I know AWS well, but our users were in China, so the backend went on Aliyun. The concepts carry over, but the specifics don't. A lot of the workflow is just different, and the agent only half understood Aliyun's specifics, so I wasted a lot of tokens going back and forth with it.

Testing on real hardware. I started with no Android device, working entirely in the emulator. Setting up the Android toolchain from scratch, as someone new to mobile, ate weeks, and a good chunk of the agent's advice was simply wrong, which meant rebuilding over and over. About a month in I finally got a physical phone, and even then every test cycle was slow. Each change meant building a release APK and pushing it onto the device:

./gradlew.bat assembleRelease
adb install -r -d app-release.apk

Nobody really talks about this part. Build pipelines, device testing, environment config: this kind of unglamorous work is where a solo founder's time actually goes.

The feature surface is bigger than it looks

I assumed the core was small. It wasn't. A "minimum" version of this app still meant auth, the booking flow, chat, listing and publishing sessions, search, payment, and notifications, before you even count the long tail of smaller things. Two of those taught me the most important engineering lesson of the whole project.

Build vs. buy, the lesson that cost me the most

Chat. My instinct was to build it. Every system design book gives you the same recipe: WebSocket plus Kafka, fan-out, and you're done. But a textbook diagram and a real production IM system are not the same thing once you have to deal with delivery guarantees, offline sync, ordering, and stability, all on a deadline. After two weeks of fighting it, I threw out my own version and adopted OpenIM, a mature open-source IM stack.

Notifications. Same trap. Push looks buildable right up until you're dealing with device tokens, vendor channels, delivery reliability, and the famously fragmented Android push landscape in China. I used JPush instead.

The lesson underneath both: a system design book teaches you how something works. It doesn't tell you whether you should be the one building it, right now, with this little runway. For the plumbing that isn't your product (chat, push, auth), adopt something proven and spend your scarce hours on the part that actually is your product. In our case, the marketplace and the trust layer.

Why AI couldn't make these calls for me

You can steer an agent with skills, rules, and good prompts. But the genuinely hard part isn't getting it to write correct code. It's direction. Any production feature has many valid implementations, and even among the good ones there are usually several reasonable architectures. Choosing the one that fits your stage, your scale, and your constraints is a judgment call. If you're not strong in system design, the model can't make it for you. Worse, it'll give you a confident, wrong answer and present it like it's the obvious choice.

This is the whole "more than vibe coding" point. You can vibe-code a demo and have something running in two hours. Shipping a real product is different. A demo never has to deal with the database going down, with what a user in another region sees when latency spikes, or with whether you even have alerts when something breaks in the middle of the night. A product has to answer all of it, and an agent won't bring any of it up unless you already know to ask.

Why It Never Launched

The reasons had almost nothing to do with code. Three things killed it.

1. We built before we validated

We had a budget, not a tiny one for a side project, though how we split the funding is a separate story. But cloud infrastructure quietly bleeds money long before you have the users to justify it. CPU, memory, storage, plus paid services like JPush all add up. I've heard of people spending thousands of dollars just to put a personal app online that never really got tested against the market, and I've come around to thinking that running real cloud infrastructure before you've validated demand is just bad business.

The right order for a small team is the opposite of what we did. Validate demand first: a landing page, a waitlist, manual matchmaking in a WeChat or QQ group, a no-code or concierge MVP. Prove people will pay before you provision a single server. Then build, and keep the infrastructure sized to reality instead of ambition.

We did it backwards. We built the product first, then went looking for proof that anyone wanted it. That's an expensive way to learn something we could have found out for almost nothing.

2. Regulation, regulation, regulation

Picking your market and platform matters more than it sounds, and we managed to pick the hardest combination without really pricing it in. Android in China is more complex than iOS, and China is more complex than most other markets to begin with. On top of that, two of our features, chat and payment, are exactly the ones that draw the strictest scrutiny.

Realistically that's three months or more just to get listing-ready across the Chinese app stores, plus a stack of paperwork: software copyright registration, an ICP filing, and an EDI license. The EDI is the real wall: it effectively requires a registered company with one million CNY in registered capital. For three people doing this part-time, standing up or borrowing a qualified company is a massive cost and time sink.

The lesson is a classic startup one I should have applied to the engineering too: do the compliance pre-mortem before you write the code. If we'd mapped the ICP and EDI requirements in week one, we'd probably have picked an easier market, a lighter feature set, or a different idea entirely.

3. Working with a co-founder

We did okay here, at least that's how it felt to me at the time, though honestly we didn't do as well as I thought. The structural problem was that everyone was part-time, with day jobs and their own lives, which made even a weekly sync hard to protect, and we never landed on a good async rhythm to make up for it.

The deeper issues are the ones every early team eventually runs into. It's hard to keep putting in time and money when you can't see any positive signal from the business yet. Motivation runs out — it gets you through the first month or two, not through quarter after quarter with nothing to show for it. And when the constraints pile up, regulatory ones especially, things just slow down, until one day you realize nobody's pushed a commit in weeks.

What I'd Tell Myself Before Starting

If I could send one note back in time, it'd be this.

Validate demand before you provision infrastructure. A waitlist and a human-powered MVP would have taught us in two weeks what the code took two months to teach.

Buy the parts that aren't your product, build the parts that are. Chat and push were never our product. The marketplace was.

Do the regulatory homework first. ICP, EDI, app-store review, and the high-risk features like chat and payment belong on the whiteboard before the first line of code. And if one market turns out to be this expensive to enter, take that as a signal to go pick an easier one instead.

And remember what AI is actually for. It speeds up execution, but it doesn't give you judgment. The faster the agent makes you, the more your own architecture, cost, and product instincts become the bottleneck, so that's where you have to keep investing.

I didn't ship the app. But it's easily the most I've ever learned from a single project — about systems, about agents, and about all the unglamorous, non-technical reasons that most products quietly never make it to launch.

Next time, I'll spend the first billion tokens making sure anyone wants the thing at all.