Massive Failure on the Product

https://www.reddit.com/r/webdev/comments/1iasb06/massive_failure_on_the_product/

created by Yan_LB on 26/01/2025 at 22:23 UTC*

754 upvotes, 103 top-level comments (showing 25)

I’ve been working with a team of 4 devs for a year on a major product. Unfortunately, today’s failure was so massive that the product might be discontinued.

During the biggest event of the year—a campaign aimed at gaining 20k+ new users—a major backend issue prevented most people from signing up.

We ended up with only about 300 new users. The owners (we work for them, kind of a software house but focusing on one product for now, the biggest one), have already said this failure was so huge that they can’t continue the contract with us.

I'm a frontend dev and almost killed my sanity developing for weeks working 12/16 hours a day

So sad :/

More Info:

Tech Stack:

The Problem:

When some users attempted to sign up with new information, the system flagged their credentials as duplicates and failed to save their data. This issue occurred because many of these users had previously made purchases as "non-users" (guests). Their purchase data, (personal id only), had been stored in an overlooked table in the database.

When these "new users" tried to register, the system recognized that their information was already present in the database, linked to their past guest purchases. As a result, it mistakenly identified their credentials as duplicates and rejected the registration attempts.

As a front-end developer, I conducted extensive unit tests and end-to-end tests covering a variety of flows. However, I could not have foreseen the existence of this table conflict on the backend. I’m not trying to place blame on anyone because, at the end of the day, we all go down in the boat together

Comments

Comment by migumelar at 26/01/2025 at 23:02 UTC*

548 upvotes, 8 direct replies

This screams a project management issue: A team of 4 working 12/16 hours and expecting 20k users on launch. I can sense it has been worked on in a rush, minimum budget, minimum supervision, lack of planning.

Tbh the product manager is the one take the most responsibility here.

Comment by AGRYZEN at 26/01/2025 at 22:29 UTC

1088 upvotes, 6 direct replies

I mean if I paid 4 devs full time for a year who didn’t test a production build for its primary purpose, I would stop paying too

Comment by alphex at 26/01/2025 at 22:31 UTC

212 upvotes, 4 direct replies

Did you test the expectations?

Comment by latro666 at 26/01/2025 at 23:27 UTC

46 upvotes, 0 direct replies

Let this be a lesson. If you are having to work 16 hour days, something is already wrong which means something is going to go horribly wrong.

Next time this is happening, talk up and if you are not heard, run for the hills.

Comment by rzwitserloot at 27/01/2025 at 00:21 UTC*

46 upvotes, 1 direct replies

Chalk this up to a pricey lesson: Death marching is **extremely dangerous, not to be undertaken lightly.**

If that's too nuanced a point and need it simplified, okay then: **Do not ever deathmarch**.

To explain it in a way that relates to your situation:

After multiple 12+ hour sessions, the state of the delivered product is, *of course it is*, in a fairly precarious, unstable state.

The usual fix is to simply not do that. Not just the 12 hour thing - work 12 hour days if you must. No, the thing that tends to make people work 12-16 hour days: Unreasonable deadlines.

The problem with those is that pretty much by definition, the 'stuff we still have to do' list is too large to fathom in a single human brain, and yet there is clearly no time to take any clarity that is gained when implementing stuff somewhere along the path to the final product and adjust the earlier stuff to take into account this clarity. After all, IF you feel it is necessary to work 12-16 hour days to deliver the stuff that still needs to be done, obviously there is no time to adjust already-done tasks.

So instead you get out your twine, tape, and spit, and you just stumble about a bit, apply a whole bunch of shortcuts and 'works for me', and move on to the next item on the endless, *endless* todolist.

And that, naturally, leads to unstable software. Which has a nasty tendency to fail exactly when it matters: devs testing the stuff they write has the nasty tendency to fail to cover 'real life', because those scenarios tend not to quite match what devs do. One trivial example for websites, as we're in `/r/webdev`: Users tend to connect to your site simultaneously. And yet devs clicking around tend not to generate concurrent situations. Concurrent situations if not written 'properly' tend to cause things to end up in invalid states: Bugs that take down signup forms until someone fixes it.

Hence, **just do not do it**. If you must, because, hey, we've all been there (or at least, I have), you *can do it*, but know a few things:

if you want it stated in a way that is easy to convey to folks who might not really get what software dev is about, here's a parable:

That lumberjack is an idiot. Don't be like that lumberjack.

Comment by zephyy at 26/01/2025 at 23:01 UTC

202 upvotes, 5 direct replies

don't waste your life working 12+ hours a day for someone else.

Comment by IAmRules at 26/01/2025 at 22:57 UTC

70 upvotes, 0 direct replies

Sounds like everyone involved including the marketing people and owners are inexperienced with product launches.

Comment by TScottFitzgerald at 26/01/2025 at 22:38 UTC

58 upvotes, 7 direct replies

What was the issue?

Comment by cuervo_gris at 26/01/2025 at 23:14 UTC

26 upvotes, 2 direct replies

Damn, of course they are not going to continue the contract if the team is not even being able to make a proper sign up

Comment by Kingbotterson at 26/01/2025 at 22:37 UTC

63 upvotes, 2 direct replies

The site went live on a Sunday?

Comment by dragenn at 26/01/2025 at 23:51 UTC

21 upvotes, 4 direct replies

Did you put a "try { ... } catch" around the whole server???

Comment by pottitheri at 26/01/2025 at 22:37 UTC

27 upvotes, 2 direct replies

Could `you tell more about tech stack and what caused the backend issue ?

Comment by EmSixTeen at 26/01/2025 at 22:30 UTC

22 upvotes, 0 direct replies

I don’t know what to say other than Jesus, that’s shite craic. Hope you fall on your feet if it goes to shit.

Comment by SpareBig3626 at 26/01/2025 at 23:25 UTC

9 upvotes, 0 direct replies

Think about it on the good side: A project with such poor management like that is only destined for overtime and poor quality, so better than ending up with 80 hours a week and depression because your Project Manager does not know how to manage the project.

Comment by sneaky-pizza at 26/01/2025 at 22:58 UTC

17 upvotes, 1 direct replies

Hopefully you got paid, cause it sounds like the product owners were basically missing in action

Comment by Yan_LB at 27/01/2025 at 00:05 UTC

15 upvotes, 5 direct replies

More Info:

Tech Stack:

The Problem:

When some users attempted to sign up with new information, the system flagged their credentials as duplicates and failed to save their data. This issue occurred because many of these users had previously made purchases as "non-users" (guests). Their purchase data, including unique identifiers (such as email addresses or other personal details), had been stored in an overlooked table in the database.

When these "new users" tried to register, the system recognized that their information was already present in the database, linked to their past guest purchases. As a result, it mistakenly identified their credentials as duplicates and rejected the registration attempts.

As a front-end developer, I conducted extensive unit tests and end-to-end tests covering a variety of flows. However, I could not have foreseen the existence of this table conflict on the backend. I’m not trying to place blame on anyone because, at the end of the day, we all go down in the boat together.

Comment by fjacquette at 26/01/2025 at 23:44 UTC

8 upvotes, 0 direct replies

This is tragically way more common than it should be, and I feel for both you and the owners. My whole career at this point is turning around, or rebuilding after, disasters like this.

Comment by PointandStare at 27/01/2025 at 00:09 UTC

9 upvotes, 1 direct replies

The first, from the OP, issue is you guys spending 12/16 hours a day working on this.

It's going to fail simply because of this time pressure.

Second, test, test and test again. I can only presume, so correct me, the bosses were pushing for more and more in less and less time.

It's going to fail as corners will be cut.

That said, every project is a learning curve - the lessons here are:

- Never work stupid hours for a badly planned project

- You guys will get the blame/ be sacked or whatever

- The managers will be safe and pass the buck to those at the bottom of the food chain

If I was you, I'd make sure my CV is up to date.

Comment by Expensive-Scar2231 at 26/01/2025 at 23:48 UTC

6 upvotes, 1 direct replies

You need to learn from this and get better bro. I also recommend working with some higher skilled devs, the current team doesn’t sound very skilled.

Comment by TracerBulletX at 27/01/2025 at 04:48 UTC

6 upvotes, 1 direct replies

Seems like a case of some backend engineers that aren't really experienced enough getting in a bit over their heads.

Comment by memetican at 27/01/2025 at 06:47 UTC

6 upvotes, 0 direct replies

35 years of dev under my belt. In mission-critical systems, I've learned to capture and log everything. If the user fills out a form, you save it before you try processing it. Things will always happen outside of your control, and this is the only way to ensure the money isn't torched.

Comment by cellularcone at 27/01/2025 at 01:01 UTC

7 upvotes, 1 direct replies

I wonder how much time was wasted building a bunch of shit from scratch in Flask instead of using Django. Also who even uses Flask for new products at this point?

Comment by ashkanahmadi at 26/01/2025 at 23:00 UTC

5 upvotes, 1 direct replies

Can you edit the post to provide more information? What was the "major backend issue" and more importantly, why was it never picked up during developing and testing?

Comment by shmargus at 27/01/2025 at 02:11 UTC

4 upvotes, 0 direct replies

I hate when I overlook the users table...

Comment by young_millennial at 27/01/2025 at 00:50 UTC

9 upvotes, 1 direct replies

You guys should have hired a experienced QA… I work as one and this is an issue we would have spotted quickly