💾 Archived View for dioskouroi.xyz › thread › 29393903 captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

My toy project turned into a viral challenge

Author: jinay

Score: 76

Comments: 27

Date: 2021-11-30 16:30:41

Web Link

________________________________________________________________________________

blopker wrote at 2021-12-02 20:59:11:

Regarding the histogram issue, I worked on a project that had a few hundred histograms based on data from over 3 billion data points. It turns out that after a few thousand data points many histograms will stop changing significantly.

So, unless you really need to show exactly how many data points each bucket contains, it's much easier to run the analysis once offline, then serve just the histogram percentage data. From that you can make an SVG and overlay additional user-specific data on top. The point is that this histogram data is small and easy to cache.

You can then rerun the histogram analysis later if you'd like. However, for this project I never saw anything change with more data. It was overkill even to run it as a cron job.

elanning wrote at 2021-12-03 00:11:30:

This is a good example of why understanding the principles of statistics can come in handy.

jinay wrote at 2021-12-02 22:23:53:

Right now this project is on the scale of ~100k points, but I'm starting to see a drop in percentage change as you mentioned. In the beginning, though, the trends weren't as clear so I wanted to keep it updating.

etrautmann wrote at 2021-12-03 05:08:09:

wow, that's >1600 person-hours of time dedicated to this waiting task.

GauntletWizard wrote at 2021-12-03 02:34:59:

You can do what Prometheus does and pre-aggregate; You size it at 1s buckets or .5s buckets, store the values of those buckets, and increment them as data points come in. You can store the data-points individually too, and regenerate the histogram if you really need to, but it's far more efficient to store the aggregations of a histogram.

tyingq wrote at 2021-12-02 18:15:53:

You were sending the entire database (all times from all users) to each individual end user to compute histograms client-side after they finished? Ah, yeah, that could get expensive on Firebase.

jinay wrote at 2021-12-02 22:25:58:

I guess it wasn't top of mind at the small scale I planned to operate at, but definitely a facepalm when you put it that way.

stavros wrote at 2021-12-02 23:46:00:

Eh, YAGNI is a valid development strategy. When you get thousands of users you can make it more efficient, and you did.

function_seven wrote at 2021-12-02 23:31:13:

I have to brag here. I silently visualized a wall clock ticking off the seconds and got 60.02 seconds.

2 hundredths of a second off from reality!

I could try it again to validate, but I can't be bothered. As far as I'm concerned, I'm super accurate judging the passage of time. No need to find out if it was a lucky random pick in the interval [58, 62] :)

pugworthy wrote at 2021-12-03 01:30:45:

60.38s for me, which also surprised me.

But the real surprise was just how close a lot of people get. I was fully expecting a bell curve that was offset high or low due to some imagined bias people would have to count fast or slow.

mannykannot wrote at 2021-12-03 02:16:59:

Unless you try it again, we have no quantitative basis for estimating your variance. You can brag all you like, but I'm not listening.

function_seven wrote at 2021-12-03 03:22:36:

Ok, I just did it again right now. This result was 59.48s.

I'm going to acknowledge that result as an outlier, exclude it from my analysis, and stick with my median result of 60.02.

fho wrote at 2021-12-03 08:55:26:

      $ np.median([59.48, 60.02])
    59.75
    $ stats.median_abs_deviation([59.48, 60.02])
    0.27

;-)

post_break wrote at 2021-12-02 22:43:59:

Just some feed back. I started the challenge, but the constant text changing like "I'm bored" etc was enough to throw off my internal clock to guess when to stop it.

Fnoord wrote at 2021-12-03 15:43:35:

Close your eyes.

All I did was count from 20 to 80. Why from 20? Its a trick I learned from drivers license practice. In my language, the 1 to 20 are too short, so if you want to time a second they're not appropriate.

boberoni wrote at 2021-12-02 18:07:10:

I pushed a quick fix to the issue by freezing the data being sent to the client, thereby halting the rapid growth in data consumption.

What do you mean by "freezing the data"?

Regarding the excessive download problem, my first instinct is to periodically (for example, every hour) compute summary statistics for the bar chart and store that in Firebase. This, of course, would require an additional script/service to perform these periodic jobs.

I'm not sure if that's what you ended up doing and I'm curious what your solution is.

bocytron wrote at 2021-12-02 20:25:13:

> What do you mean by "freezing the data"?

https://github.com/JinayJain/just-a-minute/blob/master/app.j...

jinay wrote at 2021-12-02 22:22:13:

As others have mentioned, my solution in the moment was to download the JSON file from Firebase, compute histogram statistics manually, and hardcode the histogram into the JS itself.

Obviously not a scalable solution, and I think I would have done something very similar to the periodic updates like you mentioned (if I had more experience with cloud functions etc.)

noahtallen wrote at 2021-12-02 20:12:57:

> This, of course, would require an additional script/service to perform these periodic jobs.

Also worth noting that Firebase has built-in “cloud functions” which have access to the database API. It would be pretty easy to run one on a schedule.

malfist wrote at 2021-12-02 18:20:31:

Probably (and I'm guessing, not the author here) took a snapshot of the data and hardcoded that to be sent to the client instead of live data.

Aeolun wrote at 2021-12-03 04:40:33:

There used to be a very old DOS program that did this, but for 5 seconds.

notwhereyouare wrote at 2021-12-02 19:06:30:

is there a way to see the results without going through the challenge?

jinay wrote at 2021-12-02 22:24:56:

I filter out any data points <5 seconds (as seen in the graph), so completing your attempt in under 5 seconds should do it.

Ideally, people would do the challenge first and then see where they lie on the graph before seeing the data itself.

notwhereyouare wrote at 2021-12-03 01:46:04:

that's what I ended up doing. I was looking at it, clicked to see your other projects, and then went back and couldn't view the results

dwighttk wrote at 2021-12-02 20:58:21:

Nice bell curve around 60s but a little higher on left than right (more people underestimate, but proportionally) and there’s a spike at 0-8 seconds (from people who just wanted to see results or decided to quit quickly.)

hatsuseno wrote at 2021-12-03 00:01:52:

I think there's also a good amount of people who didn't get what the site was actually asking the user to do and just did what was prompted.

tyingq wrote at 2021-12-02 21:37:55:

https://imgur.com/a/jVYbnSn