Comment by Academic-Rent7800 on 23/12/2022 at 21:07 UTC

1 upvotes, 1 direct replies (showing 1)

View submission: Update on COLO switchover -- bug fixes, reindexing and more

View parent comment

While going over the Pushshift paper, "The Pushshift Reddit Dataset" I found this -

"In this paper, we present the Pushshift Reddit dataset.

Pushshift is a social media data collection, analysis, and

archiving platform that since 2015 has collected Reddit

data and made it available to researchers. Pushshift’s Reddit

dataset is updated in real-time, and includes historical data

back to Reddit’s inception."

Replies

Comment by safrax at 23/12/2022 at 21:09 UTC

1 upvotes, 0 direct replies

It would be literally impossible to monitor the 2.4B+ submissions and keep their scores updated in anything even remotely realtime without direct access to reddit's backend databases. Hence once and never again.