1 upvotes, 1 direct replies (showing 1)
View submission: Update on COLO switchover -- bug fixes, reindexing and more
While going over the Pushshift paper, "The Pushshift Reddit Dataset" I found this -
"In this paper, we present the Pushshift Reddit dataset.
Pushshift is a social media data collection, analysis, and
archiving platform that since 2015 has collected Reddit
data and made it available to researchers. Pushshift’s Reddit
dataset is updated in real-time, and includes historical data
back to Reddit’s inception."
Comment by safrax at 23/12/2022 at 21:09 UTC
1 upvotes, 0 direct replies
It would be literally impossible to monitor the 2.4B+ submissions and keep their scores updated in anything even remotely realtime without direct access to reddit's backend databases. Hence once and never again.