1 upvotes, 11 direct replies (showing 11)
View submission: Update on COLO switchover -- bug fixes, reindexing and more
Going to try and keep track of all the main breaking changes/bugs/notable changes here.
Metadata/total results
`"total_results": 28462`
The new api now returns a cheaper estimate count of results by default but in many applications the count is the only part you want.
Will need to add `&track_total_hits=true` to the query to get a real count, otherwise for large queries the estimate will max out at 10000.
Will need to be updated to find the total results in a different section as it now looks like `{"total":{"value":28462,"relation":"eq"}`
~~PMAW uses the field in it's pagination process and needs to be updated to use the new field to work properly among other changes, IIUC there are a couple of pull requests on the github page that bypass the field but none that adapt it to use the new field yet. PMAW should be updated this week.[1] - 2022-12-19~~ PMAW has been updated for the API changes 2022-12-24
1: https://www.reddit.com/r/pushshift/comments/zovqr9/pushshift_appears_to_return_0_results/j0swmxg/
--------------------------------------------------------------------------------
`after` and `before` no longer accepts YYYY-MM-DD, support could still be added later but at least for now it's not.
--------------------------------------------------------------------------------
Sort/order
`sort` is now `order` and `sort_type` is now `sort` so it's unlikely to be fixed with an alias later
--------------------------------------------------------------------------------
/meta
The meta page no longer exists but SITM had not been updating it anyway. The intent was to have a dynamic page where clients like PSAW could get the current rate limit but SITM never updated it.
PSAW requires some modification to work around the changes
https://www.reddit.com/r/pushshift/comments/zlryw1/ive_been_getting_response_status_code_404_since/j0bss25/[2]
Otherwise PSAW is no longer maintained and the github page recommends using PMAW instead, I was not able to find any active forks.
--------------------------------------------------------------------------------
The `https://api.pushshift.io/reddit/search` comment search endpoint is no longer functional, move to `https://api.pushshift.io/reddit/comment/search` or `https://api.pushshift.io/reddit/search/comment`
May still be aliased into being functional again later but seems unlikely as the other endpoints are much more intuitive at a glance.
--------------------------------------------------------------------------------
`full_link` is no longer included in submission results, suggest building url via `permalink` - 2022-12-26
--------------------------------------------------------------------------------
It is no longer possible to sort submissions by `num_comments` considering we're supposed to be getting aggs back once all of this is working again I think this is just an oversight on SITMs part rather than an intentional change but with so much else broken i'm not going to ask about it until I start seeing some of this being fixed 2022-12-31
--------------------------------------------------------------------------------
Searching by `url` doesn't work, this is not listed in any current documentation I can find so it may no longer be supported or it could just be something that got left out by accident. Will check after things start getting fixed. -- 2023-01-19
--------------------------------------------------------------------------------
size is supposed to be aliased to limit but doesn't work the same
size=0 returns 10 results
limit=0 returns 0
--------------------------------------------------------------------------------
author search has problems with dashes.
author search is now contains rather than an exact match.
--------------------------------------------------------------------------------
subreddit search has similar problems to author search and appears to be returning results as contains rather than exact match. As an example https://api.pushshift.io/reddit/search/submission?subreddit=science&author=science[3] is returning results from user self post subreddits like u/Inner-Science-5658 - 2023-02-01
3: https://api.pushshift.io/reddit/search/submission?subreddit=science&author=science
--------------------------------------------------------------------------------
~~submission search currently only goes back like 45 days, the data isn't there, it's supposed to be loaded from the old API this week - 2022-12-19 submissions are slowly being reloaded from the beginning currently there is a gap from 2022-01-09 to 2022-11-03. Minibug made a page to track the progress here[4] - 2023-03-29~~
Back submissions reloading appears to be complete as of 2023-04-06
4: https://minibug1021.github.io/pushshift.html
--------------------------------------------------------------------------------
`fields` is now `filter` although this is supposed to be aliased so either works later.
--------------------------------------------------------------------------------
redditsearch.io is now broken entirely, well it still loads but the search function doesn't work, the comment search had already been broken for a while and now the submission search doesn't work either.
Suggest using one of the other maintained front ends like;
https://camas.unddit.com/[5]
~~https://redditsearchtool.com/~~[6] broken by an API change resulting in a redirect 2023-01-05 https://adhesivecheese.github.io/chearch/[7]
6: https://redditsearchtool.com/~~
7: https://adhesivecheese.github.io/chearch/
--------------------------------------------------------------------------------
`!` negation no longer works, suggest using `-` instead~~, not sure if intended change or bug~~. Neither works on author or subreddit searches, ~~seems like a bug.~~ --confirmed bug 2022-12-21.
--------------------------------------------------------------------------------
querying `link_id` is only working in base 10 format[8] instead of the normal base 36 - 2023-01-07
--------------------------------------------------------------------------------
api is giving parent_ids for comments in base 10 instead of base 36 -- 2023-01-12
--------------------------------------------------------------------------------
The `metadata=true` flag seems to be ignored now and is always enabled regardless of setting.
--------------------------------------------------------------------------------
`until` is the new `before` and `since` is the new `after` but both seem to be functional.
https://api.pushshift.io/redoc
and
If it's not here i've missed it, please let me know. I aim for this to be a comprehensive list.
Comment by Security_Chief_Odo at 20/12/2022 at 01:08 UTC
6 upvotes, 1 direct replies
Author search **really** needs to be changed back to 'exact match' or given a way to make it exact match only. This 'contains' matching, will ruin a lot of searches with false positives.
Comment by bwburke94 at 20/12/2022 at 22:00 UTC
2 upvotes, 1 direct replies
At least on unddit, negative filtering (with ! signs) still isn't working properly.
Comment by [deleted] at 21/12/2022 at 09:12 UTC
2 upvotes, 2 direct replies
[removed]
Comment by MisterCrazy8 at 19/12/2022 at 23:23 UTC
1 upvotes, 1 direct replies
Could you rephrase "cheaper"?
It isn't a term I'm personally familiar with in a professional context (granted, my degree is in Computer Science, not Data Science).
I'm assuming you are saying that `total_results` returns only the count it would return given the limit of returnable items specified, and therefore would at most equal the limit. Whereas `track_total_hits=true` would result in it returning the actual total number of results, not just the limit of the items it would return at a time.
Thanks for the sticky update. It clarifies things and consolidates answers to the questions flying about.
Comment by angelafischer at 20/12/2022 at 18:03 UTC
1 upvotes, 1 direct replies
I can't access subreddits files[1]. Is this normal or are the raw files for the subreddit just never uploaded?
1: https://files.pushshift.io/reddit/subreddits
Comment by forbabylon at 21/12/2022 at 08:54 UTC
1 upvotes, 1 direct replies
Please include `/search/submission?ids=`not working in the bugs section (currently returns empty data set)
Comment by TEbejer at 26/12/2022 at 05:55 UTC
1 upvotes, 1 direct replies
With the changes from before/after to until/since, can I still use code such as?:
`import datetime as dt`
`until = int(dt.datetime(2020,1,1,0,0).timestamp())`
`since = int(dt.datetime(2019,1,1,0,0).timestamp())`
I have looked up both commands in the new API documentation at both new API documentation links above and I don't understand from the descriptions how to use them.
I understand that the API will return no results with the dates i've written in the code above because they aren't loaded yet. Mostly just wondering how to use until and since for when the data has been loaded.
Thank you for your hard work!
Comment by Beginning_Flan3921 at 12/01/2023 at 12:34 UTC
1 upvotes, 1 direct replies
Thank you. What is `parent_id: 41556640685` for a comment? I can't find this id in a parent comment and can not associate them.
Example:
Parent comment https://api.pushshift.io/reddit/search/comment?ids=j38wnwm[1][2]
Reply to parent https://api.pushshift.io/reddit/search/comment?ids=j3a77sa[3]
1: https://api.pushshift.io/reddit/search/comment?ids=j38wnwm
2: https://api.pushshift.io/reddit/search/comment?ids=j38wnwm
3: https://api.pushshift.io/reddit/search/comment?ids=j3a77sa
Comment by forbabylon at 19/01/2023 at 11:28 UTC
1 upvotes, 1 direct replies
can we please add `url` search parameter not working anymore into the bug list?
Comment by shiruken at 01/02/2023 at 03:28 UTC
1 upvotes, 2 direct replies
Not sure it's been reported, but it appears that `subreddit` filtering on the submissions endpoint is suffering from similar problems as `author` search. The following query for submissions from r/science is returning submissions from user profiles that contain the string "science" in their username:
https://api.pushshift.io/reddit/search/submission?subreddit=science[1][2]
1: https://api.pushshift.io/reddit/search/submission?subreddit=science
2: https://api.pushshift.io/reddit/search/submission?subreddit=science
Comment by grejty at 13/04/2023 at 18:21 UTC
1 upvotes, 1 direct replies
order="asc" seems not to be working for me
sort="created_utc"
order="asc"
NotImplementedError: Support for non-default order has not been implemented as it may cause unexpected results