Comment by Wiskkey on 06/10/2020 at 20:57 UTC*

4 upvotes, 2 direct replies (showing 2)

View submission: This user is posting with GPT-3: /u/thegentlemetre

I made a copy of the 1000 (the maximum number that Reddit gives) most recent comments from that account, in 4 parts:

Part 1: https://pastebin.com/9qe528AW[1][2]

1: https://pastebin.com/9qe528AW

2: https://pastebin.com/9qe528AW

Part 2: https://pastebin.com/bhr96fJU[3][4]

3: https://pastebin.com/bhr96fJU

4: https://pastebin.com/bhr96fJU

Part 3: https://pastebin.com/4M5QAcvD[5][6]

5: https://pastebin.com/4M5QAcvD

6: https://pastebin.com/4M5QAcvD

Part 4: https://pastebin.com/iMdqTBA0[7][8]

7: https://pastebin.com/iMdqTBA0

8: https://pastebin.com/iMdqTBA0

​

I then tallied the number of those 1000 comments that have a given number of points (which I believe equals 1 plus the number of upvotes from other accounts minus the number of downvotes from other accounts):

1 347 points

1 157 points

1 120 points

1 97 points

1 51 points

1 24 points

1 20 points

2 18 points

1 13 points

2 11 points

1 10 points

1 9 points

1 8 points

4 7 points

9 6 points

9 5 points

16 4 points

83 3 points

252 2 points

424 1 point

118 0 points

29 -1 points

19 -2 points

3 -3 points

4 -4 points

2 -5 points

1 -6 points

3 -7 points

3 -8 points

1 -9 points

2 -15 points

1 -18 points

1 -23 points

1 -34 points

​

388 of the 1000 comments have 2 or more points.

424 of the 1000 comments have 1 point.

188 of the 1000 comments have 0 or fewer points.

​

If my understanding of the comment points system is correct, from the last set of numbers we can conclude:

At least 38.8% of the 1000 comments have at least one upvote.

At least 18.8% of the 1000 comments have at least one downvote.

At least 38.8% + 18.8% = 57.6% of the 1000 comments have at least one upvote or downvote.

Replies

Comment by Phylliida at 08/10/2020 at 16:56 UTC

2 upvotes, 1 direct replies

Studying this distribution is actually a fairly objective way of determining “humanlike” in a Turing test type way (nuances about edge cases aside). I’d be interested to see if we could optimize prompts to boost this up, and it also provides an additional way to compare previous and future GPT models.

Unfortunately I imagine this kind of use is probably discouraged by OpenAI, and I’m not sure they’re wrong to discourage it, seeing some of the dark sides of the interactions here. The ideal scenario is probably to train a model that predicts upvotes well, then simply use it for evaluation, but don’t optimize against it

Comment by schmieroslav at 26/11/2020 at 09:39 UTC

1 upvotes, 1 direct replies

I took the liberty of cleaning the data and packing it into a json[1] file:

1: https://gist.github.com/bmurauer/8486c6fca0f6b50d046f47dbe7156299