💾 Archived View for dioskouroi.xyz › thread › 24919569 captured on 2020-10-31 at 01:00:03. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Wikimedia is moving to Gitlab

Author: input_sh

Score: 881

Comments: 384

Date: 2020-10-28 15:04:50

Web Link

________________________________________________________________________________

zzzeek wrote at 2020-10-28 16:49:20:

I think I have some relevant experience here.

We host all of our projects on github:

https://github.com/sqlalchemy/

yet we also use gerrit!

https://gerrit.sqlalchemy.org/

users send us pull requests, and they never have to deal with Gerrit ever. We use a custom integration, the source code to which is here:

https://github.com/sqlalchemyorg/publishthing/tree/master/pu...

and then we mostly bidirectional synchronization between Gerrit and Github pull requests (code changes can move freely from Github PR -> gerrit, comments and code review comments are posted bidirectionally, Gerrit status changes are synchronized into the PR - example:

https://github.com/sqlalchemy/sqlalchemy/pull/5662

I continue to find Gerrit's code review to be vastly better than Githubs. Gitlab would take tremendous server resources to run internally and I like Github much better for the front-facing experience.

I wrote about an earlier form of our integration here:

https://techspot.zzzeek.org/2016/04/21/gerrit-is-awesome/

to sum up:

1. your project benefits massively by being on Github

2. Gerrit is awesome (for me)

3. gitlab is not very appealing to me UX-wise and self-hosting wise

4. you can still use pull requests from outside users and use gerrit for code reviews.

augustohp wrote at 2020-10-28 20:21:50:

I am yet to know people who Review code and use Gerrit to name a better solution.

I belonged to a team that used Gerrit for Review and Hosting, we changed to hosted Gitlab because people missed a "GitHub-like UI" they were used to. It was unanimous that Code Review on Gerrit was way better:

1. You start reviewing the commit message, that is the first touch point with a change everyone has

2. Navigation is done from file to file

3. On Gerrit there isn't two people commenting the same thing, because:

3.a. Messages from different people on the same line change are displayed together, not as different threads.

3.b. The review of a previous version is displayed with the next version, so you can continue the same discussion

I understand that GitHub/GitLab interface is more friendly, but their code-review really stands in the way of producing good software by not favoring good commit messages and long discussions.

u801e wrote at 2020-10-28 22:04:21:

> I am yet to know people who Review code and use Gerrit to name a better solution.

What about reviews via patches sent to a mailing list?

I haven't looked into Gerrit for a while, so one question I have is how it handles related commits? The mailing list approach can group them in a single thread tied by a cover letter message where each commit along with the associated diff from the parent working tree is a message which is a reply to the cover letter message.

AaronFriel wrote at 2020-10-28 22:33:18:

To be polite, I think the target audience of these tools might not include you. While that workflow works for you and, apparently, scales for some very large projects like the Linux kernel, it isn't a good solution for an enormous number of people which is _why_ tools like Github, Gerrit, GitLab, and others exist.

u801e wrote at 2020-10-28 22:38:45:

> To be polite, I think the target audience of tools might not include you.

Insinuating that I'm a special case doesn't add to the discussion.

AaronFriel wrote at 2020-10-28 22:56:34:

You aren't a special case, but if you don't see any flaws with your method of code review then I don't think you're the target market for code review tools.

h0l0cube wrote at 2020-10-28 23:55:28:

I have no dog in this fight, but you stated:

> it isn't a good solution for an enormous number of people

.. without providing any insight or justification for that belief. So an interesting and productive tangent might be to elucidate what you believe those flaws are?

Unpacking this question might even lead to some good UX ideas that can be applied to today's review systems?

fireattack wrote at 2020-10-29 03:22:42:

I would provide one from someone who is not a dev or SE.

I honestly never fully understand how to read the maillist patch. The plain text format makes it very hard to understand what's going on. I'm sure I will understand it better or even prefer it if I use it long enough, but then again I can instantly understand code reviews on GitHub/GitLab/Gerrit.

h0l0cube wrote at 2020-10-29 03:36:04:

I've never had to use a maillist patch myself, but from the ones that I've glanced at, I have this same problem, that formatting, syntax highlighting is absent. It seems like this would be an easy problem to fix by using a better viewer, so that might be enough to make it comprehensible.

u801e wrote at 2020-10-29 04:58:49:

> I honestly never fully understand how to read the maillist patch. The plain text format makes it very hard to understand what's going on.

That may be due to the settings in your mail client. If it's displaying plain text in a variable width font and/or not applying syntax highlighting in terms of showing added and removed lines in different colors, that would make the diff more difficult to read.

But some mail clients can do that and it makes reading the diff much easier.

> I'm sure I will understand it better or even prefer it if I use it long enough, but then again I can instantly understand code reviews on GitHub/GitLab/Gerrit.

Essentially, code reviews in a mailing list are much like a discussion thread in Hacker News or Reddit where the thread structure is very similar. The only difference is that most mail clients only allow you to display one message at a time.

In my mail client, Thunderbird, you can see the overall thread structure of a patchset discussion. This is the root message for the thread which serves as the cover letter (which is the equivalent of the PR description) [2]. The first commit in the patch is displayed here [3] (note that I have a plugin that enables diff syntax highlighting). The email subject is the commit message title (with the [PATCH v2 1/3] tag prepended to it). The commit message itself is the beginning of the email, and the diff follows.

Unlike Github (and maybe Gitlab), the commit message and diffstat is treated at the same level as the diff itself. That means you can comment on it just like you would on the diff.

Here, you can see the Junio C Hamano's comments on the second commit in the patch set [4]. He's commenting on the diffstat line which shows 391 lines lines added to the builtin/submodule--helper.c file. Further down in the same message [5], he's commenting on the code inline, much like someone would quote a message here on HN and reply inline to multiple sections of it. It's not really that different compared to comments on a diff in Github or Gitlab other than the fact that it's a reply to an email message rather than a web page.

[1]

https://i.imgur.com/QmqUWR8.png

[2]

https://i.imgur.com/mILREtf.png

[3]

https://i.imgur.com/gdoy5zs.png

[4]

https://i.imgur.com/BcTdRRe.png

[5]

https://i.imgur.com/cCpqsOL.png

fireattack wrote at 2020-10-29 05:30:12:

I will be honest, I don't even use email client :/

u801e wrote at 2020-10-29 05:36:29:

True, I suppose most people use Gmail or one of the other major email providers through a webmail interface. I haven't been able to get Gmail or Hotmail to display threaded messages the way they're displayed in Thunderbird and they tend to display messages using a variable width font.

In that context, reviewing code would be difficult, if not impossible, to do via email.

eru wrote at 2020-10-30 04:36:58:

That should be an easy problem to fix:

Just send out email in HTML format with the code portions set to fixed width font and syntax highlighting already applied. Should display just fine in GMail.

h0l0cube wrote at 2020-10-29 05:51:58:

It is one advantage to mailpatch though. A lack of vendor lock-in means you can view the patch using whatever application you want. And there's room for a better mailpatch viewer, if anyone could be bothered to make one.

I'm guessing one disadvantage is the method of diff is hard-coded into the patch. It would be good to switch to word-diff, or ignore whitespace, but I'd imagine these could be applied as transformations on the generic format.

u801e wrote at 2020-10-29 06:19:24:

> I'm guessing one disadvantage is the method of diff is hard-coded into the patch. It would be good to switch to word-diff, or ignore whitespace, but I'd imagine these could be applied as transformations on the generic format.

The plugin I use in Thunderbird can switch between unified, context, and side-by-side diff views based on the same email. Adding the transformations you mentioned could be done.

But one limitation that email has over other review tools is the lack of ability to expand the view of the context within the client. The only way I could think of is to have git format-patch generate the diff with the entire context included and then have the client limit the display of that context. But that not have a reasonable fallback for those using clients that aren't capable or configured to do that.

izietto wrote at 2020-10-29 15:17:49:

Which plugin are you using? it seems life-changing to me

u801e wrote at 2020-10-29 15:27:44:

This is the one I'm using:

https://github.com/Qeole/colorediffs

. I installed it several years ago, so I'm not entirely sure whether you can install it on a current version of Thunderbird, but it is still working with my installation.

basicexploit wrote at 2020-10-30 14:08:51:

I might try it out as well! Thanks for it.

u801e wrote at 2020-10-28 23:37:22:

Now you're claiming that I'm making an argument that I never made.

To be clear, the original statement was whether there was a better review tool compared to Gerrit for those who review code and use that tool for that purpose. I responded by suggesting the patch review via mailing list method and asked a follow up question about how Gerrit handled related commits and explained how that case was handled by the mailing list method.

Then you, not the person I originally responded to, decided to interject and, while claiming not to be rude, claims I'm saying something entirely different than what I actually stated.

Personally, I found that very off putting and extremely rude on your part.

AaronFriel wrote at 2020-10-29 22:36:01:

I'm sorry I was rude to you, I didn't mean that and I apologize.

I didn't mean anything more than what I literally wrote, which is I don't think you're the target market for code review tools and your suggestion may not be a good fit for people looking for review tools.

waheoo wrote at 2020-10-29 01:17:06:

Gerrit handles related commits in a similar way I guess.

A Gerrit changeset is like a GitHub pr, it has many commits, and commits are usually rebased against master, this is crucial to track the review properly over time. (Its very easy to switch to earlier revisions of the same changeset and you never see external changes.)

Honestly, I don't think Gerrit is anything special, I think it simply has the right approach to development (rebased/ff-only) that enables easy review.

u801e wrote at 2020-10-29 06:35:33:

Based on the documentation I read, it looks like Gerrit handles this by grouping changes by topics [1]. I'm not sure whether that can be done automatically when pushing up changes that span multiple commits in a local branch.

If I create a branch with several commits where one commit adds a new method with associated unit tests, and a subsequent commit adds several calls to that new method in the code base (while updating any affected tests), then how would Gerrit handle the ordering of those commits. Even if they're in the same topic, I don't know if there's a way to ensure that the first commit is reachable from the subsequent commit.

[1]

https://gerrit-review.googlesource.com/Documentation/intro-u...

atombender wrote at 2020-10-29 00:06:13:

Gerrit is the one where each "pull request" has to be a single commit, right?

I'm not particularly happy about GitHub, but I think it's less about GH and more about workflow and source code evolution, and I'm not sure if Gerrit solves anything here.

First, PRs being heavyweight encourages large branches instead of smaller incremental changes. Secondly, a large change (such as a big new feature) ends up living in a branch until it's ready to be _released_, not until it's reviewed. This means that any code that cannot be immediately merged to master need to live on that branch for a while, and gets rapidly out of date, requiring constant rebasing to keep it from rotting.

I'd prefer to merge as soon as something is accepted and use master at the main development branch, but that causes challenges. If you've merged something you don't want to release yet, you're faced with having to build release branches through cherry-picking, which can be really difficult or even impossible. You can hide features behind flags, but sometimes a branch is a big risky refactoring or some structural change that isn't providing features that can be isolated. Plus, once you merge something, any following changes, even if unrelated, often end up depending on the things you want to exclude.

I think something like Pijul (with some discipline, like always doing small incremental commits) could make this easier by being able to treat individual commits as moving pieces that can be rearranged for a release, but it wouldn't solve everything.

Any thoughts on this and how Gerrit would fit in?

bawolff wrote at 2020-10-29 00:39:17:

> Gerrit is the one where each "pull request" has to be a single commit, right?

Yes, but you can have changesets depend on each other, so its not that big a deal. (But you ca end up in rebase hell if you do that). You also get a version history of all the different versions of your commit.

Ancedotally, during my use of gerrit, i never really wished to have multiple commits on a single changeset.

> I'd prefer to merge as soon as something is accepted and use master at the main development branch

That's what Wikimedia did, mostly. (There were weekly deployment branches but it was unusual to have something in master but reverted out of the deploy branch). It seemed to mostly work fine afaik (of course i wasnt on the team doing deploys, for all i know they might have horror stories)

JoshTriplett wrote at 2020-10-29 03:51:11:

> Ancedotally, during my use of gerrit, i never really wished to have multiple commits on a single changeset.

People's tools shape their workflows; projects that use Gerrit tend to do more squashing of commits, because there's much more per-commit overhead. When I encountered Gerrit, I found it really frustrating to work with for this exact reason. Other aspects of it were great, but if you're used to "one logical change per commit" and end up with a dozen commits in a PR, that can be painful with Gerrit.

lima wrote at 2020-10-29 10:41:51:

The whole point of Gerrit is to keep doing "one logical change per commit", but reviewing them individually. Anecdotally, this results in much higher-quality reviews.

You can still group them by "feature" and merge them atomically by using topics.

LockAndLol wrote at 2020-10-28 19:00:47:

> # Why

> For the past two years, our developer satisfaction survey has shown that there is some level of dissatisfaction with Gerrit, our code review system. This dissatisfaction is particularly evident for our volunteer communities. The evident dissatisfaction with code review, coupled with an internal review of our CI tooling and practice makes this an opportune moment to revisit our code review choices.

and then further down

> # FAQ

> * Why is GitHub not considered?

> - GitHub would be the first tool required to participate in the Wikimedia technical community that would be non Free Software and non self-hosted.

> - GitHub also does not meet all of our needs; for example, GitHub grants little control of metadata, no influence over privacy policy/data retention, sanctions and bans, little control over backups and data integrity checks, and no long-term guaranteed access to underlying repository settings and configuration.

bawolff wrote at 2020-10-29 00:35:11:

Wikimedia was already using mirroring to github (however we didn't accept pull requests).

I'm pretty sure most of the anti-gerrit sentiment at wikimedia was about gerrit as a code review tool.

My personal experience with it (as a mediawiki developer) is gerrit has a lot of UI bugs (although it has gotten better). I also suspect it encourages a code review culture that is overly nitpicky and risk averse (but perhaps that is just cultural forces at wikimedia)

zzzeek wrote at 2020-10-29 01:34:05:

I'm not familiar with any UI bugs of any kind, but we don't use the "live editing" feature, maybe that's where you had problems. The big issue with Gerrit is on the "getting plugins to work" side of things, as they are kind of ad-hoc and almost totally undocumented, as well as the access model is too complicated but once that's all working, there is no need to deal with it.

nitpicky culture, we maybe have that problem with Openstack where there are thousands of developers, but for our projects in SQLAlchemy we're a team of about five people and I'm more or less a BDFL type of role, to the degree that we are nitpicky about things it only prevents much bigger problems from happening later, if a review has little things that are bugging me I'll just fix them myself and push a new change up rather than bothering them with it, also something you can't usually do with pull requests.

bawolff wrote at 2020-10-29 01:56:41:

Wikimedia was using a super old version of gerrit for a long time. They upgraded recently (although the upgrade happened at roughly the same time as i left my job and took a step back from wikimedia so i dont have much experience with the new version).

I think Wikimedia struggles a lot with code review culture in general. Different people have conflicting ideas about what good code looks like. It used to be very nitpicky (i've had code rejected in the past for using (php's) intval() instead of casting to int. I've also had code rejected for casting to int instead intval().) But that's improved quite a bit with better precommit lint tools. The length of the feedback cycle is very long and sometimes feels like its mostly about who you know (e.g. the last patch i submitted i did on sept 9 for a decently serious bug. First actionable feedback (which was relatively minor things of the form use a constant named ONE_MINUTE instead of 60) was on oct 15. Thats kind of a long time to wait for code review imo). Anyways, its just not fun to contribute when code review is so unpredictable and long.

Hmm. Guess i got off on a bit of a tagent there. I do think gerrit has some usability issues, but i think that's hardly the main problem.

zzzeek wrote at 2020-10-29 14:18:53:

those sound like managerial / organizational / social issues. technology isn't going to solve those without good guidance and controls for the overall system. building that up for a very large organization is extremely difficult, I'd not want to have to do that :).

lima wrote at 2020-10-29 10:43:18:

> _My personal experience with it (as a mediawiki developer) is gerrit has a lot of UI bugs (although it has gotten better). I also suspect it encourages a code review culture that is overly nitpicky and risk averse (but perhaps that is just cultural forces at wikimedia)_

Can strongly recommend to remove "-1 code review" and require all comments to be resolved instead. Accomplishes the same goal why being more positive.

aprdm wrote at 2020-10-28 22:16:11:

I wonder how gerrit compares to reviewboard (

https://www.reviewboard.org/

)

jancsika wrote at 2020-10-28 16:06:13:

Anyone thinking of moving to their own Gitlab instance with Gitlab CE-- either stay on Github or prepare to waste your time dealing with user spam bots that pollute your site's search results.

In other words-- if you want the common use case for a FOSS project:

1. publicly viewable main repository with publicly viewable issue tracker

2. requirement to log in to view all snippets, user profiles, perhaps even other repos as enforced by administrator settings (otherwise SEO bots will leverage these features to eat your search results)

3. anyone with an email can sign up to post issues to the main repo's issue tracker

There is no combination of settings in Gitlab CE to achieve this. Any sane approach has to leave out step #2. That means that your Gitlab instance gets hammered with user spam from bots which then get indexed in Google search results for your site.

Worse, Gitlab has no tools to make it easy to remove the user spam (and obviously no tools to prevent it from happening).

Just run a public-facing Gitlab CE instance for a few days. Search for one of the spam snippets you collect, and you'll find results for all the FOSS projects out there running their own Gitlab instances.

I've never seen any solutions offered by Gitlab for this, nor frankly any interest in the myriad bug reports about them addressing this at all.

Edit: typo

phikai wrote at 2020-10-28 16:49:22:

Hi! I'm the PM at GitLab who works on Snippets, so thanks for providing this feedback. We do have Recaptcha support which can be configured - are you seeing these kinds of issues with that enabled/configured?

One item that is on the roadmap that is coming and may be of interest is `Optional Admin Approval for local user sign up` -

https://gitlab.com/groups/gitlab-org/-/epics/4491

I'm not in the group working on that, but it does appear to be coming soon and would limit the ability of newly created accounts from doing anything until they're approved.

protoduction wrote at 2020-10-28 17:06:41:

Hi phikai,

I built a privacy friendly alternative to ReCaptcha called FriendlyCaptcha [1], is there a possibility to see this integrated as a more user friendly alternative?

Happy to chat (e-mail in profile)

[1]

https://friendlycaptcha.com/

barnabask wrote at 2020-10-28 17:26:54:

Man this needs more attention, cool project. I see you tried to submit to HN a couple of times and didn't get traction, that's too bad. Don't give up!

ognarb wrote at 2020-10-29 12:41:56:

Your website mention that friendlycaptcha is open source but looking at the license in the repository, it is a custom license that can't be defined as open source. Can you change it to source available?

aeyes wrote at 2020-10-28 18:28:24:

Is the demo somehow tweaked to be less hard?

On my machine it doesn't take any time to solve it and I see no signs of CPU usage. Even trying a couple of times in incognito mode and watching CPU immediately after loading the page for the first time.

On many sites creating a profile takes a few seconds. Loading one of my CPU cores for another 5 seconds doesn't really bother me if I wanted to create massive amounts of profiles/posts. I'll still do over 100 per minute on a standard desktop PC.

protoduction wrote at 2020-10-28 18:50:30:

The default difficulty is set to a difficulty that makes sense on websites that have a varied audience (which includes some ancient browsers on old devices).

The solver runs in WebAssembly and is really really fast (~4M hashes per second) - but not every browser supports WASM yet (around 0.3% empirically). The JS fallback is around 10 times slower (more in 5+ year old browsers) - for those users you want at least a decent solve time too.

For Gitlab's audience the difficulty can probably be increased a lot - it all depends on the website and usecase. I'm sure the JS fallback's performance can be improved (it involves a lot of operations on 64bit ints that need to be represented as two numbers in JS), happy to accept PRs [1] :)

[1]:

https://github.com/FriendlyCaptcha/friendly-pow/blob/master/...

thinkloop wrote at 2020-10-28 19:24:02:

What are your thoughts on performing a quick intial test on each client to measure their performance then tailoring the puzzle to be difficult enough for each?

unilynx wrote at 2020-10-28 19:30:51:

Once the spammer figures out what you're doing, he'll just throttle the CPU for the duration of the quick test.

Depending on how smart the test is, just having Date.now() return values with a -12000, -11000, -10000 offsets the first few calls might even do it

sytse wrote at 2020-10-28 17:14:14:

That looks cool! Can someone create an issue to add support for this to GitLab? And maybe we can consider switching GitLab.com to this as well.

robotmay wrote at 2020-10-28 18:25:48:

I'm personally interested in this too so I've created one :D

https://gitlab.com/gitlab-org/gitlab/-/issues/273480

sytse wrote at 2020-10-28 19:11:11:

Thanks for creating this! I think adding support for this in GitLab is a no-brainer. After that we can consider enabling it for GitLab.com

birdsbirdsbirds wrote at 2020-10-28 17:37:34:

Hopefully you are successful, but how can you scale? If it takes 5 seconds on a desktop, then a server can solve 500.000 captchas per month. At $5 per month, a spammer can still send 1.000 messages for a cent.

protoduction wrote at 2020-10-28 17:52:39:

It's not enabled yet in production - but the main mechanism is by increasing the difficulty as more requests are made from an IP in a certain timeframe (it's basically rate limiting at that point). Think: every 3rd request in a minute doubles the difficulty with some cooldown period.

With that the cost (and complexity) of an attack can hopefully be in the same ballpark (or higher) than ReCaptcha - without your end user having to label cars or send data to Google.

But in the end a determined spammer will get through any captcha cheaply (for reference: ReCaptcha solves are sold by the thousands for $1) - we just hope we can do better than ReCAPTCHA, especially UX-wise.

coder543 wrote at 2020-10-28 18:14:04:

The obvious follow-up question is how IPv6 impacts this, because I think it's supposed to be easy for someone to get their hands on a decent chunk of IPv6 addresses.

Maybe the difficulty could scale as a property of how similar the IP address is to previously seen addresses... so the addresses in the same /64 block would be very closely related, for example. (I think that's how IPv6 works... but definitely something I haven't researched lately, so I could just sound very confused)

protoduction wrote at 2020-10-28 18:31:16:

I don't have all the answers yet, but indeed rate limiting a larger block (at least /64), or even at multiple prefix sizes with different weighting makes sense.

zahllos wrote at 2020-10-29 00:18:27:

So the way this is supposed to work is that providers hand out /48s and each site should be allocated a /64. In practice if you for example rent a VPS, you'll be handed a /64 for it by your service provider from their /48.

I would personally treat any /64 as the same. Depending on your local network setup the second half of the address could be anything and could change frequently. You might also get multiple addresses. Whereas getting a new /64, or /48, requires slightly more effort.

Of course there's a risk you'll block a /64 and that takes out some whole company or whatever, but I've seen that happen to corporate proxies that got flagged as a source of spam as well so this is not an easy problem even without the 2^128 address space.

leonidasv wrote at 2020-10-29 01:52:31:

I love this concept of proof-of-work captchas, but there's a growing number of tools and ways to bypass IP blocks via IP rotation[1], specially after the explosion of IaaS providers. How do you intend to tackle this?

[1] Some examples:

https://rhinosecuritylabs.com/aws/bypassing-ip-based-blockin...

https://oxylabs.io/products/real-time-crawler

https://github.com/alex-miller-0/Tor_Crawler

https://www.scrapinghub.com/crawlera/

njitram wrote at 2020-10-29 11:35:44:

There are free and paid list of all ip addresses from datacenters like

https://udger.com/resources/datacenter-list

, they probably existing for specifically preventing this, so maybe thats an option here.

typenil wrote at 2020-10-28 21:15:24:

Love to see this. ReCaptcha is nothing short of a menace. I'll take a shot at this for my next project

NorwegianDude wrote at 2020-10-28 23:51:11:

Cool project, but I do find it quite ironic that it's named friendly captcha when it's not a captcha.

Eldt wrote at 2020-10-28 23:59:01:

How would you define "CAPTCHA"?

perryizgr8 wrote at 2020-10-29 04:02:26:

The original expansion was "Completely Automated Public Turing test to tell Computers and Humans Apart".

jimmydorry wrote at 2020-10-29 03:52:04:

CAPTCHA: a computer program or system intended to distinguish _human_ from _machine input_, typically as a way of thwarting spam and automated extraction of data from websites

I would say this Oxford Languages dictionary definition is close enough.

laughinghan wrote at 2020-10-28 19:57:22:

There doesn't appear to be any discussion on your website or on GitHub about why, to be blunt, this is even a good idea in the first place.

A classic 2004 paper, "Proof-of-Work" Proves Not to Work [0], explained that the fundamental problem with proof-of-work bot filters is that attackers will always be able to solve the cryptographic puzzle faster than legitimate users. A touch of security-through-obscurity can help at the margins, but you chose Blake2b, which is used by cryptocurrencies like Zcash, Siacoin, and Nano [1], and as a result there are optimized GPU algorithms (first Google result [2]) and FPGA designs (one of the top Google results [3]). Have you run the numbers on any of those?

The closest to any discussion of these numbers that I saw was a mention on your website that it may take up to 20s on mobile; for comparison, the much-hated image CAPTCHA takes about 6-12s on average for native English speakers, and 7-14s for non-native speakers [4].

In another comment you bring up the idea of starting with a lower difficulty, and increasing it with repeated requests from the same IP address (IPv4, I assume). Unfortunately, access to unique IPv4 addresses is highly correlated with access to more compute power: laptops and desktops in developed countries are most likely to be in a household with a unique IPv4 address, whereas mobile devices on 4G internet and households in developing countries are more likely to be behind Carrier-Grade NAT [5], where thousands or millions [6] of hosts share a pool of a handful or dozens of IPv4 addresses. (The exact same concern applies to IPv6 /64 prefixes.)

This means that mobile devices will face a "double-jeopardy": your service will present them with higher proof-of-work difficulties because the same IPv4 address is shared by more people, and at the same time, the mobile device solves the proof-of-work slower for the same difficulty than a desktop.

Do you have documented anywhere on your website or GitHub how you address these concerns?

[0]:

https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf

[1]:

https://en.bitcoinwiki.org/wiki/Blake2b

[2]:

https://github.com/zhq1/sgminer-blake2b

[3]:

https://xilinx.github.io/Vitis_Libraries/security/2020.1/gui...

[4]:

http://theory.stanford.edu/people/jcm/papers/captcha-study-o...

[5]:

https://en.wikipedia.org/wiki/Carrier-grade_NAT

[6]: Yes, millions. RFC 6598 reserved a /10 for them, which is 4 million unique IPv4 addresses:

https://tools.ietf.org/html/rfc6598

coder543 wrote at 2020-10-28 20:16:51:

I'm not associated with the project in any way, but your well researched comment did miss at least one important factoid.

This comment:

> The closest to any discussion of these numbers that I saw was a mention that it may take up to 20s on mobile; for comparison, the much-hated image CAPTCHA takes about 6-12s on average for native English speakers, and 7-14s for non-native speakers.

Missed this quote from the website:

> As soon as the user starts filling the form it starts getting solved

> By the time the user is ready to submit, the puzzle is probably already solved.

The time spent solving reCAPTCHA is active user involvement. The time being spent on Friendly Captcha is passive and can overlap with time being spent filling out a form.

"up to 20 seconds" was also seemingly presented as a worst-case scenario. Most users' devices would presumably be faster than that, but I don't know how the author researched that conclusion on how performance scales. Friendly Captcha does report back some information on how long it is taking users to solve the captcha, and it looks like website owners could use that to adjust the difficulty based on the needs of their specific audience and how tolerant they are of untargeted spam.

The stuff you point out about Blake2b seems entirely legitimate, and I wonder if an Argon variant would be more appropriate to avoid specialized hardware being quite so problematic.

Personally, I really like the idea of Friendly Captcha. Certainly, there are problems with any captcha implementation. People can rant for many, many paragraphs about websites that use reCAPTCHA... I'm not surprised to see someone ripping apart a different captcha system. The ideal solution would be for spammers to just stop being so obnoxious... but good luck with that plan.

laughinghan wrote at 2020-10-28 20:41:37:

_The time being spent on Friendly Captcha is passive and can overlap with time being spent filling out a form._

Great point!

_I wonder if an Argon variant would be more appropriate_

The creators of Argon2 actually also created a memory-hard proof-of-work function they call MTP (for "Merkle Tree Proof", which is a terrible name, totally un-Googleable; I always have to search for the title of their paper, "Egalitarian Computing"):

https://arxiv.org/pdf/1606.03588.pdf

A bug bounty for it was sponsored by Zcoin, which is nice. Zcoin is actually considering moving away from it, but mainly because the proof size of 200kb is prohibitive, which is less of a concern for a captcha system:

https://forum.zcoin.io/t/should-we-change-pow-algorithm/477

_I'm not surprised to see someone ripping apart a different captcha system_

I really don't mean to rip it apart. I just wanted to see some discussion, any discussion, of the well-known flaws with the idea and what ideas OP has to address them.

protoduction wrote at 2020-10-28 21:40:33:

It is also important to note that the 6-12 seconds and 7-14 seconds reported in the paper is for the garbled text CAPTCHAs, not for image labeling tasks (fire hydrants, cars, etc).

protoduction wrote at 2020-10-28 21:20:21:

I'll try to provide my thoughts on each of the issues you've mentioned, let me know if there's something I missed.

On using blake2b:

I chose blake2b as I was looking to use a hash function that is small in implementation, readily available and already optimized. With WebAssembly the solver can achieve (close to native) speeds and be least be an order of magnitude or two closer to optimized GPU algorithms.

Using specialized hardware, image tasks (and even more so audio tasks which must be present for accessibility reasons) have the same issue that they can be solved by GPU algorithms (i.e. machine learning, in which even a low percentage success rate would already be enough). If you search on GitHub you will find there are more ML captcha cracking repos than captcha implementations - they are probably even easier to get started with than adapting GPU miner code.

Image/Audio Captcha vs ML is an arms race that can be beat for split seconds of compute (even on CPU) or cheap human labeling: it's just as broken. FriendlyCaptcha optimizes for the end user (privacy + effort + accessibility) by not engaging in the arms race - I think it makes a better trade-off. Like the sibling comment pointed out the captcha solving can happen entirely in the background so that hopefully it doesn't even make the user wait.

As for rate limiting/difficulty adjustment: it's not perfect and it could lead to problems if you share the IP with a spammer (and let's be realistic: even with a million users on one IP there won't be tens of users signing up to some forum per minute). Also normal captchas have problems here though: users from these locales already get presented with much more difficult+frequent recaptcha tasks (I also doubt they are localized: American sidewalks are harder to label if you've never seen one in real life). Setting a reasonable upper limit to difficulty may be good enough here.

On not using blake2b:

I have considered mutating the hashing algorithm every day randomly to make writing an optimized solver for it all that more difficult - but that would mean one could no longer self-serve the JS+WASM and be done with it. I won't rule it out for FriendlyCaptcha v2 if this does ever become a real problem.

Swapping out the hash function should be easy (the puzzles are versioned to allow for this). If you have a different function in mind and someone implements it in Assemblyscript (so we also have a JS fallback) then we can definitely consider it.

webphineas wrote at 2020-10-28 17:50:56:

Really nice! Finally someone is using the blockchain technology in a meaningful way!

laughinghan wrote at 2020-10-28 19:10:31:

This doesn't use a blockchain, it uses a Hashcash-style proof-of-work function (an idea that predates the Bitcoin by decades):

https://en.wikipedia.org/wiki/Hashcash

redbergy wrote at 2020-10-28 19:07:41:

Awesome work, I will be giving this a try in my next project

remram wrote at 2020-10-28 22:01:39:

> up to 20 seconds on old smartphones

That sounds like a very battery-unfriendly idea.

protoduction wrote at 2020-10-28 22:25:00:

It's not perfect, but maxing a single core for 20 seconds on an older smartphone is a necessary evil for this kind of captcha.

The alternative: loading a third party script and multiple images (~2MB) to label for ReCAPTCHA and spending time performing the task also takes some battery (and mental) power.

Max70 wrote at 2020-10-28 17:41:00:

Wow! Thumbs up! I have just checked it out and FriendlyCaptcha seems to be a true game changer. I hope that it will replace every f*cking Google reCAPTCHA out there. Such a great idea!

jancsika wrote at 2020-10-28 17:11:32:

> We do have Recaptcha support which can be configured - are you seeing these kinds of issues with that enabled/configured?

Thanks, I have used Recaptcha for a long time now. It made no difference.

> One item that is on the roadmap that is coming and may be of interest is `Optional Admin Approval for local user sign up` -

https://gitlab.com/groups/gitlab-org/-/epics/4491

Yes, that would be a very sensible solution and welcome feature for my use case here.

Unfortunately, from the bottom of that issue tracker:

"Yikes. I'm glad we did the further breakdown and pre-work. It's a bit cringeworthy looking back and seeing I estimated a 5"

mushakov wrote at 2020-10-28 18:15:32:

Hi! I'm a PM at GitLab. Please see my reply above for more details but TL;DR we shipped the first iteration of the `Optional Admin Approval for local user sign up` feature in 13.5. I'd love your feedback! Please comment on the epic if there are other changes for this feature that would help your use case

https://gitlab.com/groups/gitlab-org/-/epics/4491

jancsika wrote at 2020-10-28 19:45:00:

Thanks for the update. I can certainly manage user sign-up from the admin tab for the time being. Once it's hooked into email, I believe that will make things maintainable again for me.

From a UX standpoint it's still sub-par. Someone who wants to report an issue doesn't want to wait an arbitrary amount of time to be allowed to report an issue. They are ready to report it at that moment.

And as an admin, I don't want to have to approve new users on a schedule to ensure the delay is low enough that they are still willing to submit the issue after I approve them. I'd much prefer they go ahead and submit the content, especially so that I can use it in my review of whether to approve the sign up or not.

I seem to remember some pattern in Gitlab where my login period timed out before I finished making a comment. When I logged back in, Gitlab had somehow saved my comment content so that I could then post it so that others could see it. Is there any way to use that pattern for users who haven't been approved yet? So that they can post content, but with a warning shown to them that other users won't see it until the sign-up is approved.

mushakov wrote at 2020-10-29 01:28:26:

That's a really interesting idea! Users could have limited interactions with the instance and content queued up until approved by an administrator. I created an issue to capture this.

https://gitlab.com/gitlab-org/gitlab/-/issues/273542

rightbyte wrote at 2020-10-28 17:07:56:

Relying on Google's Spying-as-a-Service tooling is not very FOSS at all.

There need to be other ways to reach out to users who block Google.

encom wrote at 2020-10-28 17:23:51:

I immediately back out whenever encounter Recaptcha.

The other day I was forced to endure it, because I wanted to delete my ancient Minecraft account, since Microsoft pulled a Facebook and are going to require a Microsoft account to play going forwards. Without exaggeration, it took me 15 minutes of training Google surveillance AI (had to solve it three times), for Recaptcha to let me in. I guess Google really hates me.

dalmo3 wrote at 2020-10-28 23:50:50:

Yesterday I spent the longest ever with a recaptcha, about 2-3 minutes, at a frigging checkout page. I decided to endure it just because I really needed that ergonomic kb+mouse combo.

Hopefully they'll allow me to solve captchas for longer without getting a RSI.

wolco2 wrote at 2020-10-28 18:12:46:

Are you sure you are human?

myself248 wrote at 2020-10-28 18:34:07:

I'm human enough, and I've been a licensed driver long enough, to recognize that rumble strips at the side of a road are not crosswalks. But apparently enough bots thought they were that the system is now trained on that 'fact', and I as a human am forced to misidentify rumble strips as crosswalks to pass as human.

It's bizarre.

ignoranceprior wrote at 2020-10-29 13:19:08:

ReCaptcha also thinks that mailboxes are parking meters, for some reason.

encom wrote at 2020-10-28 18:29:06:

Yes, definitely.

https://v.redd.it/uaefcc2mztj31/DASH_720

WalterSear wrote at 2020-10-28 20:41:09:

This sounds like it has the potential to be a modern version of the credit score: avoid it enough, and you become persona non grata. That is, for more than 15 minutes.

db48x wrote at 2020-10-28 20:11:14:

I do the same thing.

Kwpolska wrote at 2020-10-28 21:43:06:

Try reCAPTCHA’s audio version (the headphones icon), it’s much easier than guessing what images it wants you to click (if you speak English, have headphones, and are not hearing-impaired).

meibo wrote at 2020-10-29 01:23:53:

You're doing something very wrong if you take 15 minutes to solve these and aren't on Tor. Even on public VPN and Firefox this doesn't happen usually.

I know people that pick the wrong options to fuck with their models though, and then go on HN to complain about recaptcha being annoying.

randunel wrote at 2020-10-29 12:54:16:

I have similar issues. I do not pick the wrong options. It also doesn't take me too long to solve the captchas, leading to "too many queries from your ip address".

This is what internet users deal with when blocking most google services.

mushakov wrote at 2020-10-28 18:11:13:

Thanks for bringing up this epic in the conversation phkai. I'm a PM at GitLab for our Auth group and am working on the `Optional Admin Approval for local user sign up` feature. I'm happy to tell y'all that we shipped the first iteration of this in our 13.5 release. You can find more information in our release blog

https://about.gitlab.com/releases/2020/10/22/gitlab-13-5-rel...

. I've also updated the epic with more information about its current status

https://gitlab.com/groups/gitlab-org/-/epics/4491#status-upd...

anarcat wrote at 2020-10-29 12:55:32:

We (at torproject.org) also adopted GitLab CE recently and we had to close down registrations because of abuse. Tens (hundreds?) of seemingly fake accounts were created in the two weeks we had registrations opened and we had to go through each one of those to make sure they were legitimate. In our case, snippets were not directly the problem: user profiles were used as spam directly.

We can't use ReCAPTCHA or Akismet for obvious privacy reasons. The new "admin approval" process in 13.5 is interesting, but doesn't work so well for us, because it's hard to judge if an account should be allowed or not.

As a workaround, we implemented a "lobby": a simple Django app that sits in front of gitlab to moderate admissions.

https://gitlab.torproject.org/tpo/tpa/gitlab-lobby/

The idea is people have to provide a _reason_ (free form text field) to justify their account. We'd also like people to be able to file bugs from there directly, in one shot.

We're also thinking of enabling the service desk to have that lower bar for entry, but we're worried about abuse there as well.

Having alternatives to ReCAPTCHA would be quite useful for us as well.

kemayo wrote at 2020-10-28 21:18:26:

For this specific case, the Wikimedia Foundation has explicitly stated that "It is the Free Software release of GitLab that runs optional non-free software such as Google Recaptcha to block abuse, which we do not plan to use." So, not incredible helpful at the moment.

Also, is manual approval for new signups a good idea for a large FOSS project? It seems like a pretty big barrier to legitimate discussion.

MrStonedOne wrote at 2020-10-28 19:18:38:

You have to remove incentives. Block the viewing of these snippets by logged out users by default and require opt-in and a way to whitelist snippets by snippet or user. Same for user profiles

noizejoy wrote at 2020-10-28 20:13:25:

I don't think this is targeting human views - but it's targeting Google for SERP (Search Engine Results Pages) boost.

MrStonedOne wrote at 2020-10-28 21:10:29:

That's the point. Having a way to disable search engines would also work, but wouldn't be obvious to spammers so they would still try to spam. Disabling all users by default works to remove the incentive to try

gaba wrote at 2020-10-29 19:36:42:

Is this something that we will have in the CE version (the open licensed one) or it will only go to the enterprise one?

67868018 wrote at 2020-10-28 17:57:30:

None of your captcha settings work, not even the invisible captcha setting that requires enabling a feature flag.

xiphias2 wrote at 2020-10-28 20:06:35:

Have you thought of the option of disabling links? That would make SEO spam impossible

pitay wrote at 2020-10-28 21:24:20:

Is just adding the attribute rel="nofollow ugc" to any links in submitted content may be good enough. This tells search engines to not index, or tag them as suspicious, allowing them them to identify SEO spam more easily. [1]

Having both options would be great.

[1]

https://support.google.com/webmasters/answer/96569

gramakri wrote at 2020-10-28 17:24:50:

The spam is infuriating (not GitLab's fault, of course). Atleast, on our instance at

https://git.cloudron.io

, we got massive snippet spam. After we disabled snippets, we got massive spam on the issue tracker (!). The way we "fixed" is by turning on mandatory 2FA for all users.

As a general lesson, what we learnt is these are not bots. These are real humans working in some poor country manually creating accounts (always gmail accounts) and pasting all sorts of random text. Some of these people even setup 2FA and open issues with junk text, it's amazing. Unfortunately, GitLab from what I can tell cannot make issues read-only to non project members (i.e I only want project members to open issues, others can just read and watch issues).

Currently, our forum spam (

https://forum.cloudron.io

) is way more than GitLab spam. On the forum, we even have Captcha enabled (something we despise) but even that doesn't help when there are real humans at work.

Symbiote wrote at 2020-10-28 19:30:24:

We had one of the "real humans" write to us (in issues) asking us to leave his spam up for "just a few hours".

We implemented a filter anyway.

(This was not Gitlab, but a specific form on our unique website.)

packetlost wrote at 2020-10-28 21:41:55:

> asking us to leave his spam up for "just a few hours"

What... why? What is their goal???

vvpan wrote at 2020-10-28 21:46:44:

Feeding their family?

packetlost wrote at 2020-10-28 21:51:01:

But like... who's _paying_ for that kind of spam??

Symbiote wrote at 2020-10-29 07:47:48:

In our case, mostly pirate TV streaming services. (Watch NBA games live etc.)

mgbmtl wrote at 2020-10-28 22:26:29:

A lot of people unfortunately seem to bite on those "SEO experts" kind of emails. I had a few clients ask me if they should give it a try, since, "why not, it's cheap".

csdreamer7 wrote at 2020-10-28 17:29:21:

Why are they posting random text in Gitlab?

grey-area wrote at 2020-10-28 19:24:35:

This is a typical spam profile. Usually they contain links, which search engines follow.

https://forum.cloudron.io/user/cardioaseg

edflsafoiewq wrote at 2020-10-28 21:09:10:

The link contains rel=nofollow.

grey-area wrote at 2020-10-29 08:37:34:

That doesn't matter, see other comments below on Google's changing treatment of this attribute.

Also you'll find spambots posting on any open form on the internet even if it doesn't do them any good, because much of it is automated, so even if you hide the results the spam will still come in.

leipert wrote at 2020-10-28 21:54:39:

I don’t know for sure, but I think our Markdown implementation adds nofollow.

brlewis wrote at 2020-10-29 03:22:10:

I used to think that spammers would stop if their spamming didn't win them any results. But they don't care. They spread their spam as widely as possible without trying to prune out the places where it does them no good.

gramakri wrote at 2020-10-28 19:04:29:

I am not entirely sure. See

https://forum.cloudron.io/users

, if you go to say page 10 or something you will see all sorts of nonsense. I am still trying to figure what the best way to fight this spam (because captcha is enabled and required to even create accounts). But these are real people and not bots. I know this because they even post new messages all the time.

fancyfish wrote at 2020-10-28 19:23:42:

Definitely the SEO backlinks- for example one profile I see is linking to an Indian escort service in the profile.

coder543 wrote at 2020-10-28 20:40:00:

Maybe GitLab needs an option to disable external linking, and filter any comment that contains an external link automatically

csdreamer7 wrote at 2020-10-28 22:04:21:

Or a nofollow option (add rel=nofollow)

dnsmichi wrote at 2020-10-29 10:57:42:

That's a great idea. We have discussed ways of getting a trust level, and enable this for specific groups. Discourse uses the same system for preventing spam. "Good" bots detect the rel=nofollow and do not come back.

See my proposal here:

https://gitlab.com/gitlab-org/gitlab/-/issues/14156#note_258...

dnsmichi wrote at 2020-10-29 17:18:35:

Iterating on my original thought, here is a smaller feature request for self-hosted GitLab instances. This can help GitLab.com too:

https://gitlab.com/gitlab-org/gitlab/-/issues/273618

coder543 wrote at 2020-10-29 17:22:54:

I still think an even better path to success is to allow entirely disabling linking for non-admins.

Google no longer treats "nofollow" as strongly as it used to:

https://webmasters.googleblog.com/2019/09/evolving-nofollow-...

dnsmichi wrote at 2020-10-29 20:50:51:

Thanks for sharing, I have added it to the issue, maybe you want to join the discussion there :)

https://gitlab.com/gitlab-org/gitlab/-/issues/273618#note_43...

Just so that I can follow - URLs posted by non-admins should not render as HTML URLs at all? Wouldn't that be quite limiting for OSS project members for example?

coder543 wrote at 2020-10-29 20:58:54:

My opinion on the topic isn't definitive by any means, but I think a lot of projects would do just fine without allowing arbitrary hyperlinks to be added by non-admins.

I think being able to link to related issues and link into the code is still important, for example.

It's certainly a trade off, but spammers want it to be rendered as a link.

riffic wrote at 2020-10-28 17:34:00:

getting that sweet sweet seo backlink juice

gkop wrote at 2020-10-28 17:37:35:

Isn’t that why we add rel=nofollow to low friction user submitted links on our platforms?

ivank wrote at 2020-10-28 17:53:12:

Google changed the interpretation of those a year ago.

https://webmasters.googleblog.com/2019/09/evolving-nofollow-...

nurettin wrote at 2020-10-28 18:29:06:

> Looking at all the links we encounter can also help us better understand unnatural linking patterns.

It appears as though they want to mark these links in order to prevent inorganic SEO, not help it.

nerdponx wrote at 2020-10-28 18:48:08:

I don't get it. They post all this spam in the hopes that people click on the links therein, thereby boosting the ranking of those sites? Does that actually work at all?

rudedogg wrote at 2020-10-28 19:20:15:

It doesn’t actually require anyone clicking on the links. Google sees inbound links and uses that as a factor when calculating the ranking of the linked page.

IggleSniggle wrote at 2020-10-28 21:53:16:

I thought that was how it worked like a decade or more ago, but not today.

technion wrote at 2020-10-28 23:47:10:

Regardless of whether it works, people still pay for it. I have a Facebook ad right now that says "Get over 500,000 backlinks for $29.99". No doubt it's someone with a bot that spams comment forms.

mpol wrote at 2020-10-28 21:47:58:

A service like Stop Forum Spam might be a solution to this. It checks for IP address and email address and gives it a value based on how likely it is assumed to be a spammer.

When they have to set up a new email account and maybe even a new IP address for every few accounts, it gets to be a lot of work soon.

https://www.stopforumspam.com/

Siira wrote at 2020-10-28 18:16:10:

How do you know they are real humans? I imagine bots doing 2FA would still be cheaper.

gramakri wrote at 2020-10-28 19:06:32:

I know this because in our forum we have LOTS of "spam" users -

https://forum.cloudron.io/users

. These users will go into posts and actually make "helpful" comments. Like, "Oh I tried this solution but I found that my disk my full. Deleting data fixed my problem". It almost seems genuine but they build reputation and once they have some votes, they go back and edit all the comments to have links.

eznzt wrote at 2020-10-28 19:12:31:

Banning entire countries helps a lot. I don't want to name certain countries, but let's assume it's one where it's common to see human corpses floating on a big river.

jychang wrote at 2020-10-28 21:32:42:

That doesn't help narrow it down. I live in Seattle and the first think I thought of was a popular tiktok of teens finding a corpse in the river last month.

reaperducer wrote at 2020-10-28 21:56:36:

The word you missed was "common."

d3nj4l wrote at 2020-10-29 01:02:52:

Apart from the fact that banning an entire country from contributing to their code would be antithetical to the Wikimedia foundation, if you're implying the country which I think you're implying (which is also where I live, btw) you'll:

1. Ban a burgeoning tech industry which has produced over 20 unicorns,receives billions in funding from across the world and produces world-class tech talent;

2. Ban millions of other OSS developers from contributing; and

3. Just lead to SEO spammers picking out other impoverished countries to spam from, which means finally you'll end up with only people from the "west" being able to contribute in any way.

boneitis wrote at 2020-10-28 18:29:20:

Many bots are likely still powered under the hood by humans.

On my backlog of projects to do is to make a browser extension that solves the more obnoxious captchas for me, as I'm regularly behind vpn and fall into ridiculously long solve loops.

On the most popular api i could find, $10 buys you a shockingly LOT of solves (not that I've tested it yet). It is automatable but ultimately still powered by humans.

dannyw wrote at 2020-10-28 18:41:21:

It’s incredibly sad how the open web is being destroyed by google’s recaptcha.

Kalium wrote at 2020-10-28 20:51:09:

Without google's recaptcha, do you think there would be less spam?

Personally, I suspect there would be more without at least some speed bumps to raise the cost of spamming. I would _absolutely love_ for there to be better options than recaptcha that meets the same needs around bot-detection, price, implementation effort, and accessibility. It is, sadly, the best option I've seen on offer.

You're right. The scenario we're in is incredibly sad. It would be wonderful if the individual actors involved had better options to meet their needs.

nerdkid93 wrote at 2020-10-28 19:00:47:

I'd argue that it's equally sad to see the open web get destroyed by massive DDoS attacks and malicious actors. How would you keep your own website up if it was constantly being attacked?

hombre_fatal wrote at 2020-10-29 01:56:13:

You're barking up the wrong tree. Bad actors create abuse and spam which they can do because of fundamental weaknesses in the design of the internet. People trying to solve that reality with Recaptcha (and Cloudflare for that matter) aren't the ones destroying the internet.

boneitis wrote at 2020-10-29 02:29:15:

I don't think it's so much the wrong tree as it is but one tree in a forest to be barking up.

All the maturely developed bot filters frequently throw me in an endless battery of tests that have me giving up in frustration before finally making it through to content I'm requesting.

> aren't the ones destroying the internet

IMO they are every bit as much destroying it as the abusers they're claiming to fend off.

boneitis wrote at 2020-10-28 18:56:44:

I'm totally in that camp of opinion, although I'll acknowledge the escalating abuses carried out by both "sides."

In the meantime, i hope to have the savviness to program my own way out of unsolvable captchas.

FredFS456 wrote at 2020-10-28 18:56:27:

Already exists:

https://github.com/dessant/buster

Edit: on re-read, you meant solving using humans. Buster uses speech-to-text APIs to solve.

boneitis wrote at 2020-10-28 18:59:29:

Every lead to spice(e:solve, lol gboard) my problems is probably worth a peek. I'll take a look, thank you.

pcmaffey wrote at 2020-10-28 22:00:18:

Could add nocrawl to your robots.txt and advertise the fact on signup page that search engines won’t find this content.

sytse wrote at 2020-10-28 16:29:33:

At GitLab Inc. we have a Trust and Safety team

https://about.gitlab.com/handbook/engineering/security/opera...

that prevents spam.

So far that functionality has lived in separate repositories from the core codebase since few people needed it, the cycle time was quicker, and it is an advantage to not have the spammers see the code.

If there is strong interest in collaborating on this I'm sure they will be happy to engage. I'll ask them how best to structure this.

ran3824692 wrote at 2020-10-28 16:44:21:

There's been a gitlab bug for almost 3 years to stop relying on recaptcha,

https://gitlab.com/gitlab-org/gitlab-foss/-/issues/45684

Debian, KDE and Gnome have never wanted to make their users run Google's nonfree javascript blob to contribute on their gitlab instance. There's been interest, Gitlab has done very little about it. Edit: other bugs about this can be found here

https://gitlab.com/gitlab-org/gitlab-foss/-/issues/46548

GLJHunt wrote at 2020-10-28 20:00:40:

We have a team currently working on improving the detection and mitigation of spam. We continue to look for ways to improve the security and user experience of our product. Our product includes the Akismet Spam filter which you can read more about in our handbook:

https://about.gitlab.com/handbook/support/workflows/managing...

. Further, Gitlab.com includes the ability to report abuse directly to our trust & safety team here:

https://about.gitlab.com/handbook/engineering/security/opera...

however, the report abuse feature on self-managed reports back to the instance admin. We are also currently developing an anti-spam feature intended to further improve spam detection & mitigation. This is set to be enabled on GitLab.com within 3 months.

nico_h wrote at 2020-10-28 23:23:53:

As mentioned above in the thread, multiple times, maybe a simpler solution to reduce spam is to remove incentives by:

- removing links (making them as plain text forcing users to copy paste them..)

- hiding links from non-registered users (plain text to non-registered users, clickable for registered users),

- blocking links from search engine crawlers (robots.txt / rel=nofollow...).

Maybe these fall in the "for each complex problem there is simple but wrong solution" but it sounds like it's worth a try.

mpol wrote at 2020-10-28 21:54:10:

(I already replied on a different thread but this might make more sense)

A service like Stop Forum Spam might be a solution to this. It checks for IP address and email address and gives it a value based on how likely it is assumed to be a spammer.

When they have to set up a new email account and maybe even a new IP address for every few accounts, it gets to be a lot of work soon.

https://www.stopforumspam.com/

It has a very simple API and is not that hard to implement (really, I have done it myself :) )

GLJHunt wrote at 2020-10-28 22:11:30:

Appreciate the response - I'll look into now

mpol wrote at 2020-10-28 23:11:19:

Okay, thank you. I see Gitlab is mostly Ruby. Just to get a general idea of the code this is a simple PHP function to use it:

https://plugins.trac.wordpress.org/browser/gwolle-gb/trunk/f...

That function can be called when the register form has been submitted. It will return true or false. Forget about the transient stuff, that is just WordPres caching stuff.

You don't need an API key like with Akismet. You would only need it if you want to add or remove entries from the SFS database. It really is much simpler. Ofcourse you might want to have a checkbox in the settings. But still, in an afternoon you might be able to finish this :)

Wish you the best.

lbierner wrote at 2020-10-29 00:35:54:

Great suggestion, this looks like a very straightforward service and implementation. All open source as well.

sytse wrote at 2020-10-28 17:00:27:

I think the code of this problem is that it is hard to identify if a user is a bot or a human. I've not seen any elegant free solutions to this.

ran3824692 wrote at 2020-10-28 17:27:27:

That is not the core of the problem. Spammers are humans, and sometimes

they will solve recaptchas in large quantities to get their spam

through. Its about having a multipronged approach for administrators to

stay ahead of them. For some examples of free solutions see

https://www.mediawiki.org/wiki/Manual:Combating_spam

. It's even possible

to connect spamassassin to forms. Gitlab needs tools and automation that

detects and rolls back spam, bans users, knobs to tune restrictions and

rate limits based on how spammers are acting. Gitlab inc just hasn't

seemed to care much to help people trying to use Gitlab and keep their software freedom.

sytse wrote at 2020-10-28 17:49:57:

I think the focus of our Trust and Safety team has been on GitLab.com and not on all GitLab instances. We'll discuss changing this.

ran3824692 wrote at 2020-10-28 18:11:49:

Thank you.

edchan wrote at 2020-10-30 05:28:09:

GitLab team member here. We just added a new page to our Handbook where we share approaches to preventing, detecting and mitigating spam on self-managed instances of GitLab.

https://about.gitlab.com/handbook/engineering/security/opera...

We want to hear from you! Instructions on how to contact us:

https://about.gitlab.com/handbook/engineering/security/opera...

robotmay wrote at 2020-10-28 18:51:29:

I'm curious about the spamassassin integration. Do you know of any open source projects currently using it for a web application?

trynewideas wrote at 2020-10-28 22:17:07:

I'll be curious to see whether they even use GitLab user auth. For Gerrit (and Phabricator), Wikimedia already requires contributors to have a dev account on Wikimedia's LDAP system:

https://wikitech.wikimedia.org/wiki/Help:Create_a_Wikimedia_...

jefftk wrote at 2020-10-28 16:19:27:

Can you say more about how it's a problem if people can view things without logging in? Naively I would have seen that as a plus.

dfabulich wrote at 2020-10-28 16:44:22:

If you allow new users to create user profiles with links, and those user profiles are visible to Google, spammers will create a bunch of new user accounts and fill them with spam links.

The easiest way to prevent this is to block Google from seeing user profiles by requiring login to see the profiles.

chrisweekly wrote at 2020-10-28 17:37:33:

> "and those user profiles are visible to Google"

googlebot adheres to robots.txt, right?

in which case couldn't self-hosted gitlab admins add a robots.txt entry for the profile page url?

jefftk wrote at 2020-10-28 18:21:15:

That requires the spammers to notice that it's blocked in robots.txt, which seems optimistic

pitay wrote at 2020-10-28 21:28:29:

There is also adding the rel="nofollow ugc" to user submitted links that removes the benefits of linking for spammers.

https://support.google.com/webmasters/answer/96569

dspillett wrote at 2020-10-28 19:02:02:

Most bots aren't going to bother checking if you've done that, so while they'll not get the expected benefit you'll still get the spam.

jancsika wrote at 2020-10-28 16:54:46:

Because spammers fill the publicly viewable things-- like snippets and user profiles-- with spam. If it can be viewed without logging in, then they get it indexed with Google and it dilutes the search results.

Nobody views snippets and user profiles as a common part of daily development, so it takes time away from development to investigate those things to prune them. And if you don't prune it fast enough, it gets into the search results at which point it's even more of a pain in the ass to remove (even using Google's webmaster tools).

dragonwriter wrote at 2020-10-28 17:07:08:

> Because spammers fill the publicly viewable things-- like snippets and user profiles-- with spam. If it can be viewed without logging in, then they get it indexed with Google and it dilutes the search results.

Isn't then the problem not that it is viewable but that it's not excluded from indexing by robots.txt?

LeifCarrotson wrote at 2020-10-28 16:54:40:

Anything visible without login will be visible to people who want to follow a bare URL to your tracker and visible to a search engine crawler, making it visible to people without a URL who just search for the issue. That is, indeed, a plus.

But even if you require login to post stuff to the issue tracker, creating a login and posting a comment has been trivially automated.

You're no longer running a useful issue tracker, you're running a free ad network: you're hosting a dozen useful issues and a thousand advertisements and blackhat SEO comments for spammers.

If it's not visible to search engines, and your repo doesn't get much traffic, it's not nearly as valuable to spammers. It's kind of like cutting off your nose to spite your face, but those basic economics of cost to spammers, cost to users, value to spammers, and value to users are the only rules that you can really apply when hosting content on the Internet.

jefftk wrote at 2020-10-28 17:06:35:

_> creating a login and posting a comment has been trivially automated_

Isn't this what various CAPTCHA tools handle?

You can also require email address validation.

marcinzm wrote at 2020-10-28 16:24:26:

1. Create user account

2. Create spam content (snippets, profile, etc.)

3. Get spam content search indexed

4. Profit

dtech wrote at 2020-10-28 16:39:29:

a free user sign up is not going to prevent scraping...

cortesoft wrote at 2020-10-28 16:44:09:

I think the idea is that if you can't view issues without logging in, then google won't index your issues (because it can't view them), so you won't get people spamming in order to get into google

noizejoy wrote at 2020-10-28 20:19:36:

Over the years I've frequently seen Google search results showing things that require login to the indexed site. Has that changed?

marcinzm wrote at 2020-10-28 23:28:52:

I believe that requires the site giving google’s bot a logged in view rather than something google does themselves.

wolco2 wrote at 2020-10-28 18:20:05:

It also means 95% of people will stop casually viewing issues.

wyldfire wrote at 2020-10-28 16:44:42:

The problem isn't scraping, it's spam.

ognarb wrote at 2020-10-28 16:29:01:

Strange I never saw this behaviour on our Gitlab instance invent.kde.org.

marcinzm wrote at 2020-10-28 18:46:30:

You seem to be using a central login system (

https://identity.kde.org/

) that requires going to a separate website to create an account which presumably is non-standard enough to throw off most bots.

ran3824692 wrote at 2020-10-28 17:01:56:

invent.kde.org uses the nonfree google Recaptcha, that prevents it mostly. Not very nice for KDE to make people run nonfree software blob in their browser that gives up their freedom, gives up their privacy to google and trains Google's proprietary machine learning models.

mattl wrote at 2020-10-29 16:27:44:

Where does it use that?

atomi wrote at 2020-10-28 16:43:33:

Because it's Microsoft FUD.

wdb wrote at 2020-10-28 18:20:00:

Personally, I also think, GitHub's Checks API, and it Github bots. Gives a much better experience compared to GitLab.

On a daily basis I am confused about how diff's are being rendered in GitLab merge requests, as it has a weird way to render ${blah} in strings in lines.

Also when you want to check if an issue exists in GitLab own repository, you always end up in some jungle of redirected tickets. Just now I got redirected to three different tickets , as they switch projects or something. Really annoying.

boleary-gl wrote at 2020-10-28 18:27:16:

Can you share an example or screenshot of what you mean around the rendering issue in merge requests?

As for the issue redirects - it does stink. It's an artifact of our move to a single code base for CE and EE [1]. A lot of issues have long-standing SEO so the "old" issue often comes up in a Google search.

[1]

https://about.gitlab.com/blog/2019/08/23/a-single-codebase-f...

wdb wrote at 2020-10-30 17:09:01:

Yes, for example, in today's pull request, I added a new file and now in my Typescript file it renders things like this, which I find confusing:

https://imgur.com/a/C82hcMK

zeeZ wrote at 2020-10-28 16:35:40:

In general settings, you can check "Public" under "Restricted visibility levels". According to the blurb, "selected levels cannot be used by non-admin users for groups, projects or snippets".

Is that not what you want with #2?

jancsika wrote at 2020-10-28 16:46:44:

It's what I want for #2, but it has the unfortunate side-effect of restricting visibility for my main public repo. My #1 goal above is for people to be able to clone from my main repo without logging in.

zeeZ wrote at 2020-10-28 16:57:19:

I see. And as soon as you make the repo public you end up with public issues again, unless you restrict them to project members...

And a workaround of auto syncing just the code to a public repo where issues and stuff is disabled isn't available natively in CE.

mgbmtl wrote at 2020-10-28 22:31:19:

If you can, place your Gitlab CE instance behind an LDAP server. Have another site handle signups. (admitedly, setting up something with LDAP is often a massive pain. I duct-tape around it by using LdapJS on top of a CMS)

I've had a handful of projects where human spammers will bother to create an account and jump through the loops, but in the 2-3 years of running a Gitlab instance, which has 1300 users, I only had 2-3 incidents (we keep an eye on recent projects, snippets, etc).

chillfox wrote at 2020-10-29 13:30:09:

The GitLab LDAP config is pretty easy.

kodah wrote at 2020-10-28 19:52:20:

I would encourage folks to look at Gitea.io. I run that on Kubernetes alongside Drone and it basically replicates all the most important parts of GitHub.

ben0x539 wrote at 2020-10-28 17:06:58:

You'd think Wikimedia in particular has experience with the issue of spam bots polluting the site's search result.

abbe98 wrote at 2020-10-28 17:21:54:

Is this mainly a concern for the Gitlab issue tracker? Wikimedia will continue to use Phabricator for issue tracking, Gitlab CE will only be used for CI and code review/hosting...

ran3824692 wrote at 2020-10-28 17:34:31:

No. Spammers will create repos and user profiles and snippets and anything they can with spam in them.

abbe98 wrote at 2020-10-28 21:33:59:

I would imagine authentication being done through Wikimedia's existing LDAP or Mediawiki solution and I hope that features that already exists in Phabricator(such as snippets) will be disabled.

oconnor663 wrote at 2020-10-28 16:13:14:

Is it possible to configure a robots.txt file to accomplish #2?

clscott wrote at 2020-10-28 16:20:18:

No, robots.txt is for well behaving bots like bing bot and google bot, not bots that will spam your forums (and Git repo apparently).

detaro wrote at 2020-10-28 16:28:17:

It doesn't stop the spam from being created, but it does stop the spam from ruining your site reputation in search results was the suggestion I guess.

_ikke_ wrote at 2020-10-28 20:46:09:

But the goal is to prevent spam in the first place. I don't think these bots will verify robots.txt to see if the spamming is effective. They just spam anything they can get their hands on.

remram wrote at 2020-10-28 22:09:17:

They probably don't have general code that spam any form, it's more likely that they have code specific to GitLab CE instances that knows to post snippets. If GitLab changes their default configuration so that those snippets are no longer indexed by Google, the spammer are likely to stop using that GitLab CE spamming script, after a while.

marcinzm wrote at 2020-10-28 16:27:38:

But if the issue is SEO bots then robots.txt would block the search engines thus meaning the spam content is of no importance (it's effectively private) and doesn't cause issues for the main sites SEO (nor help the spammers).

jancsika wrote at 2020-10-28 16:43:27:

To be honest, I'm not certain of the purpose of the spam.

Some portion of it would end up in the search results, sure.

But I don't know if there's some secondary benefit to, say, a casino showing a link coming from my site even if my site has a robots.txt saying that the address for that link isn't to be directly indexed.

Is there such a benefit? If not then I'll just set up the robots.txt and observe whether that does indeed solve the problem. But I'd much prefer to just set up the permissions I know I want on my own running instance than spend time making inferences about the reasons bots are abusing my instance's inputs.

jancsika wrote at 2020-10-28 16:33:28:

That's right. I'm talking about SEO spam. Basically anything that has a url where the content includes input from the user will be spammed.

I'm fine with, say, the spammers hammering the main repo's merge requests and issue tracker. Those are things any healthy project will check regularly-- I'm even fine just pruning the spam there by hand (and historically I haven't gotten a lot there anyway).

But I don't regularly look at the global view of snippets, and I don't want to regularly prune the global user list for SEO spam in the user profiles. There's no good reason most FOSS projects need those things to be publicly viewable, anyway. But AFAICT Gitlab's admin settings only have a single setting that affects all these things across the board. So if you make snippets viewable only to logged in users, then nobody can clone from the main repo without logging in.

It's quite frustrating, and Gitlab shows no interest in disabling or hiding features like snippets and user profiles.

tylersmith wrote at 2020-10-28 16:26:30:

robots.txt is nothing more than a request. It's essentially a "no dog on lawn" sign in the yard of a vacation home.

tyteen4a03 wrote at 2020-10-28 18:42:47:

I wonder how effective QuestyCaptcha would be on GitLab.

bawolff wrote at 2020-10-29 00:55:50:

Generally speaking questycaptcha works great if you fly under the radar enough that nobody is putting any effort in, and its just automated bots. It tends to fall apart the more high profile you are.

eznzt wrote at 2020-10-28 19:15:51:

Can't you just put the gitlab instance behind an nginx proxy to achieve this? Like, if you are requesting ^/user/$, check for a cookie; if invalid, return 403

908B64B197 wrote at 2020-10-28 16:24:26:

I honestly can't see why someone would go through the trouble of making sure their instance is correctly configured and available when there are solutions (like GitHub) that just work out of the box.

paulryanrogers wrote at 2020-10-28 16:33:01:

Control. Github can change the behavior, pricing, availability, or security of their offerings at any time. If they get hacked then you could suffer. GH is also closed source.

For many a self hosted solution is better, despite the costs.

modoc wrote at 2020-10-28 18:30:55:

It's not a lot of trouble. To be fair my GitLab instance isn't setup for large numbers of public contributors. I have a somewhat limited network connection and work on projects that often have large-ish codebases, building Docker images, etc.... and I do 90% of that on my home network (local servers, storage, etc....). So running GitLab locally allows me (and a few other folks) to get all those nice features without relying on the world facing internet connection and without having lots of delays moving large files up and down...

908B64B197 wrote at 2020-10-28 20:50:57:

That's the use-case I can understand (if someone has a large number of machines or a fast internal network behind a VPN). My comment was more aimed at public facing projects that are open and accept contributions from anyone.

languagehacker wrote at 2020-10-28 15:54:46:

In the past, I've found Gerrit to be reasonably good. Phabricator, on the other hand, not so much.

Having worked with MediaWiki in the past on CRs, I think this will be a good move to modernize things for them.

When faced with a similar task around the same time at Wikia (now Fandom), we chose GitHub while we were moving off of SVN. I'm glad we did at the time, even without all the additional features GitHub has.

I understand why WMF didn't choose GitHub. Compared to their current stack, Gitlab is going to feel like a serious upgrade.

20after4 wrote at 2020-10-28 17:27:30:

What issues did you have with Phabricator? I'm maintaining phabricator for the WMF and I'm interested in anything that could improve the user experience.

kevincox wrote at 2020-10-29 11:02:34:

I was reviewing code in mercurial's phabrictor and it was awful the most notable was that when a new version was uploaded the comments stayed on the same line number instead of sticking to the same code.

There were other annoyances but it would move at least to "ok", maybe even "good" if that was fixed.

epriest wrote at 2020-10-29 18:42:18:

This is not (and has never been) the behavior of Phabricator.

See <

https://secure.phabricator.com/T7447

> for discussion of why this feature can never work the way you think it should work in the general case and why I believe other implementations, particularly GitHub's implementation, make the wrong tradeoffs (GitHub simply discards comments it can't find an exact matching line for).

If you believe this feature is possible to implement the way you imagine, I invite you to suggest an implementation. I am confident I can easily provide a counterexample which your implementation gets wrong (by either porting the inline forward to a line a human user would not choose, or by failing to port an inline which is still relevant forward).

kevincox wrote at 2020-10-29 18:54:06:

I don't need perfect, I just need good. GitHub and GitLab both have good implementations as well as every other good code review system I have used. GitHub annoying tries its hardest to hide the "outdated" comments but GitLab has the option to keep them open (they are no longer visible in the code, but remain on the discussion tab)

So I appreciate your opinion that it is impossible, but as a reviewer I much prefer when the tool tries.

epriest wrote at 2020-10-29 18:57:54:

GitHub's implementation does not do what you claim it does. GitHub has _no_ behavior around porting and placing comments (while Phabricator does), GitHub just hides anything it can't place exactly. See my link above for a detailed description of GitHub's very simple implementation. I believe this is absolutely the wrong tradeoff.

kevincox wrote at 2020-10-29 19:50:48:

I use GitHub every day, I've definitely seen it preserve some comments. Sure, it drops a lot. But I still prefer this to dropping them all, or showing the comments on the wrong lines.

epriest wrote at 2020-10-29 21:02:54:

I mean that GitHub does not "try", in the sense of looking at the interdiff, doing fuzzy matching, trying to identify line-by-line similarity, etc. It places comments only if the hunk is exactly unchanged and gives up otherwise.

Phabricator does "try", in the sense that it examines the interdiff and attempts (of course, imperfectly, because no implementation can be perfect) to track line movement across hunk mutations.

My claim is that all comments which GitHub places correctly, Phabricator also places correctly. And some comments which GitHub drops, Phabricator places correctly (on the same line a human would select)! However, some comments which GitHub drops, Phabricator places incorrectly (on a line other than the line a human would select).

So the actual implementation you prefer is not one that tries, but one that doesn't try! Phabricator could have approximately GitHub's behavior by just deleting a bunch of code.

That's perfectly fine: many other users also prefer comments be discarded rather than tracked to a possibly-wrong line, too. I strongly believe this isn't a good behavior for code review software, which is why Phabricator doesn't do it -- but Phabricator puts substantially more effort into trying to track and place comments correctly than GitHub does.

epriest wrote at 2020-10-29 18:54:42:

In particular, see <

https://secure.phabricator.com/T7447#112231

> for a specific example which I believe GitHub's implementation gets egregiously wrong, by silently discarding an inline which is highly relevant to discussing the change.

Cthulhu_ wrote at 2020-10-28 16:08:21:

After reviewing some tools we went with Phabricator ourselves; it's not ideal, but it's open source (read: free, we can't afford a $x / seat license) and self-hosted.

jchook wrote at 2020-10-28 20:02:21:

Phab has been great for my side projects.

- You can host Phabricator on a $5/mo VPS and have CPU to spare, whereas GitLab Ruby is a big hog that requires a $20/mo box minimum.

- Able to deploy + configure it within a few hours, even with fancy features like emails via mailgun, using pygmentize to highlight code, and observed + managed repos. Defaults are all reasonable and get you moving quickly. Haven't had to touch config since I set it up.

- The Kanban + stories + PR flow is wholly sufficient. Arcanist grows on you fast. It totally abstracts the PR workflow for most any VCS and can help enforce practices (e.g. reviews, sign-off, merging, etc). "Projects as tags" feels weird at first but ends up giving you fantastic cross-sectional views of your issues.

WrtCdEvrydy wrote at 2020-10-28 18:17:41:

Phabricator is amazing as an all-in-one solution for a small shop.

nuritzi wrote at 2020-10-28 20:02:25:

I think Phabricator is a really powerful tool for engineering teams, but when you try to do more cross-functional team collaboration, it's not as user-friendly as GitLab.

I used Phabricator at a previous company and miss some functionality, like Phabricator's ability to show issue dependencies in a more intuitive and granular way -- but at that company, we had a lot of trouble getting the Design team to use Phabricator, for example.

As OSS communities continue to onboard newcomers, they're faced with a generation that expects modern interfaces that are user-friendly. Having user-friendly tooling also helps promote diversity of OSS communities since it's easier to onboard people with all sorts of backgrounds, since the technical adoption barrier is lowered.

I think GitLab is a clear winner here since it's user friendly and designed for cross-functional team collaboration (GitLab dog foods their own product in all departments of the team, so you have HR, Marketing, Finance, etc all using it, in addition to the full product teams).

Full disclosure: I work at GitLab as the OSS Program Manager. Part of the reason I joined was because I feel really strongly about GitLab's ability to lower the contribution barrier and get more people involved in OSS.

epage wrote at 2020-10-29 02:16:53:

I'm curious about this because I've wanted to get off Phab as soon as I started using it

- Whats your thoughts on "arc"? It seems like a whole can of worms of problems you can run into with basic branch flows. I know teams that have complex branch flows and it is a nightmare. Same for Windows users.

- What do you use for a CI? How well does the integration work for you?

- How is it with tracking conversations on Diffs?

- Any particular plugins or bots for it that help make the difference?

nuritzi wrote at 2020-10-29 21:56:49:

@epage -- Unfortunately I can't speak to the branch flow or bots question. Perhaps someone else here can? Other answers are below.

Re: CI --

We dogfood GitLab CI via gitlab.com -- so no need for integrations.

GitLab non-engineering teams use CI all the time because we constantly update the handbook to document all of our work.

I would love to see this practice more often in OSS orgs, and other companies for that matter. Having a handbook-first approach (

https://about.gitlab.com/company/culture/all-remote/handbook...

) really helps enable remote team collaboration and makes it easier for newcomers to jump in. I think OSS orgs have done a good job of recognizing the importance of documentation for development projects, but there's an opportunity to increase documentation around workflows and community operations.

Re: tracking conversations on Diffs --

Admittedly, I don't have a lot of experience with this outside of GitLab. But maybe that's the point. It's easy to chime in on diffs on merge requests on GitLab, and one of my favorite features is "suggesting changes" where you can add in a suggested update to a diff and the author can choose whether or not to apply it.

Here's a link with info about suggested changes:

https://docs.gitlab.com/ee/user/discussions/#suggest-changes

It's within the larger doc addressing discussions in GitLab in general:

https://docs.gitlab.com/ee/user/discussions/

Btw, for anyone interested, here's more info on using GitLab for project management:

https://about.gitlab.com/solutions/project-management/

https://www.buggycoder.com/project-management-with-gitlab/

https://thenewstack.io/gitlab-issue-board-project-management...

I gave a presentation about cross-functional team collaboration using GitLab at GNOME's GUADEC this year. Here are the slides:

https://events.gnome.org/event/1/contributions/70/

.. As a program manager, I'm generally really excited about this topic!

Some of the features I talk about are not available as part of the Community Edition, but there's the GitLab for Open Source program which gives OSS projects access to our top tiers, plus 50K CI mins per month, for free.

I'm hoping to make the program's page more discoverable, but in the meantime, here's the link:

https://about.gitlab.com/solutions/open-source/

nwah1 wrote at 2020-10-28 16:02:36:

What features do you think Gitlab lacks compared to GitHub?

I haven't used the CI/CD features of either, but PR/MR features seem comparable. Is it the advanced workflow stuff and CI/CD integration where GitHub is better? Bots?

I think git in general should copy the approach of Fossil and include issue management and wikis along with the repo, to keep things consistent and avoid vendor lock-in.

But I would be a lot more worried about being locked-in to GitHub than Gitlab.

brennen wrote at 2020-10-28 18:24:04:

> I think git in general should copy the approach of Fossil and include issue management and wikis along with the repo, to keep things consistent and avoid vendor lock-in.

A few paragraphs I recently wrote elsewhere:

The entire state of code forges as a general thing in 2020 is all the evidence you could possibly want that version control systems (Git, I'm talking about Git) are themselves massively deficient in design.

I rant about this all the time, but there is an entire class of argument about how & whether to use GitHub / GitLab / Gitea / Phabricator / Gerrit / sourcehut / mailing lists / whatever that would mostly vanish if the underlying data model in the de facto standard was rich enough to support the actual work of software development. Because it's not, we find ourselves in a situation where no widely used DVCS is actually distributed in practice, and the tooling around version control is subject to platform monopolization by untrustworthy actors and competitive moats.

Code review should itself be distributed/federated, but few of the people involved have incentives to make that happen. It's possible something like

https://github.com/forgefed/forgefed

will eventually get traction, and Git has been dominant for long enough that I wonder all the time when we might see a viable successor that learns from its fundamental mistake. In the meantime we're forced to choose from a frankly pretty terrible lot of options in the broad structural sense.

(For clarity, I'm a WMF employee and am involved in the decision to migrate to GitLab.)

pbronez wrote at 2020-10-29 12:33:26:

> an entire class of argument [...] mostly vanish if the underlying data model in the de facto standard was rich enough to support the actual work of software development.

Interesting idea. You think we could develop a unified data model that covers source code, static files, documentation, project management and community management as a single unified thing?

That’s certainly ambitious, and I’d love to see it. For the moment it seem that Git has won for source code (in a pretty crowded field) because just that part was hard and it was a big improvement. The collaboration tools it includes, mostly around email, appear to be inadequate for most projects. So now we see a healthy ecosystem that adds rich collaboration on top of / next to Git.

> no widely used DVCS is actually distributed in practice

I think this is due to economic and social factors rather than technical ones. Fully distributing a Git repo is very doable, but harder to think about than the Github model. Plus you have all the normal P2P problems around who’s online and how good their connection is.

> tooling around version control is subject to platform monopolization

Again, I think this is simply the social network effect more than anything else. Making a website for your project let’s people find it, use it, and contribute to it. The bar to entry is lowered further if it’s a common platform, where people already have accounts and know how it works, and where they can get a consolidated view of all their activity.

Centralized hosting makes even more sense as projects grow and you only want a subset of the code on any given development machine. Eventually big monorepos preset serious scaling challenges.

Still... I completely agree that it would be awesome to have a more self-sovereign computing architecture writ large. I’m just pessimistic we can get there from here.

brennen wrote at 2020-10-29 20:46:55:

> You think we could develop a unified data model that covers source code, static files, documentation, project management and community management as a single unified thing?

Realistically, not exactly, given how much space some of those things cover.

I do think that entities like code review are as much a part of the history of a project as the deltas to code. Reviews not being first-class objects in the VCS itself has turned out to be a crack into which you can wedge an entire GitHub.

I won't claim I know where best to draw the line here. Better handling of large static files by default and a robust way to model relationships _between_ projects obviously belong within the VCS. On the other hand, relationships modeled in issue tracking systems and the like are also part of the software's history, but past some level of complexity it gets much harder to imagine wedging them into something that you can pass around like you clone a Git repo. All I can really say for sure is that it feels broken that all of this stuff lives in competing application silos.

(As a sidebar: Not that you can't jam things like review data into git-as-data-store. Gerrit does just that. But nobody's going to mistake that for a usable interface to code review.)

Anyhow, I don't think you're wrong about the social & economic factors, but I think a different landscape with less concentration of power could have shaken out if (for example) easy code review had been baked in and host-agnostic early on. Fully p2p architectures aren't feasible, or even necessarily desirable, for a lot of problems - but it shouldn't be too much to ask that things are able to be federated and resistant to capture by a single vendor.

> Still... I completely agree that it would be awesome to have a more self-sovereign computing architecture writ large. I’m just pessimistic we can get there from here.

Yeah, fair enough. I am myself boundlessly pessimistic about the future of computing generally.

judge2020 wrote at 2020-10-28 23:59:46:

To me, it sounds like the issue is that you need a central source of truth that everyone can pull from for their purposes, and distributing the code review part doesn't sound like it'll add much. In the current climate, most anyone requesting code review is probably trying to merge into the main central source of truth anyways, so what actual benefit does it bring to either the maintainers or the contributors?

brennen wrote at 2020-10-30 03:03:03:

Version control for a genuinely long-lived project is a problem that often outlasts:

- Dominant version control and code review system(s) / paradigms.

- The current configuration of institutional owners.

- Users' trust in an owner / sponsor / maintainer. (Forks happen for reasons.)

- The involvement of developers who remember why and how decisions were made.

- The trustworthiness of the entities that control services, applications, and network real estate used for development.

Some central source of truth is usually necessary, but maintainers and contributors don't benefit when that source of truth is subject to vendor lock-in or can otherwise only migrate at great cost. For all the collaborative benefit that GitHub has undeniably wrought, platform monopolies are eventually a failure mode for end users, at least as for-profit enterprises. With the exception of the dominant silo vendors, nobody in the ecosystem really benefits from being forced to choose a silo that will be hard (and lossy) to escape later. The silos are engineered to limit mobility and channel interoperability to their own ends, for business reasons that run directly contrary to the interests of their users.

If the protocol at hand were actually up to the task, we'd spend less effort and anxiety on the problems of all the non-protocol platform tooling that's been built up around it.

bawolff wrote at 2020-10-29 01:01:55:

I feel like the git model makes a lot of sense when viewed as an extension to the mailing list code review system. But most people dont want that model. However trying to fit git to other models is a bit round peg into slightly square hole imo.

brennen wrote at 2020-10-29 18:37:10:

Yeah, from that angle and from the perspective of 2005 it's a reasonable design, and I think what I describe above as a massive deficiency only really becomes visible in the light of everything that's happened since.

boogies wrote at 2020-10-28 16:11:55:

> I think git in general should copy the approach of Fossil and include issue management and wikis along with the repo, to keep things consistent and avoid vendor lock-in.

It does include git send-mail, and I think Sourcehut’s use of that for issues is nice (and they quote customer claims that “SourceHut mailing lists are the best thing since the invention of reviewing patches.”).

darkcha0s wrote at 2020-10-28 16:12:05:

Let's not kid ourselves here, it's because GH is owned by MS.

bawolff wrote at 2020-10-29 01:06:11:

Its not. This isn't 2000s /. M$ is teh evil!!!11.

Its because GH is not available as self-hosted open source. Doesn't matter who owns it. Github was discussed and rejected by wikimedia back in 2012 as well, which was before MS bought them

rock_hard wrote at 2020-10-28 16:11:24:

Gitlab is worse on almost every angle compared to Github

It simply lacks the attention to detail...you can tell that Github walks the extra mile to get the UX right.

We used Gitlab for a year and then migrated to Github...it’s a joy!

frenchyatwork wrote at 2020-10-28 16:34:35:

Having used both a fair bit, I don't know what you're talking about. If anything my experience has been the opposite. Gitlab had the second mover advantage on a few things, while Github's interface has some weird oddities that seem to stem from the fact that that's how they've always been.

cortesoft wrote at 2020-10-28 16:48:22:

Interesting... we used GHE for 7 years and have now switched to gitlab. Gitlab CI, container repos, and Kubernetes integration has been amazing.

abbe98 wrote at 2020-10-28 17:27:48:

WMF is only replacing Gerrit for now, Phabricator will continue to be the issue tracker.

q3k wrote at 2020-10-28 15:16:32:

This is sad. In my experience, Gerrit is a much better code review system than Gitlab merge requests. But it is different from what people are used to.

dietrichepp wrote at 2020-10-28 15:24:39:

Probably because you got used to Critique at Google ;)

I agree though. I think the most important thing in a code review system is inline comments in the diff itself, and that’s something you get from Gerrit, Phabricator (Differential), etc. It encourages people to discuss the particulars of of a diff. Merge approval can be made contingent on resolving minor issues within a diff. Diffs are also approved on a per-diff basis, and it’s less typical to merge a stack of diffs.

I think the pull request / merge request makes sense with the “trusted lieutenants” development model that the Linux kernel uses, but for other projects you would be more likely to want a work flow where someone submits a single commit/diff and then someone approves it (after comments).

When I review PRs on services like GitHub I _very often_ think, “This should be several different reviews” and the discussion thread in a PR is often not a high-quality discussion. I don’t use GitLab as much but my experience is that it has the same problems. What I would love is to review a stack of commits and approve / make comments on the commits individually.

(For those reading: Mondrian -> Rietveld -> Gettit, and also Mondrian -> Critique. Mondrian and Critique are internal tools at Google. Phabricator originated at Facebook which has a lot of ex-Google engineers on staff.)

nemetroid wrote at 2020-10-28 16:09:10:

I don't use Github much, but Gitlab allows for multiple threads in a merge request. These threads may reference diffs/commits, but can also be directed at the merge request in general. Each thread has to be explicitly resolved before merging.

kevincox wrote at 2020-10-29 11:08:14:

I think the pull request model still makes sense. Of course if you stick to small changes it tends towards the patch model. However thete still some cases where two or three commits at once make sense. Even rarer there are cases where marging a bunch of changes into a "staging" branch before merging to master makes sense. I think this added flexibility is valuable, of course keeping the "patch style" single-commjt review great should probably be the priority.

eatthoselemons wrote at 2020-10-28 16:36:10:

Do you think that gitlab will ever add inline diff comments? I don't know if it would even be feasible to add to gitlab

e12e wrote at 2020-10-28 17:24:15:

More online than comments on the changes in the Mr? Like:

https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/2...

(there's some old discussion on there, random example)

huskyr wrote at 2020-10-28 15:25:24:

I guess you can get the hang of Gerrit, but i tried it out a couple of times (i occasionally do work on projects in the Wikimedia git repo) and is pretty evident that the interface is written by developers with little knowledge of user experience. Moving stuff to Gitlab will probably increase the number of volunteer contributions, at least i'm more interested in contributing more now.

q3k wrote at 2020-10-28 15:27:04:

I really don't mind the Gerrit UX - it seems to be optimized for daily use by programmers, not for onboarding speed. That's a tradeoff I'm very much okay with.

20after4 wrote at 2020-10-28 17:40:28:

Gerrit is so off-putting to new users that many never get over the learning curve. At Wikimedia we want to be welcoming to new contributors. We also have to consider the on-boarding experience of new staff members, as well as the productivity of staff and long-time contributors. Gerrit satisfies some people who have used it for a long time but it is almost universally disliked by newcomers. When your users are volunteers, you can't force them to use Gerrit until they get used it it. If the experience is bad enough then they will choose to spend their time on something else instead.

bawolff wrote at 2020-10-29 01:15:53:

I'm personally somewhat doubtful that that is really the aspect of the new dev experience at wikimedia that is actually turning off newbies.

shandor wrote at 2020-10-28 22:15:49:

Curious, did you think of moving also the repos and reviews to Phabricator, as I recall WMF using that for task management?

badrequest wrote at 2020-10-28 15:51:11:

You're okay with the tool being difficult to use because you already know how to use it.

q3k wrote at 2020-10-28 15:54:14:

I'm okay with it being slightly more difficult to get started with, in return for higher productivity in the long run, yes. Right now, after spending a similar amount of time both with GitHub PRs (and Gitlab MRs) and Gerrit, I still find Gerrit much easier and faster to use.

llimllib wrote at 2020-10-28 18:00:44:

I'm generally okay with that tradeoff too, but we tried it early on at our company and the juice was not worth the squeeze, at least in our case.

Designers and even many developers found it essentially impossible to use and the developers who were reasonably comfortable with it spent way too much time assisting others in attempting to use it.

(fwiw I found myself somewhere in the middle - I like the model and understood the ideas but also found it annoying to work with in practice)

dionian wrote at 2020-10-28 23:28:34:

Same. Cool idea in concept. Not something I have enough time to be interested in using heavily.

lawtalkinghuman wrote at 2020-10-28 16:03:17:

I'm okay with Vim being slightly harder to learn to use than VS Code. A tougher learning curve in exchange for more powerful tools can be a good tradeoff.

huskyr wrote at 2020-10-29 23:44:00:

I think this is a good comparison, and exactly shows the problem: Vim is only used by a minority of developers, the majority use some kind of graphical editor (like VS Code). That doesn't mean learning Vim isn't a good tradeoff, it's just not a good tradeoff for the majority of people.

dionian wrote at 2020-10-28 23:27:48:

I already know how to use it and I hate how proprietary it is, I have so many other problems on my plate than customizing my Git to work a certain way. I like using the off the shelf tools that work nicely with the normal git workflows using temporary branches for MRs. And a nice UI that anyone can use with minimal effort or training.

parliament32 wrote at 2020-10-28 16:30:08:

"Because it's hard" is a bad reason to shy away from something.

marcinzm wrote at 2020-10-28 19:45:26:

No it's not, you have limited time and devoting X hours to gain back X/10 hours worth of productivity gain in the future is a bad investment. Don't do something hard for the sake of doing it unless the gains out weigh the cost.

https://xkcd.com/1205/

kortilla wrote at 2020-10-28 19:53:39:

Difficult to learn != difficult to use.

noizejoy wrote at 2020-10-28 20:39:31:

I agree with you - for tools I use on a daily basis.

However, since my interests vary considerably, and therefore I dabble with lots of different tools, the difficult-to-learn tools never get enough traction in my limited human memory to get me to the easy-to-use stage.

If a community doesn't want to engage occasional users, it's probably fine (maybe even desirable) to have a higher barrier to entry to make daily use really fast.

If a community benefits meaningfully from occasional users, a high learning barrier may not be a good thing.

svrb wrote at 2020-10-28 15:54:31:

Software should be written for users, not non-users. You'd think this would be self-evident and yet here we are.

zimpenfish wrote at 2020-10-28 16:00:29:

How do you then convert a non-user into a user with the least friction?

Cthulhu_ wrote at 2020-10-28 16:13:07:

Force. I for one never looked at Gerrit and thought "I should push this at my employer". I'll probably never use it unless I'm forced to.

bawolff wrote at 2020-10-29 01:14:05:

Gerrit ux is full of bugs that get in the way of daily use (maybe improved over time). Things like overriding ctrl-f in the browser but then having the overriden search bar not work or not being able to effectively type inline comments on mobile.

I dont really think the choices that they did make that work, are really any better in optimized daily use than intuitive choices would have been.

(Yes probably much of this is fixed)

delsarto wrote at 2020-10-28 22:17:34:

The git-review [1] tool makes it trivial to interface with Gerrit (it's likely packaged for your distro, see [2]). I've found many people struggling with Gerrit don't know about it and it has made their life significantly easier. It handles all the magic of pushing to refs so that you never need to know about it. You drop a .gitreview in your project and then your work-flow is literally

    $ git checkout -b my-feature-branch
    $ emacs ... <edit edit edit>
    $ git add -i ...
    $ git commit
    $ git review
     read reviews, edit
    $ git add -i ...
    $ git commit --amend
    $ git review # push new change revisions

You can download an upstream change to use locally with "git review -d 123456"

[1]

https://docs.openstack.org/infra/git-review/

[2]

https://www.mediawiki.org/wiki/Gerrit/git-review

bawolff wrote at 2020-10-29 01:10:06:

Honestly, i find git-review much more annoying than just memorizing `git push origin HEAD:refs/for/master` (if that's too much easy to create a normal alias) and the gerrit web interface gives you the command to download a specific changeset. Git-review tends to break in unclear ways and sometimes do things other than what i expect it to.

u801e wrote at 2020-10-28 22:12:17:

> Moving stuff to Gitlab will probably increase the number of volunteer contributions, at least i'm more interested in contributing more now.

Is there a project out there that originally used something like Gerrit, Phabricator, Reviewboard, or a mailing list that moved to Gitlab or Github where the number of contributions increased after the change?

epistasis wrote at 2020-10-28 15:19:15:

What advantages do you see in Gerrit? Do they require a lot of experience in order to be realized?

q3k wrote at 2020-10-28 15:22:50:

Clear mapping of change request == commit, allowing for easy building of multiple in flight change requests, rearranging them by using git rebase, updating with push to refs/for/master. Easy diffing between states of the CR (patch sets), so you can see what changes since the last round of comments, even if it spanned multiple updates. This is the feature I miss the most from other code review systems - be able to easily work on another commit that bases on one that I just sent out for review (even starting review on the new while the parent still hasn't finished being reviewed!), and rebasing my current change as the parent change gets reviewed/updated.

Possibility to send out a PR for review using just a push (change message in commit, push to refs/for/master%r=foo).

Snappy and compact code review experience (no space wasted for whitespace, avatars, pretty buttons). Full coverage with keyboard shortcuts.

Powerful review rule system based in Prolog, allowing for things like code owners, experimental subdirectories without the need for review, etc.

dionian wrote at 2020-10-28 15:31:30:

> Clear mapping of change request == commit,

was forced to use Gerrit by a client. I could never get the hang of this, I like to do frequent commits on short lived branches and using vanilla git. I never wanted any more features other than a nice UI to encourage people to review.

q3k wrote at 2020-10-28 15:35:24:

The nice UI for review works well exactly because it limits the functionality available to users and enforces a particular commit model. If you don't do that, you get the code review mess that are Github/GitLab PRs/MRs - difficult to tell apart how commits relate to the change and how it progresses through review, because the entire branch history is free-form.

kevincox wrote at 2020-10-29 11:10:54:

It's possible to allow users to have multiple commits but still show review between the submitted "tip" commits. If you want you don't even need to show the other commits in the UI.

detaro wrote at 2020-10-28 15:35:43:

One thing one has to get away from that one change request == one issue. Multiple small commits as separate change requests for one story are fine as long as they work standalone (which is generally a good idea, e.g. to enable bisecting)

js2 wrote at 2020-10-28 15:25:50:

I much prefer the model of force-pushing your development branch to create change sets. It lets you more easily see how development evolves in response to feedback. And the final state of the branch which gets merged leaves all the in-progress work that no one cares about behind only in Gerrit.

With Github/lab's model, if you force push your PR, you lose the ability to view its previous state and diff against that. Alternately, if you just keep adding commits, then the final branch that gets merged (unless you squash) has all the in-progress work which pollutes the repo's history.

Gerrit also has a finer grained permission model, but I don't care as much about that.

Gerrit definitely expects the user to understand how git works conceptually a bit more than Github/lab.

nemetroid wrote at 2020-10-28 16:12:13:

> With Github/lab's model, if you force push your PR, you lose the ability to view its previous state and diff against that.

That's not quite true. Gitlab lets you compare any two "versions" of the force pushed branch.

js2 wrote at 2020-10-28 16:52:06:

Thanks. I haven’t used gitlab. I assumed it worked the same as GitHub. That’s good to know.

kevincox wrote at 2020-10-29 11:13:52:

Yup, GitLab works as expected here. It's always surprised me how quickly the old commit is garbage-collrcted when you force-push a branch on GitHub. It causes weird errors in CI runs and breaks viewing the old commits.

Seriously I'll pay for those couple of KiB of space, just keep it around. (at least until the PR is closed)

0x0 wrote at 2020-10-28 16:13:41:

Gitlab DOES let you compare different versions of changesets in a merge request:

https://docs.gitlab.com/ee/user/project/merge_requests/versi...

u801e wrote at 2020-10-28 22:31:21:

> With Github/lab's model, if you force push your PR, you lose the ability to view its previous state and diff against that.

I'm not sure about Gitlab, but Github has recently added a feature where you can view the diff between the old branch head and new one. But, as far as I'm aware, there's no way to check out the previous branch head from the repo due to a lack of a remote branch pointing to it.

At least git itself provides a range-diff command that allows you do see a diff between the commits between two versions of a given branch.

depressedpanda wrote at 2020-10-28 17:03:08:

But you don't force push with Gerrit?

Just update your change set, then push to refs/for/<branch_name> again.

js2 wrote at 2020-10-28 18:55:15:

It's been a while since I used it, but I guess the more important point is that it's a rebase-based workflow.

My memory was you had to force push your working branch to refs/for. Thank you for the correction.

I've actually setup and run instances of it as two companies, but as I say, it's been a while.

depressedpanda wrote at 2020-10-29 12:11:06:

Yes, I very much agree: the rebase-based workflow is what makes Gerrit superior to all other systems I've tried (of which I find Github and Bitbucket particularly loathsome).

I felt I needed to correct you because with Gerrit you reserve the concept of force pushing for exceptional cases, which I think is the correct mental model. Force pushing should not be done frivolously.

20after4 wrote at 2020-10-28 17:55:14:

It's effectively similar to a force-push.

depressedpanda wrote at 2020-10-29 12:02:41:

I would say it's quite different, since you don't overwrite anything. Old patch sets are still available should you wish to roll back/reference either a particular commit, or a whole chain of commits.

kortilla wrote at 2020-10-28 19:57:27:

It is a force push under the hood. You’re removing an old commit and creating a new one.

lacksconfidence wrote at 2020-10-28 22:30:15:

It's not a force push under the hood. Under the hood a new branch is created (first patch set is /refs/changes/nnnn/0, second is /refs/changes/nnnn/1, etc.). From the end user perspective the result is quite plausibly similar to force push.

kortilla wrote at 2020-10-28 22:37:34:

The branch model must have changed. I used to push directly to a branch that was the change number (not the revision number) and of course had to force because it was the same.

So what happens when you push to changes/nnnn/12 when revision 11 hasn’t been created?

lacksconfidence wrote at 2020-10-29 02:13:25:

At least the way our installation works we don't push to changes/nnnn/12, rather changes are pushed to refs/for/{branch} (often refs/for/master). This endpoint isn't an actual branch. It triggers gerrit to look up the nnnn for the Change-Id in the commit message and create the appropriate branch for the next patch set.

finnthehuman wrote at 2020-10-28 16:07:58:

It's depressing how git tooling has to squish out all the flexibility that makes git attractive in the name of being unsurprising to someone who doesn't understand git.

I was excited to finally bring in git as we start ramping down on a legacy project onto something new. Then I started thinking about the developers that have never touched git and I need to support. I looked at the tools available and what workflows they dictate. Then there's the drive to do something similar to the rest of the company, autonomy only goes so far without a good reason.

Fuck me. I'm going to pick GitLab and hate it.

okhan wrote at 2020-10-28 19:05:47:

I had to use Gerrit in a previous job, and _hated_ it. The UX is abysmal. Some folks loved it though, especially engineers working mostly on the backend. People with more of a frontend focus couldn't get past the awful user experience.

alessioalex wrote at 2020-10-29 10:06:48:

Can confirm, it's extremely off-putting to say the least. Having previously used Github, Gitlab, Bitbucket I find Gerrit very unusual. I have to Google on how to do basic stuff.

Entaroadun87 wrote at 2020-10-28 18:42:28:

Why is it sad? They've identified a better experience for their target audience, new community devs, just that not perhaps for you haha.

PieUser wrote at 2020-10-28 15:19:55:

This is sad. In my experience, GitHub is a much better code review system than GitLab merge requests. But it is different from what people are used to.

modernlearner wrote at 2020-10-28 17:07:06:

I've been using Gitlab and honestly it's pretty awesome; using a mono-repo setup and everything just works. The integration of CI jobs is also excellent.

zimbatm wrote at 2020-10-29 14:10:10:

It's a shame that Gerrit doesn't get more TLC. Now that GitHub is owned by Microsoft, it would be a good way for Google to combat the ongoing centralization of open source.

Fundamentally, the Gerrit changes model is superior than the pull-request model. I also really like how Gerrit stores all its information in git repos, comments, settings and everything else.

The issue is everything that surrounds it. Talking from somebody who maintains an instance for a customer. The UI has all sorts of UX problems and takes a while to get into. The notification system is super noisy by default, we had to build a system on top and it's still not as good as GitHub's (which is also bad in its own respect).

celeritascelery wrote at 2020-10-28 15:26:44:

it was interesting to read their reasons for not using github, especially related to no control over bans or sanctions. Microsoft could pull the rug out from under them at any time if they got pressure from somebody like China.

tmotwu wrote at 2020-10-28 15:53:06:

Interesting to see China be used as the example. Github recently took action towards DMCA requests. Meanwhile, Github has been used as actively used as a safehaven from Chinese censors when it came to 996 protests and COVID information.

OskarS wrote at 2020-10-28 17:35:51:

For sure, Github should be lauded for its policy towards China, they have absolutely done the right thing here. Even so, there's no guarantee that that would last forever, so it makes sense for a company like Wikimedia to host this themselves.

mxskelly wrote at 2020-10-28 16:07:03:

Oh right yeah pressure from China, like what got youtube-dl entirely removed... Oh wait no that was the US government.

driverdan wrote at 2020-10-28 16:31:28:

I downvoted you because it wasn't the government that filed the takedown. Perhaps you mean "a private company using laws enacted by the US government?"

mxskelly wrote at 2020-10-28 16:57:18:

What's the difference?

frenchyatwork wrote at 2020-10-28 16:27:20:

It's naive to believe that Microsoft would stand up to the CPP and not the US government.

mxskelly wrote at 2020-10-28 16:58:10:

It's more naive to treat the CCP like some kind of horrible boogeyman who wants to destroy freedom while completely ignoring the US's own issues.

frenchyatwork wrote at 2020-10-28 18:01:55:

It's more naive to treat the CCP like some kind of horrible boogeyman

This is a valid point some of the time, but I don't think it really applies here. First off, the Wikipedia has had problems with censorship by various states, China being the most notable by a long shot (

https://en.wikipedia.org/wiki/Censorship_of_Wikipedia

It's very much true that the US government (or some other non-China state) could be a threat to the Wikipedia in the future, and I'm sure the folks at Wikimedia are aware of that too.

simlevesque wrote at 2020-10-28 15:39:59:

I'm not sure how GitLab is a better option. China could pressure them too.

haolez wrote at 2020-10-28 15:40:41:

GitLab is open source and can be self-hosted by Wikimedia.

detaro wrote at 2020-10-28 15:41:47:

They're selfhosting.

bredren wrote at 2020-10-28 18:24:18:

I was able to see Wikipedia uses something called openstack, but not the details of what infrastructure they host on. Anyone know what facilities these repos will ultimately be served from?

bawolff wrote at 2020-10-29 01:25:35:

Openstack is used for wikimedia cloud, which is a project to give volunteers compute resources to do cool projects (this includes domains wmcloud.org toolserver.org and wmflabs.org). Production does not use openstack.

Things like code review tools would be hosted in Virginia [eqiad] (with backup in texas [codfw]) on hardware owned by the Wikimedia Foundation.

Docs about how gerrit is hosted

https://wikitech.wikimedia.org/wiki/Gerrit

if you want to know nitty gritty details see also

https://github.com/wikimedia/puppet

easton wrote at 2020-10-28 21:28:47:

Wikimedia uses colocated boxes around the world at different providers, and the user facing stuff is backed by Cloudflare.

https://meta.wikimedia.org/wiki/Wikimedia_servers

bawolff wrote at 2020-10-29 01:19:46:

The user facing stuff is not backed by cloudflare (Cloudflare's magic transit has at some points been used to mitigate DDOS attacks at the IP layer, but otherwise cloudflare is not used. Wikimedia operates (its own) varnish servers in virginia, texas, san francisco, singapore and amsterdam to do frontend caching.)

addicted wrote at 2020-10-28 16:02:53:

Is this GitLab open source, or is it GitLab enterprise on the free plan?

Because they are different products, even if the GitLab marketing pages make it appear as if they are the same.

simlevesque wrote at 2020-10-28 16:04:43:

Community Edition.

simlevesque wrote at 2020-10-28 15:53:18:

Oh I get it now thank you. I had read only the outcome part.

bastardoperator wrote at 2020-10-28 16:06:41:

You can self host GitHub too

parliament32 wrote at 2020-10-28 16:28:06:

It's not open source, at best they ship you a locked down VM image to run yourself.

Edit: Confirmed, "GitHub Enterprise is delivered as a virtual appliance that includes all software required to get up and running. The only additional software required is a compatible virtual machine environment."

https://enterprise.github.com/faq

saxonww wrote at 2020-10-28 16:45:33:

This is not true, they publish the code at

https://gitlab.com/gitlab-org/gitlab

If you want to use premium features, you do have to pay to unlock them, and depending on your deployment you would need to get the gitlab-ee image instead of the gitlab-ce one.

wizzwizz4 wrote at 2020-10-28 16:57:36:

We're talking about GitHub.

parliament32 wrote at 2020-10-28 17:37:54:

https://enterprise.github.com/faq

marcinzm wrote at 2020-10-28 16:34:40:

Which costs $21/user/month versus $0 for the free plan and $4 for the team plan. That's a steep price increase.

Cthulhu_ wrote at 2020-10-28 16:10:20:

If you pay the enterprise fee, which is likely an X amount per seat. They MIGHT give it out for free to Wikimedia as a favor / goodwill, but there will be strings attached.

(I'm not saying Gitlab doesn't have strings attached)

meesles wrote at 2020-10-28 16:10:15:

Source? I don't think that's true. Github has on-premises enterprise solutions but one should not confuse that with running free and open software on your own machines -

https://enterprise.github.com/faq

bastardoperator wrote at 2020-10-28 16:28:06:

I personally don't think anyone would confuse it, but the fact remains, you can host self GitHub. Maybe it costs, maybe it has string attached but ultimately the functionality is there... which makes it true.

snnn wrote at 2020-10-28 15:51:12:

Gitlab has no business in China.

pengaru wrote at 2020-10-29 00:11:10:

I'll generally support using gitlab over github on principle, but have found gitlab to be basically useless without javascript enabled while github still kind of works at least for read-only visiting of shared github URLs. Whenever someone shares a gitlab URL with me, it's very rare I can make any use of it without enabling js, it's quite annoying.

nichos wrote at 2020-10-29 00:27:22:

Of a simple page that works without JavaScript enabled is of interest to you, you might want to check out sourcehut.

pengaru wrote at 2020-10-29 01:10:28:

If I could compel everyone migrating to gitlab to instead use sourcehut, I would.

Alas this isn't about my personal choice of git hosting, I already have my needs covered with a dedicated server.

mixologic wrote at 2020-10-28 22:45:51:

"This raises the question: if Gerrit has identifiable problems, why can't we solve those problems in Gerrit? Gerrit is open source (Apache licensed) software; modifications are a simple matter of programming."

... And nuclear power is a simple matter of splitting atoms.

gravypod wrote at 2020-10-28 15:24:01:

One thing stopping me from moving my company's code to gitlabs cloud offering is the storage sizes for repos being extremely small. I heard from a rep that this will change in November. I'm wondering if this purchase-more-storage change come relates to this?

alex_reg wrote at 2020-10-28 15:27:39:

Wikimedia will self-host Gitlab. They can use whatever limits they want.

q3k wrote at 2020-10-28 15:30:49:

They are still very limited when it comes to functionality though - only the first column in

https://about.gitlab.com/pricing/self-managed/feature-compar...

, right?

Deukhoofd wrote at 2020-10-28 15:39:08:

They probably use the free Open Source option, which gives access to the top tiers for free.

https://about.gitlab.com/solutions/open-source/

mikey_p wrote at 2020-10-28 17:19:11:

That means they are running non-free code, which seems to be one of their main points against Github.

20after4 wrote at 2020-10-28 17:55:49:

It's community edition.

mikey_p wrote at 2020-10-28 21:28:44:

The community edition doesn't have any tiers or features, those are only in the Enterprise Edition.

67868018 wrote at 2020-10-28 18:01:33:

This requires asking for a specific number of license seats for your open source project and that's impossible to work with. Every possible user account/contributor takes up a license. How many contributors will I have tomorrow? Don't know. How many spam accounts will I have that waste license seats? Too many, and they're impossible to clean up.

alex_reg wrote at 2020-10-28 15:34:45:

All of those are self-hosted.

"Core" is the free community edition. The others are not open source/free and require payment.

Core is plenty for most use cases. That table makes it look like it has almost no features, but most items in the list are either advanced or pretty niche.

q3k wrote at 2020-10-28 15:36:34:

Right, so is Wikimedia going to pay $$ to GitLab for one of the more advanced licenses?

jonp888 wrote at 2020-10-28 15:40:55:

According to their pricing page: "We provide free Gold and Ultimate licenses to qualifying open source projects and educational institutions. Find out more by visiting our GitLab for Open Source and GitLab for Education program pages."

I'm sure Wikimedia has the dosh to pay for licenses themselves, but it's hard to see how the per/user pricing model would work for any open source project.

boleary-gl wrote at 2020-10-28 16:01:53:

No, Wikimedia is going to be using the Community Edition (CE) of GitLab, which is free and open source under an MIT license. This decision and the reasons for it are described in more detail in the FAQ section of the linked article.

Grimm1 wrote at 2020-10-28 15:34:32:

My understanding is you can self host and pay a sub and get access to all the other features. I may be wrong.

boleary-gl wrote at 2020-10-28 16:00:52:

That's correct - with the GitLab Enterprise Edition you can self-host GitLab and get access to all of the features - both those from the Core open source version as well as our proprietary features.

mikey_p wrote at 2020-10-28 17:20:29:

Yes, but that's only if you self-host the Enterprise Edition which contains non-free code, not the Community Edition which only contains free code.

SparkyMcUnicorn wrote at 2020-10-28 15:54:51:

The limit is 10GB, while GitHub's limit is "ideally less than 1 GB, and less than 5 GB is strongly recommended".

I don't consider 10GB to be "extremely small", especially since that's a larger than usual limit for a hosted solution.

naavis wrote at 2020-10-28 17:14:32:

You can buy tons more space for Git LFS on GitHub. 10GB seems to be a hard limit on Gitlab.

YorickPeterse wrote at 2020-10-28 15:54:55:

The repository storage size is 10GB [1]. I wouldn't consider that "extremely small".

[1]:

https://docs.gitlab.com/ee/user/gitlab_com/index.html#accoun...

gravypod wrote at 2020-10-28 16:30:46:

That's for container registry, packages, code, artifacts, everything. I have a single project in my monorepo that produces a 512MB binary file and stores it as an asset. In 20 CI runs, assuming there was 0 code in repo or anywhere else, we'd use up the entire budget. We make more than 20 commits/day.

remram wrote at 2020-10-28 22:18:09:

I don't know about those other features, but that definitely does not include the container registry.

detaro wrote at 2020-10-28 15:27:05:

They're hosting Gitlab themselves.

BerislavLopac wrote at 2020-10-28 15:31:51:

Where can one find the repo sizes?

Wezl wrote at 2020-10-28 17:02:05:

I need clarification: is this Mediawiki or Wikimedia moving to gitlab? It seems the Wikimedia foundation is moving their code, so why is this on mediawiki.org?

detaro wrote at 2020-10-28 17:05:12:

> _so why is this on mediawiki.org?_

Probably because mediawiki is the main Wikimedia software project.

unixsheikh wrote at 2020-10-28 20:47:19:

Google's non-free JavaScript recaptcha is an absolute showstopper! At least roll out a custom made solution of nothing else.

brennen wrote at 2020-10-28 22:57:56:

We are not going to use reCAPTCHA.[0]

[0].

https://www.mediawiki.org/wiki/GitLab_consultation/Discussio...

zdw wrote at 2020-10-28 17:42:52:

I'd be interested in the CI aspects of this transition, which seem to be glossed over.

The combination of Gerrit with Jenkins and Jenkins Job Builder blows everything else I've seen out of the water with how easy it is to make both per-patchset and post-integration changes in an infrastructure-as-code manner across multiple repos, once you get over the learning curve.

brennen wrote at 2020-10-28 19:00:40:

We recently conducted an extensive evaluation of CI options[0], concluding at the time that GitLab could meet our needs but didn't make a whole lot of practical sense unless we also migrated to it for code review. Other considerations (Gerrit user experience and sustainability, onboarding costs, a de facto migration of many projects from our Gerrit instance to GitHub, etc.) led us to re-evaluate whether a migration for code review would make sense, and that's what the decision linked here addresses.

I am on the team that maintains our existing CI system[1] (Zuul, Jenkins, JJB), though I mostly work on other things. While this system is certainly quite _powerful_, I would not personally describe it as easy. We have a lot of work in front of us in migrating it to GitLab, but so far I've found the experience there quite a bit more pleasant than grepping through JJB definitions and the like.

At any rate, if you're interested in how all of this pans out, we will as ever be doing the work in a very public fashion.

[0].

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering...

[1].

https://www.mediawiki.org/wiki/Continuous_integration

zdw wrote at 2020-10-28 20:55:33:

To clarify, is your GitLab-based job system done with configuration-as-code and does it share definitions of jobs across repos?

The solution we come up with in using Gerrit/Jenkins was to have a common test invocation (in our case `make test`) that glossed over all the details of a projects build process, and was expected to output test and coverage in specific formats Jenkins could consume (junit/xunit and cobertura). We have jobs that run `make test` no matter if the code is C, Go, Python, Javascript, etc.

This also had the beneficial side effect of lowering the barrier to entry for someone working on any random project - make papers over all those toolchain differences.

brennen wrote at 2020-10-28 21:57:41:

> To clarify, is your GitLab-based job system done with configuration-as-code and does it share definitions of jobs across repos?

We announced our decision to migrate to GitLab on Monday, so we don't so much have a GitLab-based job system.

Nevertheless, yes, GitLab CI jobs will be defined by files checked into version control, and we'll reuse things where appropriate.

vaccinator wrote at 2020-10-29 01:04:31:

Since Microsoft is pro open-source now, they probably should move too.

dfalzone wrote at 2020-10-28 21:52:11:

Would it make sense to implement a peer-to-peer protocol for sharing git repositories, similar to torrenting, with the goal of overcoming these kinds of issues?

gerdesj wrote at 2020-10-29 01:14:07:

Things move so fast! git was initially developed on the back of a fag (cigarette) packet by Mr Torvalds and his mates for Linux code development. It seems to work quite well because we have a lot of gits these days.

github or lab or gerrit? I don't have favourite but I did notice a lot more coloured lines with gerrit stuff and a feeling of bewilderment. However I also felt like the adults seemed to know what was going on.

Let's see how this pans out. It'll probably be fine but I suspect we will lose something by creeping towards the "mainstream" and ignoring diversity. That sounds a bit odd for a tool designed for Linux 8)

chovyfu wrote at 2020-10-29 03:38:13:

No more shithub.

sitzkrieg wrote at 2020-10-28 19:16:44:

why is this news

slim wrote at 2020-10-28 17:53:35:

is the deal with Wikimedia the reason gitlab blocked users from Iran ?

lanevorockz wrote at 2020-10-28 16:22:59:

Microsoft will slowly kill github

throw_m239339 wrote at 2020-10-28 16:32:09:

Depends. MS used to have its own code hosting service (Codeplex I believe), and it wasn't that bad, but Github was more "social".

I think it's mostly a question of how independent Github will be under MS. Now to be fair, MS has good services. Azure is nice, and office 365 is pretty good.

syrrim wrote at 2020-10-28 16:44:16:

They might make money off if it, but I believe GP was referring to github's prominence as the home of so many free software projects.

A_No_Name_Mouse wrote at 2020-10-28 15:18:39:

A compony moves their code repository to a different provider. Honest question: why is this news worthy? Am I missing something?

vorpalhex wrote at 2020-10-28 15:25:27:

Wikimedia is very cautious in making changes because they take very seriously the values of sustainability and predictability.

That they are moving from Gerrit to Gitlab is a blow against Gerrit and a boon for Gitlab (assuming it goes well).

zeitg3ist wrote at 2020-10-28 16:02:14:

I don’t know about Wikimedia being super cautious when adopting technologies... I host a MediaWiki instance and there’s been a lot of “not so cautious” tech decisions in the past. Jumping early on the HHVM train (which they eventually had to leave); adopting Lua for wiki modules; developing Parsoid as a Node service (now rewritten in PHP)... None of these was the “safe option”; in some cases it worked out well but in others it didn’t.

TonyTrapp wrote at 2020-10-28 17:03:04:

None of that was forced on anyone, though. Everything you mention are purely optional components. I think it makes sense for them to explore new technologies like that while still keeping their general requirements conservative.

zeitg3ist wrote at 2020-10-28 23:56:46:

Yeah, but those components were heavily used on Wikipedia and other Wikipedia websites, and some - especially Lua modules - are fundamental once the wiki grows to a certain size, since wikicode templates are quite limited and suffer from performance problems.

sayhar wrote at 2020-10-28 16:20:12:

What's wrong with HHVM? Honestly asking, I have no real context beyond knowing it is a FB invention

SahAssar wrote at 2020-10-28 22:16:27:

HHVM got very little community adoption and never had strong promises to stay compatible with the rest of the PHP ecosystem.

A_No_Name_Mouse wrote at 2020-10-28 15:32:40:

Thanks. I'm not a developer so I guess it's relevant, just not for me.

melling wrote at 2020-10-28 15:24:33:

It helps to follow trends. For a while I thought Gitlab was part of GitHub.

When I see enough news about something on HN, I look into it.

I even did a little Rust tutorial recently.

Buzz makes a difference

paulus_magnus2 wrote at 2020-10-28 15:54:52:

^F gittorrent.

It is like working in a big corporation. Eventually you will do / say something that won't be liked by a person in power / competitor and you will be cancelled. Community / public supported projects, also free speech are only safe from influence on community infrastructure. It's time to start decentralizing.

JeremyNT wrote at 2020-10-28 16:02:29:

Canceled by whom? They are self-hosting their own instance of Gitlab CE. If Gitlab the company disappears tomorrow, they can fork the project and continue using it.

krelian wrote at 2020-10-28 16:17:10:

I never checked out gitlab but the name always made it seem like a github copycat. If you want to differentiate your product why not choose a name that rings different to your competitor?

And now I finally went and browsed some repositories and it's clear as that they were very much "inspired" but Github's design, it's practically a github clone, at least in that regard (although something is a bit off in the smoothness)

Maybe one day I'll find myself in a similar position and see things differently but this blatant copying always seemed ridiculous to the point where I would feel ashamed to lead this type of strategy.

whoisjuan wrote at 2020-10-28 16:46:15:

I don't want to sound pedantic but this comment makes you look extremely ignorant and unaware of this space and its solutions, use cases, etc. Basically you sound completely oblivious of what's going on here.

If today was the first day that you saw GitLab, perhaps you don't have the domain knowledge to make any of this claims. Nothing you said here makes sense. The Git as a prefix in the name is just a technicality, the same way I could create an operating system called YeahOS and everyone would understand that OS is a suffix that blends into the branding.

But what makes you sound particularly ignorant it's calling GitLab a clone of Github. They are both git platforms, that's it. By that metric BitBucket, Gitea, Gogs, CodeCommit, Phabricator, etc., are all clones of GitHub.

If anything, recent GitHub functionalities like GitHub Actions are a clone of mature GitLab functionality like GitLab's CI.

krelian wrote at 2020-10-28 20:33:21:

>the same way I could create an operating system called YeahOS and everyone would understand that OS is a suffix that blends into the branding

Yet if you named it myOS and copied the look and feel from iOS everyone could see that you created an iOS clone.

That's the only point I was making but seems that many feelings were hurt in the process.

whoisjuan wrote at 2020-10-28 20:50:51:

It's all fair. Looking back at my comment I sounded a little bit harsh and I apologize for that.

What I was trying to establish is that there's really no solid ground to claim that any Git platform is a copy of another one since they are all essentially productivity and team-work wrappers around Git.

If you're talking about the general information architecture that's how SourceForge was even before Git existed, so hardly an original idea.

krelian wrote at 2020-10-28 21:41:48:

>no solid ground to claim that any Git platform is a copy of another

I think that's fairly obvious. I guess I didn't explain clear enough the point I was trying to make in the original post. I kind of regret writing it now.

tantalic wrote at 2020-10-28 16:29:10:

I think it's fair to say that a few years ago GitLab was very much inspired by Github's design. However the project has had a focus on adding additional, tightly-integrated features and I'd say in the last couple of years Github has been more inspired by GitLab than the other way around.

GordonS wrote at 2020-10-28 16:23:37:

GitHub and GitLab, Gitea etc are all centred around git, so including "Git" in the name seems like an obvious, sensible even, idea.

Personally I don't think the GitLab UI looks that similar to GitHub's, and to me it looks and feels kind of clunky, and lacks contrast.

spacechild1 wrote at 2020-10-28 16:27:36:

The important difference is that GitLab is open source! That's a bit like critizing LibreOffice for "copying" MS Office and using the word "Office" in its name.

dnsmichi wrote at 2020-10-28 16:51:27:

Hi, Developer Evangelist at GitLab here.

GitLab Core is open source which is the base for the Community Edition. The paid tiers are based on Core and add Enterprise licensed proprietary code, following our open core business model.

More about our Open Source stewardship:

https://about.gitlab.com/company/stewardship/

globalproctd wrote at 2020-10-28 16:21:26:

You could not be farther from the truth. Gitlab is an improvement on Github in every way, they offer a suite of features that Github doesn't have that are only available through third party connections.

Gitlab is not a Github clone.

dudul wrote at 2020-10-28 19:32:02:

A few years ago, Gitlab was indeed mostly a clone of Github. The thing is during these few years I personally think Gitlab became far superior to Github in terms of "turn key" solution to manage your software. Now, Github is the one "stealing" ideas from Gitlab, with for example their Actions.

gilrain wrote at 2020-10-28 16:22:57:

That two services based on git have git in their names seems reasonable. That's the only commonality... "hub" and "lab" are both one syllable, but very different in sound and connotation.

ganoushoreilly wrote at 2020-10-28 16:21:49:

Both are just fancy wrappers and tools built around GIT. It's not surprising that one looks like the other. Given historic exoduses from one, it's only logical the other would build a similar toolset with the intention of allow for easy migrations between platforms.

While similar, the companies are different in the approaches, the platforms are different in their pricing (both having impacted the other which ultimately has helped consumers).

Your argument would hold true of any word processor. I think it's an unfair assessment of two products targeting the same specific users.

Buttons840 wrote at 2020-10-28 16:21:04:

People wanted a decentralized GitHub. That's why they built it like GitHub, except decentralized.

boogies wrote at 2020-10-28 16:21:06:

If you think Gitlab’s a Github clone you must not have seen Gitea.

Wezl wrote at 2020-10-28 17:09:05:

I agree that Gitea looks a lot more like Github (but with a dark theme!), and I think this is a good thing. I also like Gitea because it doesn't have reCaptcha or other proprietary components, but it doesn't seem to have the advanced ('Enterprise') features that Gitlab does (I program as a hobby so I've never needed these). When Gitea has grown and is more accepted, I hope the Wikimedia Foundation will consider using it.