The PITA Threshold: GitHub vs. CPAN

I recently attended an excellent talk by GitHub founders, about the history and rapid success of GitHub. The talk was highly entertaining and informative, and also brought forth a concept I've often encountered but didn't have a name for. As the time of this writing, there are only 61 Google hits and 22 Yahoo! Search hits for it: the pain-in-the-ass threshold.

The PITA threshold

The PITA Threshold

The Pain-In-The-Ass threshold is where one draws the line and says "I give up" after making a certain amount of effort.

In the particular case of GitHub, what propelled its tremendous success was the ease with which a user could contribute to a project. The talk showed how after several projects moved from Google Code to GitHub, their contributions simply skyrocketed. That was because casual contributors could now easily fork the project, make their changes, then issue a pull request. With Google Code, making a contribution was much more complicated.

Some GitHub statistics as of February 2009

  • Has been online for about a year
  • 46,000 public repos
  • 17,000 in the last month
  • 6,200 have been forked at least once
  • 4,600 merged from fork
    • 75% of forked projects have work contributed back
  • 18,000 projects with at least one watcher
    • 25% get outside contributions

... and an update as of 2009-Jul-27

From Tom Preston-Werner, Chris Wanstrath and Scott Chacon - Git, GitHub and Social Coding:

  • 90,000 unique public repositories,
  • of which 10,000 in the last month
  • 12,000 repositories have been forked at least once,
  • for a total of 135,000 repositories.

... and how CPAN crosses the PITA Threshold line

The reason I started writing this article is that I wanted to submit a trivial typo fix to Text::Wrap, a core Perl module. To date, you can still see the typo in Text::Wrap.pm:

If you just to preserve existing newlines but...

It's a one-word typo. But this is how long it took me to attempt to fix it:

  • find the module author
  • e-mail him
  • wait a day - no reply
  • go on #p5p, where I was told that he hasn't been heard of in a while
  • scour the Internet for his other e-mail addresses
  • e-mail him again
  • get some distracted reply

Since Text::Wrap is in core, I then took the other route, and in roughly 10 minutes,

Shortly after, I got my tiny Text::Wrap patch mirrored to Perl's git repo. Keep in mind that this was the first time I forked a project on GitHub, and I also found a bug about forking Perl.

What if CPAN moved the source code repo to GitHub?

If the example above doesn't convincingly explain why I'm proposing that, here's my current conundrum:

I was using Test::Differences for its table output format, which is very useful for when you have two large pieces of text that don't match and you need to find out exactly how. The only problem with Test::Differences is that it can't wrap lines that are too long, and the table output in those cases looks completely messed up in the terminal.

It turned out that I wasn't the first to encounter the problem - someone else submitted a comprehensive patch for a maximum width option in the Text::Diff dependency, 3 years ago. Text::Diff has 6 bugs in total on RT, one of which has been submitted in 2004 and is an easy fix. However, Text::Diff hasn't been updated since 2002!

This, I think, illustrates CPAN's PITA threshold problem again. Here is a project that has generated interest, but which is now stuck, because of how CPAN contributions work. Quote:

In some circumstances, we can grant co-maintenance permissions to you or others if the current maintainer of a module has entirely disappeared. You have to understand that is not a decision we make lightly. We are essentially giving write access to somebody else's work to third parties without explicit consent from the missing author. Since almost all code on CPAN has a free license, this is likely unproblematic from a legal point of view, but any violation of a contributor's trust in the PAUSE/CPAN mechanisms is a serious blow against the work of everybody who contributes to CPAN. For this reason, we try to tread very lightly, make the least possible use of the administrative privileges and attempt to protect voluntary contributors like yourself or the author of the module at hand from any unnecessary burden.

This is a very respectful policy, except that it seems written 15 years ago, when social coding was way less popular, and the quest for personal glory trumped the desire to collaborate. It also assumes that the module author must have a really hard time accepting contributions. With CPAN though, this is quite true. As a module author, one has to:

  1. keep an eye on RT in case its email notifications end up as spam
  2. review the ticket
  3. no patch? -> forget about the issue (often)
  4. if there's a patch, download it, try to to apply it (that is, if you're not simply oblivious to it)
  5. build a new module version and publish it

Now if module sources were on GitHub, any contribution is ALREADY in the form of a patch. In case of a typo like the one in Text::Wrap, the author can even accept it online with one click. For a more complex patch, with 5 git commands, the module maintainer can check out the patch and merge it in the mainline after reviewing it and passing tests:

$ gh network fetch
$ git checkout -b testpatch
$ git merge user/branch
$ ... tests ...
$ git checkout master
$ git merge testpatch

Under the PITA Threshold? You bet.

Update

In the meantime, I found out that there exists a recent project aiming to "make trivially easy the process of grabbing any distribution off CPAN, stuffing it in a local git repository and, once gleeful hacking has been perpetrated, sending back patches to its maintainer": Git::CPAN::Patch.

My tags:
 
Popular tags: