deborah: The management regrets that it was unable to find a Gnomic Utterance that was suitably irrelevant. (gnomic)
deborah ([personal profile] deborah) wrote in [site community profile] dw_dev2012-04-29 02:40 pm
Entry tags:

Git, Mercurial, github, bitbucket

I want to spin off a new post from the log of last night's IRC developer meeting. The topic of GitHub came up in the meeting, and some concerns with that idea have been raised in the comments of the previous post. [personal profile] vlion's concerns largely address the difference between mercurial and git, whereas [profile] karelia's concerns also address that difference but touch incidentally on the hypothetical benefit of working in the more public environment of Github.

I was talking to [personal profile] allen and he pointed out that there are really two different issues in play here, because we can go to a shared, public, relatively popular, FLOSS-friendly environment without ever leaving mercurial, namely, Bitbucket.

I'd actually say there are three questions:
  1. Are there benefits to git over mercurial, and if so, are those benefits enough to outweigh the cost of switching to a new source control system?
  2. Would we like to move our source control management to a public, shared, FLOSS-friendly environment? If so, why? Do we think it would be more friendly to our current developers, do we think it would make it easier to bring in new developers, some combination of the two, or something else?
  3. If we want to move to a shared environment, do we feel that there is a strong reason that it should be Github? What are those reasons, if so? If we think git is worse than mercurial, but we do think there's a benefit to moving to Github, which reason should prevail?


Actually, we should probably add a fourth question, which is "would any of our needs be better served by using mercurial more in the fashion for which it was intended?"

Keep in mind when I write these questions that I use github for other projects and like it,and I have never used mercurial intensely enough to have strong feelings about it either way. Personally I fell in love with Perforce at an early date and find all other VCS systems to be it pale yet free imitations. But I do think that if we make a switch like this, these are the questions we need to answer.
pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)

[personal profile] pauamma 2012-04-29 07:10 pm (UTC)(link)
Ewww. Perforce. :-)

Other than that, what are you and allen doing in my brain? :-) (Time of the first meeting made it impossible for me to attend, but I definitely intend to revisit those questions at the 2nd meeting.)
exor674: Computer Science is my girlfriend (Default)

[personal profile] exor674 2012-04-29 11:10 pm (UTC)(link)
I think part of the problem is that mercurial's branching/merging is inferior, so we'd still have to so weird things.
pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)

[personal profile] pauamma 2012-04-30 07:35 am (UTC)(link)
I think part of the problem is that there's no visible, objective set of requirements (what we need or want of a VCS, and why we need or want that) that we can judge statements like "mercurial's branching/merging is inferior" against. Is there such a document that you and or Mark can pass around so we can go over it at the next meeting?
kareila: (Default)

[personal profile] kareila 2012-04-30 12:25 pm (UTC)(link)
Cosigned.

(no subject)

[personal profile] fu - 2012-05-08 07:39 (UTC) - Expand
kareila: Rosie the Riveter "We Can Do It!" with a DW swirl (dw)

[personal profile] kareila 2012-04-30 12:31 am (UTC)(link)
I agree these are the questions we need to be asking ourselves - thanks for articulating them.

I think the issue of using branches in Mercurial ties nicely into your question four.

[personal profile] alexbayleaf 2012-04-30 11:23 am (UTC)(link)
My only real thought on this is that "give me a link to your github" has become almost standard in hiring in open source land. I love the idea that DW is giving our developers skills they can use in their careers, especially developers who might be coming into software development via non-traditional channels (eg. not via a comp sci degree), and there's definitely something to be said for making DW developers' work more visible in that way.

Wait, I just realised I have another point. The Perl community at large uses github extensively, so cross-fertilisation with them would be a plus (from both directions: our developers would be better equipped to submit patches to other Perl projects, and external Perl developers might be drawn into DW development). Similar, the OTW uses github for the AO3, so skills gained on DW would be transferrable to that project and vice versa, which can only be a good thing for both projects.
baggyeyes: Bugs Bunny and the Bull (barcode)

[personal profile] baggyeyes 2012-04-30 04:21 pm (UTC)(link)
Excellent points. I've been trying to learn PHP and Drupal - the Drupal community just adopted Git after years of using CVS. They aren't on Github, but a lot of their developers swear by it.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2012-04-30 07:10 pm (UTC)(link)
First, it's useful to note that that VCS systems are much like asking which is better: emacs or vim. Everybody has an opinion, it's hard to reach agreement, and at the end of they day they can all do the job of editing files. :-)

Re: moving to a hosted solution (whether that is Bitbucket, GitHub, or Launchpad):

* More open and community oriented. GitHub and similar tools have really helped push contributions up and lowered the barriers to entry. You can look at things online without having to clone the repository somewhere.

* The online UI is much easier to use and look at than command line. Particularly for people who aren't quite as knowledgable about the tools. A web page gives people a lot of help in understanding what is going on and how to proceed, whereas a command line gives you no real help and lets you do terrible things to your repository.

* I don't really want to be running all this. Managing the commit system is kind of annoying and it needs some work and I haven't really put the time into it. I'd rather not have it hosted by some individual when there are great options out there.

Primary reasons for GitHub, for me:

* I feel that GitHub has become the market winner in easy to use, approachable, well documented online code repository systems. This means that people who become familiar with it also gain skills that are broadly useful in other projects. I.e., learning to do Dreamwidth development means you can easily drop in on any project that uses GitHub and start doing development. This is in keeping with our training ethos.

* The available GUI tools for interacting with git repositories and GitHub are pretty good. There are even iOS applications (and probably Android ones too) that help you remotely use GitHub. It's very convenient and easy. Git Tower, GitHub's application, gitx, etc etc. People are really building great tools for git since it's becoming so popular.

* The company behind GitHub spends a lot of time doing education. They have tutorial videos, documentation, and have optimized their site for ease of use. It's very approachable by newbies. They really emphasize this and they do a good job at it.

* git itself is an extremely powerful system. I've been using it in open source and professionally for years now and I think it has won compared to hg and others.

Now as to GitHub versus BitBucket versus whatever:

* Honestly, I think that this comes down to personal preference and experience. Every platform has some advantages and some disadvantages. In this case, though, it's clear that GitHub has become the market winner -- so it's in our best interests to go there.

* As mentioned above, many Perl projects have gone in the direction of git. This knowledge and familiarity with the ecosystem will help our developers become productive on other projects.

Finally, the largest argument I've heard against git is that some of our volunteers have gotten used to using Mercurial Queues. IMO, git rebase --interactive gives you 90% of that functionality and should be a good replacement for most people who need that power.

And if not -- there's nothing saying you can't use hg locally with mq and then just generate a diff file to upload. You can still use the tool locally if you want to.
pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)

[personal profile] pauamma 2012-05-06 07:36 pm (UTC)(link)
2 questions:
- For a developer (or a tester, or a reviewer), how much of the workflow (synchronizing/merging in changes, preparing a patch or a series of patches, pushing/uploading those patches) will actually use the web-based UI you mention, and how much will still use the command line from the developer/tester/reviewer's dreamhack?
- If (as it sounds from what you're saying), you're planning to deprecate the "upload patch to Bugzilla" step of the workflow, what replaces it, and how does that let a reviewer's see which patches are waiting for a review and attach said review to the patches?
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2012-05-06 07:45 pm (UTC)(link)
With GitHub, you don't have to make patches anymore. The process morphs to look something like this:

1) You make your changes in your local repo on your hack or whatever environment you use. You use a separate branch for each bug/feature you're working on.

2) When you're ready to submit, you push your changes up to your forked repository on GitHub. This is done with git push with some arguments and is really easy.

3) On GitHub's web site, you submit a pull request to the main Dreamwidth repository.

4) Admins can then review your pull request and request changes and/or merge it in to the base repo.

5) You close out your branches. Or keep them open if you want to do more work on this feature/bug -- you can submit another pull request later!

For more information on pull requests, and why they're so awesome:

http://help.github.com/send-pull-requests/

They really are amazing. You can upload code, discuss it, upload more code, preview things, bounce it back and forth, and eventually merge it in when it's ready to go. It's brilliant.

I wrote up a thing on how to use git flow with GitHub here:

http://qq.is/article/git-flow-on-github

It's pretty easy. There is no "create a patch" and upload files. You make your changes in a branch, commit them, and create a pull request. It's pretty easy.
pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)

[personal profile] pauamma 2012-05-06 11:40 pm (UTC)(link)
How can a reviewer (who may or may not be an admin, whatever that means for a git or github repo) know which uploaded patches are waiting for review pull requests have been submitted but not yet merged?

(no subject)

[staff profile] mark - 2012-05-07 02:11 (UTC) - Expand

(no subject)

[personal profile] fu - 2012-05-08 07:42 (UTC) - Expand

(no subject)

[staff profile] mark - 2012-05-08 08:21 (UTC) - Expand

(no subject)

[personal profile] pauamma - 2012-05-08 13:39 (UTC) - Expand

(no subject)

[personal profile] fu - 2012-05-08 15:34 (UTC) - Expand

(no subject)

[staff profile] mark - 2012-05-08 16:51 (UTC) - Expand

(no subject)

[personal profile] pauamma - 2012-05-09 16:05 (UTC) - Expand
allen: "Badass Dreamwidth Dev" on a green background (dwdev)

[personal profile] allen 2012-05-01 12:44 am (UTC)(link)
I do like the idea of moving to a hosted repository, if only so it would be easy to fork for larger coding projects. Would be nice to be able to point to somewhere public to test out an in-progress feature. For that matter, it would be nice to be able to make a shared fork so that more than one developer could easily work on a single feature/bugfix.

I've not really used git, so I can't comment on the relative advantages/disadvantages of it versus Mercurial.

I will say that while git and GitHub are very popular right now, Mercurial is better known among our most important constituency (our current devs).
fu: Close-up of Fu, bringing a scoop of water to her mouth (Default)

[personal profile] fu 2012-05-08 07:53 am (UTC)(link)
Agree entirely on the first paragraph. I want those, and think that the long-term benefits will outweigh the pain of moving.

Re: devs, it's not quite so cut and dried. Our devs are split between knowing cvsreport.pl -d (which has its own sets of problems) and Mercurial. On the other hand, the ones who actually use Mercurial are also more often the ones that are more active as devs.
foxfirefey: A wee rat holds a paw to its mouth. Oh, the shock! (myword)

[personal profile] foxfirefey 2012-05-08 04:27 pm (UTC)(link)
Since cvsreport.pl -d is in the process going away anyway, those folks will have to be changing no matter what we go with!
vlion: cut of the flammarion woodcut, colored (Default)

[personal profile] vlion 2012-05-01 01:50 am (UTC)(link)
As I noted before, I work full-time as a version control support guy. I, am, in fact, an expert... or serve as one. Feel free to laugh. I don't quite believe it myself. Most of that is foxfirefey's fault for getting me into hg years ago.

I actively use github, bitbucket, and will dork with other systems from time to time, not to mention my daily job (supporting two mercurial installs for the company as well as misc. ClearCase work).

Github absolutely is the wave of F/OSS and that should not be discounted. That alone weights toward using it. Other considerations abide:

- git Windows support has always been awful for everyone I've dealt with (bar one chap who loves Git Extensions)

- git is a much sharper and two-edged tool. People gonna screw up. Let's not make that too easy.

- hg is what people know. This is a *big* deal, particularly when you are not a source control geek.

- hg gives its power out incrementally.

- bitbucket is distinctly a second-class site compared to github.

---

My thoughts on (1)

I do *not* believe there are technical reasons to choose git over hg over git aside from storage space is slightly better in git. Either will work, modulo fiddling and fussing.

And for (2); It's very nice seeing things simply THERE on a webapp page. I believe it does enhance existing devs and help provide a nice public face. There might be very good reasons not to use a shared hosting environment, but I can't think of any for an open source project.

(3) Github is the best right now, and IMO, it is likely to stay that way. This should be the last item perhaps in the decision chain, for reasons I give below..


---


Reviewing dw-free's repo, I note it's very SVN-looking.

I would suggest that DW devs think about using a branching sort of model. Right now it 'looks' as if patches are sent in, which does not leverage the power of hg as much as it might.

I am a big proponent of branchy development. Here's why. Say you are working away for a month on your su-weet patch. Patch is rocking, yo. You have squashed it and mq'd it and it is sitting there ready to be sent up. Night before you send it out, your hard drive dies. And you ain't got backups. You lost yo' work! Bad news. You go home and have a sad.

On the third hand, in a branchtastic model, you make your branch (aka fork on github basically), and you can send your commits up every night. Then, at the end, you can tell the project maintainer, "BRANCH READY MERGE IN". It's more robust. If you have a bad case of real life, people can see what's going on and what version of the code its tied to. Or it will survive machine failure. The flip side is that your history is Out There to get embarrassed by. Well... yeah. :-/


---

Deborah is right in that tools need to come second. First, what problems are there? Second, how do you solve them? Third, what tools support them? Fourth, what tools support the tools?

fu: Close-up of Fu, bringing a scoop of water to her mouth (Default)

[personal profile] fu 2012-05-03 10:45 am (UTC)(link)
I would like to switch to a more branch-y model. You're right we haven't been using mercurial as a DVCS, or at least we haven't as much as we could. And we really should!

The flip side of embarrassing history out there is that embarrassing dangerous code in a small chunk where the chunks are distributed out over several weeks is easier to notice and correct than embarrassing dangerous code hidden in one big chunk that's just come out. Also, I... man hopefully if I make a mistake, and correct it, that makes it easier for someone newer to the project to not worry so much about their own mistakes, because mistakes happen, even among senior devs.



I'd like to pick your brain about branching if you don't mind. Do you have any experience with mercurial branching? One big reason I'm gravitating toward Git (personal opinion right now, not to be construed as DW's official direction) is that we've had a lot of trouble getting the Mercurial branching to work for relatively long-lived, but not permanent, feature branches for large projects which will eventually get merged back into the main branch.

The most common suggestion I've seen has been to clone a new repository and do all the work there, but it's not easy to switch which repository's code will be used by the web server. I can see how the cloned repo thing would work for, say, a command line tool but I'm having a harder time figuring out how to apply it to DW code/environment. (The pending death of cvsreport.pl *may* make this point moot? But I think it won't)


vlion: cut of the flammarion woodcut, colored (Default)

[personal profile] vlion 2012-05-03 03:11 pm (UTC)(link)
Yes, we branch very heavily at my job.

Clone branches are, imo, mega-sketchy due to a variety of issues. hg bookmarks are very similar to git branches.

Let's find each other on im and chat about it.

vlion.geek@aim, online evenings pacific time usa

(no subject)

[personal profile] pauamma - 2012-05-03 15:41 (UTC) - Expand

(no subject)

[personal profile] vlion - 2012-05-03 15:51 (UTC) - Expand

(no subject)

[personal profile] pauamma - 2012-05-03 15:52 (UTC) - Expand

(no subject)

[personal profile] vlion - 2012-05-03 16:03 (UTC) - Expand

(no subject)

[personal profile] pauamma - 2012-05-03 17:15 (UTC) - Expand

(no subject)

[personal profile] vlion - 2012-05-03 17:38 (UTC) - Expand

(no subject)

[personal profile] pauamma - 2012-05-03 18:06 (UTC) - Expand

(no subject)

[personal profile] fu - 2012-05-08 07:57 (UTC) - Expand

(no subject)

[personal profile] pauamma - 2012-05-08 13:41 (UTC) - Expand

(no subject)

[staff profile] mark - 2012-05-06 19:47 (UTC) - Expand

(no subject)

[personal profile] pauamma - 2012-05-06 22:46 (UTC) - Expand

(no subject)

[staff profile] mark - 2012-05-06 22:55 (UTC) - Expand
foxfirefey: Fox stealing an egg. (mischief)

[personal profile] foxfirefey 2012-05-03 09:03 pm (UTC)(link)
Ironic side note: I told you about Mercurial because I learned about it working on Dreamwidth. FULL CIRCLE!
fu: Close-up of Fu, bringing a scoop of water to her mouth (Default)

[personal profile] fu 2012-05-04 07:37 am (UTC)(link)
One thing I'd like to do is figure out a workflow that would make it easy to:

1) branch off big features like https://bitbucket.org/anall/dw-free-work/changesets, and let someone work on them separate from the flow of day-to-day commits
2) merge those big features back into the main repo (branch?)
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2012-05-06 07:57 pm (UTC)(link)
That link you gave illustrates one of the most annoying things about hg for me: merge, merge, merge! It's one of the real powers of git that it can do rebasing, which makes for a much, much cleaner thing.

So -- let's say someone is doing lots of development in their branch foobar and later they want to bring in changes from master... well, in hg and such, you can merge master in to foobar and that's great, but then you have this commit showing that you merged changes in, and it's kind of messy.

Let's imagine commits like this:

master: A, B, C.

At that point you create your branch and start doing some work:

foobar: A, B, C, D, E.

But meanwhile, people are still committing to master:

master: A, B, C, F.

Now you want to merge that change, F, into your foobar branch. In hg and similar systems, you can do it by creating commit G so it looks like this:

foobar: A, B, C, D, E, G.

But in reality, G doesn't contain any of your code. You end up with this branch showing that you made some changes, then you merged in a bunch of stuff. Diffs can get annoying now.

With git, you can instead do a rebase. In essence, this takes changes that you have made and puts them on top of the other changes. This ends up with your branch looking like this:

foobar: A, B, C, D, E, F.

Now you have a perfect repository with a perfect history with no messy merge commits. It's beautiful, easy to read, and exactly what you want. At the end of the day, nobody cares that you have worked on this for a month and merged things back and forth -- that's messy. What we care about is that you took the repository and you made your commit on top of it.

The interactive rebase even lets you do lots of ugly checkpoint commits that let you say "blah blah, typo" and not worry about it. Then before you submit your pull request, you do a rebase and squash all of your checkpoints into the one final commit with a great commit message.

You then submit this in the pull request and instead of seeing 45 different commits you made over the last month, instead you submit one commit that has the changes for your feature. It's easier to review, easier to go back and read when you look a the repository, and generally just a win for clarity.
pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)

[personal profile] pauamma 2012-05-06 10:57 pm (UTC)(link)
That's what MQ is for.

(no subject)

[staff profile] mark - 2012-05-06 23:06 (UTC) - Expand

(no subject)

[personal profile] pauamma - 2012-05-07 00:02 (UTC) - Expand

(no subject)

[personal profile] kareila - 2012-05-07 02:48 (UTC) - Expand

(no subject)

[staff profile] mark - 2012-05-08 16:53 (UTC) - Expand
fu: Close-up of Fu, bringing a scoop of water to her mouth (Default)

[personal profile] fu 2012-05-08 07:59 am (UTC)(link)
Trying to figure this out. When you squash, is the commit history still in your branch, just not in the master, or does it all go away?

(no subject)

[staff profile] mark - 2012-05-08 08:29 (UTC) - Expand
crschmidt: (Default)

[personal profile] crschmidt 2012-05-05 03:25 pm (UTC)(link)
I know that I'm reiterating some things that others have already said, but I do want to say that although I am personally opposed to Github for a number of reasons (among them the fact that it is a proprietary itself), I do feel strongly that GitHub is a huge win for open source development projects.

Note that I think that Git is a *bad* thing in general for increasing developer involvement: it is full of sharp edges that are confusing at best, and actively harmful at worst. (hg is better in this regard, in my personal experience, but I've used git more, so it has more chances to shoot me in the foot.)

However, Github:
- Hides much of the pain. By having UI-level options for forking, the distributed model is not just encouraged, it's de facto; many people use github effectively without having any idea on how git really works or is meant to work.
- Is the clear winner on 'network effect' -- huge swaths of the open source community already have GitHub accounts, which removes obstacle #1 to contributing development to any code base.

Despite being against Git, and GitHub, I helped a project transition from Subversion to GitHub in September of last year. Since then, we have seen a huge number of new developers participating in the development of the project; the project in question is now reported on Github as having more than 140 'forks' where people have made changes to the code, and that's excluding the hundred more or so pull requests that have been integrated into trunk and are no longer forks.

Git is painful. GitHub hides much of the pain, tries to do outreach to solve the remaining pieces, and makes many parts of interacting with an open source project trivially simple in a way that nothing else does. As good as the other tools may be, I haven't seen any evidence that other tools compete when it comes to the massive success that Github has had in getting open source development opened up to a massively larger number of people.
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2012-05-06 07:49 pm (UTC)(link)
Yeah, I totally agree that git is more painful than hg and other options. For me though, the real wins from git come because it has that power and flexibility. I love and abuse rebase and it's really something amazing that I can't live without anymore, and to my knowledge, hg doesn't come close to giving you that power.

That said, I think that git is ultimately going to disappear underneath management tools that people use and people will hopefully not use git itself. It's a nightmare of usability, so I don't want us to be using git directly much -- git-flow + github is the winning combination for me, not git itself.