Git, Mercurial, github, bitbucket
I want to spin off a new post from the log of last night's IRC developer meeting. The topic of GitHub came up in the meeting, and some concerns with that idea have been raised in the comments of the previous post.
vlion's concerns largely address the difference between mercurial and git, whereas
karelia's concerns also address that difference but touch incidentally on the hypothetical benefit of working in the more public environment of Github.
I was talking to
allen and he pointed out that there are really two different issues in play here, because we can go to a shared, public, relatively popular, FLOSS-friendly environment without ever leaving mercurial, namely, Bitbucket.
I'd actually say there are three questions:
Actually, we should probably add a fourth question, which is "would any of our needs be better served by using mercurial more in the fashion for which it was intended?"
Keep in mind when I write these questions that I use github for other projects and like it,and I have never used mercurial intensely enough to have strong feelings about it either way. Personally I fell in love with Perforce at an early date and find all other VCS systems to be it pale yet free imitations. But I do think that if we make a switch like this, these are the questions we need to answer.
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
![[profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I was talking to
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
I'd actually say there are three questions:
- Are there benefits to git over mercurial, and if so, are those benefits enough to outweigh the cost of switching to a new source control system?
- Would we like to move our source control management to a public, shared, FLOSS-friendly environment? If so, why? Do we think it would be more friendly to our current developers, do we think it would make it easier to bring in new developers, some combination of the two, or something else?
- If we want to move to a shared environment, do we feel that there is a strong reason that it should be Github? What are those reasons, if so? If we think git is worse than mercurial, but we do think there's a benefit to moving to Github, which reason should prevail?
Actually, we should probably add a fourth question, which is "would any of our needs be better served by using mercurial more in the fashion for which it was intended?"
Keep in mind when I write these questions that I use github for other projects and like it,and I have never used mercurial intensely enough to have strong feelings about it either way. Personally I fell in love with Perforce at an early date and find all other VCS systems to be it pale yet free imitations. But I do think that if we make a switch like this, these are the questions we need to answer.
no subject
Other than that, what are you and allen doing in my brain? :-) (Time of the first meeting made it impossible for me to attend, but I definitely intend to revisit those questions at the 2nd meeting.)
no subject
no subject
no subject
no subject
What are our problems, what are our goals?
Then, what tools would serve those needs?
(I have no problem with saying that the features of the available tools should be helping to drive our problem statements. 10 years ago I never would have said "public repository that makes it really easy to share code with other people in open source" is a feature, because while there was sourceForge, sourceForge never made any of that easy.)
(no subject)
no subject
I think the issue of using branches in Mercurial ties nicely into your question four.
no subject
Wait, I just realised I have another point. The Perl community at large uses github extensively, so cross-fertilisation with them would be a plus (from both directions: our developers would be better equipped to submit patches to other Perl projects, and external Perl developers might be drawn into DW development). Similar, the OTW uses github for the AO3, so skills gained on DW would be transferrable to that project and vice versa, which can only be a good thing for both projects.
no subject
no subject
Re: moving to a hosted solution (whether that is Bitbucket, GitHub, or Launchpad):
* More open and community oriented. GitHub and similar tools have really helped push contributions up and lowered the barriers to entry. You can look at things online without having to clone the repository somewhere.
* The online UI is much easier to use and look at than command line. Particularly for people who aren't quite as knowledgable about the tools. A web page gives people a lot of help in understanding what is going on and how to proceed, whereas a command line gives you no real help and lets you do terrible things to your repository.
* I don't really want to be running all this. Managing the commit system is kind of annoying and it needs some work and I haven't really put the time into it. I'd rather not have it hosted by some individual when there are great options out there.
Primary reasons for GitHub, for me:
* I feel that GitHub has become the market winner in easy to use, approachable, well documented online code repository systems. This means that people who become familiar with it also gain skills that are broadly useful in other projects. I.e., learning to do Dreamwidth development means you can easily drop in on any project that uses GitHub and start doing development. This is in keeping with our training ethos.
* The available GUI tools for interacting with git repositories and GitHub are pretty good. There are even iOS applications (and probably Android ones too) that help you remotely use GitHub. It's very convenient and easy. Git Tower, GitHub's application, gitx, etc etc. People are really building great tools for git since it's becoming so popular.
* The company behind GitHub spends a lot of time doing education. They have tutorial videos, documentation, and have optimized their site for ease of use. It's very approachable by newbies. They really emphasize this and they do a good job at it.
* git itself is an extremely powerful system. I've been using it in open source and professionally for years now and I think it has won compared to hg and others.
Now as to GitHub versus BitBucket versus whatever:
* Honestly, I think that this comes down to personal preference and experience. Every platform has some advantages and some disadvantages. In this case, though, it's clear that GitHub has become the market winner -- so it's in our best interests to go there.
* As mentioned above, many Perl projects have gone in the direction of git. This knowledge and familiarity with the ecosystem will help our developers become productive on other projects.
Finally, the largest argument I've heard against git is that some of our volunteers have gotten used to using Mercurial Queues. IMO, git rebase --interactive gives you 90% of that functionality and should be a good replacement for most people who need that power.
And if not -- there's nothing saying you can't use hg locally with mq and then just generate a diff file to upload. You can still use the tool locally if you want to.
no subject
- For a developer (or a tester, or a reviewer), how much of the workflow (synchronizing/merging in changes, preparing a patch or a series of patches, pushing/uploading those patches) will actually use the web-based UI you mention, and how much will still use the command line from the developer/tester/reviewer's dreamhack?
- If (as it sounds from what you're saying), you're planning to deprecate the "upload patch to Bugzilla" step of the workflow, what replaces it, and how does that let a reviewer's see which patches are waiting for a review and attach said review to the patches?
no subject
1) You make your changes in your local repo on your hack or whatever environment you use. You use a separate branch for each bug/feature you're working on.
2) When you're ready to submit, you push your changes up to your forked repository on GitHub. This is done with git push with some arguments and is really easy.
3) On GitHub's web site, you submit a pull request to the main Dreamwidth repository.
4) Admins can then review your pull request and request changes and/or merge it in to the base repo.
5) You close out your branches. Or keep them open if you want to do more work on this feature/bug -- you can submit another pull request later!
For more information on pull requests, and why they're so awesome:
http://help.github.com/send-pull-requests/
They really are amazing. You can upload code, discuss it, upload more code, preview things, bounce it back and forth, and eventually merge it in when it's ready to go. It's brilliant.
I wrote up a thing on how to use git flow with GitHub here:
http://qq.is/article/git-flow-on-github
It's pretty easy. There is no "create a patch" and upload files. You make your changes in a branch, commit them, and create a pull request. It's pretty easy.
no subject
uploaded patches are waiting for reviewpull requests have been submitted but not yet merged?(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
no subject
I've not really used git, so I can't comment on the relative advantages/disadvantages of it versus Mercurial.
I will say that while git and GitHub are very popular right now, Mercurial is better known among our most important constituency (our current devs).
no subject
Re: devs, it's not quite so cut and dried. Our devs are split between knowing cvsreport.pl -d (which has its own sets of problems) and Mercurial. On the other hand, the ones who actually use Mercurial are also more often the ones that are more active as devs.
no subject
no subject
I actively use github, bitbucket, and will dork with other systems from time to time, not to mention my daily job (supporting two mercurial installs for the company as well as misc. ClearCase work).
Github absolutely is the wave of F/OSS and that should not be discounted. That alone weights toward using it. Other considerations abide:
- git Windows support has always been awful for everyone I've dealt with (bar one chap who loves Git Extensions)
- git is a much sharper and two-edged tool. People gonna screw up. Let's not make that too easy.
- hg is what people know. This is a *big* deal, particularly when you are not a source control geek.
- hg gives its power out incrementally.
- bitbucket is distinctly a second-class site compared to github.
---
My thoughts on (1)
I do *not* believe there are technical reasons to choose git over hg over git aside from storage space is slightly better in git. Either will work, modulo fiddling and fussing.
And for (2); It's very nice seeing things simply THERE on a webapp page. I believe it does enhance existing devs and help provide a nice public face. There might be very good reasons not to use a shared hosting environment, but I can't think of any for an open source project.
(3) Github is the best right now, and IMO, it is likely to stay that way. This should be the last item perhaps in the decision chain, for reasons I give below..
---
Reviewing dw-free's repo, I note it's very SVN-looking.
I would suggest that DW devs think about using a branching sort of model. Right now it 'looks' as if patches are sent in, which does not leverage the power of hg as much as it might.
I am a big proponent of branchy development. Here's why. Say you are working away for a month on your su-weet patch. Patch is rocking, yo. You have squashed it and mq'd it and it is sitting there ready to be sent up. Night before you send it out, your hard drive dies. And you ain't got backups. You lost yo' work! Bad news. You go home and have a sad.
On the third hand, in a branchtastic model, you make your branch (aka fork on github basically), and you can send your commits up every night. Then, at the end, you can tell the project maintainer, "BRANCH READY MERGE IN". It's more robust. If you have a bad case of real life, people can see what's going on and what version of the code its tied to. Or it will survive machine failure. The flip side is that your history is Out There to get embarrassed by. Well... yeah. :-/
---
Deborah is right in that tools need to come second. First, what problems are there? Second, how do you solve them? Third, what tools support them? Fourth, what tools support the tools?
no subject
The flip side of embarrassing history out there is that embarrassing dangerous code in a small chunk where the chunks are distributed out over several weeks is easier to notice and correct than embarrassing dangerous code hidden in one big chunk that's just come out. Also, I... man hopefully if I make a mistake, and correct it, that makes it easier for someone newer to the project to not worry so much about their own mistakes, because mistakes happen, even among senior devs.
I'd like to pick your brain about branching if you don't mind. Do you have any experience with mercurial branching? One big reason I'm gravitating toward Git (personal opinion right now, not to be construed as DW's official direction) is that we've had a lot of trouble getting the Mercurial branching to work for relatively long-lived, but not permanent, feature branches for large projects which will eventually get merged back into the main branch.
The most common suggestion I've seen has been to clone a new repository and do all the work there, but it's not easy to switch which repository's code will be used by the web server. I can see how the cloned repo thing would work for, say, a command line tool but I'm having a harder time figuring out how to apply it to DW code/environment. (The pending death of cvsreport.pl *may* make this point moot? But I think it won't)
no subject
Clone branches are, imo, mega-sketchy due to a variety of issues. hg bookmarks are very similar to git branches.
Let's find each other on im and chat about it.
vlion.geek@aim, online evenings pacific time usa
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
no subject
no subject
1) branch off big features like https://bitbucket.org/anall/dw-free-work/changesets, and let someone work on them separate from the flow of day-to-day commits
2) merge those big features back into the main repo (branch?)
no subject
So -- let's say someone is doing lots of development in their branch foobar and later they want to bring in changes from master... well, in hg and such, you can merge master in to foobar and that's great, but then you have this commit showing that you merged changes in, and it's kind of messy.
Let's imagine commits like this:
master: A, B, C.
At that point you create your branch and start doing some work:
foobar: A, B, C, D, E.
But meanwhile, people are still committing to master:
master: A, B, C, F.
Now you want to merge that change, F, into your foobar branch. In hg and similar systems, you can do it by creating commit G so it looks like this:
foobar: A, B, C, D, E, G.
But in reality, G doesn't contain any of your code. You end up with this branch showing that you made some changes, then you merged in a bunch of stuff. Diffs can get annoying now.
With git, you can instead do a rebase. In essence, this takes changes that you have made and puts them on top of the other changes. This ends up with your branch looking like this:
foobar: A, B, C, D, E, F.
Now you have a perfect repository with a perfect history with no messy merge commits. It's beautiful, easy to read, and exactly what you want. At the end of the day, nobody cares that you have worked on this for a month and merged things back and forth -- that's messy. What we care about is that you took the repository and you made your commit on top of it.
The interactive rebase even lets you do lots of ugly checkpoint commits that let you say "blah blah, typo" and not worry about it. Then before you submit your pull request, you do a rebase and squash all of your checkpoints into the one final commit with a great commit message.
You then submit this in the pull request and instead of seeing 45 different commits you made over the last month, instead you submit one commit that has the changes for your feature. It's easier to review, easier to go back and read when you look a the repository, and generally just a win for clarity.
no subject
(no subject)
(no subject)
(no subject)
(no subject)
no subject
(no subject)
no subject
Note that I think that Git is a *bad* thing in general for increasing developer involvement: it is full of sharp edges that are confusing at best, and actively harmful at worst. (hg is better in this regard, in my personal experience, but I've used git more, so it has more chances to shoot me in the foot.)
However, Github:
- Hides much of the pain. By having UI-level options for forking, the distributed model is not just encouraged, it's de facto; many people use github effectively without having any idea on how git really works or is meant to work.
- Is the clear winner on 'network effect' -- huge swaths of the open source community already have GitHub accounts, which removes obstacle #1 to contributing development to any code base.
Despite being against Git, and GitHub, I helped a project transition from Subversion to GitHub in September of last year. Since then, we have seen a huge number of new developers participating in the development of the project; the project in question is now reported on Github as having more than 140 'forks' where people have made changes to the code, and that's excluding the hundred more or so pull requests that have been integrated into trunk and are no longer forks.
Git is painful. GitHub hides much of the pain, tries to do outreach to solve the remaining pieces, and makes many parts of interacting with an open source project trivially simple in a way that nothing else does. As good as the other tools may be, I haven't seen any evidence that other tools compete when it comes to the massive success that Github has had in getting open source development opened up to a massively larger number of people.
no subject
That said, I think that git is ultimately going to disappear underneath management tools that people use and people will hopefully not use git itself. It's a nightmare of usability, so I don't want us to be using git directly much -- git-flow + github is the winning combination for me, not git itself.