Some time ago I wrote about my assessing of Using Git versus using SVN in the company that I work for. It’s been about an year this month since we’ve successfully adopted and moved all the company’s source code to Git and GitHub in particular. I wanted to talk about the process and the issues we had in doing this. Before I begin I must say that I and probably the whole team already are pretty convinced that moving to DVCS and Git is one of the best things that we did. I helped us so much with our feature development that the problems we first had with it were quickly forgotten.
In my previous post that I mentioned above I’ve laid out the problems we had. Now I’ll explain how we work now – one year later.
The company I work for is product company with a couple of web projects(products). Our development team is about 7-8 people in total(including me) and each person is responsible for the development of 1 main feature at a time. He’s also maybe supporting other feature’s he’s built that are live or not actively developing at the moment.
For this way of development we have one main repository where we keep most of our code. The repository is a single one for all our projects(not one for each project) due to the fact that we have a couple of libraries that are shared between all the projects and the most painless scheme to support for developing/sharing the libraries between the projects is all the projects being in one repository.
That said, the way we use the repository that makes us super flexible is by having one branch for each of the features we’re developing. That gives us the flexibility that when we work on a feature, let’s call it Feature-X, we’re able to develop it on Branch-X by the developer in charge. When someone else(a colleague or a manager) needs to see what it’s going with Feature-X he could just switch the branches, load the project in Visual Studio and hit F5. There is always one ‘master’ branch where the ‘live’ version is and that give us the ability to quickly fix a bug and deploy to production without anyone having to do any extra work. It’s awesome. And super flexible.
Anyway I wanted to talk about the challenges we had when integrating Git in our work process.
There is one particular challenge with core .NET developers – they don’t trust technology that is not coming from Microsoft. And Git is not a Microsoft technology. It something that is open-source and “free” and not-to-be-trusted. For example there is no good Visual tool for working with Git, there is no good integration with Visual Studio(like a plugin), and the official GitHub For Windows app is very nice but sadly still pretty limited in its capabilities. So the best way to use and learn Git is to work with the console. Which my teammates felt like a step back rather than a step forward in our tooling.
Frankly there are a number of nice open source Git GUI applications out there the best in my opinion is Git Extensions. But there is a big challenge with these applications. If you come from an SVN or TFS experience background and you don’t know what’s going on with Git - you could get VERY frustrated with the visual applications. They give you all the power of all the Git commands at your fingertips but when you don’t know what the commands do… well - it becomes a mess, trust me. So the best way to learn git is by using it through a console. That way - when you need something done, you'll look up the command you need, learn what it does and then use it. And that way you’ll know Git a little bit better.
Well, to be frank - there are couple of things that are common everyday work which is Commit, Push to server, Pull from server and Switch Branches. The GitHub For Windows app that I mentioned above does these three things very well so I use a combination of that GitHub app for the basic operation along with command line for all the advanced merging, rebasing, cherry-pick etc.
The Learning Curve
Talking about merging, rebasing, cherry-pick… Well GIT is not like SVN… Git is really simple but if you don’t ‘get’ it – it become really hard. There are a lot of commands and a lot of stuff you need to understand if you don’t want to bang your head against a wall. It’ll be a lot easier if there is a person in your team(or in your company) who could explain the theory behind DVCS and graph representation of commits and branches and what is Git doing behind the scenes with those hash keys on each commit… well there’s a lot of stuff. But it’s a lot simpler than what it sounds like.
If you’re want to learn more about Git there are a couple of materials that I found are very good.
Ohhh… the line endings. This is one of the most frustrating things in git – dealing with the different line endings for the files in your repository. The problem here is that Unix, Mac OS and Linux operating systems use LF for denoting the end of a line, while Windows uses CLRF. That results in very frustrating situations in Git where you haven’t changed a file but nevertheless git picks it up as a changed file and wants you to commit it.
To fix the problem with line endings there is a simple fix that you must do before you make a repository and that is to setup a .gitattributes file in your repository with the proper configuration.
Here’s the .gitattributes of one of my projects:
The paid version
I have to note also that there are a couple of companies that offer premium DVCS for the Enterprise and help you deal with all the challenges that I talk about in here, but they come at a price – $25-$30 for person/per month. That is not so much, considering the salaries and overall expenses that a company has for a single developer, but… there are different companies and for some it’s acceptable and for some it’s not. Maybe for more developers this price adds up or some companies just don’t want to have to depend on such a company for the core of the business – the source code. But it’s good to know that there is this kind of premium solutions on the market.
As a conclusion I have to say that yes – transferring to a DVCS is not an easy thing. It takes time and effort to get it right. But it’s TOTALY worth it. It gives you the freedom to work as you like on the features you’re developing – use branching and merging without any fear and not have to conform with the tools you use(and you don’t need to shout in the room – “Everyone – no more commits because I’m merging ”).
I’ve gotten used to the easiness of work that much, that actually cannot imagine working in an non-DVCS environment so I strongly recommend it to every company out there. And even if your company doesn't want to transfer to a DVCS source control – go ahead and make an account at GitHub or BitBucket for your personal project. It’s a good experience and you never know when you’re going to need it or how it’s going to change your life.
A while ago I wrote about how to import an existing SVN repository to GitHub. The approach I used there describes how to import a folder with its full revision history. That however may not be needed or wanted in a lot of scenarios(for example - the SVN repository being too old). In that case we could do a partial import – import the changes made in the last year or a couple of months or the last “x” revisions of the SVN.
In that case we must do a couple of things:
- We must decide from which revision number on we want to import the history. You could check the revision numbers and dates in the “Show Log” option in TortoiseSVN or a similar one depending on the SVN software you use. We’ll call the revision number BeginNum.
- We must find the latest revision number. We’ll call it EndNum.
- We must change the “svn clone” command we used in my previous post adding a “–r “ option:
git svn clone –r BeginNum:EndNum –A <path-to-users.txt> <SVN-path-to-clone>
And that is it! Git will now import all the revisions beginning from revision number BeginNum to revision number EndNum without the ones before BeginNum.
A few days ago I blogged about Why bigger teams need Git/Mercurial. Now I’ll be showing how to import a simple SVN folder to a repository in GitHub.
Disclaimer: The method I’m describing here is the very basic way of importing a normal SVN folder to a GitHub repository. There may be more advanced configurations(if you used branching and tagging) of importing your SVN folder to Git which require different “git svn” command parameters. You can find more information about the different other options here.
The logical steps of the import are:
- Make a users.txt file with all the users that committed to the SVN folder.
- Clone the SVN repository to a local folder with “git svn”.
- Create a GitHub repository.
- Make the GitHub repository a “remote” to the cloned SVN repository.
- “Pull” the changes from the GitHub repository locally.
- “Push” the local repository with the merged changes and SVN history to GitHub.
We’ll now look at the specifics of each step.
For the import we’ll be needing a text file with a list of the users who contributed – who made the commits in the folder in SVN. The list should have the information for the username in SVN, real name of the person and the email address of the person. It should look like that:
After that we open a PowerShell in an empty folder and clone the SVN repository with the following command:
git svn clone –-no-metadata –A c:\path\to\users.txt svn://url-to-svn-server/path/to/folder
The command should look similar to this on your screen, given that you change the paths to the SVN and local users.txt file:
Then we can open the newly created folder with
After that we must make an GitHub repository. I’ll call the one I’ll be using “Git-SVN-Clone”. Then via the following command in PowerShell we can make a “remote” to the newly created GitHub repository so we would be able to push the current local repository with its history from SVN to GitHub. We’ll add the “remote” with the name of “server”:
git remote add server https://github.com/asapostolov/Git-SVN-Clone.git
In my case the remote repository url is “https://github.com/asapostolov/Git-SVN-Clone.git” but you can find yours in the page of you repository in GitHub:
After we’ve added a remote named “server” we must pull(pull is “fetch and merge”) the remote’s changes(which in our case is the initial commit for creating the repository on the server) to the local folder.
git pull server master
And then we must push the local changes to the server with:
git push --set-upstream server master
And that’s all. We now have a GitHub repository with all the history of SVN commits in it. In just a few lines of commands.
An issue came up, a while ago, in the company I work for. The issue was that the development team was expanding. It was expanding because some of the projects got in later and more mature stages of development, which is a natural thing.
The expansion of the teams led to a interesting thing however – using SVN was becoming a pain in the ass. What do I mean by that?
Before the expansion - every project in the company had one developer who was working on it. That meant a linear development of the projects – if someone finishes a functionality, bug or feature, the project easily becomes ready-to-publish. With the introduction of more than one developer on the project - the company was able to build more than one feature per project at the same time. That process however broke the linear progression of the projects. When one person finished their feature – the second person could be half way through his, so if you need to publish the new feature you either have to wait for everyone to end their work, or hide the unfinished functionality.
I want to clarify that the way the company is using SVN is to have a folder for each project and commit your changes to that folder. A feature is a series of commits. It’s pretty standard actually and I think a lot of people use SVN that way.
In this context I began searching for solutions. And I actually found several solutions. The more interesting are:
- Introduction of branches in SVN – when you need to make a big feature – you make a branch and do the feature. Then you merge with the main branch. After a bit of research on the matter however I found that this is not such a good option. Why? Well when merging in SVN, SVN tries to clash the two versions of the file you’re merging and combine them in one. So if someone has changed that file a little bit and you changed it – you get a merge conflict. Now imagine a three or four weeks of work of one man trying to clash in three or four weeks of work of another. Disaster. This leads to the second option.
- Working with no commits. I mean you update frequently and you commit when you’re ready with your feature. However – this means no one would be able to see your progress and no one will be able to help you with the feature. Also if you do something wrong at the end of the feature – you won’t be able to ‘revert’ your mistake. And it feels wrong too.
- Use a Distributed source control system(Git or Mercurial) instead of Subversion. I knew the distributed source control systems were better, but didn’t know why actually. So I began researching on the subject and found some great articles like the one from Joel Spolsky and this question in stackexchange, and a few others.
The big difference between SVN and Git\Mercurial and the solution of our problems was in the way the two source control systems worked. SVN tracks the VERSIONS of files and when merging - tries to unite the two versions you’re merging into one. Distributed source control systems like Git and Mercurial track the CHANGES in your files and when they begin merging they try to apply and combine every change you’ve made in your file with the changes the other people made to the file. That means that in theory – if you move a method and someone else changes the method contents without moving it – the source control should be able to figure that out and give you the result – the method moved and changed – without a conflict. That makes the merging of branches possible with very few conflicts. Nice. And that enables a lot of people working on the same features and changing the same files with a little effort and less pain.
It seems using a distributed source control system means you can SCALE your development very efficiently. Very nice.
I think there may be a fourth solution to the problem – trying to implement a part of Scrum agile methodology – the Sprint. We set a target for a sprint and make changes(commits) throughout that sprint and in the end of the sprint we have a stable version of the system which we publish. However that requires significant change in the processes of the company – of how we work, communicate, estimate and plan etc. It’s a lot more work and a lot more risk associated with this option.
After all we chose to use a distributed source control system – Git and particularly GitHub. It seems to have so much less pain associated with it compared to SVN.
However there is still the price of introducing it to the whole team, importing the source code from SVN and learning how to use Git PROPERLY. After all the source control is just a tool in the toolbox of a developer. And like every tool – if you don’t use it right it could give you a lot of headaches. But I think the whole effort will be well worth it at the end.