KARPACH

WEB DEVELOPER BLOG

How to migrate from subversion to git with almost no down time?

Svn to git migration

Last year I was in charge of SVN to Git migration at the company where I work for. We wanted to migrate the history as well. In our case, there were about 40,000 revisions made during the last 8 years. In order to minimize developers’ downtime I did a lot of scripting preparation ahead of time. The actual switch from SVN to Git took less than 2 hours. Here are the steps that we took.

1. Retrieve a list of all committers

You’ll need to create a list of users that have committed to the SVN repo and then convert those users over to the Git format as Subversion only supplies the username of the person committing and not the username and email. To retrieve the list of users from SVN, create a new folder, right-click and select Git Bash Here to open a Git command window. Run the following command:

svn log http://url/to/svn/repository -q | awk -F '|' '/^r/ {sub("^ ", "", $2); 
sub(" $", "", $2); 
print $2" = "$2" <"$2">"}' | sort -u > users.txt

Note: this will take a couple of minutes to complete based on the size of your repository, number of commits, and number of committers.

The text file will have separate lines for each committer and will need to be transformed from vkarpach = vkarpach <vkarpach> to vkarpach = Viktar Karpach <vkarpach@company.com>

2. Clone the repository using git-svn

Note - this step will take hours to complete, so it is suggested to run this step overnight on a dedicated box. Run the following command to convert the repository to a Git repository:

git svn clone --stdlayout --no-metadata -A users.txt http://url/to/svn/repository dest_dir-tmp

3. Make a copy of this folder.

git svn clone takes a lot of time. For our main project, it took 48 hours for about 18000 commits. Make a copy of this folder, so you don’t need to do it again. Create scripts for the next steps, when you are ready to switch you can do it quickly.

4. Fetch the latest commits.

The team continued to use Subversion until the very last moment, so while working on migration scripts from time to time I had to fetch the latest commits.

git svn fetch
git reset --hard trunk

5. Clean up the script.

Delete tags

for t in `git branch -r | grep 'tags/' | sed s_tags/__` ; do
    git tag $t tags/$t^
    git branch -d -r tags/$t
done

Delete trunk, since we will use master from now on.

git branch -d -r trunk

Remove SVN references

git config --remove-section svn-remote.svn
rm -rf .git/svn .git/{logs/,}refs/remotes/svn/

And finally, convert the remaining remote branches to local branches

git config remote.origin.url .
git config --add remote.origin.fetch +refs/remotes/*:refs/heads/*
git fetch

Remove remote branches:

for t in `git branch -r` ; do
    git branch -d -r $t
done

Git doesn’t support space in branch names, so git svn fetch replaced spaces with %20. I think it is more aesthetic to use underscore instead of %20:

for t in `git branch -a|grep '%20'` ; do
    newName=`echo $t | sed 's/%20/-/g'`
    git branch -m $t $newName
done

You might want to delete some unused branches:

for t in `cat ../list_of_branches_for_deletion.txt`; do 
    git branch -D $t
done

Where list_of_branches_for_deletion.txt contains branch names that will be deleted. Use the following code to populate these files:

git branch -a > ../list_of_branches_for_deletion.txt

Manually edit list_of_branches_for_deletion.txt file. Leave only those branches that you want to delete.

6. Replace any svn externals with git submodules

git submodule add ssh://git@git.company.com:7999/ProjectName/external_repo.git ExternalFolderName
git commit -m "Added submodules"

Only use git submodules for external projects that don’t change very often. We had to combine our internal projects in one git repository since it is hard to maintain submodules for rapidly changing projects. Each project gets its own directory in the git repository:

Before migration:

svn_main_project
	external_1
		external_1_folder_1
		external_1_folder_2
	external_2
		external_2_folder_1
		external_2_folder_2
	svn_main_project_folder_1
	svn_main_project_folder_2

Where svn_main_project has to externals external_1 and external_2.

After migration

git
    svn_main_project
        svn_main_project_folder_1
        svn_main_project_folder_2
    external_1
        external_1_folder_1
        external_1_folder_2
    external_2
        external_2_folder_1
        external_2_folder_2	

You can use following bash script to push everything in sub_folder, so later you can combine repositories. The script will modify commit history as well.

git filter-branch --index-filter \	
	'git ls-files -s | sed "s-\t\"*-&sub_folder/-" |
		GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
			git update-index --index-info &&
	 mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE" || true' HEAD

7. Get your repository onto the server

Create a repository on your git server.

Init local repository:

git init

Use the following if you are combining repositories:

git remote add external_1 ../external_1/
git pull external_1 master
git remote rm external_1

Add gitignore

cp ../gitignore.txt .gitignore
git add .
git commit -m "Added .gitignore"

Push all branches in one shot:

git remote add origin ssh://git@git.company.com:7999/repo.git
git push --all origin
Posted on January 27, 2014 by