ROSEdu Techblog

Application process for the Community Development Lab

ROSEdu — Mon, 23 Mar 2015 00:00:00 UT

Application process for the Community Development Lab

Published on March 23, 2015 by Alex Palcuie
Tagged: algorithms, ROSEdu, write-up, JSON

The Community and Development Lab is a traditional yearly ROSEdu project where we teach students how to start contributing to open source software. This year we had 117 applicants and had to select only 19 of them. To do this, we gave them an algorithms problem to filter 60 potential candidates which were invited to an interview.

In the Community and Development Lab students have the chance to participate on 9 weekly sessions during which they can learn real industry skills. Every week, there is a 2 hour presentation about different topics, like Linux, Git, Python, OOP, Raspberry Pi, and then for another 2 hours they stay with an assigned mentor and write patches for open source projects.

To select the best students, we gave the students to solve an ACM style algorithms problem on the Infoarena judge. They had to code their solution in C, C++ or Java, submit it online and the judge would run it over 10 tests, checking the output and measuring the time and memory of their implementation. Most of the students who tried to solve this problem were in their 1st or 2nd undergraduate years at Computer Science or Computer Engineering in Bucharest.

Problem Statement

You can read the Romanian version on Infoarena

Ada, Calin and Andrei got bored of learning algorithms at their University and want to learn more practical stuff. To do this, they have decided to apply at ROSEdu CDL. However, the organisers cannot separate the applicants, so they decided to give an algorithms problem for them to solve. Luckily, you don’t need lots of knowledge about time complexities.

You are given a JSON file that contains a list of objects. Every object contains a list of entries of key-value type, where the value can be a string or an integer. You have to transform it into a CSV.

Restrictions

Every JSON line will contain maximum 1,024 characters.
You have maximum 0.1s of time for every test on a dual core 2.93GHz
You have maximum 4,906 KB of memory for each test

Example input:

[{
"id": 1,
"language": "Ruby",
"usage": "Mainly by hipsters.",
"power": 4
}, {
"id": 2,
"language": "Python",
"usage": "Computer scientists and some wannabe hipsters.",
"power": 2
}, {
"id": 3,
"language": "C++",
"usage": "Hardcore people who love dangling pointers.",
"power": 100
}, {
"id": 4,
"language": "Haskell",
"usage": "A lonely dude in Massachussets.",
"power": 999999
}]

Example output:

id,language,usage,power,
1,Ruby,Mainly by hipsters.,4,
2,Python,Computer scientists and some wannabe hipsters.,2,
3,C++,Hardcore people who love dangling pointers.,100,
4,Haskell,A lonely dude in Massachusetts.,999999,

However, since we want to simulate a real life problem better, the JSON file won’t be formatted with the same whitespace. But, we guarantee it will be correct.

[ { "name": "Ruby on Rails", "commits": 49507, "contributors": 429,
"last commit" : "an hour ago" }, {"name": "jQuery", "commits":  5745,
"contributors" : 213, "last commit":  "4 days ago" }, {"name": "React",
"commits" : 3557,  "contributors": 288, "last commit": "5 hours ago"} ]

name,commits,contributors,last commit,
Ruby on Rails,49507,429,an hour ago,
jQuery,5745,213,4 days ago,
React,3557,288,5 hours ago,

We also guarantee that:

there are no nested objects
a string surrounded by quotes will only contain alphanumeric characters
every object has the same keys
the keys will be in the same order

Solutions

I originally thought the solution of this problem to be a finite-state machine. You have just a pointer, go through each character one by one, and you either decide to print it, print a comma or do nothing. You first do this to print the first row of the CSV with the columns, reset the pointer to the top of the file and traverse it again by printing the values. My solution is here and Ada Solcan helped me with a more beautiful version here.

For generating the tests, I hacked a Python script that generated a JSON. Three tests were special because they had random whitespace. One test was a corner case where there was only one object with lots of keys, and another one had lots of objects with a single key.

The problem gathered 2812 submissions from 158 students.

Statistics about the online submissions

I was expecting more people to fail the problem, but over 50% of the scores are perfect.

The average number of submissions for a perfect problem was 11. We had a participant who submitted 124 times.

The most failed test was number 8. What’s special about it is that it contains a single object, and most of the students assume that after the first object ending curly brace, they will have a comma.

A reason why the last 3 tests have the most failing submissions is that they are really big and most of the students preferred to use getchar for I/O. Doing that, you overwhelm the operating system with lots of calls. A better approach is to use a buffer.

Also, I’ve counted the time students spent solving the problem. The average time for any student was 6 hours and 15 minutes, and the average time for a student who successfully passed all the tests was 3 hours and 45 minutes. To estimate the time spent on solving the problem, I added an hour to the solving time for the first submissions and if the difference between 2 submissions was more than 4 hours, I also added one hour for his next submission. In total, students spent 40 days of time trying to get all the tests passed.

Hands-on interviews

After we eliminated the students who didn’t have 100 points at the problem and the students who didn’t complete anything on the “What project are you most proud of?” in their application form, only 60 students remained. We sent a call-to-action to the ROSEdu community and the mentors, and 8 people replied that they could help us with the interviews.

Each interview took 30 minutes. For the first part we asked them questions about the non-technical applications, then asked the applicant to talk about how he approached the problem and finally we asked him the dreaded technical questions.

Technical questions

We would first ask the student to present his solution. Then, we would start asking him what would happen if we modified the problem statement and he would start to have special characters inside the keys, like parentheses or colons. Then we would ask him to tell us how easily it would have been to modify the source code and support those edge cases.

From what I’ve observed, the shorter and cleaner the student’s solution was, the easier for him was to give us a version adapted to the new requirements.

After these warming questions, we would ask him the important one: what would happen if the keys of the objects were not in the same order. For example

[{
"id": 1,
"language": "Ruby",
"usage": "Mainly by hipsters.",
"power": 4
}, {
"usage": "Computer scientists and some wannabe hipsters.",
"language": "Python",
"id": 2,
"power": 2
}]

All the complexities are assuming a comparison of strings is done in O(1).

I took about 20 interviews. Almost every student would find the O(N^2) algorithm. The solution would be to print the keys of the first object, read the second object, and for every key search it naively in the first object.

From here, only half of the students would get a better solution alone. The first hint I gave was to try and see if having the keys in a certain order might help. Some of them caught the idea, sorted the keys and said that they would now use binary search to find the position of the keys in O(N*log(N)).

Then I might ask them if they know a data structure where you could do lookups faster than O(log(n)). Most of them knew about hashes and gave the correct complexity solution. However, when I asked them how does a hash work, they raised their shoulders and had no idea. I then explained them that a simple implementation of a hash is a long array with a smart hash function.

I must say that there were some smart students that knew what hashes were, how they worked behind the scenes and they applied them to this problem naturally without any help. For these students I asked them why would you sometimes prefer a binary search tree rather than a hash. The answer is that a BST uses lower memory. Another topic of discussion would be on how would you implement a hash function for this problem. Two students knew that using a base of 26 for the characters of the keys and then doing modulo of a big prime number would be a simple and elegant solution.

All in all, I was surprised by the lack of how students grasped the concept of a hash and applied it in the problem, but had some interesting discussions with some.

Acknowledgements

Ada Solcan, Calin Cruceru and Andrei Dinu for organising the whole CDL
Gabriel Ivănică, Alexandru Răzvan Căciulescu, Călin Cruceru, Mihai Brănescu, Nicu Bădescu, Vlad Fulgeanu, Dan Șerban, Iulian Radu and the Wyliodrin team for being mentors, teaching voluntarily the students accepted how to contribute to the open source world
infoarena because they let us host the problem

Here be Dragons - The Interesting Realm of Floating Point Operations

ROSEdu — Sun, 30 Mar 2014 00:00:00 UT

Here be Dragons - The Interesting Realm of Floating Point Operations

Published on March 30, 2014 by Mihai Maruseac
Tagged: floating point, numerical methods, approximate algorithms, fast transcedental functions, fast inverse square root

In every programmer’s life there comes a time when he has to leave the realm of integers and tread into the dangerous land of rational numbers. He/she might do some scientific computation, or work on a financial application or a game rendering pipeline or even in some artificial intelligence or data-mining algorithm – in all of these cases and many others, restricting oneself to using only integers is no longer feasible.

And, as soon as one starts using floating point a lot of interesting things happen, starting from results which don’t show up nicely and bad equality testing and going towards subtler and subtler bugs.

Even experts and common-sense is at fault in this realm. For example, did you know that always comparing two floating points like in the following code is bad?

if (fabs(a - b) < 0.0001)
    do_something_with_equal_numbers(a);

Without being a complete guide, this article shows some of the beauties and dangers of the floating-point realm.

A common pitfall

Beginners programmers expect floating point number to act as the real fractional numbers: no errors involved. Slightly experienced programmers know that this is not the case, yet even the most careful and experienced ones make mistakes from time to time. We will focus more on the common pitfalls and not on the occasional mistreatments given by experts.

For example, someone unprepared might write the following code

#include 
#include 

int main ()
{
    float a = 0.1;
    float b = 0.2;
    float c = a + b;
    if (c != 0.3)
        printf("%f\n%f\n%f\n%f\n", c, a + b, 3 * a, 1.5 * b);
    return 0;
}

and be surprised to see that results are

$ ./a.out
0.300000
0.300000
0.300000
0.300000

Note: Your results on your machine might vary. Later in the article we will discuss this aspect at length.

Of course, the problem in here is pretty simple: all floating point constants use double precision thus the code should at least read

if (c != 0.3f)
    printf(...)

I say at least because even if on my architecture I got the exact value of 0.3, this is not the case on all of them. Why? Because none of the 0.1, 0.2 and 0.3 values have an exact representation in base 2. One can see that by trying to convert the number into base 2. Let’s follow the example of 0.3:

the integral part of 0.3 is 0 so it is also in base 2
double the number, we get 0.6, its integral part is 0 thus the first binary digit after decimal point of 0.3 is still a 0.
double this result, we get 1.2 so the next digit is a 1 and we are left with 0.2
double it, get 0.4, next binary digit is 0
double it, get 0.8, next binary digit is 0
double it, get 1.6, next binary digit is 1 and we’re back to 0.6

Thus, the binary representation of 0.3 would be 0.01001100110011001... Repeating the same algorithm with 0.1 and 0.2 will end in the same loop between 0.2, 0.4, 0.8 and 0.6. So, none of 0.1, 0.2 or 0.3 has an exact representation. Thus, no result of any operation with these numbers will be an exact answer.

But, then, why did we get the exact answer in here? The two sensible answers are that either the compiler generates code which uses a higher level of precision than the space reserved for float or the printing routine does hard work to properly display the numbers. We can test these hypotheses using gdb:

$ gdb -q ./a.out 
Reading symbols from /tmp/fps/a.out...done.
(gdb) b main
Breakpoint 1 at 0x400538: file 1.c, line 6.
(gdb) r
Starting program: /tmp/fps/a.out 

Breakpoint 1, main () at 1.c:6
6       float a = 0.1;
(gdb) n
7       float b = 0.2;
(gdb) p a
$1 = 0.100000001
(gdb) n
8       float c = a + b;
(gdb) p b
$2 = 0.200000003

As you can see, printing the values from memory shows that they are not 0.1 and 0.2 but values close to that.

Let’s see now what the assembly code around c = a + b looks like:

(gdb) disass
Dump of assembler code for function main:
   0x0000000000400530 <+0>:     push   %rbp
   0x0000000000400531 <+1>:     mov    %rsp,%rbp
   0x0000000000400534 <+4>:     sub    $0x10,%rsp
   0x0000000000400538 <+8>:     mov    0x142(%rip),%eax        # 0x400680
   0x000000000040053e <+14>:    mov    %eax,-0x4(%rbp)
   0x0000000000400541 <+17>:    mov    0x13d(%rip),%eax        # 0x400684
   0x0000000000400547 <+23>:    mov    %eax,-0x8(%rbp)
=> 0x000000000040054a <+26>:    movss  -0x4(%rbp),%xmm0
   0x000000000040054f <+31>:    addss  -0x8(%rbp),%xmm0
   0x0000000000400554 <+36>:    movss  %xmm0,-0xc(%rbp)
---Type <return> to continue, or q <return> to quit---q
Quit

The last three lines are the assembly lines generated for float c = a + b (you can test that by running an objdump -CDgS | less and searching for float c). -0x4(%rbp) is where a is stored on the stack. b is stored at -0x8(%rbp). The assembly instructions used – addss and movss – and the register involved – xmm0 – show that we are working with Streaming SIMD Extensions (SSE). This register has a precision of 128 bits which is 4 times greater than the 32 bits used by the float datatype. We are tempted now to think that we are able to use the full width of the register – even if the SIMD part of the extension tells that this is not the case, we want a real proof based on the memory/register contents.

Continuing the execution, we see:

(gdb) n
9       if (c != 0.3)
(gdb) p $xmm0
$3 = {v4_float = {0.300000012, 0, 0, 0}, v2_double = {5.18894283457103e-315,
0}, v16_int8 = {-102, -103, -103, 62, 0 12 times>}, v8_int16 = {
-26214, 16025, 0, 0, 0, 0, 0, 0}, v4_int32 = {1050253722, 0, 0, 0}, v2_int64 =
{1050253722, 0}, uint128 = 1050253722}
(gdb) p c
$4 = 0.300000012

Indeed, our c is not 0.3. But it seems that not even the contents of xmm0 are closer to the truth.

So, the fact that we got 0.3 in the output is caused not by the fact that we use a 128-bits wide registers but by the fact that the up-to-recent unsolved problem of precisely printing floating point numbers is no longer so.

The floating point standard

Before we further investigate the realm of floating points, let’s have a look at the standard used for storing and working with these numbers: IEEE-754. We would not go in full details since we are only interested in some minor aspects.

First of all, the standard defines the way in which we can store a floating point number as three integer numbers: one for the sign (which is always 0 or 1), one for an exponent which gives us access to a wider range than[0..2^32] and one for the mantissa. The final number is just the product of the mantissa, the base (2 in case of binary numbers, 10 in case of decimal numbers – the standard defines some way to store decimal numbers too) raised to the exponent power and (-1) raised to the sign value.

Depending on the sizes of these numbers we have the basic float type (or binary32) in which the total size of the three numbers is 32 bits. In this case 1 bit is reserved for the sign, 8 for the exponent and the other 23 for the mantissa.

The C double type is defined by the binary64 format: 1 bit of sign, 11 bits for the exponent and 52 bits for the mantissa for a total of 64. There is also a binary128 format and a C long double type. In this case 15 bits are reserved for the exponent and 112 for the mantissa.

The standard committee has come up with a clever idea of storing these numbers into binary format. For example, they don’t store the exponent in 2’s complement but modified via an offset. Thus, the bit patterns of two nearby representable floats represent two consecutive integer values. This allows us to do some interesting tricks with the two representations of real numbers.

The standard also defines $\infty$ and $-\infty$, two values for 0 (+0 and -0 and how they should be tested equal but treated differently in operations) and a full sequence of values which don’t represent a number but some exception – the sometimes dreaded NaN values.

Knowing these details about the IEEE-754 standard we can go forward in our exploration. Because from now on we would use the binary representation and won’t rely on the base 10 view of numbers we will use an online analyzer to investigate interesting values.

Back to the castle and a final conclusion

Returning to our code, we want to see what values are stored in memory for a, b and c and also in register xmm0:

(gdb) x $rbp - 0x4
0x7fffffffdfac: 0x3dcccccd
(gdb) x $rbp - 0x8
0x7fffffffdfa8: 0x3e4ccccd
(gdb) x $rbp - 0xc
0x7fffffffdfa4: 0x3e99999a
(gdb) p/x $xmm0
$4 = {.... uint128 = 0x0000000000000000000000003e99999a}

Looking through the analyzer, 0x3dcccccd (the value for a) is 1.00000001490116119384765625E-1 which is both close to the original value of 0.1 and to the displayed value of 0.100000001. Same for b and c. However, looking at xmm0 register we see that the last 32 bits have the same pattern as -0xc($rbp). Thus, the SSE 128 bits registers are not using the binary128 standard! If they were using it, the last value displayed there should have been 3FFD3333333333333333333333333333. As said on reddit thread for this article, excess precision comes from the x87 coprocessor which uses 80 bits of precision.

Now it is time to see some other aspects of working with floating point numbers.

Testing them all

Since there is a perfect isomorphism between float values and int ones and there are only 2^32 ints (on normal architectures), sometimes it is easy and desirable to test a new function on all of the possible values. Unfortunately, this doesn’t properly work for functions with more than one argument because one would have to spend ages for that. But for one single argument things are pretty nice: it only takes 16 seconds on my machine to run the following code which tests that changing the sign twice gives the same value:

#include 
#include 

int main()
{
    unsigned int i = 0;
    float x;

    do {
        x = *((float*)&i);
        if (x != -(-x))
            printf("%f %u\n", x, i);
        i++;
    } while (i != 0);

    return 0;
}

Running it we see:

$ gcc -Wall -Wextra -O0 -g 2.c 
$ ./a.out  | head -n 5
nan 2139095041
nan 2139095042
nan 2139095043
nan 2139095044
nan 2139095045

It seems that our hypothesis fails when the initial number was a NaN value. For now, let us filter all of these values and test the hypothesis on the remaining domain.

$ time ./a.out | grep -v nan

real    0m15.895s
user    0m17.977s
sys 0m0.163s

Something which we would have expected.

Note: Compiling with optimisations on might make the compiler issue the following warning:

warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    ^

This is because the C/C++ standard says that the compiler can assume that different types don’t overlap in memory so neither should pointers to those types. Knowing that a pointer to an array of integers and one array of doubles don’t overlap opens a way for some optimizations. Breaking them is at your own risk. See also the documentation for -fstrict-aliasing flag of gcc.

The NaN problem

You might be wondering why do we have so many NaN values (the 5 above are but a small sample of them all). Thing is, the standard allows some NaN values to carry an exception code within it such that the programmer debugging the code can know why he got this value. We would not enter into details regarding this aspect though.

A more interesting question is how these NaN values arise. One example is doing asin(1+smth) or sqrt(0-smth_else). You might say: “but I will never do that” to which I will reply that since every floating point operation has some rounding and errors tend to propagate you might find in some occasions doing exactly that.

Now, the question is how to filter out these values from code. The standard states that the NaN values have form s1111111 1axxxxxx xxxxxxxx xxxxxxxx so one might just check the first few bits of the number (s is the sign and is ignored and a is used to differentiate between a quiet NaN and a signalling one while x represent payload bits showing why the signalling NaN was produced). So we change the code to read

#include 
#include 

int main()
{
    unsigned int i = 0;
    float x;

    do {
        x = *((float*)&i);
        if (x != -(-x))
            printf("%f %u\n", x, i);
        i++;
        if (i > 0x7f800000)
            break;
    } while (i != 0);

    return 0;
}

If you don’t remember the bit pattern you can still filter out by knowing that all NaN values are required to compare unequal even themselves. Thus, a test x == x is always false for NaN values.

The Associativity Problem

One of the ideas behind this post was this StackOverflow question. We can test this to see on how many floats the output is wrong:

#include 
#include 

int main()
{
    unsigned int i = 0;
    float x, y, z;
    unsigned long long s = 0;

    do {
        x = *((float*)&i);
        y = x * x * x * x * x * x;
        z = (x * x * x);
        z = z * z;
        if (y != z)
            printf("%f %u\n", x, i);
        s += i;
        i++;
        if (i > 0x7f800000)
            break;
    } while (i != 0);

    printf("%lld\n", s);

    return 0;
}

Since we are compiling with -O3 we don’t want the compiler to optimize our loop away. Thus we have a s variable in which we store the sum of all is. Also, the code already removes the NaN values. Running it we get:

$ time ./a.out | wc -l
163049703

real    1m58.114s
user    1m59.005s
sys 0m3.148s

That is, there is a total of 3.79% values for which doing the optimization in question will give a different result on this machine.

Equality testing done right

Finally, we have arrived to an interesting aspect: how do we compare if two floats are almost the same? We already know that doing a comparison with == is bad. Let us pick now two numbers: 10000 and the next representable float and compare between them using the standard method:

#include 
#include 
#include 

int main ()
{
    int expectedAsInt = 1176256512;
    int resultAsInt = expectedAsInt + 1;
    float expectedResult = *((float*)&expectedAsInt);
    float result = *((float*)&resultAsInt);

    printf("%f %f\n", result, expectedResult);

    if (fabs(result - expectedResult) < 0.0001)
        printf("Numbers are close\n");

    return 0;
}

The output

$ ./a.out
10000.000977 10000.000000

So the above test fails to consider two floating points which are neighbors as being the same. If your algorithm produced a result which would be between these two floats and it would be rounded to the wrong one you would get the impression that your algorithm is wrong.

Anyway, even if this method was correct, what value should one use for the bound in the test? float.h defines FLT_EPSILON so one might decide to test using that:

#include 
#include 
#include 
#include 

int closeFloats(float number, float target)
{
    return fabs(number - target) < FLT_EPSILON;
}

inline float getFloatFromInt(int value)
{
    return *((float*)&value);
}

void testFloatTesting(int src)
{
    float target = getFloatFromInt(src);
    float next = getFloatFromInt(src + 1);

    printf("src=%d target=%f next=%f compare=%d\n", src, target, next,
            closeFloats(next, target));
}

int main ()
{
    /* 0.5 and next float */
    testFloatTesting(0x3F000000);

    /* 1.5 and next float */
    testFloatTesting(0x3FC00000);

    /* 100.5 and next float */
    testFloatTesting(0x42C90000);

    /* 10000.5 and next float */
    testFloatTesting(0x461C4200);

    return 0;
}

A proper closeFloats function is what we are looking for. We use testFloatTesting to test this on two floats which come from two neighboring integers (a more formal definition is floats which differ by 1ULP – units in last place). Running it, we get:

$ ./a.out
src=1056964608 target=0.500000 next=0.500000 compare=1
src=1069547520 target=1.500000 next=1.500000 compare=0
src=1120468992 target=100.500000 next=100.500008 compare=0
src=1176257024 target=10000.500000 next=10000.500977 compare=0

All of the initial numbers were chosen to be exactly representable but this is not vital. What’s interesting is that only the numbers between 0 and 1 show as being close when using the FLT_EPSILON absolute method.

Let’s try now to use a relative error and compare that with FLT_EPSILON:

int closeFloats(float number, float target)
{
    return fabs(number - target) / target < FLT_EPSILON;
}

Using the above gives the following results:

$ ./a.out 
src=1056964608 target=0.500000 next=0.500000 compare=0
src=1069547520 target=1.500000 next=1.500000 compare=1
src=1120468992 target=100.500000 next=100.500008 compare=1
src=1176257024 target=10000.500000 next=10000.500977 compare=1

We get better results above 1 but worse below. This is because we are dividing to a smaller number closing to doing a division by 0. So, don’t use the above method as well.

Let’s try with a third option:

int closeFloats(float number, float target)
{
    float diff = fabs(number - target);
    float largest;

    number = fabs(number);
    target = fabs(target);
    largest = (target > number) ? target : number;

    return diff <= largest * FLT_EPSILON;
}

This time, instead of dividing we use multiplication. Also, to ensure some more safety, we pick the largest absolute value as being the mark around which we compute the relative error. Running this test we finally get:

$ ./a.out
src=1056964608 target=0.500000 next=0.500000 compare=1
src=1069547520 target=1.500000 next=1.500000 compare=1
src=1120468992 target=100.500000 next=100.500008 compare=1
src=1176257024 target=10000.500000 next=10000.500977 compare=1

However, the story is not yet finished. What happens if the FLT_EPSILON is too large a gap in relative error? You might be tempted to say just multiply FLT_EPSILON with 0.1 and be done. Test it and you’ll see that all of the results turn to 0: it is as if we didn’t use any bound at all and tested using ==. So we are thus restricted to having a relative gap no smaller than FLT_EPSILON.

Now, let’s turn to the other side: what if the gap is too small? You can multiply FLT_EPSILON with a small value for this. However, finding out which value to use is hard because this way of computing the error is not linked at all with the representation of the floating point numbers. So, let’s try with using ULPs:

int closeFloats(float number, float target)
{
	int numberULP = *((int *) &number);
	int targetULP = *((int *) &target);

	if ((numberULP >> 31) != (targetULP >> 31))
		return number == target;
	return abs(numberULP - targetULP) < 5;
}

In the above we consider numbers which differ by at most 5 ULPs as being close. Also, observe the first check which tests if the numbers have different signs. In the positive case we compare using == the floating point numbers to ensure that we catch the case +0 == -0.

Running it we get:

$ ./a.out
src=1056964608 target=0.500000 next=0.500000 compare=1
src=1069547520 target=1.500000 next=1.500000 compare=1
src=1120468992 target=100.500000 next=100.500008 compare=1
src=1176257024 target=10000.500000 next=10000.500977 compare=1

which was somehow obvious (since the number are already one ULP apart).

Now you might raise one more question: which of the two methods is fastest? Let’s test:

void testFloatTesting(int src)
{
    float target = getFloatFromInt(src);
    float next = getFloatFromInt(src + 1);

    if (closeFloats(next, target) != 1)
        printf("src=%d target=%f next=%f compare=%d\n", src, target,
                next, closeFloats(next, target));
}

int main ()
{
    unsigned int i = 0;

    do {
        testFloatTesting(i++);
        if (i > 0x7f800000)
            break;
    } while (i != 0);

    return 0;
}

Using ULP we get these results:

$ time ./a.out

real    0m32.343s
user    0m32.290s
sys 0m0.007s

Using the floating point - relative method we get:

$ time ./a.out | wc -l
4194305

real    1m4.161s
user    1m4.137s
sys 0m0.204s

We seem to be getting some wrong results (0.9%). Indeed, around 0 both comparison methods fail. The relative error method fails because we are close to dividing by 0 and because of catastrophic cancellation. The ULP method because there are many numbers between 0 and FLT_MIN (the minimum properly representable float) – these values are denormalized and using them might slow down your computation quite a lot. So, what should we use in this case? It turns out that if you want to compare with 0 the absolute error method is the best.

Also note that on my machine the relative method is twice as slow as the ULP one.

To conclude this part:

when you compare two numbers which are far from 0 (properly representable) use either the relative error method (with multiplication) or the ULP one, depending on which is fastest (on machines with SSE this would most certainly by the ULP one).
when comparing a number against 0 use the absolute error method
in all other cases take care to split the comparison into the above two cases

Determinism, Correctness and Fastness

Up to this point, this article focused on the correctness aspect of floating point operations where by correctness one means giving results as close as possible to the real truth. Not mentioned in here but on the same topic we have the field of numerically stable algorithms and the entire mathematics/CS branch of numerical analysis.

However, there is another aspect which needs to be considered. We have written even in this article the results you get might differ depending on the architecture you use. And indeed, neither IEEE nor C/C++ standards define what precision should be use for intermediate computations. Even though the IEEE-754-2008 standard says Together with language controls it should be possible to write programs that produce identical results on all conforming systems, this is just a possibility, not yet mandated across architectures.

When is this important? Three domains come to mind: games (network games and game replays), research (reproducibility), cloud computing (migration of live virtual machines). All of them are important enough to make this problem an interesting one.

There are settings which change the rounding mode, the handling of denormals or of exceptions. There are a lot of flags to control and you can find them all described in fenv.h header. These values are per-thread but they might change if you call a library function which has the side effect of modifying one of these flags and not changing it back to the previous value (another strong point of referential immutability).

Finally, floating point results might also change depending on the compilation flags passed (-ffast-math) or even if you are running your code inside a debugger or in production mode. We’ll leave this topic by giving a link to a comprehensive article about it. If one really needs reproducible floating point results then he might use Streflop or even MPFR.

Now, let’s turn to the third topic: fastness. It turns out that all floating point operations are slow. To alleviate this problem several CPU extensions were introduced – that’s why we have SSE. But it turns out that we can do even better than that if we leave some room for some errors.

Games and Artificial Intelligence use quite a lot of floating point operations with transcendental functions (sin, log, exp). These have been the subject of optimizations through time. We have the fast-inverse-square-root trick as a powerful example of that. We have fast approximations of exponential function which is commonly used in neural networks and radial basis functions. And we have even libraries ([1], [2]) dedicated to optimizing the speed of these functions in detriment of precision. At first look, all of these look like clever algorithms with a lot of magical constants which arise from (seemingly) nowhere. However, most of them are just simply usages of numerical methods to compute roots of equations (Newton-Raphson method is used for the Carmak’s trick) or some series expansions of the functions being used coupled with clever usages of the integer representation of the floating point. Describing these algorithms will cover an article twice as long as this one so we won’t do it now. However, keep in mind that Knuth saying:

Premature optimization is the root of all evil

Don’t just go and replace all of your transcendental calls from libm to calls from one of the libraries bent on optimizing the speed of some floating point operations, check first if this is exactly what you want and if the errors stemming from the approximations have no impact on your code/results.

To end this section, it seems that in the realm of floating point precision, reproducibility and speed are the vertices of an Iron Triangle: one cannot get all of them at once and must make compromises.

Fun trivia

To conclude the article on a funny note note that one can compute the logarithm in base two of any float by just looking at it’s representation from the integer point of view: since multiplying a float by 2 increases the exponent – which is stored in the middle of the representation – increasing the value of the logarithm by 1 is just increasing the representation by 0x800000.

Another interesting fact is that since $\sin(\pi-x) = \sin(x)$ and for small values of x $\sin(x) \approx x$ we get that $\sin(\pi) \approx \epsilon(\pi)$ (the error in representing $\pi$ as a float). Thus, a nice method to compute $\pi$ is to repeatedly compute pi + sin(pi) up to the highest precision available. Don’t try this in production code, the xkcd reference in the beginning of the article should be warning enough: sin(pi) is not a rational function thus this method can quickly lead to catastrophic errors.

Conclusions

This article is quite a long one and filled with seemingly disjoint pieces of information. They are but mere glimpses into the dangers of using floating point arithmetic without considering all of the aspects involved with it. For a more comprehensive reading the obligatory Oracle Appendix D is essential but it is filled with mathematical formulas and equations which are daunting to the less brave readers. Some more details can be found in The Floating Point Guide.

In the end, keep in mind that floating point math is not mystical but neither should it be treated carelessly.

Daemonizing Processes - Part 1

ROSEdu — Sun, 23 Mar 2014 00:00:00 UT

Daemonizing Processes - Part 1

Published on March 23, 2014 by Matei Oprea
Tagged: C, daemon, fork, setsid, nohup, disown

A special category of processes in Linux is that formed by daemon processes.

What is a daemon?

A daemon is a program that runs as a background process, forever, without being directly affected by any user. Let’s run a command to see examples of some daemons.

We need to run a command which will tell us what processes are started by the init process (they must have a PPID of 1):

ps -ef | awk '$3 == 1'

The trimmed result will be the following:

$ ps -ef | awk '$3 == 1'
root       367     1  0 Mar08 ?        00:00:00 upstart-udev-bridge --daemon
root       398     1  0 Mar08 ?        00:00:00 /sbin/udevd --daemon
syslog     521     1  0 Mar08 ?        00:00:03 rsyslogd -c5
102        525     1  0 Mar08 ?        00:00:15 dbus-daemon --system --fork --activation=upstart
avahi      816     1  0 Mar08 ?        00:00:00 avahi-daemon: running [matei-Satellite-C660.local]

In Linux, the parent process of a daemon is often the init process. To create a daemon, we need to fork() a child process and then exit (after that the process will be an orphan process), causing init to adopt the it as a child. A daemon is often started at boot time having the task of handling network requests or hardware activity.

Let’s code a Daemon

In this article we will show how one can write a simple daemon process. First, we will show the long path of using fork() and setsid().

First step (see this FAQ for reference) is to fork and exit such that the process is no longer a process leader:

/*
 * Fork the parent process
 */
pid = fork();
/* On failure, -1 is returned in the parent
 * No child process is created
 */
if (pid < 0) {
    perror("fork");
    exit(EXIT_FAILURE);
}

/* We are now killing the parent process
 * parent exits -> init "takes the lead"
 */
if (pid > 0)
    exit(EXIT_SUCCESS);

Because we want to have a completely new controlling terminal we need to make our process be a session leader using setsid(). The above fork was needed just to allow this to succeed:

sessionID=setsid();
if (sessionID < 0) {
    perror("setsid");
    exit(EXIT_FAILURE);
}

Next step is to fork()/exit() again. Since the session leader is now dead our process can never get access to a controlling terminal:

pid = fork();
if (pid < 0) {
    perror("fork");
    exit(EXIT_FAILURE);
}

if (pid > 0)
    exit(EXIT_SUCCESS);

Now, we will switch to the directory which contains the files needed for this daemon to run (for example in case of dovecot we would switch to /run/dovectot which contains the sockets for different mail queues). Or, we could switch to / (like apache2 and sshd do for example) if we don’t want to change to a specific directory. Anyway, it is essential to change the current running directory to prevent cases where if the program was started in a cwd from a different partition that partition could no longer be umounted.

change_dir = chdir("/");
if (change_dir < 0 ) {
    perror("chdir");
    exit(EXIT_FAILURE);
}

Though the following steps are optional, it is better to do them too to ensure a reproducible behaviour of our executable, no matter what state the system was when we started it.

Because a child process inherits file descriptors and file descriptors from his parent, we need to close them. We use sysconf to get the maximum number of opened file descriptors in order to close all of them and prevent leaks. Then, we will set umask to 0 to gain complete permissions over anything we write.

maxfd = sysconf(_SC_OPEN_MAX);
if (maxfd < 0) {
    perror("sysconf _SC_OPEN_MAX");
    exit(EXIT_FAILURE);
}

for (fd = 0; fd < maxfd; fd++)
    /* note that we ignore return code here */
    close(fd);

umask(0);

Now we should reopen the 3 standard file descriptors. We can point them to /dev/null or to specific log files. Here we open all of them to /dev/null:

fd = open("/dev/null", 0);
if (fd < 0) {
    perror("open /dev/null");
    exit(EXIT_FAILURE);
}

status = dup2(fd, 0);
if (status < 0) {
    perror("dup 0");
    exit(EXIT_FAILURE);
}
status = dup2(fd, 1);
if (status < 0) {
    perror("dup 1");
    exit(EXIT_FAILURE);
}
status = dup2(fd, 2);
if (status < 0) {
    perror("dup 2");
    exit(EXIT_FAILURE);
}

We now, have a fully working daemon, created by us. However, certain considerations must be taken:

First, if our code is to be launched by inetd then only the chdir and umask steps are useful. No fork and setsid should be called (otherwise inetd will get confused) and all other steps are already done by inetd.
Second, all of the above code is already implemented in the daemon function call with slightly less control over the end-result. It might be easier to use it instead of all of the above steps.

All is good and nice but what if we want to daemonize a normal process? Well, we can use nohup, disown or start-stop-daemon. Or we could resort to special services to start our daemons like inetd and upstart.

Using nohup for daemonizing processes

nohup is a command which is used to run a command which ignores the HUP (hangup) signal. The HUP signal is used by a terminal to warn dependent processes of logout. Thus, processes started with nohup won’t be killed after the tty is destroyed.

$ nohup sleep 10000 &
[1] 5470
nohup: ignoring input and appending output to ‘nohup.out’
$ exit

Now open a new terminal and

$ pgrep sleep
5470

We can simulate nohup inside our C code too. Let’s configure signal handlers:

memset (&sig_act, 0 , sizeof(sig_act));
/* Ignore SIGHUP signal */
if (signal(SIGHUP, SIG_IGN) == SIG_ERR){
    perror("signal");
    exit(EXIT_FAILURE);
}

Now, if stdout is a terminal we have to redirect output to a file, just like the original command does:

if(isatty(fileno(stdout))) {
    rc = open ("nohup.out", O_WRONLY | O_CREAT, 0644);
    if (rc < 0) {
        perror("open");
        exit(EXIT_FAILURE);
    }
    rc = dup2(rc, STDOUT_FILENO);
    if (rc < 0) {
        perror("dup2");
        exit(EXIT_FAILURE);
    }
}

Then we can fork and exec to get to our new process.

Let’s test it now (./a.out is our test binary, it receives as arguments the command line to execute in exec):

$ ./a.out gedit &
[1] 22727

After we close the terminal and open a new one, we clearly see that the process will be adopted by init:

$ ps -ef | grep gedit
matei    22727     1  2 18:25 ?        00:00:01 gedit

Disowning a process

What if we already started the process and forgot to use nohup? We can use disown to remove the process from the current session hierarchy, thus making it able to survive when the tty is closed.

Our first job is to use ^Z to stop/pause the program and to go back to terminal. Then we have to use bg to run it in the background.

$ gedit
^Z
[1]+  Stopped                 gedit
$ bg
[1]+ gedit &

Now, we use disown with the -h option, to mark the process so that SIGHUP is not gonna be received. If we don’t use the -h option the process is also removed from the current jobs table, which is something we like to do anyway:

$ disown

If we go to another terminal, we should see that the process has been adopted by init:

$ ps -ef | grep gedit
matei    23087 22921  8 18:38 pts/6    00:00:01 gedit

What happened? Our gedit process is not adopted by init. This is because we haven’t yet closed the terminal in which we have launched it. After closing it we have

$ ps -ef | grep gedit
matei    12490     1  1 07:07 ?        00:00:00 gedit

To conclude:

we need to use daemons for autonomous tasks
we have multiple ways for creating daemons
we can control daemons using signals and config files

All other methods will be presented in a second part article.

Unix portability. Autoconf, Automake, Libtool

ROSEdu — Sat, 08 Mar 2014 00:00:00 UT

Unix portability. Autoconf, Automake, Libtool

Published on March 8, 2014 by Alexandru Goia
Tagged: Unix, C, portable code, autoconf, automake, libtool

The purpose of this article is to generally present the utilities Autoconf, Automake and Libtool, which ease very much the process of installation from sources of software packages or libraries, from the point of view of users. It is assumed that the user uses any Unix/Unix-like system, and the purpose of the developer, who chooses to use these GNU tools, is to make the installation, on the user’s system, as easy as possible.

If you are a serious Unix/Linux user, for sure you have been in the situation of compiling a program or a library from sources. Then, you called ./configure; make; make install and you got an executable or a library. In this article, we change the point of view, assuming that we are the software’s developers and not the users, and our purpose is to understand in principle how can we generate the configure script and the other additional files,

This article aims to explain le raison d’etre of these very useful tools (Autoconf, Automake, Libtool) and to present generic configuration files, in the case of a simple program and the case of a simple library – thus, the base syntax for these GNU tools with the aim of creating an easy install experience for the users.

The Theory

Autoconf is a software tool useful in the process of compilation from sources of a software package. At running time, it generates shell scripts which will run on the user’s system, independently of Autoconf version installed there (thus there is no need that the user should install this on his system too). These shell scripts will be run without manual interventions on any Unix/Posix system. Thus, Autoconf makes easier the porting of source programs on various Unix/Posix systems by determining the characteristics of the user’s system on which the compilation will take place just before this.

For each software package on which we run it, Autoconf generates a configuration script from a template file, named configure.ac or configure.in, which lists the options of the user’s system, options that the software package needs or uses.

It is being said, like Unix, that those who do not understand Autoconf are destined to reinvent it; Autoconf doesn’t make easier the life of the developer – which is supposed to be a mature one –, but it makes easy the installation experience of the software, on the Unix/Unix-like systems of the users, which are – at least a priori – various.

So, Autoconf solves the problem of determining the pieces of information about the system needed for compilation, right before it. It is only the first part of a larger problem which is the ‘perfect’ compilation on the user’s system, in other words, the development of portable software. Here enters the GNU Build System, which continues and completes what Autoconf started, by another two GNU tools: Automake and Libtool.

The make tool (GNU make, gmake, etc.) is present on every Unix/Unix-like system. Automake allows developers to describe in a file named Makefile.am the build specifications, with a syntax simpler and richer than that of a regular Makefile. From the file Makefile.am, after running Automake (automake), a file named Makefile.in will be generated, which in turn will be used by the configure script to generate on the user’s system the classic file Makefile. Automake is very useful in the situation of software packages with multiple subdirectories or with multiple sources, but even for simple programs the reached portability is a gain.

Sometimes, we don’t want to generate only executables, we want to also generate libraries, in order to let them be further used by other developers. We want to generate shared (dynamic) libraries, and do this in a portable way. This is the task of the Libtool tool. One of the most used features of Libtool is the coexistence of multiple versions of a library, so the user may install or upgrade the library, without destroying the binary compatibility. Libtool is used by default by Automake, when we want to generate dynamic libraries, and there is no need to know its syntax.

These GNU tools are based on the macro-preprocessor GNU M4, but in this article we will not talk about it.

The Practice

A configure.ac file has the following structure:

autoconf requirements
AC_INIT(package, version, [bug-report-email], [tarname], [url])
information on the package
checks for programs
checks for libraries
checks for header files
checks for types
checks for structures
checks for compiler characteristics
checks for library functions
checks for system services
AC_CONFIG_FILES([file...])
AC_OUTPUT

For example, for a simple project, configure.ac/configure.in can look like this (comments inline for each directive):

dnl Comments start with dnl
dnl This file will be processed with autoconf command, in order to
dnl generate the configure script

AC_INIT([hello], [1.0])
dnl Do not put spaces between AC_* / AM_* and the open paranthesis!
dnl AC_* : declarations for Autoconf
dnl AM_* : declarations for Automake

AM_INIT_AUTOMAKE([-Wall -Werrror foreign])
dnl for Automake, in order to create Makefile.in
dnl -Wall -Werror request Automake to activate warnings
dnl and to report them as errors (Automake warnings, not compiler
dnl warnings).

dnl If in a configure.in file there is an AM_* directive then
dnl Autoconf will automatically call Automake.

dnl foreign specifies the program does not adhere to
dnl the GNU standard: there are not ChangeLog, AUTHORS, NEWS and
dnl README files.

dnl We determine the standardized name of the machine
AC_CANONICAL_HOST
dnl for Linux on Intel 32-bit this is i686-pc-linux-gnu

AC_LANG_C
dnl or AC_LANG([C])
dnl specifies the C language as the programming language
dnl or AC_LANG([C++]) or AC_LANG_CPLUSPLUS for C++ language

AC_PROG_CC
dnl verifies the existence of the C compiler
AC_PROG_CXX
dnl verifies the existence of the C++ compiler
dnl insert it if the sources are written in the C++ language

AC_PROG_MAKE_SET
dnl verifies the existence of make program

AC_HEADER_STDC
dnl verifies the existence of standard C files

AC_CHECK_HEADERS([stdio.h])
dnl verifies the existence of header file stdio.h

AC_CONFIG_HEADERS([config.h])
dnl the configure script will generate at runtime the config.h file
dnl which will contain useful #define directives for the program.

dnl config.h can be big enough, because every feature tested on
dnl the user's system is added as a #define in config.h

AC_CONFIG_FILES([
   Makefile
   src/Makefile
])
dnl AC_CONFIG_FILES declares the list of files that will be
dnl generated from their templates with .in extension

AC_OUTPUT
dnl the ending line, which produces a sequence of commands
dnl in the configure script, a sequence that will generate
dnl the registered files from AC_CONFIG_HEADERS and 
dnl AC_CONFIG_FILES

Of course, the configure.in file can be much more complex, with many AC_* and AM_* lines. For this, we recommend the Autoconf manual.

We should also have Makefile.am files in every subdirectory in the source tree. For example, for a tree with few source files (hello program with sources: main.c and functions.c):

# src/Makefile.am
bin_PROGRAMS = hello
hello_SOURCES = main.c functions.c

If the sources of the program contain an ./include/ directory which has header files then Makefile.am file from that directory must also contain this line:

# include/Makefile.am
include_HEADERS=header1.h header2.h header3.h ... headerN.h

In the Makefile.am from the root directory we must have the following line:

SUBDIRS = doc src

SUBDIRS = src

SUBDIRS is a special variable which lists all directories in which make will enter, before processing the current directory.

In the case of a simple library, the Makefile.am from the respective directory will look like this:

# lib/Makefile.am
lib_LTLIBRARIES = libaa.la
libaa_la_SOURCES = library-aa.c

We will run the command libtoolize, after which two new files will be created: ltconfig and ltmain.sh.

For configure.in and Makefile.am, we will run the commands aclocal, autoconf, and, then, automake.

Also, the source code, which must be portable, must “fold” on these tools.

Examples

This examples checks if ncurses is installed and saves the name of the terminal inside the s variable (if ncurses is installed then the name can be obtained by running termname, otherwise a constant value of TERM is used):

#include "config.h"
...
#ifdef HAVE_NCURSES
strcpy(s, termname());
#else
strcpy(s, getenv("TERM"));
#endif // HAVE_NCURSES

This example checks and includes the proper header for string functions, if there is one:

#include "config.h"
...
#ifdef HAVE_STRING_H
#include 
#else
#ifdef HAVE_STRINGS_H
#include 
#endif
#endif

Bibliography. Conclusions

For more informations, we recommend the free online book GNU Autoconf, Automake and Libtool, and also Using GNU Autotools, from the page of Alexandre Duret-Lutz.

After this article, we wish that the young Unix/Unix-like/Linux programmers are aware of the existence of these tools, and have them in mind when they will have to develop open-source programs which are intended to be portable across all Unix systems.

A superficial exploration of Haskell, part 2: Lazy by default

ROSEdu — Sun, 02 Mar 2014 00:00:00 UT

A superficial exploration of Haskell, part 2: Lazy by default

Published on March 2, 2014 by Dan Șerban
Tagged: haskell

Haskell uses lazy evaluation by default, but what does that mean exactly?

We’re going to state the abstract definition of laziness, behold its nonsensical beauty for a few seconds, and then conclude that a concrete example is necessary in order to understand the concept:

Laziness is the separation of equation from execution.

Before we look at an example, let me remind you of a bit of syntactic sugar that Haskell provides in order to quickly define a list of successive integers:

λ: [20..70]
[20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70]

As you might remember from part 1, λ: is my custom GHCi prompt, so we’re effectively looking at the result of how GHCi interprets the notation [20..70].

OK then. To start, let’s define two lists:

λ: let list1 = [20..70]
λ: let list2 = map (+1) list1

At this point you might be thinking “Oh look, list1 is the enumeration of all integers between 20 and 70, and list2 is the enumeration of all integers between 21 and 71”.

Well, no. Not yet, at least.

GHCi provides a command called :sprint that allows us to take a peek at how far along the evaluation of a given expression has progressed.

λ: :sprint list1
list1 = _
λ: :sprint list2
list2 = _

So what :sprint is telling us in the above snippet is that both list1 and list2 are unevaluated at this point. To establish some terminology, an underscore in the context of :sprint output represents a thunk. Formally defined, a thunk is an expression that hasn’t yet been evaluated. You may think of it as a value wrapped in a function of zero arguments. When the function is called, the value springs into existence.

We are now going to ask increasingly “intrusive” questions about list2 and then each step along the way examine what has been evaluated and what hasn’t.

The simplest and least intrusive question we can ask about a list is whether or not it’s empty.

λ: null list2
False
λ: :sprint list1
list1 = 20 : _
λ: :sprint list2
list2 = _ : _

In order to answer that question, GHCi needs to know whether or not the first element exists, and as a result, list2 is no longer unevaluated, it is now partially evaluated. GHCi now knows something about the structure of list2 - it knows that it consists of something “consed onto” (prepended to) something else. In the next round, that “something” will turn out to be the value 21, but for right now this fact is irrelevant to the process of answering the question “is list2 empty”. However, we notice the value 20 is fully evaluated as the head of list1 - this particular evaluation was necessary in order to construct the thunk (+1) 20.

Next, we ask for the first element of list2:

λ: head list2
21
λ: :sprint list1
list1 = 20 : _
λ: :sprint list2
list2 = 21 : _

We notice that the expression (+1) 20 mentioned above is now fully evaluated and therefore no longer a thunk.

Next, let’s ask for the first 5 elements of list2:

λ: take 5 list2
[21,22,23,24,25]
λ: :sprint list1
list1 = 20 : 21 : 22 : 23 : 24 : _
λ: :sprint list2
list2 = 21 : 22 : 23 : 24 : 25 : _

Next, let’s ask for the 18th element of list2:

λ: list2 !! 17
38
λ: :sprint list1
list1 = 20 : 21 : 22 : 23 : 24 : 25 : 26 : 27 : 28 : 29 :
        30 : 31 : 32 : 33 : 34 : 35 : 36 : 37 : _
λ: :sprint list2
list2 = 21 : 22 : 23 : 24 : 25 : _ : _ : _ : _ : _ :
        _ : _ : _ : _ : _ : _ : _ : 38 : _

Now this is interesting.

Elements from the 6th to the 17th are fully evaluated in list1 but unevaluated in list2. Because of how list1 is defined, it is going to be evaluated in small, close-proximity increments from left to right. But all values in list2 are evaluated by applying a transformation on elements in the corresponding positions in list1. That is why we are starting to see gaps in list2.

This example really drives home the essence of lazy evaluation. By default, Haskell will evaluate as little as possible, as late as possible. This is in contrast to traditional, imperative programming languages which evaluate as much as possible, as soon as possible.

Next, let’s ask for the length of list2:

λ: length list2
51
λ: :sprint list1
list1 = [20,21,22,23,24,25,26,27,28,29,
         30,31,32,33,34,35,36,37,38,39,
         40,41,42,43,44,45,46,47,48,49,
         50,51,52,53,54,55,56,57,58,59,
         60,61,62,63,64,65,66,67,68,69,70]
λ: :sprint list2
list2 = [21,22,23,24,25,_,_,_,_,_,
         _,_,_,_,_,_,_,38,_,_,
         _,_,_,_,_,_,_,_,_,_,
         _,_,_,_,_,_,_,_,_,_,
         _,_,_,_,_,_,_,_,_,_,_]

At this point, list1 is fully evaluated because there are no two ways around it - in order to compute the length of list2, GHCi needs to keep track of each and every one of its thunks, therefore it needs to generate the entire “spine” of the list. The process of generating the thunks (+1) 25 through (+1) 70 will require all elements of list1 to be fully evaluated.

Finally, there is only one thing left for us to do such that list2 is fully evaluated too - compute the sum of its elements:

λ: sum list2
2346
λ: :sprint list2
list2 = [21,22,23,24,25,26,27,28,29,30,
         31,32,33,34,35,36,37,38,39,40,
         41,42,43,44,45,46,47,48,49,50,
         51,52,53,54,55,56,57,58,59,60,
         61,62,63,64,65,66,67,68,69,70,71]

Conclusion

Laziness can be a tremendously helpful device for designing Haskell programs that run in constant space and feature a clean separation of pure code vs. side-effecting code. However, care must be taken to avoid what is known as “space leaks”, which we are going to cover in the next instalment of this series.

Editorial note

This blog post was inspired by chapter 2 of Simon Marlow’s excellent book “Parallel and Concurrent Programming in Haskell” which is available both in e-book format as well as free of charge online.

Update

In recent versions of GHC, due to the Monomorphism Restriction being off by default (in contrast with the current ones) some of the examples might look a little different. See the discussion on reddit.

Unix standards and implementations. Unix portability

ROSEdu — Sun, 02 Feb 2014 00:00:00 UT

Unix standards and implementations. Unix portability

Published on February 2, 2014 by Alexandru Goia
Tagged: Unix, C, portable code, POSIX, SUS, libc

The purpose of this article is to present in a general way the Unix standards and how can we write portable code on Unix systems, not only on Linux ones.

In the Unix world at present, there are three important standards:

the C language (ISO C standard) and the standard C library (libc), which are included in the POSIX standard
the POSIX standard (Portable Operating System Interface for Unix), which has the last version from 2008
the SUS standard (Single Unix Specification), which includes as a subset the POSIX standard, with the last version from 2010 (SUSv4).

The POSIX standard consists of:

POSIX.1: core services
POSIX.1b: real-time extensions
POSIX.1c: threads extensions
POSIX.2: shell and utilities

We will be interested in this article only by POSIX.1 (last version: POSIX.1-2008, or IEEE Std 1003.1-2008) from the whole POSIX standard.

As implementations of the standard, we can name the GNU/Linux-based operating systems, the systems which descend from the BSD Unix version: FreeBSD, NetBSD, OpenBSD, DragonflyBSD, the certified and commercial UNIX-es, based on UNIX System V release 4 and, from case to case, with BSD elements: Oracle Solaris (known previously as Sun Solaris), HP-UX and Tru64 UNIX (HP), AIX (IBM), IRIX (SGI), Unixware and OpenServer (SCO), and also Mac OS X, which is also officially certified as a UNIX system, based on FreeBSD elements and not on UNIX System V.

The most “popular” Unix systems are at present Linux, FreeBSD, Solaris and Mac OS X. With regards to the C language, all these operating systems (Linux 3.x, FreeBSD >= 8.0, Mac OS X >= 10.6.8, Solaris >= 10) support the following LIB C headers:

assert.h: verify program assertion
complex.h: complex arithmetic support
ctype.h: character classification and mapping support
errno.h: error codes
fenv.h: floating-point environment
float.h: floating-point constants and characteristics
inttypes.h: integer type format conversion
iso646.h: macros for assignment, relational, and unary operators
limits.h: implementation constants
locale.h: locale categories and related definitions
math.h: mathematical functions and type declarations and constants
setjmp.h: nonlocal goto
signal.h: signals
stdarg.h: variable argument lists
stdbool.h: boolean type and values
stddef.h: standard definitions
stdint.h: integer types
stdio.h: standard I/O library
stdlib.h: utility functions
string.h: string operations
tgmath.h: type-generic math macros
time.h: time and date
wchar.h: extended multibyte and wide character support
wctype.h: wide character classification and mapping support

They also support the following POSIX headers (in the C language):

aio.h: asynchronous I/O
cpio.h: cpio archive values
dirent.h: directory entries
dlfcn.h: dynamic linking
fcntl.h: file control
fnmatch.h: filename-matching types
glob.h: pathname pattern-matching and generations
grp.h: group file
iconv.h: codeset conversion utility
langinfo.h: language information constants
monetary.h: monetary types and functions
netdb.h: network database operations
nl_types.h: message catalogs
poll.h: poll() function
pthread.h: threads
pwd.h: password file
regex.h: regular expressions
sched.h: execution scheduling
semaphore.h: semaphores
strings.h: string operations
tar.h: tar archive values
termios.h: terminal I/O
unistd.h: symbolic constants
wordexp.h: word-expansion definitions
arpa/inet.h: Internet definitions
net/if.h: socket local interfaces
netinet/in.h: Internet address family
netinet/tcp.h: TCP definitions
sys/mman.h: memory management declarations
sys/select.h: select() function
sys/socket.h: sockets interface
sys/stat.h: file status
sys/statvfs.h: file system information
sys/times.h: process times
sys/types.h: primitive system data types
sys/un.h: UNIX domain socket definitions
sys/utsname.h: system name
sys/wait.h: process control
fmtmsg.h: message display structures
ftw.h: file tree walking
libgen.h: pathname management functions
ndbm.h: database operations (exception: Linux)
search.h: search tables
syslog.h: system error logging
utmpx.h: user accounting database (exception: FreeBSD)
sys/ipc.h: inter-processes communication
sys/msg.h: XSI message queues
sys/resource.h: resource operations
sys/sem.h: XSI semaphores
sys/shm.h: XSI shared memory
sys/time.h: time types
sys/uio.h: vector I/O operations
mqueue.h: message queues (exception: Mac OS X)
spawn.h: real-time spawn interface.

The SUS standard (the whole set of UNIX functions and constants) can be found online for SUSv2 (year 1997, naming UNIX 98), SUSv3 (year 2001-2002, naming UNIX 03) and SUSv4 (year 2010):

To write portable code which can be executed on any Unix systems we must know the C headers (defined by LIBC and by POSIX) which are recognized by the Unix systems. We can activate the operating system in order to use only POSIX.1 elements, or also SUSv1, SUSv2, SUSv3, or SUSv4 using the so-called “feature test macros”:

_POSIX_SOURCE and _POSIX_C_SOURCE, to activate POSIX functionality
_XOPEN_SOURCE, which activates SUSv1/2/3/4 functionality.

For older POSIX functionality we have to declare the following in our source file:

#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 1 /* for POSIX 1990 */
/* use 2 for POSIX C bindings 1003.2-1992 */

For POSIX 2008 functionality, we define:

#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 200809L

Or, we can compile with:

cc -D_POSIX_SOURCE -D_POSIX_C_SOURCE=200809L filename.c

If our code is written, or it will run on UNIX certified systems (hence on systems who follow SUSv1, SUSv2, SUSv3, or SUSv4), we must define also _XOPEN_SOURCE:

Thus, we would have to use

for SUSv1:

#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 2
#define _XOPEN_SOURCE
#define _XOPEN_SOURCE_EXTENDED 1

for SUSv2:

#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 199506L
#define _XOPEN_SOURCE 500

for SUSv3:

#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 200112L
#define _XOPEN_SOURCE 600

for SUSv4:

#define _POSIX_SOURCE
#define _POSIX_C_SOURCE 200809L
#define _XOPEN_SOURCE 700

If we write code only for Linux platforms, we will use the feature test macro _GNU_SOURCE, which will activate GNU LIBC functionality, which sometimes isn’t POSIX compatible. There is also the feature test macro _SVID_SOURCE (to activate System V functionality) and _BSD_SOURCE (to activate BSD functionality). One important note is that a UNIX system (which follows SUSvX) can be activated to offer any SUSvX functionality.

This is the way we can write Unix portable code. Other methods to find out more about the operating system on which we compile are:

LIBC functions: sysconf(3), pathconf(3), fpathconf(3) – functions which determine system constants
autoconf, automake and libtool: utilities which determine at compile time, with scripts, what system and libc functions the operating system offers. (These will be part of the content of a following article.)

Happy Unix programming!

Lambda Functions in C++

ROSEdu — Tue, 28 Jan 2014 00:00:00 UT

Lambda Functions in C++

Published on January 28, 2014 by Mădălina-Andreea Grosu, Matei Oprea
Tagged: lambda, higher-order functions, c++

The C++ 2011 standard introduced the lambda expression syntax element causing some people ask why it was needed. In reality, it was not a new use case, people have been using this under different names since C was created. You had functors (C++ terminology) and pointer to functions for example. A basic use case was in applying the same transform over all elements of a collection (the functor’s widely shared example) or sorting elements of a vector (via qsort in C). But, in reality, all of these cases can be reduced to using higher-order functions.

1. Higher-order functions

A high-order function is a function that takes one or more functions as an input and outputs a function. For example, we can use this higher-order functions to map, filter, fold and sort lists.

Let’s start with a simple example of a high-order function, in Haskell:

zipWith1 :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith1 _ [] _ = []
zipWith1 _ _ [] = []
zipWith1 f (x:xs) (y:ys) = f x y : zipWith1 f xs ys

This function will take a function and two lists as parameters and then joins them by applying the function between corresponding elements. Let’s see a little demonstration for the function written above:

ghci> zipWith1 (+) [1,2,3,4] [5,6,7,8]
[6,8,10,12]

So we found out what a higher-order function is. Now, what is a lambda function? The term comes from the Lambda Calculus and refers to anonymous functions in programming. With a lambda function you can write quick functions without naming them.

Let’s see the above function written using lambdas:

zipWith (\x y -> x + y ) [1,2,3,4] [5,6,7,8]

If we run this function in GHCi the result will be the same as above:

Prelude> zipWith (\x y -> x + y ) [1,2,3,4] [5,6,7,8]
[6,8,10,12]

Now, to see the equivalence, the following functions are one and the same:

f x y = x + y
f x = \y -> x + y
f = \x y -> x + y

Now, we know what is a lambda function and a higher-order function. Let’s see how can we use lambda functions in C++.

2. Lambdas in C++

A lambda function, in C++, starts with [ and it has a specific syntax:

[capture] (params) -> return_type { function_body }

Let’s see a short example of a lambda function in C++:

[](int x, int y) -> int { return x * y; }

This function simply multiplies two integers.

Consider now the following Haskell example of applying a function to a list, using map:

map (\x -> x + 1) [1, 2, 3]

In C++, we have the function transform which does the same thing as the map function from Haskell:

#include 
#include 
#include 

using namespace std;

int main (){
    /* declare 2 vectors */
    vector <int> vector1;
    vector <int> vector2;

    /* pseudo-pseudo-random values */
    for (int i=1;i<4;i++)
        vector1.push_back (i);

    /* alocate memory in vector2 */
    vector2.resize(vector1.size());

    /* applies our lambda function for each element
     * in vector1 and stores it in vector2
     */
    transform (vector1.begin(), vector1.end(), vector2.begin(),
        [] (int i) { return ++i; });

    /* output the result */
    cout << “Vector2 contains: “;
    for (std::vector<int>::iterator it=vector2.begin();
        it!=vector2.end(); ++it)
        std::cout << ' ' << *it;

    return 0;
}

And the output is:

Vector2 contains: 2 3 4

You can see that our result is the same as in Haskell. We used a lambda function to increment the value for the each element from the first vector and then we printed it to standard output.

3. Conclusions

So, why you should use lambda functions ?

You can write fast functions and use them in your production code
You can replace macros (because macros are evil – citation needed)
Because $\lambda$ rocks
Because you can use it when you want a short-term functionality that you do not want to have to name

Inspecting library calls for fun and profit

ROSEdu — Sat, 18 Jan 2014 00:00:00 UT

Inspecting library calls for fun and profit

Published on January 18, 2014 by Mihai Maruseac
Tagged: trace, ltrace, strace, ptrace, debugging

Two years ago this blog had a series of articles on debugging tools. We have presented tools like Valgrind and GDB and we stopped with an introduction to strace. At the end of that article we mentioned that there are other tools useful for debugging beyond these three already mentioned. After two years of silence, the debugging series is on with an article on ltrace.

Ask around developers and you’ll see that the proportion of those knowing about ltrace compared to those knowing how to use strace is at most the same as the proportion of strace users among users knowing how to use gdb and valgrind.

But how is ltrace different? Why is this an useful tool? This article will try to shine some light on this while also providing comparisons with the strace tool.

Basic Example

The simples way to use both ltrace and strace is to append this tool in front of the command you’re tracing. We will illustrate here the same example used for strace

$ ltrace ls
__libc_start_main(0x402c60, 1, 0x7fffa36d7038, 0x412bb0 
strrchr("ls", '/')                               = nil
setlocale(LC_ALL, "")                            = "en_US.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale") = "/usr/share/locale"
textdomain("coreutils")                          = "coreutils"
__cxa_atexit(0x40ace0, 0, 0, 0x736c6974756572)   = 0
isatty(1)                                        = 1
getenv("QUOTING_STYLE")                          = nil
getenv("COLUMNS")                                = nil
ioctl(1, 21523, 0x7fffa36d6bd0)                  = 0
getenv("TABSIZE")                                = nil
getopt_long(1, 0x7fffa36d7038, "abcdfghiklmnopqrstuvw:xABCDFGHI:"..., 0x61a5e0, -1)         = -1
getenv("LS_BLOCK_SIZE")                          = nil
...
opendir(".")                                     = 0x2789c30
readdir(0x2789c30)                               = 0x2789c60
readdir(0x2789c30)                               = 0x2789c78
readdir(0x2789c30)                               = 0x2789c90
strlen("a.out")                                  = 5
malloc(6)                                        = 0x2791c70
memcpy(0x2791c70, "a.out\0", 6)                  = 0x2791c70
readdir(0x2789c30)                               = 0x2789cb0
strlen("out.9373")                               = 8
malloc(9)                                        = 0x2791c90
memcpy(0x2791c90, "out.9373\0", 9)               = 0x2791c90
...
closedir(0x2789c30)                              = 0
free(0)                                          = 
malloc(432)                                      = 0x2789c30
_setjmp(0x61b640, 0x400000, 0x2785e50, 0x2789cc0)= 0
__errno_location()                               = 0x7f95ad5916c0
strcoll("out.9307", "1.c")                       = 23
...
fwrite_unlocked("1.c", 1, 3, 0x3573db9400)       = 3
...
fwrite_unlocked("out", 1, 3, 0x3573db9400)       = 3
...
exit(0 
__fpending(0x3573db9400, 0, 64, 0x3573db9eb0)    = 0
fileno(0x3573db9400)                             = 1
__freading(0x3573db9400, 0, 64, 0x3573db9eb0)    = 0
__freading(0x3573db9400, 0, 2052, 0x3573db9eb0)  = 0
fflush(0x3573db9400)                             = 0
fclose(0x3573db9400)                             = 0
__fpending(0x3573db91c0, 0, 0x3573dbaa00, 0xfbad000c)= 0
fileno(0x3573db91c0)                             = 2
__freading(0x3573db91c0, 0, 0x3573dbaa00, 0xfbad000c)= 0
__freading(0x3573db91c0, 0, 4, 0xfbad000c)       = 0
fflush(0x3573db91c0)                             = 0
fclose(0x3573db91c0)                             = 0
+++ exited (status 0) +++

Looking at the trace we see that the ls process starts by acknowledging the current locale after which several environment variables which control the output are read (only a few of them shown, the others ellided by ...). Then opendir is called on . (since ls had no other arguments) and each entry is read via readdir and then copied into a vector of entries (after using strdup seen here as a triple of strlen, malloc and memcpy). Next step is to sort all of these entries according to the current locale (strcoll, the variable LC_COLLATE). This allows sorting the filenames in alphabetical order. Then, each filename is written on the 1 file descriptor (stdout) using the non-blocking fwrite_unlocked. Last step is to call exit and flush all open streams.

Right now you are more enlightened on what ls does than before reading this part. Knowing the above information you can do things like changing the way files are quoted (I retrieved the options by providing an invalid value and looking on the QUOTING_STYLE='-' ltrace ls output to see what arguments are tested for):

$ ls a*
a file  a.out

$ QUOTING_STYLE="shell" ls a*
'a file'  a.out

$ QUOTING_STYLE="c" ls a*
"a file"  "a.out"

The next question we are interested in is “Can ltrace trace syscalls as well?”. Luckily, the answer is yes, by using the -S flag:

$ ltrace -S ls
SYS_brk(0)                               = 0x1d4b000
SYS_mmap(0, 4096, 3, 34)                 = 0x7f4d8b352000
SYS_access("/etc/ld.so.preload", 04)     = -2
SYS_open("/etc/ld.so.cache", 524288, 01) = 3
SYS_fstat(3, 0x7fff9f3a4110)             = 0
SYS_mmap(0, 0x246b0, 1, 2)               = 0x7f4d8b32d000
...

Contrast with the results of strace:

$ strace ls
execve("/usr/bin/ls", ["ls"], [/* 48 vars */]) = 0
brk(0)                                  = 0x1190000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fcf80794000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=149168, ...}) = 0
mmap(NULL, 149168, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fcf8076f000
...

Differences are easily seen. The main one is that ltrace prefixes each syscall with SYS_ and doesn’t represent macros as macros but expands them (so instead of PROT_READ|PROT_WRITE you have 3). In fact, even the number of arguments is different. For understandability reason, it is better to use strace for tracing the system calls and ltrace for tracing the library ones.

Why Is `ltrace` Useful?

From the above section you have seen that we can use ltrace to understand undocumented behavior of an application. For example the QUOTING_STYLE was found neither in the ls manual nor in the bash one.

Another way ltrace is useful is when one of the libraries your application depends on is faulty. Instead of trying to debug a full-scale application you might want to isolate the culprit into a minimal application which exhibits only the bad behaviour. For that, you can use ltrace in the same way we used strace in its own article in the past.

I Have Too Much Output

Like in the case of strace, ltrace produces a long list of output lines and it is quite hard to find what you’re looking for or to understand what’s happening while they are scrolling on the screen.

Just like strace we can save the output to a file, using -o:

$ ltrace -o ltraceout ls
$ wc -l ltraceout
523 ltraceout
$ head ltraceout
__libc_start_main(0x402c60, 1, 0x7fffbc2e3348, 0x412bb0 
strrchr("ls", '/')                              = nil
setlocale(LC_ALL, "")                           = "en_US.UTF-8"
bindtextdomain("coreutils", "/usr/share/locale")= "/usr/share/locale"
textdomain("coreutils")                         = "coreutils"
__cxa_atexit(0x40ace0, 0, 0, 0x736c6974756572)  = 0
isatty(1)                                       = 1
getenv("QUOTING_STYLE")                         = nil
getenv("COLUMNS")                               = nil
ioctl(1, 21523, 0x7fffbc2e2ee0)                 = 0

Like strace, we can also use -e to filter on specific calls.

In the following examples we would use the following C source file which computes 41^41 and 42^42 both using the float libmath version and the libgmp multi-precision integers one. We will use threads to compute 42^42 and compute 41^41 in the main function with both arguments.

#include 
#include 
#include 
#include 

#include 

void *do_double_thread(void *data)
{
  double x = 42;
  x = pow(x, x);
}

void *do_mpz_thread(void *data)
{
  mpz_t x;

  mpz_init_set_ui(x, 42);
  mpz_pow_ui(x, x, 42);

  mpz_clear(x);
}

int main()
{
  pthread_t double_thread, mpz_thread;
  pthread_attr_t attr;

  double y = 41;
  mpz_t x;

  mpz_init_set_ui(x, 41);
  mpz_pow_ui(x, x, 41);

  mpz_clear(x);

  y = pow(y, y);

  /* initialize the attribute */
  if (pthread_attr_init(&attr) != 0) {
    perror("pthread_attr_init");
    pthread_exit(NULL);
  }

  /* set detached state */
  if (pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE) != 0) {
    perror("pthread_attr_setdetachstate");
    pthread_exit(NULL);
  }

  if (pthread_create(&double_thread, &attr, do_double_thread, NULL)) {
    perror("pthread_create");
    exit(EXIT_FAILURE);
  }

  if (pthread_create(&mpz_thread, &attr, do_mpz_thread, NULL)) {
    perror("pthread_create");
    exit(EXIT_FAILURE);
  }

  pthread_attr_destroy(&attr);

  if (pthread_join(double_thread, NULL))
    perror("pthread_join");

  if (pthread_join(mpz_thread, NULL))
    perror("pthread_join");

  return 0;
}

To compile, we have to link against libmath, libpthread and libgmp:

$ gcc -lm -lpthread -lgmp test.c -o test

Running ltrace on the full output we have the following:

$ ltrace ./test
__libc_start_main(0x400aeb, 1, 0x7fff6afa8b78, 0x400c60 
__gmpz_init_set_ui(0x7fff6afa8a30, 41, 0x7fff6afa8b88, 0x400c60)     = 1
__gmpz_pow_ui(0x7fff6afa8a30, 0x7fff6afa8a30, 41, 0x7fff6afa8a30)    = 0
__gmpz_clear(0x7fff6afa8a30, 0x6bb020, 0, 0x129c08be7ca69)           = 0
pow(0x3573db8760, 0xffffffff, 0x4044800000000000, 0)                 = 0x4da9465e5d9d1629
pthread_attr_init(0x7fff6afa8a40, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
pthread_attr_setdetachstate(0x7fff6afa8a40, 0, 0x7fefffffffffffff, 0)= 0
pthread_create(0x7fff6afa8a80, 0x7fff6afa8a40, 0x400a60, 0)          = 0
pthread_create(0x7fff6afa8a78, 0x7fff6afa8a40, 0x400aa8, 0)          = 0
pthread_attr_destroy(0x7fff6afa8a40, 0x7f60ece77fb0, 0x7f60ece789d0, -1)= 0
pthread_join(0x7f60ed679700, 0, 0x7f60ece789d0, -1)                  = 0
pthread_join(0x7f60ece78700, 0, 0x7f60ed679700, 0x3574418290)        = 0
+++ exited (status 0) +++

If we want to capture only the bignum operations we can use -e flag:

$ ltrace -e '*gmpz*' ./test
test->__gmpz_init_set_ui(0x7ffffb01f830, 41, 0x7ffffb01f988, 0x400c60)= 1
test->__gmpz_pow_ui(0x7ffffb01f830, 0x7ffffb01f830, 41, 0x7ffffb01f830 
libgmp.so.10->__gmpz_n_pow_ui(0x7ffffb01f830, 0xbf0010, 1, 41 
libgmp.so.10->__gmpz_realloc(0x7ffffb01f830, 7, 42, 7)         = 0xbf0010
<... __gmpz_n_pow_ui resumed> )                                  = 0
<... __gmpz_pow_ui resumed> )                                    = 0
test->__gmpz_clear(0x7ffffb01f830, 0xbf0020, 0, 0x129c08be7ca69) = 0
+++ exited (status 0) +++

From this output we see that __gmpz_pow_ui from our code calls __gmpz_n_pow_ui from libgmp.so.10 which in turn calls __gmpz_realloc to expand the space allocated to the number.

However, in some cases one library might call functions from another or you might want to filter and keep only the calls done by your application. Fortunately, we can still do that:

$ ltrace -e '*gmpz*-@libgmp.so*' ./test
test->__gmpz_init_set_ui(0x7fff45c5cd70, 41, 0x7fff45c5cec8, 0x400c60) = 1
test->__gmpz_pow_ui(0x7fff45c5cd70, 0x7fff45c5cd70, 41, 0x7fff45c5cd70)= 0
test->__gmpz_clear(0x7fff45c5cd70, 0xc02020, 0, 0x129c08be7ca69)       = 0
+++ exited (status 0) +++

If you want to trace all calls inside a library then it is better to use -x.

$ ltrace -x '@libgmp.so.*' ./test
__libc_start_main(0x400aeb, 1, 0x7fff656660b8, 0x400c60 
__gmpz_init_set_ui(0x7fff65665f70, 41, 0x7fff656660c8, 0x400c60 
__gmpz_init_set_ui@libgmp.so.10(0x7fff65665f70, 41, 0x7fff656660c8, 0x400c60 
__gmp_default_allocate@libgmp.so.10(8, 41, 0x7fff656660c8, 0x400c60)= 0x222a010
<... __gmpz_init_set_ui resumed> )= 1
<... __gmpz_init_set_ui resumed> )= 1
__gmpz_pow_ui(0x7fff65665f70, 0x7fff65665f70, 41, 0x7fff65665f70 
__gmpz_pow_ui@libgmp.so.10(0x7fff65665f70, 0x7fff65665f70, 41, 0x7fff65665f70 
__gmpz_n_pow_ui@libgmp.so.10(0x7fff65665f70, 0x222a010, 1, 41 
__gmpz_realloc@libgmp.so.10(0x7fff65665f70, 7, 42, 7 
__gmp_default_reallocate@libgmp.so.10(0x222a010, 8, 56, 7)= 0x222a010
<... __gmpz_realloc resumed> )= 0x222a010
__gmpn_sqr@libgmp.so.10(0x222a010, 0x7fff65665e80, 2, 48 
__gmpn_sqr_basecase@libgmp.so.10(0x222a010, 0x7fff65665e80, 2, 48)= 0x3562f3ea0787ecff
<... __gmpn_sqr resumed> )= 0
__gmpn_mul_1@libgmp.so.10(0x222a010, 0x222a010, 3, 0x129c08be7ca69)= 0xca32f2e
<... __gmpz_n_pow_ui resumed> )= 0
<... __gmpz_pow_ui resumed> )= 0
<... __gmpz_pow_ui resumed> )= 0
__gmpz_clear(0x7fff65665f70, 0x222a020, 0, 0x129c08be7ca69 
__gmpz_clear@libgmp.so.10(0x7fff65665f70, 0x222a020, 0, 0x129c08be7ca69 
__gmp_default_free@libgmp.so.10(0x222a010, 56, 0, 0x129c08be7ca69)= 0
<... __gmpz_clear resumed> )= 0
<... __gmpz_clear resumed> )= 0
pow(0x3573db8760, 0xffffffff, 0x4044800000000000, 0)= 0x4da9465e5d9d1629
pthread_attr_init(0x7fff65665f80, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
pthread_attr_setdetachstate(0x7fff65665f80, 0, 0x7fefffffffffffff, 0)= 0
pthread_create(0x7fff65665fc0, 0x7fff65665f80, 0x400a60, 0)= 0
pthread_create(0x7fff65665fb8, 0x7fff65665f80, 0x400aa8, 0)= 0
pthread_attr_destroy(0x7fff65665f80, 0x7f3164bc1fb0, 0x7f3164bc29d0, -1)= 0
pthread_join(0x7f31653c3700, 0, 0x7f3164bc29d0, -1)= 0
pthread_join(0x7f3164bc2700, 0, 0x7f31653c3700, 0x3574418290)= 0
_fini@libgmp.so.10(0x358cc761f0, 0, 0xffffffff, 0)= 0x358ca5edc4
+++ exited (status 0) +++

To catch only the calls to the specific library use -L which will make ltrace not trace anything from the MAIN library:

$ ltrace -L -x '@libgmp.so.*' ./test
__gmpz_init_set_ui@libgmp.so.10(0x7fffbf630930, 41, 0x7fffbf630a88, 0x400c60 
__gmp_default_allocate@libgmp.so.10(8, 41, 0x7fffbf630a88, 0x400c60)= 0x17b5010
<... __gmpz_init_set_ui resumed> )= 1
__gmpz_pow_ui@libgmp.so.10(0x7fffbf630930, 0x7fffbf630930, 41, 0x7fffbf630930 
__gmpz_n_pow_ui@libgmp.so.10(0x7fffbf630930, 0x17b5010, 1, 41 
__gmpz_realloc@libgmp.so.10(0x7fffbf630930, 7, 42, 7 
__gmp_default_reallocate@libgmp.so.10(0x17b5010, 8, 56, 7)= 0x17b5010
<... __gmpz_realloc resumed> )= 0x17b5010
__gmpn_sqr@libgmp.so.10(0x17b5010, 0x7fffbf630840, 2, 48 
__gmpn_sqr_basecase@libgmp.so.10(0x17b5010, 0x7fffbf630840, 2, 48)= 0x3562f3ea0787ecff
<... __gmpn_sqr resumed> )= 0
__gmpn_mul_1@libgmp.so.10(0x17b5010, 0x17b5010, 3, 0x129c08be7ca69)= 0xca32f2e
<... __gmpz_n_pow_ui resumed> )= 0
<... __gmpz_pow_ui resumed> )= 0
__gmpz_clear@libgmp.so.10(0x7fffbf630930, 0x17b5020, 0, 0x129c08be7ca69 
__gmp_default_free@libgmp.so.10(0x17b5010, 56, 0, 0x129c08be7ca69)= 0
<... __gmpz_clear resumed> )= 0
_fini@libgmp.so.10(0x358cc761f0, 0, 0xffffffff, 0)= 0x358ca5edc4
+++ exited (status 0) +++

Attaching To Other Processes

Like in strace case, we can use -p to attach to running processes:

$ ./test &
[1] 26026

$ ltrace -p 26026
__gmpz_clear(0x7fff1fa3bb50, 1, 0, 0x1b9b000)= 0
pow(0x7f2c3dca8000, 0x49ff000, 0x4044800000000000, -1)= 0x4da9465e5d9d1629
pthread_attr_init(0x7fff1fa3bb60, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
pthread_attr_setdetachstate(0x7fff1fa3bb60, 0, 0x7fefffffffffffff, 0)= 0
pthread_create(0x7fff1fa3bba8, 0x7fff1fa3bb60, 0x400a60, 0)= 0
pthread_create(0x7fff1fa3bba0, 0x7fff1fa3bb60, 0x400aa8, 0)= 0
pthread_attr_destroy(0x7fff1fa3bb60, 0x7f2c42073fb0, 0x7f2c420749d0, -1)= 0
pthread_join(0x7f2c42875700, 0, 0x7f2c420749d0, -1)= 0
pthread_join(0x7f2c42074700, 0, 0x7f2c42875700, 0x3574418290)= 0
+++ exited (status 0) +++
[1]+  Done                    ./test

In fact, just as strace, we can use multiple -p arguments to attach to multiple processes simultaneously:

$ ./test & ./test &
[1] 26149
[2] 26150

$ ltrace -p 26149 -p 26150
__gmpz_clear(0x7fff52a4fed0, 1, 0, 0xa2c000)= 0
pow(0x7f85fb6f0000, 0x49ff000, 0x4044800000000000, -1)= 0x4da9465e5d9d1629
pthread_attr_init(0x7fff52a4fee0, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
pthread_attr_setdetachstate(0x7fff52a4fee0, 0, 0x7fefffffffffffff, 0)= 0
pthread_create(0x7fff52a4ff28, 0x7fff52a4fee0, 0x400a60, 0)= 0
pthread_create(0x7fff52a4ff20, 0x7fff52a4fee0, 0x400aa8, 0)= 0
pthread_attr_destroy(0x7fff52a4fee0, 0x7f85ffabbfb0, 0x7f85ffabc9d0, -1)= 0
pthread_join(0x7f86002bd700, 0, 0x7f85ffabc9d0, -1)= 0
pthread_join(0x7f85ffabc700, 0, 0x7f86002bd700, 0x3574418290)= 0
+++ exited (status 0) +++
__gmpz_clear(0x7fff4cbac6e0, 1, 0, 0x1207000)= 0
pow(0x7fbf03640000, 0x49ff000, 0x4044800000000000, -1)= 0x4da9465e5d9d1629
pthread_attr_init(0x7fff4cbac6f0, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
pthread_attr_setdetachstate(0x7fff4cbac6f0, 0, 0x7fefffffffffffff, 0)= 0
pthread_create(0x7fff4cbac738, 0x7fff4cbac6f0, 0x400a60, 0)= 0
pthread_create(0x7fff4cbac730, 0x7fff4cbac6f0, 0x400aa8, 0)= 0
pthread_attr_destroy(0x7fff4cbac6f0, 0x7fbf07a0bfb0, 0x7fbf07a0c9d0, -1)= 0
pthread_join(0x7fbf0820d700, 0, 0x7fbf07a0c9d0, -1)= 0
pthread_join(0x7fbf07a0c700, 0, 0x7fbf0820d700, 0x3574418290)= 0
+++ exited (status 0) +++
[1]-  Done                    ./test
[2]+  Done                    ./test

Though, this case is useful only when debugging multiple programs which need to communicate between themselves, it is nice to know that this is possible.

Tracing the Threads and Children of a Process

The strace tools allows attaching to subprocesses of a process using -f. Also, you can use -ff with a -o to get the output of each thread in a separate file.

However, ltrace knows only the -f option. Lines from different processes are prefixed with the PID of that process.

$ ltrace -f ./test
[pid 26192] __libc_start_main(0x400aeb, 1, 0x7fffc406b9c8, 0x400c60 
[pid 26192] __gmpz_init_set_ui(0x7fffc406b880, 41, 0x7fffc406b9d8, 0x400c60)= 1
[pid 26192] __gmpz_pow_ui(0x7fffc406b880, 0x7fffc406b880, 41, 0x7fffc406b880)= 0
[pid 26192] __gmpz_clear(0x7fffc406b880, 0x1b21020, 0, 0x129c08be7ca69)= 0
[pid 26192] pow(0x3573db8760, 0xffffffff, 0x4044800000000000, 0)= 0x4da9465e5d9d1629
[pid 26192] pthread_attr_init(0x7fffc406b890, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
[pid 26192] pthread_attr_setdetachstate(0x7fffc406b890, 0, 0x7fefffffffffffff, 0)           = 0
[pid 26192] pthread_create(0x7fffc406b8d0, 0x7fffc406b890, 0x400a60, 0)= 0
[pid 26193] pow(0, 0, 0x4045000000000000, -1 
[pid 26192] pthread_create(0x7fffc406b8c8, 0x7fffc406b890, 0x400aa8, 0 
[pid 26193] <... pow resumed> )= 0x4e1646505f35a847
[pid 26193] +++ exited (status 0) +++
[pid 26192] <... pthread_create resumed> )= 0
[pid 26192] pthread_attr_destroy(0x7fffc406b890, 0x7fc1a1041fb0, 0x7fc1a10429d0, -1)= 0
[pid 26192] pthread_join(0x7fc1a1843700, 0, 0x7fc1a10429d0, -1)= 0
[pid 26192] pthread_join(0x7fc1a1042700, 0, 0x7fc1a1843700, 0x3574418290 
[pid 26194] __gmpz_init_set_ui(0x7fc1a1041f00, 42, 0x59a85877c49edc2b, -1)= 1
[pid 26194] __gmpz_pow_ui(0x7fc1a1041f00, 0x7fc1a1041f00, 42, 0x7fc1a1041f00)= 0
[pid 26194] __gmpz_clear(0x7fc1a1041f00, 0x7fc19c0008c0, 0, 42)= 0
[pid 26192] <... pthread_join resumed> )= 0
[pid 26194] +++ exited (status 0) +++
[pid 26192] +++ exited (status 0) +++

Thus, if you want to filter only a single child you have to resort to text filter utilities like grep.

Profiling

One nice thing about strace is that you can use the -c flag to get a table with all syscalls used in a program, the time needed to execute them and the count of error results. However, ltrace lacks this option but it can be simulated by using the other timing options and text filters.

Both strace and ltrace allow you to get timestamps around any call by using -r, -t, -tt or -ttt:

-r shows a relative timestamp since program startup

$ ltrace -r ./test
  0.000000 __libc_start_main(0x400aeb, 1, 0x7fff2a51a328, 0x400c60 
  0.000418 __gmpz_init_set_ui(0x7fff2a51a1e0, 41, 0x7fff2a51a338, 0x400c60)= 1
  0.000296 __gmpz_pow_ui(0x7fff2a51a1e0, 0x7fff2a51a1e0, 41, 0x7fff2a51a1e0)= 0
  0.000166 __gmpz_clear(0x7fff2a51a1e0, 0x1f66020, 0, 0x129c08be7ca69)= 0
  0.000137 pow(0x3573db8760, 0xffffffff, 0x4044800000000000, 0)= 0x4da9465e5d9d1629
  0.000168 pthread_attr_init(0x7fff2a51a1f0, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
  0.000147 pthread_attr_setdetachstate(0x7fff2a51a1f0, 0, 0x7fefffffffffffff, 0)= 0
  0.000216 pthread_create(0x7fff2a51a230, 0x7fff2a51a1f0, 0x400a60, 0)= 0
  0.000409 pthread_create(0x7fff2a51a228, 0x7fff2a51a1f0, 0x400aa8, 0)= 0
  0.000474 pthread_attr_destroy(0x7fff2a51a1f0, 0x7f25016c5fb0, 0x7f25016c69d0, -1)= 0
  0.000250 pthread_join(0x7f2501ec7700, 0, 0x7f25016c69d0, -1)= 0
  0.000257 pthread_join(0x7f25016c6700, 0, 0x7f2501ec7700, 0x3574418290)= 0
  0.000735 +++ exited (status 0) +++

-t shows the time of day when the call was made

$ ltrace -t ./test
14:50:42 __libc_start_main(0x400aeb, 1, 0x7fff84229b38, 0x400c60 
14:50:42 __gmpz_init_set_ui(0x7fff842299f0, 41, 0x7fff84229b48, 0x400c60)= 1
14:50:42 __gmpz_pow_ui(0x7fff842299f0, 0x7fff842299f0, 41, 0x7fff842299f0)= 0
14:50:42 __gmpz_clear(0x7fff842299f0, 0x1d02020, 0, 0x129c08be7ca69)= 0
14:50:42 pow(0x3573db8760, 0xffffffff, 0x4044800000000000, 0)= 0x4da9465e5d9d1629
14:50:42 pthread_attr_init(0x7fff84229a00, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
14:50:42 pthread_attr_setdetachstate(0x7fff84229a00, 0, 0x7fefffffffffffff, 0)= 0
14:50:42 pthread_create(0x7fff84229a40, 0x7fff84229a00, 0x400a60, 0)= 0
14:50:42 pthread_create(0x7fff84229a38, 0x7fff84229a00, 0x400aa8, 0)= 0
14:50:42 pthread_attr_destroy(0x7fff84229a00, 0x7f48e7ec0fb0, 0x7f48e7ec19d0, -1)= 0
14:50:42 pthread_join(0x7f48e86c2700, 0, 0x7f48e7ec19d0, -1)= 0
14:50:42 pthread_join(0x7f48e7ec1700, 0, 0x7f48e86c2700, 0x3574418290)= 0
14:50:42 +++ exited (status 0) +++

-tt also displays the microseconds

$ ltrace -tt ./test
14:50:45.465708 __libc_start_main(0x400aeb, 1, 0x7fff83373968, 0x400c60 
14:50:45.465942 __gmpz_init_set_ui(0x7fff83373820, 41, 0x7fff83373978, 0x400c60)= 1
14:50:45.466216 __gmpz_pow_ui(0x7fff83373820, 0x7fff83373820, 41, 0x7fff83373820)= 0
14:50:45.466400 __gmpz_clear(0x7fff83373820, 0x192e020, 0, 0x129c08be7ca69)= 0
14:50:45.466584 pow(0x3573db8760, 0xffffffff, 0x4044800000000000, 0)= 0x4da9465e5d9d1629
14:50:45.466764 pthread_attr_init(0x7fff83373830, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
14:50:45.466932 pthread_attr_setdetachstate(0x7fff83373830, 0, 0x7fefffffffffffff, 0)= 0
14:50:45.467101 pthread_create(0x7fff83373870, 0x7fff83373830, 0x400a60, 0)= 0
14:50:45.467417 pthread_create(0x7fff83373868, 0x7fff83373830, 0x400aa8, 0)= 0
14:50:45.468024 pthread_attr_destroy(0x7fff83373830, 0x7fc1e7ebdfb0, 0x7fc1e7ebe9d0, -1)= 0
14:50:45.468253 pthread_join(0x7fc1e86bf700, 0, 0x7fc1e7ebe9d0, -1)= 0
14:50:45.468480 pthread_join(0x7fc1e7ebe700, 0, 0x7fc1e86bf700, 0x3574418290)= 0
14:50:45.469108 +++ exited (status 0) +++

-ttt displays microseconds as above but use the seconds till epoch instead of the actual time.

$ ltrace -ttt ./test
1390074648.833755 __libc_start_main(0x400aeb, 1, 0x7fff5b1c8e28, 0x400c60 
1390074648.833981 __gmpz_init_set_ui(0x7fff5b1c8ce0, 41, 0x7fff5b1c8e38, 0x400c60)= 1
1390074648.834289 __gmpz_pow_ui(0x7fff5b1c8ce0, 0x7fff5b1c8ce0, 41, 0x7fff5b1c8ce0)= 0
1390074648.834481 __gmpz_clear(0x7fff5b1c8ce0, 0x1e7c020, 0, 0x129c08be7ca69)= 0
1390074648.834678 pow(0x3573db8760, 0xffffffff, 0x4044800000000000, 0)= 0x4da9465e5d9d1629
1390074648.834858 pthread_attr_init(0x7fff5b1c8cf0, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0
1390074648.835033 pthread_attr_setdetachstate(0x7fff5b1c8cf0, 0, 0x7fefffffffffffff, 0)= 0
1390074648.835242 pthread_create(0x7fff5b1c8d30, 0x7fff5b1c8cf0, 0x400a60, 0)= 0
1390074648.835935 pthread_create(0x7fff5b1c8d28, 0x7fff5b1c8cf0, 0x400aa8, 0)= 0
1390074648.836327 pthread_attr_destroy(0x7fff5b1c8cf0, 0x7fc3da214fb0, 0x7fc3da2159d0, -1)= 0
1390074648.837980 pthread_join(0x7fc3daa16700, 0, 0x7fc3da2159d0, -1)= 0
1390074648.838436 pthread_join(0x7fc3da215700, 0, 0x7fc3daa16700, 0x3574418290)= 0
1390074648.839230 +++ exited (status 0) +++

Also, both tools allow you to time each individual call by using -T:

$ ltrace -T ./test
__libc_start_main(0x400aeb, 1, 0x7fffc4512768, 0x400c60 
__gmpz_init_set_ui(0x7fffc4512620, 41, 0x7fffc4512778, 0x400c60)    = 1 <0.000290>
__gmpz_pow_ui(0x7fffc4512620, 0x7fffc4512620, 41, 0x7fffc4512620)   = 0 <0.000167>
__gmpz_clear(0x7fffc4512620, 0x21cc020, 0, 0x129c08be7ca69)         = 0 <0.000142>
pow(0x3573db8760, 0xffffffff, 0x4044800000000000, 0)                = 0x4da9465e5d9d1629 <0.000209>
pthread_attr_init(0x7fffc4512630, 213, 0x7fefffffffffffff, 0x7fffffffffffffff)= 0 <0.000130>
pthread_attr_setdetachstate(0x7fffc4512630, 0, 0x7fefffffffffffff, 0)= 0 <0.000139>
pthread_create(0x7fffc4512670, 0x7fffc4512630, 0x400a60, 0)         = 0 <0.000304>
pthread_create(0x7fffc4512668, 0x7fffc4512630, 0x400aa8, 0)         = 0 <0.000421>
pthread_attr_destroy(0x7fffc4512630, 0x7f09988a1fb0, 0x7f09988a29d0, -1)= 0 <0.000266>
pthread_join(0x7f09990a3700, 0, 0x7f09988a29d0, -1)                 = 0 <0.000181>
pthread_join(0x7f09988a2700, 0, 0x7f09990a3700, 0x3574418290)       = 0 <0.000467>
+++ exited (status 0) +++

Though you can profile applications using ltrace and strace, a much better tool to use is perf which will be presented on a future article.

Blaming it on the Culprit Line

It is possible to use ltrace and strace to show you the line numbers of the caller by using the -i flag to get the value of the EIP register and then using addr2line to get the exact line (compile with -g):

$ ltrace -i ./test
...
[0x400bfb] pthread_create(0x7fff0804c998, 0x7fff0804c960, 0x400aa8, 0)= 0
[0x400c1f] pthread_attr_destroy(0x7fff0804c960, 0x7f708d112fb0, 0x7f708d1139d0, -1)= 0
...
[0xffffffffffffffff] +++ exited (status 0) +++

$ addr2line -iCse ./test 0x400c1f
test.c:63

This is useful when your code makes repeated calls to the same subset of functions but only a few of them cause problems.

Nicer Output

One interesting feature of ltrace is that you can get a nice call tree when functions from one library call other traced functions. For that, you would use the -n option.

$ ltrace -n 3 -L -x '@libgmp.so.*' ./test
__gmpz_init_set_ui@libgmp.so.10(0x7fff7bb2e810, 41, 0x7fff7bb2e968, 0x400c60 
   __gmp_default_allocate@libgmp.so.10(8, 41, 0x7fff7bb2e968, 0x400c60)= 0x12b2010
<... __gmpz_init_set_ui resumed> )= 1
__gmpz_pow_ui@libgmp.so.10(0x7fff7bb2e810, 0x7fff7bb2e810, 41, 0x7fff7bb2e810 
   __gmpz_n_pow_ui@libgmp.so.10(0x7fff7bb2e810, 0x12b2010, 1, 41 
      __gmpz_realloc@libgmp.so.10(0x7fff7bb2e810, 7, 42, 7 
         __gmp_default_reallocate@libgmp.so.10(0x12b2010, 8, 56, 7)= 0x12b2010
      <... __gmpz_realloc resumed> )= 0x12b2010
      __gmpn_sqr@libgmp.so.10(0x12b2010, 0x7fff7bb2e720, 2, 48 
         __gmpn_sqr_basecase@libgmp.so.10(0x12b2010, 0x7fff7bb2e720, 2, 48)= 0x3562f3ea0787ecff
      <... __gmpn_sqr resumed> )= 0
      __gmpn_mul_1@libgmp.so.10(0x12b2010, 0x12b2010, 3, 0x129c08be7ca69)= 0xca32f2e
   <... __gmpz_n_pow_ui resumed> )= 0
<... __gmpz_pow_ui resumed> )= 0
__gmpz_clear@libgmp.so.10(0x7fff7bb2e810, 0x12b2020, 0, 0x129c08be7ca69 
   __gmp_default_free@libgmp.so.10(0x12b2010, 56, 0, 0x129c08be7ca69)= 0
<... __gmpz_clear resumed> )= 0
_fini@libgmp.so.10(0x358cc761f0, 0, 0xffffffff, 0)= 0x358ca5edc4
+++ exited (status 0) +++

If ltrace was compiled with libunwind support then you can also use the -w option to get a backtrace for a specific number of frames around each traced call. If not (like in our case) one can still use the -i way or the -n, depending on what he is interested in.

Conclusions

Though very rarely used, ltrace is a nice program to have in your toolbox. It will greatly help you in those hard to debug cases caused by undocumented behaviors of third-party libraries.

Notice that ltrace has most of the bugs of strace:

a program with setuid doesn’t have euid privileges while being traced
a program is slow while being traced
the -i support is weak

Next article on this series will present tools for profiling applications and solving timing bugs.

A superficial exploration of Haskell - part 1

ROSEdu — Tue, 07 Jan 2014 00:00:00 UT

A superficial exploration of Haskell - part 1

Published on January 7, 2014 by Dan Șerban
Tagged: haskell

This series of blog posts is aimed at experienced programmers who have heard that Haskell is an interesting programming language, but have not had the chance to invest any time in researching it.

In this series I am going to highlight a few remarkable things at a high level, while glossing over some implementation details that would take too long to explain properly. Therefore, expect a lot of “here’s a practical application of Haskell and here’s some sample code, but don’t ask to see the gory details” hand-waving.

For the purposes of this series, I will simply assume that it’s easy for the experienced reader to jump into a new imperative programming language after a few hours or days of becoming familiar with its syntax. And I’ll start with an example that illustrates how you have to adopt a completely different mindset when you start learning Haskell.

Part 1 of this series covers:

Mutability
Upside Down Maps
Tokenizing Kernel Code

Before you ask: All the Haskell snippets I’m showing here consist of GHCi interactive console sessions. I have configured a custom prompt for myself, by placing the line :set prompt "λ: " in GHCi’s configuration file ~/.ghc/ghci.conf. The prompt is going to look different if you’re just starting out with a freshly installed copy of Haskell.

Mutability

To start with, here’s a Python code sample, cut and pasted from a Python 2.7 REPL (interactive console session):

>>> x = 1
>>> x = x + 1
>>> x
2
>>>

Nothing could be simpler!

OK then. Time to port this snippet of code over to Haskell. I’m just going to go with the flow and naively assume – just as many newcomers to Haskell would – that porting Python code is a direct 1-to-1 syntactic translation, in other words, an easy, straightforward thing to do.

The following is what happens in the Haskell REPL (called GHCi). By the way, here we have to prepend the keyword let - it’s the law of the land in GHCi:

λ: let x = 1
λ: let x = x + 1
λ: x
^CInterrupted.
λ:

Huh? What just happened? I was expecting Haskell to compute the value 2. It took forever for the GHCi interactive console to evaluate x, so I got bored and pressed Ctrl-C. What’s happening? Explain this to me.

Well, as one Reddit commenter once observed, this is just one of the many things Haskell does to haze you during your initiation.

What you’re actually doing is giving Haskell a puzzle (x = x + 1) and saying “Go find me a solution”. Mathematically speaking, there are only 2 possible solutions to that puzzle: $\infty$ and $- \infty$. So when you saw it hang, Haskell wasn’t merely taking its time – for no good reason – before giving you back the value 2. Instead, Haskell’s runtime was trying its hardest to give you a correct result by taking every integer value it could think of, one by one, and checking whether it was equal to its successor.

Just to be clear, there is a way to emulate the behavior of the Python snippet we saw above, and the Haskell code for doing that looks like this:

λ: let x = 1
λ: x <- return $ x + 1
λ: x
2
λ:

As you can see, the syntax is much more verbose (and uglier) than in Python, and for good reason – in Haskell, you are strongly discouraged from using variables and mutation as the primary means of expressing algorithms.

Haskell is divided in two major parts: a crystal palace of unspeakable beauty and mathematical purity, and an imperative ghetto for doing I/O and dealing with mutation. The equal sign in x = x + 1 lives in the beautiful palace and symbolizes mathematical unification, while the construct <- return $ lives in the ghetto and means “evaluate the right hand side and shove the result into the identifier on the left hand side, thus overwriting what was there beforehand, in true imperative style”.

Reverse Map? Upside Down Map? You decide

OK, for the next segment I’m going to assume that your beloved programming language of choice has a construct called map, and that you know how to use it.

We start again with some Python code. While Python does indeed offer a higher-order function called map, it’s much more common for experienced Python developers to prefer using a list comprehension, like this:

>>> list = range(20,31)
>>> list
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
>>> [ x + 1 for x in list ]
[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
>>>

Nothing new or earth-shattering - I would hope - so here’s the Haskell equivalent before we jump into the interesting stuff:

λ: let list = [20..30]
λ: list
[20,21,22,23,24,25,26,27,28,29,30]
λ: map (+1) list
[21,22,23,24,25,26,27,28,29,30,31]
λ:

So the basic idea I’m getting at here is that in the beginning we have:

one single operation (compute an integer’s successor)
a list of integer values

Now for the interesting part.

Let’s take those bullet points above and turn them upside down, such that in the beginning we have:

one single integer value
a list of unary integer-to-integer operations

Python can still deal with this situation fairly well, since Python’s functions are first-class values (you can place several of them inside of a list). But the Python code wouldn’t be as concise or expressive as the Haskell equivalent, shown here:

λ: let function_list_1 = [(+2),(*3),(^2)]
λ: let function_list_2 = [(*5),(+7),(*4),(subtract 10)]
λ: let i = 12
λ: import Control.Applicative
λ: function_list_1 <*> [i]
[14,36,144]
λ: function_list_2 <*> [i]
[60,19,48,2]
λ:

Side note: Due to brevity concerns, it is not practical to go into a detailed explanation of the <*> operator in Control.Applicative (a module in Haskell’s standard library). That is a topic for another blog post. The point here was to show how concisely you can express non-trivial computations with Haskell.

Fun fact: implementing the “upside-down map” described above was recently a requirement for admittance into WebDev (an extracurricular course organized by ROSEdu). Candidates sent us solutions they had written in various programming languages, with varying degrees of conciseness. We found that the most verbose implementations were predominantly Java-based.

Let’s tokenize some kernel code

For the next segment I’ll just grab a snippet of code from the Linux kernel and demonstrate how concisely you can express a tokenizer for it in Haskell.

The code for Linux’s completely fair scheduler is stored in a file called fair.c; I’ll just grab a small function from it (function __enqueue_entity, which starts at line 507) and store it locally in a file called enqueue_entity.c.

Here’s what I do subsequently, step by step:

λ: sample_cfs_code <- readFile "enqueue_entity.c"
λ:

I just slurped the contents of the file into sample_cfs_code. This is our raw material, let’s look at it:

λ: sample_cfs_code
"static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)\n{\n        struct rb_node **link = &cfs_rq->tasks_timeline.rb_node;\n        struct rb_node *parent = NULL;\n        struct sched_entity *entry;\n        int leftmost = 1;\n        while (*link) {\n                parent = *link;\n                entry = rb_entry(parent, struct sched_entity, run_node);\n                if (entity_before(se, entry)) {\n                        link = &parent->rb_left;\n                } else {\n                        link = &parent->rb_right;\n                        leftmost = 0;\n                }\n        }\n        if (leftmost)\n                cfs_rq->rb_leftmost = &se->run_node;\n\n        rb_link_node(&se->run_node, parent, link);\n        rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);\n}\n\n"
λ:

We now define our tokenizing function in Haskell (I trust you will appreciate how concise it is):

λ: import Data.List
λ: let tokenize_this = unfoldr (\x -> case lex x of [("","")] -> Nothing; x:_ -> Just x)
λ:

The most interesting keyword here is lex, which is a function that is defined in Haskell’s standard library as part of the GHC.Read module. The unfoldr function also deserves some explanation, but just as before, it wouldn’t be practical to go into much detail here.

So far, so good. Let’s apply our tokenizing function to the C code:

λ: let tokenized_cfs_code = tokenize_this sample_cfs_code
λ:

Finally, let’s view the resulting stream of tokens:

λ: mapM_ print tokenized_cfs_code
"static"
"void"
"__enqueue_entity"
"("
"struct"
"cfs_rq"
"*"
"cfs_rq"
","
"struct"
"sched_entity"
"*"
"se"
")"
"{"
"struct"
"rb_node"
"**"
"link"
"="
"&"
"cfs_rq"
"->"
[ ... many more tokens I'm not showing here ... ]
"rb_insert_color"
"("
"&"
"se"
"->"
"run_node"
","
"&"
"cfs_rq"
"->"
"tasks_timeline"
")"
";"
"}"
λ:

You can use this approach to help your language design efforts, if you plan on inventing your own DSL, or even your own general-purpose programming language. Once your source code is tokenized, you can now parse the stream of tokens into the target Haskell data structures using a technique called combinator parsing, which is where the Haskell programming language really shines.

End of part 1

That’s it for part 1 – there will be more to come.

If you’re interested in picking up Haskell, there are a number of very good free online resources (1, 2, 3), as well as classes and workshops held in various locations.

Speaking of Haskell classes and workshops, allow me to draw your attention to a project called lambda.rosedu.org, which is an instructor-led, in-depth, hands-on workshop on functional programming centered around Haskell, Scala and Clojure. The workshop is free of charge, but the standards for admittance are fairly high (you will need to solve a few programming as well as logic problems). The workshop will be hosted by ROSEdu at the department for Computer Science of the POLITEHNICA University of Bucharest some time during the summer of 2014. The topics I glossed over – due to brevity concerns – in this blog post will be covered in depth during the workshop.

Facebook Hackathon Live Blogging

ROSEdu — Sat, 19 Oct 2013 00:00:00 UT

Facebook Hackathon Live Blogging

Published on October 19, 2013 by Alex Palcuie, Marius Ungureanu
Tagged: facebook, hackathon, live, blogging, coders

3:30PM

Ladies and gentlemen, fast hackers and coder perfectionists, web developers and mobile app creators, we present you the first edition of the Facebook hackathon in Romania. Organized by your favorite open-source community ROSEdu, the volunteers have been busy all morning preparing the workspace for the 15 participating teams. We have pizza, beer and a mountain of bean bags for people who move fast and break things.

3:30PM

People have started their IDEs (or text editors for more hardcore people) and started installing their gems (Ruby guy here sorry). After a quick intro from the organizers about the rules, the Facebook engineers presented their skills and their expectations: it’s fun to code, but it’s awesome to ship. So happy shipping hackers!

5:30PM

A brief pause and all the keyboard presses have stopped. The Facebook representatives have given out a random prize! One Facebook T-shirt. Congratulations to Andrei Duma! People are now back to coding and making their ideas come to life: done is better than perfect.

First team

Only 4 hours in the event! We have interviewed some of the participants and they’re coding, designing and implementing their application basis! The first team we interviewed is 3_awesome_guys_and_a_llama. These students from the University “Politehnica” of Bucharest are writing an Event Planner. From what they told us, it’s an application which tries to help people organize events for them and their friends for their night out. It’s more focused on location, than being focused on time, so they can make it a planned drink-up or dance-off. They integrate it with the Facebook Places API and would like to have bars, clubs and restaurants use their app so people can make reservations. As technology stack, they have Python on top of Google App Engine. One of the devs said that he learned about it on a Udacity course which I recommend it to you. They also plan to use Twitter’s Bootstrap library because they do not have enough frontend experience.

Be green, recycle

You are a human, walking down and you see a big pile of garbage. It’s a scenario common here in Romania. But what if you have an app for cleaning it? That’s what sudoRecycle is trying to do with their Android idea. You see the junk, take a photo, tag it with the GPS location and send it to their servers. Using their backend written in PHP, they will send teams of robots that will clean the area. Because we human beings are really lazy, they plan to use the Facebook API for gamification, so you could level up in cleaning the world.

Explore the underground

We’ve all endured the lack of knowledge of moving around Bucharest, if we haven’t lived here. But dark_side_of_the_moon is going to remedy this with their offline mobile subway connection app. You want to get from X to Y using the shortest route. It also wants to tell you what ground-level public transportation is there and what you can visit. Furthermore they want it to tell your friends where you’ve been after you used its functionality to check-in at your destination. Under the hood, it’s using Android 4.0+ API and they want to integrate with the Facebook API to see the places your friends have visited. The coolest feature they want to code will tell you when the next tube will arrive.

GRails

Did you know that in the year 2013, if you apply to MIT, you must send the papers by fax or postal mail? And after you send them, a person will manually go through them and tell you that the papers have arrived? Or if you get into a university you must write 6 papers with about 60% redundant information? That’s what GRails, the only team made entirely of girls, is trying to solve, fighting bureaucracy with Rails 4. Now with 100% less paper involved!

Hiking

Everybody knows that Romania has some of the best hiking routes, beautiful views and mysterious mountains. And who doesn’t want to know what trips you can make in the wild nature? Well, you can now check out a map and see what is available for adventurers! The map also shows you elevation, so you know if it’s a long road and also an abrupt road. A Django platform by saltaretii should be enough to support this paradise for nature’s explorers!

I want to ride my bicycle, I want to ride my bike

2 wheels, foot power and long distance travelling made easy! These two guys are achieving the awesome tool that brings bikers a dream app come true! Using complex algorithms, they want to give bikers many possible routes from one place to another. You can choose your own type of road, either abrupt and short or longer and less steep. The point? You can choose which kind of road you want and which is fit for you! If that is not enough, these 2 guys are doing this client side with ClojureScript… yeah, it’s the new functional kid in town which tries to solve the event driven callback hell. FlatRide on, people!

Jackson Gabbard

From an English major in Tennessee, to the 300th Facebook employee, to the 4th one to move in the new London office. He works on developer tools for the engineers and oversees some of the most important components like Tasks which devs open daily to get their job done. He is a self-taught hacker and he had an enlightment moment about the power of programming the first time he used the array structure.

He was really communicative and willing to tell us of his opinions, about the event, mentioning that he’s amazed about the main focus of students. ‘Transportation’, ‘Finding things’ and ‘Group organization’ are recurrent themes. He said some of his coworkers are Romanian and he thinks Romania is a land where lots of engineers are being created. Proud to be a full-time hackers around here!

We also asked him about the Bootcamp in London, which is about learning to code. And guess what? Even executives go through these preparations to get into Facebook. The engineering team has lots of fun hacking in that period of education. It teaches you how to love the company, you get to learn the ropes while communicating and interact with other mind-like people.

Finally he has participated in lockdowns each year. These are periods of time when teams gather in a room and stay there for several days (usually 30) and ship a big feature. Pretty hardcore, but that’s life at Facebook.

ROSEdu Techblog

Application process for the Community Development Lab

Problem Statement

Restrictions

Solutions

Statistics about the online submissions

Hands-on interviews

Technical questions

Acknowledgements

Here be Dragons - The Interesting Realm of Floating Point Operations

A common pitfall

The floating point standard

Back to the castle and a final conclusion

Testing them all

The NaN problem

The Associativity Problem

Equality testing done right

Determinism, Correctness and Fastness

Fun trivia

Conclusions

Daemonizing Processes - Part 1

What is a daemon?

Let’s code a Daemon

Using nohup for daemonizing processes

Disowning a process

Unix portability. Autoconf, Automake, Libtool

The Theory

The Practice

Examples

Bibliography. Conclusions

A superficial exploration of Haskell, part 2: Lazy by default

Conclusion

Editorial note

Update

Unix standards and implementations. Unix portability

Lambda Functions in C++

1. Higher-order functions

2. Lambdas in C++

3. Conclusions

Inspecting library calls for fun and profit

Basic Example

Why Is ltrace Useful?

I Have Too Much Output

Attaching To Other Processes

Tracing the Threads and Children of a Process

Profiling

Blaming it on the Culprit Line

Nicer Output

Conclusions

A superficial exploration of Haskell - part 1

Mutability

Reverse Map? Upside Down Map? You decide

Let’s tokenize some kernel code

End of part 1

Facebook Hackathon Live Blogging

3:30PM

3:30PM

5:30PM

First team

Be green, recycle

Explore the underground

GRails

Hiking

I want to ride my bicycle, I want to ride my bike

Why Is `ltrace` Useful?