Benchmarking CouchDB (2)

April 15th, 2008

This is a plot of the amount of documents created in a bulk update at the same time against the average amount of documents created per second it yields.

On the App Engine lock-in

April 15th, 2008

Some people are afraid that Google App Engine won’t be a really big success because it’s a lock-in. Give it a month and no doubt there will be a project which allows you to run AppEngine applications on your own servers, which (in my opinion) is even more interesting than AppEngine itself.

Benchmarking CouchDB (1)

April 2nd, 2008

I’ve written a small benchmark for couchdb to test it’s document creation performance. A script creates $N$ documents in total using bulk update to create $B$ at the same time with $T$ concurrent threads. The following graph show the time it takes to create an amount of documents against that amount of document for different values of $B$ with $T=1$.

And for $T=2 (two concurrent threads. Tested on a dual core machine)

The values of B are 1, 2, 4, 5, 8, 11, 16, 22, 32, 45, 64, 90, 128, 181, 256, 362, 512, 724 and 1024

As you can see, a higher value of $B$ causes the graph to shift to the right which means more $N for the same time. Bulk update really does make a difference. Or non-bulk-update really sucks. Also adding threads does help a bit, but not as much as expected.

There are some more interesting graphs to plot ($B$ against $\overline {N \over \Delta T} $). More graphs tomorrow.

(For those interested, the raw data from which these graphs were plotted.)

CouchDB document creation performance

March 30th, 2008

CouchDB is a non-relational database which uses MapReduce inspired views to query data. There are lots of cool things to tell about its design, but I rather want to talk about its performance.

Today I’ve been busy hacking together a little script to import all e-mails of a long e-mail thread into a couchdb database to write views to extract all kinds of statistics. I already imported these e-mails into a MySQL database a few months ago, but was quite disappointed by the (performance) limitations of SQL. The e-mail thread contains over 20,000 messages which weren’t a real problem for MySQL. When importing, however, couchdb was adding them at a rate of only a few dozen per second with a lot of (seek)noise of my HDD.

So I decided to do a simple benchmark. First of, a simple script (ser.py) that adds empty documents sequentially. It’s averaging 16 per second. It occurred to me that couchdb waits for a fsync before sending a response and that asynchronously the performance would be way better. A simple modification to the script later (par.py) it still averaged 16 creations per second.

I guess, for I haven’t yet figured out how to let straces tell me, that it’s the fsync after each object creation which causes the mess. couchdb itself doesn’t write or seek a lot, but my journaling filesystem (XFS) does on a fsync.

Can anyone test it on a different filesystem?

Update Around 17/sec with reiserfs.

Update I had some trouble with the bulk update feature. I switched from svn to the 0.7.2 release. I got about 600/sec, which dropped to a steady-ish 350/sec when using sequential bulkupdates of 100 docs. Two bulk updates in parallel yield about 950/sec initially, dropping to 550/sec after a while. Three parallel updates yield similar performance.

Interrupting a select without a timeout

March 11th, 2008

select is a POSIX syscall which allows you to wait on several different filedescriptors (including sockets) for the event that they won’t block on write; won’t block or read or are in error. This syscall is very convenient when you’re writing a server.

When I want to shutdown an instance of the server, I have to interrupt the select. I have yet to find a satisfying way of doing this. At the moment I create a pair of linked sockets with socketpair. I include one of them to the sockets on which to block until there is data to read in the select call. To interrupt, I simply write some data to the other socket which will cause data to be available on the socket which in turn will interrupt the select.

There must be a more elegant solution.

wlan0_rename

March 8th, 2008

If udev has problems assigning wlan0 to wlan0 (as given by the kernel) and instead lets you end up with an ugly wlan0_rename or alike, you might want to check whether you’ve got stray udev config files in /etc/udev/rules.d/ like 70-persistent-net.rules.

X61s (2)

March 7th, 2008

And here are some pictures of my X61s.

X61s (1)

March 6th, 2008

I’ve been the lucky owner of a Thinkpad X61s for a bit more than a week now. It’s a light and small 12.1 inch notebook. It’s structurally very solid and has got almost the same full size keyboard as my 14.1inch T60. (The enter, tab, shift and alike keys are shortened in width).

The installation of Gentoo went quite smooth. For those interested the xorg.conf, make.conf and kernel .config I use.

Internal mic/speaker and external jacks; video; DRI and AIGLX; USB; PCMCIA; wireless; fingerprint reader and ethernet all seem to work just fine. I haven’t tested the firewire, ssd slot, bluetooth and n-capabilities of the wireless yet.

In comparison with my T60 the volume and the backlight buttons aren’t hardware controlled. Gnome recognizes the volume buttons, but not the backlight ones. I’m still working on those.

ssmtp segfault in sendmail on Gentoo

February 26th, 2008

Disabling the md5sum useflag might fix it.

Thanks to Bram for the tip.

Fosdem (2)

February 25th, 2008

I have just returned from Fosdem 2008. The trip by car to Bruxelles was by Dutch standards quite long, 3 hours. Volker, who did the hotel bookings for the FSFE, was kind enough to also book a room for me and my brother. For the 3-star hotel, which normally would cost you 200,– per night, we only had to pay about 70,– p/n. I did expect group discount, but that it would be so enormous was a surpise. I’m not complaining though :).

After the check-in we went for a quick diner and straight of to the Delirium Café for the “Beer Event”. The café, even though it’s the largest one I’ve ever visited, was barely big enough for the masses of geeks. One could buy a ‘red dot’ from one of the organizers for 20,– euros. With it you can order any beer (one of the 25 on the tap) at the bar without paying until you spend the 20,– euros. This wasn’t checked though. I feel it hard to imagine, and my head felt even harder the next morning, that one would consume more. On that next morning the real magnitude of the conference became apparent. The first talk, at a large (really, large) college room at the Free University of Bruxelles, was attended by at least a thousand if it weren’t two-thousand people.

The talk about Perl 6 was the most interesting for me. I didn’t really like Perl <5 primarily because of having too many ways to do the same in exactly the same way but with a different syntax. I knew that Perl 6 was a total and backwards incompatible redesign of the language, build on top of a generic and good virtual machine called Parrot. Parrot, which I hadn’t given a proper look yet, turned out to be a lot greater than expected. You write support for a new language in Parrot by writing in a subset of Perl 6, which with it’s new Regular Expressions and specializations (tokens: regex without backtracking, etc), was looking very suited for it.
Except for all the new syntactic very very sweet sugar (on which I won’t (yet) elaborate) they added in Perl 6, the greatest one (which is actually more of a Parrot thing) is being able to extend Perl during runtime: writing new parser rules. One application is being able to define ‘!’ as a faculty operator. I’m itching to play with it.

Another very interesting talk was the one about Gallium3D, which is an effort to rework the 3D API on Linux and Xorg for 3D accelerator drivers. It primarily abstracts most of the operations of the video card to being a very specialized language processor and not entirely surprising makes heavy use of the llvm. The abstraction was effective enough to being able to take a driver originally written for Linux, rework it for Gallium3D and actually being able to be run on a Windows version of Gallium. That it runs on Windows really can’t bother that much people. That it also now runs on Cell’s SMP’s, can. The speaker joked that one finally can play 3D games on your PS3 [ on Linux ].

Most of the talks I attended were pretty interesting. Most of the speakers, though, weren’t that great speakers. The thing I missed most with FOSDEM was that there was no easy way to get in touch with people of a certain project. There were booths of a lot of projects, but there the people behind the booth were just standing there to show you stuff in the hallway that was too narrow. There were separate rooms for projects where there were talks given about the project, but there was no real time in between. A few hours nothing but people of the project meeting in a project room would have been great.

Miscellaneously, the OLPC is a great toy; the EEE is even uglier than it looks on the web; the MacBook Air is even thinner than you think and the FSFE people with whom we stayed at the hotel were great. I could tell a lot more, but quite frankly I’m very tired and sleep calls.

Fosdem (1)

February 22nd, 2008

Tomorrow I’ll travel the short distance to Bruxelles to visit Fosdem. I’m pretty excited :).

Codeyard 2008

February 12th, 2008

Codeyard is a project of the RU to stimulate high school students for Free and Opensource Software in the Netherlands. Primarily they offer hosting and guidance for setting up projects. They also organize (great!) monthly meetings at the university where participants meet.

To motivate people to join, they organize a yearly award sponsored by Capgemini for the best project. With a friend, Noud, and my brother Bram we participated in their first year (2006) and won the award. This year, I have the honor to join as a member of the jury.

If you’re interested, you can visit one of the “community days”. The next one is this Saterday. There will also be a presentation by the people who cracked the “OV-kaart”, which should be quite interesting.

Evolving the Object Paradigm

January 31st, 2008

Kaja is writing a series of articles on the shortcomings and solutions to the current object paradigm. Very interesting.

Bye bye Reiser4

January 23rd, 2008

A few days ago my root partition (formatted Reiser4) corrupted on my notebook. [ the usual IO hangups and nasty output in dmesg ]. Probably due to the usual wear and tear a notebook has to suffer or a faulty suspend cycle causing bogus IO. Something I suffered a few times before and didn’t think it would be a great deal. This time, though, fsck.reiser4 said it was all ok. That meant I was pretty screwed, for I knew it didn’t work correctly.

I lended a USB hdd, booted up to a fallback installation on a separate ext2 partition and tried to copy over everything to the USB hdd. It was quite tricky to copy over as much as I could and remembering the point where it started to crash when reading it. Luckily, I salvaged my whole /home. /var, /bin, /usr/share and a lot of other trees weren’t that lucky.

Formatted to XFS, copied everything I got back to the HDD and copied a Gentoo stage 3 tarball over it. A stage 3 tarball contains a minimal installation to which can be chrooted and then booted and from which the rest of the system can be build: the usual method to install Gentoo. I didn’t lost my world nor /etc/make.conf file. A small script later I got portage re-emerging every package I had installed on the system. Still 200 to go at the moment, but at least I’m now in a partially functioning gnome desktop, which is a lot more usable than TWM (ugly default WM of Xorg).

XFS performs quite well. It’s latency under load is a lot smaller than Reiser4’s. (It’s a pity I haven’t yet come to try the new patches in mm to help Reiser4 a bit with that problem. And also becasuse Reiser4 seems so close to inclusion, reading Andrew’s merge plans). In contrast, XFS sucks at handling a lot of small files compared to Reiser4. This is all just a feeling though. I haven’t tested anything. The most important characteristic of a FS, though, is only apparent after long use: the influence of fragmentation. Having looked around a bit, btrfs seems interesting.

On a sidenote on latency: my mom runs Ubuntu with EXT3 and even though EXT3 sucks in practically every single performance benchmark it has seem to got a superb responsiveness. Ah, 150 packages to go.

Linux 2.6.24-bw-r34

January 20th, 2008

Again an update for my bw-tree. There isn’t a tree that includes reiser4 and TuxOnIce without a lot of other bloat, so I created one myself.

Download the big diff or the seperate patches broken out.

Note I pulled in new patches for reiser4 from the -mm tree against -rc8 which should fix the Reiser4 flush problem a bit.

Update Patched against 2.6.24 stable. New TuxOnIce patched added and the genpatches. Please note that I haven’t tested Reiser4 thoroughly enough on this version.

2008

January 1st, 2008

Happy newyear!

Tickle

December 28th, 2007

Tickle is a small Python serializer like Pickle. It however aims at generating smaller output:

>>> len(tickle('hello'))
7
>>> s = StringIO.StringIO()
>>> pickle.dump('hello', s)
>>> len(s.getvalue())
13

Though the difference is and remains quite small, this alone is useful for serialization of small things in the case of for instance RPC. However, usually you already know what kind of data to expect and you don’t really bother about the type information. This can be done by specifying a template:

>>> obj = []
>>> for i in xrange(100):
       obj.append((i, str(i)))
>>> len(tickle(obj))
629
>>> len(tickle(obj, template=(tuple, \
   ((tuple,((int,), (str,))),)*100)))
390

(Instead the *100 an iterator could be constructed, but that would clutter the example even more than it already is.) In comparison:

>>> s = StringIO.StringIO(); pickle.dump(obj, s)
>>> len(s.getvalue())
1680

One big disadvantage of Tickle is speed. Pickle has got a nice C implementation, which is quite fast. Psyco helps a bit but not really enough for really big things. Even more so pickle is a bit smarter: it builds a LUT for instances to avoid duplicate data. However, in the situations where Tickle will be used (by me at least) that isn’t too big of an issue.

You can download tickle.py via gitweb.

Merry Christmas

December 24th, 2007

For happy a christmas!

linux 2.6.24-bw-r12

December 20th, 2007

There isn’t a (stable-ish) tree that includes reiser4 and TuxOnIce (suspend2), so I made one myself, based on 2.6.24-rc5-rc67.

You can grab it as one big patch, bw-r12-for-2.6.24-rc5-rc67.diff.bz2, or broken out: bw-r12-for-2.6.24-rc5-rc67.tar.bz2.

Java crashing on xcb_xlib_unlock.

December 17th, 2007

When Java applications crash on linux with xcb_xlib_unlock: Assertion `c->xlib.lock' failed you should upgrade to libxcb-1.1 and add LIBXCB_ALLOW_SLOPPY_LOCK=1 to the environment.