Welcome to Arkanis Development

Touch table blob detection - normalized spider algorithm

Published

It has been quite a while since I wrote anything technical. Or in general about the stuff I actually do. Well, this post is about stuff I'm currently working on: a blob detection program for an old touch table of my university. We also have a MS Surface table here but I don't really do windows programming for fun any more. Maybe I should try booting the MS Surface table from a Linux live USB stick…

A, well, the old touch table was assembled a few years ago from students I unfortunately never met (before my time). In there are infrared LEDs that flood the table with infrared light as well as a camera that permanently films the surface of the table (from below that is). When a finger touches the table the surface and the finger reflect additional infrared light. This light is then captured by the camera. The purpose of the program I'm currently working on is to detect the infrared light blobs that belong to fingers.

Touch table from below without and with a hand on it

The small figure shows what the touch table looks like for the camera. The left side shows the "background". Basically this picture is what the camera always sees. No mater what is above the table each pixel is at least as bright as in this "background". It's actually the mean image of 60 consecutive images. This way the noise of the camera is filtered out. The right side shows a normal input image with 5 fingers touching the surface and a lot of noise in it.

Just in case someone notices: These images were taken when the table was partially disassembled. I played around with different cameras (another story) and this was a more or less comfortable way to do it. That's the reason why the beamer is almost in the center of the image and only a part of the actual surface is visible. But this was enough to test and develop by so I used a few hundred captured frames as sample material.

The ultimate goal is to throw the right picture into the detector program and getting the coordinates of the 5 fingers in return. Well, for a human this is an easy task. Unfortunately I can't afford to hire someone always watching the table from below and entering the coordinates into a computer with about 17ms delay. So I have to teach computers to do it for me. Turns out our brain does a lot of magic when we see the 5 fingers and it's a lot harder to teach a computer to do something similar. It took me the better part of the last 1½ weeks.

I already tried the same thing on the same table about 1½ years ago. Back then with more "established" approaches. That is filtering the image with a Gaussian blur or doing some temporal filtering (e.g. only count fingers that are there for 5 consecutive frames). These techniques do work but bring in their own set of problems. I never got the Gaussian blur fast enough on one core to filter 30 frames per second on a 640×480 video. And the temporal filtering makes fast movements and short touches almost impossible to detect.

But back then I primarily failed because of two problems:

  • The touch table is not uniformly lit and not uniformly sensitive. When someone touches the table the camera see different brightness values depending on where you touch it. This makes it very hard to use a simple brightness threshold to detect a finger. For example in one area pixels only get brighter than 140 if someone touches the table. In other areas this value can be quite a bit off (e.g. 100 or 190).
  • The arm itself is actually quite as bright as the fingers. Looking for the center of bright areas will also detect the arm as a finger, albeit a gigantic one. In the worst case it almost absorbs the fingers.

Since this first failed attempt I had some ideas to tackle at least the first problem. I thought I could avoid the second problem I wasn't that lucky. So I had to solve it. I tried quite a few different ideas. For example filtering for specific brightness increases followed by matching brightness decreases (I think one can call that the second spatial derivative). And about 2 hours ago I finally succeeded in that endevour. So here is how it works right now…

Normalization

I need to determine some characteristics of the table before actually detecting stuff. You need to know your enemy, right? On my first attempt this only involved two steps:

  • Capture a background image
  • Figure out what pixel on the camera image corresponds to what position of the table surface (haven't done that yet)
The background, the touch map and the differnce between them

To get hold of the differences in sensitivity on the table I added a third characteristic: A map of how strong the brightness changes when someone touches the table. I simply called it the "touch map" (well, naming things is difficult). Basically it's recording the maximal brightness of each pixel while you touch the table every where. With that data we know the brightness value when no one touches the table (the background) and when someone touches it (the touch map). And these values can be different for each pixel.

With these two per pixel values we can "normalize" the brightness of any pixel into a uniform one. If a pixel is a bright as the background the value will be 0 and if it is as bright as the touch map it will be 1 (or 255 for 8 bit integers). Any further code can work on this "normalized" brightness and is independent on the actual sensitivity of the table.

And the sensitivity can vary quite a bit. The second figure shows the background and the touch map as well as the difference between them. I only touched half the visible surface and you can see that area nicely on the rightmost image. Some variations in the sensitivity come from my lazy "touching the table every where". I did this a bit to often so it's not the best touch map but it works.

As extra fine tuning the touch map is blurred slightly. Each pixel is taken as the mean value of a 3×3 pixel square with the pixel in the center. This is fast and gives a nice 1 pixel "safety margin" around especially sharp edges in the touch map. E.g. areas surrounding the white "blind spot" of the table reduce the sensitivity around them. This prevents the normalization from amplifying noise around these areas into insanely bright spots.

I also added a "touch range threshold". A pixel needs a meaningful difference between its background and touch map value. If this difference is below e.g. 10 we don't do any further stuff with this pixel. This prevents the normalization from amplifying noise in regions that were never touched.

Background with noise

After many noise filtering experiments (simple blurring, eliminating specific 1 and 2 pixel patterns, etc.) I settled for a somewhat contra intuitive approach. Usually the background is captured by calculating the mean brightness of several consecutive frames for each pixel. In this mean brightness the camera noise eliminates itself (it's evenly distributed in time). This gives a nice and noise free background.

However when you subtract this mean background from an input frame you get the difference from the background… and a lot of noise. I found myself spending more and more time on filtering that noise. Out of curiosity I started to capture minimal and maximal noise values for the background as well. For each pixel that is the smallest (min) and largest (max) brightness observed while capturing the background. As it turns out this noise is not evenly distributed over the entire table (spatial). And anyway I'm only interested in pixels that are above the noise level of the camera. Otherwise it's very difficult to differ between noise and a valid increase in brightness.

End of story: Right now I'm just capturing the maximal observed brightness when creating the background. This way the background also contains the maximum noise level. Now when we subtract the background from an incoming frame we get a simple noise filter for free. It's not great but actually does the job without resorting to temporal filtering and is as fast as it can get.

The actual blob detection

Stages of the detection pipeline

Whenever the camera sends a new frame it runs though the following pipeline:

  • Original input frame: The stuff from the camera.
  • Subtract background: Remove the brightness we know is always there as well as the noise. Note that the brightness in this frame of the figure is pushed by 32 to make the actual changes more visible.
  • Normalize difference: Use the touch map to map the difference to a brightness uniform across the entire table.
  • Simple blur: Do a simple 3×3 mean blur on the uniform brightness values. This step actually served a different purpose for another idea (smoothing the brightness gradients so slope, high point and low point detection would work better). However it also serves well to suppress some hardcore 1 and 2 pixel noise as well as occasional 1 pixel spikes from the normalization (pretty rare thanks to the same blur on the touch map itself).
  • "Spider" amplification… sorry for the lack of a better name. This step detects pixels that belong to fingers and suppresses pixels that belong to other stuff (e.g. arms).

The figure shows the frame after each of these steps. The order of the images in the figure is left to right and top to bottom.

The "spider" amplification is the part I just finished several hours ago. For each pixel the program looks into 4 different directions. Like the legs of a spider, hence the name. Ok, a spider has 8 legs but this is what I thought of when first visualizing the algorithm. We go e.g. 8 pixels in each direction and compare the brightness value there with the brightness of our own pixel. If the brightness difference between the outer and center pixel is above a threshold (e.g. 75) the "leg" is counted. If it's below the "leg" is ignored.

The new brightness value of the center pixel is then calculated as the mean value of all counted "legs". This alone gets rid of almost the entire arm and leaves the fingers very bright. However it also generates some "ghost" artifacts. Like 4 different versions of the image just a bit darker and offset into each of the 4 directions. This however can be suppressed by only counting center pixels with at least 3 "legs" since the ghosts are created by center pixels with only one intact "leg".

The distance or "leg" length has to be configured to match the maximum blob size. Right now I got the best results with a value of 8 but this is very dependent on the size of the table and resolution of the camera. The last image (lower left) in the pipeline figure shows the results of the spider amplification with a distance value of 8.

Further stuff

All parts of the detector program are quite simple right now (code wise). Therefore I have good hopes to achieve 60 fps on a 640×480 video in realtime. The algorithms themselves at least look quite well suited for SIMD or GPU parallelization. However I would prefer to have it run on a single CPU core. Less data transfer and keeps the GPU and most parts of the CPU free for the game. Yes, the entire thing is only to make the touch table usable for a specific game I have in mind. Let's hope it works out. :)

Now the sun has risen again. It's time to go to bed.

To the Moon

Published

Well, it has been a while since I last wrote something. I overdid the programming stuff a bit during my thesis and I'm currently kind of recovering from that. Rediscovering that programming can be a lot of fun and a rewarding experience (it's about time the fun finally comes back). Anyway I'm not writing this because of programming stuff. This time it's about a game.

There are games I like. For example Deus Ex and the Unreal series. Because, well they were "cool" at their time. I spend way to much time with them as a child and they helped to inspired me to do all that technical stuff I really enjoy today. Then there are other games… games that have a soul, that have something really unique about them. I only know very few of them and it's probably a very personal matter which games you would consider to belong to that class. For me Darwinia is one of them. It really made me think about a lot of stuff and seeing a sorrowing darwinian was a heartbreaking experience.

And then, a day ago, I discovered that there is another kind of game. Games that do something I never thought possible games can do. Games that make you think… no, that let you experience emotions in a way I never imagined possible for a game. A few years ago I read the first (and until now only) book that made me cry: Nation. Yesterday I played the first game that made me cry: To the Moon. It's hard to describe why. Every life has it's own path, it's valleys, it's unique story. There are things you have to do and decisions you have to make. And these two (To the Moon and Nation) are not just a game or a book… they are imprints of life. And now they have become a part of me, probably for the rest of my life.

I'm sorry but that is the best I can come up with to describe the experience. Thanks to these two marvellous pieces of art I feel like I have lived two and a half lives by now. And if that's not true art I don't know what is.

ps.: I small Star Trek remark that just popped into my mind: There is an Star Trek The Next Generation episode (The Inner Light) where they find an old probe. This probe lets Captain Picard relive the last few decades of a scientist of a long dead world. Discovering that their sun was dying, having children and finally realising that their entire culture will die along with their sun. At the end the probe is launched to convey the memories of their culture to the first one who finds it (and that is Captain Picard). I think after the game I felt like Captain Picard at the end of the episode when they found the flute in the probe…

pps.: I bought the game at Good old Games.

Programming without a text editor

Published

I was just thinking about the tools one can use to write programs. Sure text editors are not especially great compared to professional IDEs like Visual Studio. But somehow I had to think about punch cards and that it probably required much more discipline to program with these. One hole punched at the wrong place… to bad, start over with a new card.

Though thought patters only known to unconscious regions of my brain I remembered a nice xkcd comic: Real Programmers. In there someone says "Real programmers use cat". Hm, that is as low level as I could imagine to go with software. But what if even cat isn't there to help you? What if you attack a server and for some unknown reason get a root shell with a full gcc environment but without cat?

gcc -x c -

This will instruct gcc to read a C programm (-x c) from standard input (the last -). Now you can write down your program line by line. Made an error in the line above? To bad, start over with a new program. :)

Well, it's not as cool as using a magnetized needle but I'm sure it's a nice toy for real programmers. Now, I think I just invented another discipline real programmers can compete in: line buffered programming. Thank good I'm not one of them…

ps.: The really real programmers can write line buffered Java.

pps.: I'm currently writing my bachelor thesis. So, please don't be surprised if some craziness lacks out of my skull here and there…

NNTP-Forum now on GitHub

Published

When the NNTP-Forum was first published two of my friends wanted to see it on GitHub. Well, it's there now: arkanis/nntp-forum.

The French developer Mr Xhark helped me to continue development on the project. Thanks to his relentless testing the NNTP-Forum is now internationalized. It supports three languages: English, German and French (by Mr Xhark). It's also quite easy to write your own language file.

Funny how the experience from the long gone Simple Localization Rails plugin help with this PHP project: The i18n code is about 30 lines of code. It always amazes me how little code is necessary if you know what you need. Ironcially the function that searches the HTTP headers for the users prefered language is about the same size. :D

Anyway, maybe you accidentally have an NNTP server running. In that case you can give the NNTP-Forum a try. Be sure to check out the setup and configuration guide.

Printing the LLVM Language Reference Manual

Published

LLVM is a very handy piece of software with a quite long manual for its intermediate language. Since I don't like reading long documents on screen I invested 3 hours to generate and print a nice PDF version of the manual. It's not perfect but still comfortable to read.

Thanks to PrinceXML all that is necessary is a small CSS style sheet with printing styles. I also used the Gentium fonts because they are easy to read on paper. Anyone interested in generating an up to date PDF can download the style sheet archive. It contains the style sheet and the Gentium fonts. Extract it into a directory and with PrinceXML installed this command will generate the PDF:

prince --no-author-style --style llvm-lang-ref.css -o llvm-lang-ref.pdf \
    http://www.llvm.org/docs/LangRef.html

Have fun reading. :)

The plains project

Published

Who doesn't know this situation: You made a note and some time later you need it again. And then it happens: "Where did I put that?!". And the search begins…

Having some stacks of notes flying around my desk I'm very prone to that problem. I've tried some software solutions to this over the last few years: offline wikis, address books (Thunderbird, Opera, etc.), some desktop notes (Tomboy, Opera), files sorted into different directory structurs, tagging, deskop search engines, etc. Nothing really helped me much. It's often quite easy to store data but finding it… well, it didn't work quite well for me.

Please, don't get me wrong. It's not that I can not find something I put into a directory 10 minutes ago. It's just the sheer amount of data that makes it difficult to organize. Many project, many ideas, notes and bookmarks. A small library of sorts, but well, everyone has that today I suppose.

Anyway, I finally took the time to write a little tool to help me organize my stuff:

The plain project

The core idea is the map metaphor. Put everything on a 2D space so it forms some recognizable patterns. Then your spatial sense of direction will help you to get around.

This works very well with classical maps. You know, these things on paper we all used before google maps and navigation systems were around. Once you've found a place on a map it's very easy to find it again. The same principal has been partially used in graphical user interfaces since 1968 (or maybe even earlier, don't know), albeit for other purposes. Well, usability principals is something for another post. Lets just say I've tested out one extreme end of the design space: put everything on a map.

The plain project is a small web application that allows you to put notes and ideas on a 2D space. You can group related stuff into new areas I somehow started to call "plains". Don't ask me why, the name just stuck. Basically it's just a different representation of files (notes and ideas) and directories (plains). However the content is shown directly and the user can move it around on the plains. Even if it sounds stupid, that is the important part: after a little while you build your own map. And since you know every spot it's pretty easy to find something. In an afterthought this reminds me of some memorization methods.

You can give it a try on the sample installation. The code is hosted on GitHub, but be aware: It's a tool for personal usage and therefore not very clean.

A part of my personal plains

How well this map metaphor will scale with large amounts of information I don't know. Right now I've only moved the stuff of the last year or so into my personal plains. Over the next few weeks I'll move over my contacts and complete bookmarks. Especially the bookmarks contain some tricky to sort stuff but only time will tell how well that works. Maybe I can use symlinks in some cases to sort something into multiple categories.

Everything is stored in text files and directories right now. This is primary to keep the data "free" of any application. If the plain project should not work out I don't want to have a big SQLite file laying around. There's also the benefit that I can use the file manager to organize stuff in case the web application doesn't work well (e.g. for large documents nothing beats your text editor of choice).

And this is one of the major problems right now: I used CSS transformation for the panning and zooming. This causes bugs in pretty much any browser I uses. Opera has redraw issues, Firefox jumps around when you click the edit button and Chromium does not allow to scroll in textareas that are off the center. The performance is abysmal in every browser. Right now I use Chromium most of the time.

Further ideas

While I have something I can work with now the map metaphor has massive potential. Not only for small text snippets but for files, pictures, etc. I want to explore that form of usability a bit more but browsers might become to much of a bottleneck. I don't want to store everything on my server either.

Therefore the next iteration will probably be done as a native Linux application based on X11 and OpenGL. This should allow much better controls and other very interesting stuff like organizing hundreds of pictures and files. Maybe its even possible to add normal applications as entries to plains. Well, only time will tell.

Crazy experiments with CELT

Published

Just a few hours ago a good friend of mine and I took CELT out for a test drive. CELT is a new audio codec created by the Xiph foundation. Well, actually we were discussing continuations over an XMPP audio chat and the quality was a bit bad. I mentioned CELT just as a joke but a few minutes later both of us somehow downloaded and compiled the CELT codec.

Basically the idea was the following:

  • Get a sound stream from the microphone
  • Pipe it through the CELT encoder
  • Use netcat to transfer the encoded sound stream over the internet
  • Pipe the received sound stream through the CELT decoder
  • Play the decoded sound on the other computer

Well, it was just for fun but it did work extraordinary well. We grabbed the mic output with the parec command and pacat plays back such a recorded stream. So this command basically plays back whatever your mic receives:

parec --latency-msec=1 | pacat --latency-msec=1

The --latency-msec option makes sure the latency of pulse audio is not to high. If you omit it pulse audio will choose a very high latency (about 2 seconds in our case). Therefore we just set it to a very small value.

Now we put CELT into the game. Fortunately the CELT library form the Xiph foundation already contains a fully functional encoder and decoder program. celtenc in the tools directory of the library is the encoder, celtdec the decoder. We both just compiled the library (with ./configure and make) and used the encoder and decoder directly from the tools directory. With CELT in the game our local test command looked like that:

parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
./celtdec --stereo - - | pacat --latency-msec=1

Now we encode and directly decode the sound stream with CELT. The only thing to watch out for here is the --stereo parameter of the CELT tools. The pulse audio tools use 2 channels by default so the --stereo options is needed to match that. Alternatively you can use the --channels=1 parameter of parec and pacat to make pulse audio just use one channel and then omit the --stereo option.

Anyway, more important is the bitrate. We played around a bit and 50 seems to be quite ok. Lower bitrates still sound good but increased the latency in our case. Probably not because of the codec but because of the audio frame size. We tried to reduce this but this did not work. Anyway 50 KBit/s was a vaule we were happy with.

Now the most important part of all: move the data over the internet. Well, thanks to netcat that is a piece of cake. We first did it in one direction and later on extended it into a real bidirectional voice chat.

For the one who wants to listen:

netcat -l 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

And the one how likes to send its mic sound:

parec --latency-msec=1 | ./celtenc --stereo --bitrate 50 - - | \
netcat [target-ip] 50123

Basically we just put a netcat pair between the celtenc and celtdec programs. [target-ip] is the IP address of the listener computer. 50123 is just a random port number, you can use whatever port you like. Another idea here is, that the listener starts the netcat server and receives the encoded sound stream.

Now, lets make it two way. So it will actually be useful for something. An easy attempt would be to do the same two commands again. Just in the other direction. That is also the best solution we could come up with.

However it seems a bit wasted to use two connections when we just send one data stream and receive another one. We can use one netcat connection for that:

Computer A:

parec --latency-msec=1 | ./celtenc --stereo - - | \
netcat -l 127.0.0.1 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

Computer B:

parec --latency-msec=1 | ./celtenc --stereo - - | \
netcat 127.0.0.1 50123 | \
./celtdec --stereo - - | pacat --latency-msec=1

Now the commands get a bit more complex. We send one encoded sound stream into netcat and decode and play back the received sound stream. I couldn't resist to draw a nice little diagram of this setup:

The data flow of the sound streams between the programs and over the internet

The dashed lines represent CELT encoded data, while normal lines are raw uncompressed audio. The problem with this setup is the latency. As soon as computer A runs the command parec and celtenc will start to record and encode data. Even if no connection has been established yet. The pipe to netcat will buffer this data as long as possible and when computer B connects it will first receive all buffered data. For example if computer B runs the command 2 seconds after computer A there will be a 2 second delay in the sound stream from computer A to B (but only in that direction).

We stopped at that point. It was working quite well and the sound quality was absolutely marvellous. For something hacked together in about 4 hours it's quite well. Most of our time went into working round NAT problems of our routers anyway. Maybe I'll play a bit around with the library and write some basic UDP transport system. The API looks quite easy to work with. :)

NNTP Forum download and installation guide

Published

A while ago I wrote a forum like frontend for NNTP newsgroups. Since that I was asked two times to make it publically availabe. Today I finally had the time to clean up the code and write some documentation. The result is a small project page with the download: NNTP Forum project page.

There you will find the download and a small installation guide. If something does not work or you need help on the setup just drop me a mail or leave a comment. Right now the procedue is anything but perfect because the NNTP Forum was not made to be portable. Instead it was made to make the best use of the available infrastructure. While this keeps the internals simple the setup suffers a bit. Sorry for that.

NNTP forum - a forum like NNTP frontend

Published

The NNTP forum is a modern frontend to newsgroups. It's not a real forum on its own but rather displays the messages of a newsgroup server like a forum does.

The message list of a topic with the form to post a reply

Here's the short overview:

  • Presents a newsgroup server as a forum
    • First displays all newsgroups on a server
    • Then all topics in a newsgoup
    • And all messages in a topic
  • Post new topics and messages
  • Supports attachments
  • The tree structure of the messages is preserved
    • You can answer to a specific message
    • You can hide all answers to a specific message (collapse a topic branch)
  • Markdown is used to display a message as HTML
  • Automatically collapses large quote blocks (e.g. the quote of the previous message)
  • Highlights unread messages
    • The unread tracker data is stored on the server, one small file per user
    • The tracker data does not grow over time but only with the number of users (only the 50 newest messages in each group are tracked)
  • Provides freely configurable newsfeeds for the newsgroups

This is the basic stuff. Right now the forum is optimized for the environment of my university. Therefore the authorization of users is handled in an unusual way:

  • Uses HTTP authorization to get user credentials: The authorization method can be feely configured in the webserver, e.g. Apache. Right now we use an LDAP lookup. The downside of this is that the user is always asked for credentials. Guest access would need to be whitelisted in the virtual host config with a proper guest user. This guest user credentials can then be used by the NNTP forum to access the newsgroups.
  • When posting a message the username is translated to a display name with an LDAP lookup.

Ok, now to the technical stuff:

  • Written in PHP 5.3
  • Messages are parsed per line so almost no memory is required (important for large attachments)
  • Caching is used extensively to reduce load on the NNTP server
  • The frontend does not copy the newsgroup content to a new database
  • Almost no new data is stored in the frontend (except the unread tracker data and the cache)
  • Supports encrypted NNTP connections
  • URLs are rewritten to nice and short URLs using Apaches mod_rewrite

The NNTP forum is up and running at my university since serveral weeks. Unfotunately there is no demo page because you would need an account of my university. You can view some more screen shots on the projects page. If you're interested and want to play with the software just let me know. Leave a comment or drop me a mail. I usually make my projects publically available but the project is optimized for a specific deployment right now. I'm a bit ill from releasing projects without getting any feedback. Therefore I will only take the time to generalize the code base if there is someone who wants it.

Why another NNTP frontend?

My university uses newsgroups to announce official information. These newsgroups can also be used by students to discuss different topics (pretty much everything). Unfortunately the acceptance among the students wasn't very high during the last years. Many students (me included) don't know newsgroups when starting to study. For some people it can be hard to configure the newsgroups in their mail client because our university uses certificates and its own certificate authority. Thanks to that Thunderbird just refuses to accept the news server until you import the certificate authority in a somewhat complicated way. Its a bit disappointing that the result after all that looks just like a bunch of ordinary mails.

People could use the Web-News frontend for the newsgroups as an alternative to a real mail client. However Web-News is not really the most up to date software and the internet changed a little bit since it received its last major update. This doesn't make Web-News a bad NNTP frontend. It's just not perfect for new students that are used to internet forums and facebook. The userbility just isn't as good as it could be for young students.

Because of that not much students checked the newsgroups regulary. Important news are not noticed, lectures are not visited because only a few know they exist… well, I exeggerate quite a bit but this was the direction we were heading towards. These acceptance problems lead to a discussion about the future of newsgroups on our university.

One idea was to replace the newsgroups with a forum. However newsgroups are already a forum. There are some newsgroups (forums) with threads and messages. It's just the presentation in mail clients and Web-News that make newsgroups look “old”. To test that I created a prototype of the NNTP forum within two days. It was read only and did only very simple and inefficient message parsing. But it was already quite intuitive to handle and much more like forums people a used to today.

Finished my practical term

Published

It's finally done! I submitted my practical term report yesterday. The 6 months at the super computing center Stuttgart (HLRS) were a very interesting time. My task there was to port a computational fluid dynamics (CFD) code to graphics processors. The CDF code was from the Institute of Aerodynamics and Gas Dynamics of the University of Stuttgart.

The people there were very helpful and tried to explain CFD to me. This topic was (and largely still is) totally new to me. It's very interesting to be a programmer in a world of physics and mathematics. I can highly recommend spending 6 month in a field totally new to you. It's a great experience to see the world and problems from different point of views. But just one hint: Don't start with fluid dynamics. ;)

The part of porting the code to GPUs turned out to become a major bug hunt. I hunted some bugs for up to 3 weeks. However I wasn't desperate enough to give them names… so it wasn't as bad as in the IE 6 days.

The CFD code called “Strukti” is written in Fortran and was not multithreaded when the project started. One of the super computers of the HLRS contains some GPUs as calculation accelerators. The basic idea of the project was to use these for Strukti. Since the GPUs are nVidia Tesla chips CUDA was the main focus on the GPU side of things. So for 6 months I tried to understand CFD, learned Fortran, CUDA and a whole lot of other stuff.

Fortran is a very interesting language. Despite its somewhat dusty reputation is's quite alive. Some ideas behind the language really fascinated me (modules, optimization, pointer handling, …). While the ideas are nice the language itself shows the long tradition of backward compatibility and compromises. Basically its like C with a lot nicer syntax and with its ugly corners at other places. But thanks to its syntax it sometimes even felt like writing Ruby (but only for a very short time). I know some people will scream when they read this but well, I'm tired of detailed language comparisons.

Learning CUDA was interesting because of its focus on the hardware. The language itself is not very impressive in my opinion but the GPU architectures are. It's very interesting how they evolved parallel to the CPUs but share some of the same thoughts. I wrote about that in my background studies section of the report: GPGPU origins and GPU hardware architecture. I was somewhat tired of the topic when I wrote the report so it's unfortunately not my usual level of writing. But I don't have the time to rewrite it in a proper way.

If you're interested in what I really did feel free to look into my report: Evaluation of CUDA Fortran for the CFD code “Strukti”. Its a bit better since I wrote it first. It covers how we mixed CUDA and Fortran with the CUDA Fortran PGI compiler extension. The compiler bugs that consumed most of the projects time are also listed and explained in detail.

Recording of the project presentation held in German

I presented the project at the institute some weeks ago. You can watch the video here or download it. A high quality version is available at events.mi.

It's a basic introduction into general purpose computations on the GPU as well as CUDA Fortran. It also covers the bugs encountered during the project. The discussion towards the end is more about how one can manage the the complexity of the current hardware. This is the more interesting part in my opinion. GPUs and CPUs are mixing on other platforms like notebooks and desktop systems, too. Therefore this might be interesting for other programmers, not only for high performance computing people.

Looking back it was a very interesting time. I learned much, gained insight into the scientific world and worked on some very interesting super computers. I don't regret taking that choice. :)

Thanks for scrolling down all the way, it can get quite lonely here…
Anyway, looking for older entries? Want to know more? Take a look at the archive.