Welcome to Arkanis Development

Simple Chat: the details

Published

This post describes the technical details of the design and implementation of the Simple Chat project I already wrote about. The idea for that kind of a chat lingered in my mind for some time now. What would an absolutely simple chat require and look like on the technical level? Well, about 20 lines of PHP and about 40 lines of JavaScript later I had an answer and a chat that doesn't need Flash, Java, a database or any other fancy stuff. In this post I will explain the basic workings behind it as well as the HTML, PHP and JavaScript code. If you're curious you can take a look at the example.

Basic idea

For small stuff (e.g. a chat for people who watch the live-stream of an event) only basic chat functionality is needed: sending a message and a list with messages from everyone. I already build such a chat during the first GamesDay project but it used a SQLite database back then and had it's troubles. Being on the simplicity trip lately I refined the concept and made everything work together nicely. So this is what I came up with:

  • Every new message is send to a website on the server via POST. There some PHP lines insert the message into a JSON text file that contains the last 10 messages (older messages are discarded). Optionally the message is also appended to a chat log.
  • Every client requests this JSON file every 2 seconds and displays new messages. If no new messages came in since the last request the webserver will usually answer with an 304 "Not Modified" response and we only get the data if it really changed. HTTP caching at work. :)

So there is absolutely nothing overwhelmingly complex about this chat and every component involved contributes something to the functionality… even the webserver itself which is often forgotten in "dynamic" stuff like this. This design has several advantages:

  • The very frequent polling requests are very cheap since only a static text file is served. This allows the webserver to make effective use of HTTPs and its own caching abilities.
  • Posting a new message is a bit more expensive since PHP is involved but it always is a constant amount of work. There's nothing that can eat up memory and if you don't record a chat log disk space stays more or less constant as well.
  • You can adjust the chat to different situations easily: If you expect many readers you can increase the polling interval to 4 seconds, effectively reducing the polling load to 50%. If very much is going on (more than 10 new messages in 2 seconds) you can increase the number of messages stored in the JSON file. However keep in mind that this is not a heavy traffic chat by design. If there is something big on the road go for the real stuff like IRC.
  • No maintenance for some kind of database or application server needed. No strange extensions… just basic PHP.
  • It's so simple that you can add it to almost any kind of existing website. One PHP page, that's it.

However there are also some things to look out for:

  • Every webserver today is multithreaded and therefore we have to be careful with the text file. If the timing is bad one thread could overwrite the new message of another one. With few new messages coming in this isn't much of a problem but with 50 simulated clients (about 12 new messages more or less at once) such a "lost update" already occurred roughly every 10 seconds. Fortunately PHP provides easy file locking we can use to avoid these things.
  • There is no easy way to get a proper list of users that are in the chat. To maintain such a list we would have to add a bit of PHP to every polling request which would kick out the HTTP caching, making the polling requests more expensive.

HTML skeleton

But enough about theory, lets get started on the work. First we create a little HTML code we later on extend with PHP and JavaScript. Since this stuff it meant to be an example for you on how to build your own little chat when the time comes we'll only use the basic code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <title>Simple chat</title>
</head>
<body>

<h1>Simple chat</h1>

<ul id="messages">
    <li>loading…</li>
</ul>

<form action="<?= htmlentities($_SERVER['PHP_SELF'], ENT_COMPAT, 'UTF-8'); ?>" method="post">
    <p>
        <input type="text" name="content" id="content" />
    </p>
    <p>
        <label>
            Name:
            <input type="text" name="name" id="name" value="Anonymous" />
        </label>
        <button type="submit">Send</button>
    </p>
</form>

</body>
</html>

It's a basic XHTML 1.0 strict page with a list (ul#messages) that will contain all messages as well as a form to write and send new messages. There are however some details:

  • The message list contains one "loading…" list item. This is necessary since every list have to contain at least one item. But don't worry, we will remove this one as soon as the page loads. If you don't need your website to validate feel free to omit this list item.
  • The form should send any data to the page itself (that's what $_SERVER['PHP_SELF'] contains, a path to the current page). This way we can just insert our PHP code at the top of this page and don't need an extra file. We will do the actual send with an background POST request later on but it's nice to have the URL in the HTML code to clarify things. Note that we escape any special HTML characters in the URL in case someone gets the idea of appending HTML code to the URL (thanks to Craig Francis for the heads up). Again, if you have other plans or another structure feel free to change things.
  • The two input fields are strait forward: one for the content of the message and one for the name of the user. Since we provide a default value for the name (Anonymous) we don't need an "you forgot to enter your name" error. If someone clears this field on purpose he or she can expect that the message won't be send.
  • There are some details like the UTF-8 encoding (use what you like but there are not many reasons against UTF-8) and the type attribute of the button element (needed for IE 6 to understand that we want to submit the form) but they are not the main point here.

With this HTML skeleton we have the basics in place. The message list will be updated by the polling requests and when the form is submitted we will kick off an POST request in the background sending it to the server.

Server side code

With the HTML code in place lets build the message buffer that stores the last 10 send messages in a JSON file. Before we dive into the code two things:

First the clients get the buffer every time it was modified, meaning that one or more messages have been added to the buffer since the last polling request. We could just append all messages of the buffer to the message list (ul#messages) but this would add old messages multiple times. So the clients need a way to know exactly which messages in the buffer are new.

This can be achieved by numbering all incoming messages (like an autoincrement key in a database). The client then only needs to remember the ID of the last message it added to the list and can ignore any messages in the buffer with an older ID. If the buffer contains no messages we simply start at an ID of 0.

Second in our PHP code we need to read the old buffer to get the 9 old messages and to calculate the next ID used for the new message. We then append the new message, remove any overflowing messages and then write the new buffer to the JSON file. Now this is a typical race condition where actually two things can go wrong. The well known lost update where some other thread reads the old message buffer before we could write down our new one, effectively overwriting our added message. However it's also possible that another thread tries to read the message buffer file while we're writing to it. In that case it will fail and this can look like an empty file, making it restart at an ID of 0 and effectively blocking all clients from updating (since all messages after that get lower IDs again and are therefore ignored). I didn't checked for any lost updates but I observed the second problem when a little test script put the chat under some load (about 50 simulated clients, each one posting a message randomly every 8 seconds).

If you couldn't follow every detail of that: it isn't a problem. Race conditions tend to be hard to understand. The bottom line however is that we need to lock the message buffer from the read until the write. Thanks to PHP this isn't hard but adds some code lines.

Now to the code itself:

<?php

$messages_buffer_file = 'messages.json';
$messages_buffer_size = 10;

if ( isset($_POST['content']) and isset($_POST['name']) )
{
    // Open an lock the message buffer
    $buffer = fopen($messages_buffer_file, 'r+b');
    flock($buffer, LOCK_EX);
    $buffer_data = stream_get_contents($buffer);
    
    // Append new message to the message buffer
    $messages = $buffer_data ? json_decode($buffer_data, true) : array();
    $next_id = (count($messages) > 0) ? $messages[count($messages) - 1]['id'] + 1 : 0;
    $messages[] = array('id' => $next_id, 'time' => time(), 'name' => $_POST['name'], 'content' => $_POST['content']);
    
    // Remove old messages
    if (count($messages) > $messages_buffer_size)
        $messages = array_slice($messages, count($messages) - $messages_buffer_size);
    
    // Rewrite and unlock the message file
    ftruncate($buffer, 0);
    rewind($buffer);
    fwrite($buffer, json_encode($messages));
    flock($buffer, LOCK_UN);
    fclose($buffer);
    
    // Append message to log file or omit it if you don't need it
    file_put_contents('chatlog.txt', strftime('%F %T') . "\t" . strtr($_POST['name'], "\t", ' ') . "\t" . strtr($_POST['content'], "\t", ' ') . "\n", FILE_APPEND);
    
    exit();
}

?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
…

First we check if we received data from our form (content and name fields were send via POST). If so we got a new message we will append to our message buffer. In exchange we kick out the older message if the buffer is already at its maximal size.

The code only reacts if we really got a new message and serves the HTML content after it on normal GET requests. When a new message comes in we open, lock and read the the current message buffer, append the new message, cut of any old messages to maintain the buffer size and overwrite the file with the new buffer. Now on to the interesting parts:

  • flock is used to exclusively lock the message buffer file so that no other thread can read or write to the buffer until we're done. However because of flock we have to stick to functions that work with file handles. Therefore stream_get_contents is used to read all the file content (could also be done by calling fread in a loop) and ftruncate, rewind and fwrite overwrite the file with the new message buffer encoded in JSON.
  • We want to read and rewrite the buffer in one operation. Therefore fopen is called with the r+b mode (read and write in binary mode on windows machines). This is the best fitting mode but unfortunately the buffer file is not automatically created if it does not already exists. Therefore it's important that you create the messages.json file by hand or add some code to automatically create it if you want it to be (e.g. a call to touch).
  • If the buffer is empty we start with an empty array as the buffer data, and if we have an empty array we also start with an ID of 0. This way it will automatically be filled by the script. In fact you can clear the message buffer file (e.g. with the truncate linux command) to reset the chat history.

    Unfortunately we also get this behavior if the file does not exist or is not read and writable by the webserver. In that case you have to create it an set the proper rights (read/write for the webserver). If you have no idea what this is about just create an empty file named messages.json and set its permissions to 666 or ugo=rw.

  • The last call to file_put_contents appends the message with the current timestamp and user name to a chat log. It just appends a new line to this log and because appends are atomic we don't need to lock anything.

That code gives us a nice and small messages.json file looking like this:

[
    {"id": 0, "time": 1282167333, "name": "arkanis", "content": "hello world!"},
    {"id": 1, "time": 1282167335, "name": "tester", "content": "hello moon"},
    …
]

Spaces and line breaks were inserted for clarity. Usually everything will be in one line with no wasted spaces. The names of the keys are important because we will use them to access the message data with JavaScript on the client side.

A word on security

Also note that no real input validation is done on the server. Unfortunately I have to disappoint any paranoid reader, we will not do some overwhelmingly complex filtering but will just make sure that every incoming data is properly encoded when it's going out. json_encode is one of those steps and no data can break out of it. In the chat log we use tabs as field separators and therefore we replace any tabs in the name or content with spaces. We will do some further escaping on the client later to prevent XSS attacks.

Since we use proper escaping you should also disable PHPs Magic Quotes. It will only mess up the original data.

The magic in between: client side code

We will use the jQuery framework to make the JavaScript code more fun to write. But still this will be a little bit bigger bunch of lines than the PHP stuff. To not scare you away with one large code block I'll divide it into several small blocks, one for each purpose.

First the usual stuff when using jQuery: include the jQuery framework itself and then do something as soon as the DOM tree is ready (using $(document).ready()). Our first action as the new ruler of the client is to remove the "loading" list entry:

<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <title>Simple chat</title>
    <script type="text/javascript" src="jquery.js"></script>
    <script type="text/javascript">
        // <![CDATA[
        $(document).ready(function(){
            $('ul#messages > li').remove();

            // code to send a message goes here…

            // placeholder for polling code…
        });
        // ]]>
    </script>
</head>
<body>

Sending a message… or: say something

For this occasion we high jack the submit event of the message form to kick off an POST request in the background. We then insert a "pending" message into the message list to let the user know that we actually did something. "pending" because we send the message but not yet received new messages from the server. As soon as the next bunch of messages comes in we will remove this pending message.

$('form').submit(function(){
    var form = $(this);
    var name =  form.find("input[name='name']").val();
    var content =  form.find("input[name='content']").val();
    if (name == '' || content == '')
        return false;

    $.post(form.attr('action'), {'name': name, 'content': content}, function(data, status){
        $('<li class="pending" />').text(content).prepend($('<small />').text(name)).appendTo('ul#messages');
        $('ul#messages').scrollTop( $('ul#messages').get(0).scrollHeight );
        form.find("input[name='content']").val('').focus();
    });
    return false;
});

If name and content are blank we just stop. It might be a good idea to do something to hint the user that one or both of these fields are missing but since we have an default value for the name the user would have to deliberately clear the name filed. In that case the user can expect it to not work as when trying to send an empty message.

After kicking off the POST request we then stop the event processing with return false;. This will keep the page from being reloaded by the browser.

As soon as we know the POST request succeeded (our callback runs) we insert the pending message into the list. Actually we first build an li element with the class of pending and set the message content as it's text. Since we do this with the text() method it's clear this string can not contain other elements and therefore every HTML stuff is escaped automatically (jQuery actually inserts a textNode into the DOM tree and the browsers do the escaping them selfs). Into this li element we insert an small element with some additional information such as the users name which is also inserted as text. Now we got something like that:

<li class="pending">
    <small>Anonymous</small>
    An example message text
</li>

At the end of the line we use appendTo to… well, append the build li element to the message list.

After this we just scroll the list down to show the new message and clear the message text field of the form so the user can start writing the next message.

Receiving messages… or: hear something

As explained above we will ask the server every 2 seconds for the messages.json file and insert any new messages into the message list. To do this we first create a function that does our GET request and make sure it's called every two seconds:

var poll_for_new_messages = function(){
    $.ajax({url: 'messages.json', dataType: 'json', ifModified: true, timeout: 2000, success: function(messages, status){
        if (!messages)
            return;
        
        $('ul#messages > li.pending').remove();
        var last_message_id = $('ul#messages').data('last_message_id');
        if (last_message_id == null)
            last_message_id = -1;
        
        for(var i = 0; i < messages.length; i++)
        {
            var msg = messages[i];
            if (msg.id > last_message_id)
            {
                var date = new Date(msg.time * 1000);
                $('<li/>').text(msg.content).
                    prepend( $('<small />').text(date.getHours() + ':' + date.getMinutes() + ':' + date.getSeconds() + ' ' + msg.name) ).
                    appendTo('ul#messages');
                $('ul#messages').data('last_message_id', msg.id);
            }
        }
        
        $('ul#messages > li').slice(0, -50).remove();
        $('ul#messages').scrollTop( $('ul#messages').get(0).scrollHeight );
    }});
};

poll_for_new_messages();
setInterval(poll_for_new_messages, 2000);

There isn't much about the GET request itself. The ifModified: true parameter makes sure that we only get message data if the message data has actually been modified. We also set a timeout of 2 seconds because after that time we start a new GET request anyway.

The message handler itself is aborted if our incoming data (messages) is undefined. This happens when the data was not modified. In case we got new stuff the action begins:

  • First remove all pending messages from the list.
  • Then figure out the ID of the last inserted message. We store the ID in the message lists DOM node later on so here we read out the nodes data. This is necessary because a local variable would not work. Every function called by setInterval starts within the global scope. A global variable could do the job too but storing the data where it belongs to is always a bit more elegant. If there wasn't any ID set yet we start with -1 which nicely fits to the start value of 0 on the server.Every incoming message will then be newer and added to the message list.
  • Next we cycle though every message in the received data and insert it into the message list if it really is newer than the last known ID. We insert the new message in the same way as we did insert the pending message (plus the time of the message) and therefore all incoming text will be escaped automatically.
  • At long last we make sure that the list only contains 50 entries at max. slice selects the first up to the 50th element counted from the end of the list and removes them, leaving the 50 last ones alive. The prevents browsers from slowing down because of lists that contain hundreds or thousands of messages. The final line scrolls down the message list to the bottom so the user can read the new messages.

And thats it for the basic functionality. Throw it at a PHP enabled webserver and create the messages.json file (don't forget: the webserver needs read and write permission). You basic own chat should now running along smoothly.

Some CSS for the eye

While the chat now already works perfectly it might look a bit strange. Because every website is styled differently I suggest you leave the styling of the chat to your own creativity. However if you just want a quick starting point take these lines of CSS:

<style type="text/css">
    ul#messages { overflow: auto; height: 15em; margin: 1em 0; padding: 0 3px; list-style: none; border: 1px solid gray; }
    ul#messages li { margin: 0.35em 0; padding: 0; }
    ul#messages li small { display: block; font-size: 0.59em; color: gray; }
    ul#messages li.pending { color: #aaa; }

    form { font-size: 1em; margin: 1em 0; padding: 0; }
    form p { position: relative; margin: 0.5em 0; padding: 0; }
    form p input { font-size: 1em; }
    form p input#name { width: 10em; }
    form p button { position: absolute; top: 0; right: -0.5em; }

    ul#messages, form p, input#content { width: 40em; }
</style>

These CSS rules are a save start, even for poor IE 6 users. The most important part is the overflow: auto property paired with a fixed height. This transforms the message list into a box with it's own scrollbars. Another little trick is to position the submit button on the right side of it's enclosing paragraph (using position: relative and position: absolute). There are many other ways to do this but when confronted with IE 6 it's on of the few ways without many "strange" side effects.

With this we're done writing any code and you should have something very similar to the example chat page. My congratulations if you really read this far. :)

About performance

While the main aspect of this chat is its simplicity you don't really understand a technology if you don't know when it breaks. To explore the locking I created a small Ruby script that puts some load on the server: test.rb

The number of clients and the URLs are hard coded so you have to modify the script for you own setup if you want to do some testing. However it's just a quick 5 minute script and not programmed in a good nor scalable way. It doesn't distribute the requests very well over time and instead a big bunch of requests flood the webserver every two seconds. However when examining the locking this was quite useful since this behavior stresses the locking quite heavily.

Performance wise I couldn't really test more than 150 clients. At that load the webserver (Apache with PHP) needed negligible CPU and IO on my development machine (an old Intel Core 2 E6300) but in the browser the time for one polling request went up to 200ms. However the test script was eating up all other CPU time.

With even more clients the Ruby script hit an expected "threading barrier" on my system. Even with 300 clients I didn't saw any messages with client IDs above about 150. I suppose the other 150 threads just starved and never came to run. I don't really know what was really going there but some experiments around keep-alive requests and a better test program might help. Also during that test a chat in one browser (Firefox 3.6 with Firebug) stopped working because all polling requests timed out.

The bottom line is: I don't really know the upper limit but the chat can take more load than such a simple thing will ever get. 150 users in one chat room is everything but a total mess. If you ever want to use a chat for something big please go for the real stuff like IRC anyway.

Further ideas

Depending on what you need three are many ways you can modify or extend the chat. To mention just a few:

  • Adding multiple chat rooms is as simple as putting the messages into different JSON files, one for each chat room. Even broadcasts are simple then: just add the same message to multiple JSON files.
  • If you want a user list you can add some PHP code to the polling requests to track who is listening to the chat right now. This however will disable the HTTP caching for the polling requests. However real user management is way more complex (e.g. check for duplicated user names, etc.).
  • If you don't need/want HTTP caching anyway you can also use any other kind of data storage for the message buffer. For example the Alternative PHP Cache or Redis might fit. Redis already implements atomic list operations but I'm not sure about the alternative PHP cache. It features a compare-and-swap function so you can implement your own lock-free list if you like.

However the goal of this project was to see what is not needed. It's all to easy to build something that explodes in complexity so you might want to think about a feature twice before you really add it. ;)

2 comments for this post

leave a new one

#1 by
Christophe
,

Hello, I'm really newby in php/javascript. I tryed to install simple chat.When I click on "send", the page become white, and I must reload it. The file "messages.json" seems OK, messages and names are recorded, but nothing is showing on simple chat… Can you help me please? Friendly, Christophe

#2 by
Stephan
,

Hi Christophe,

this happens if for some reason JavaScript is broken or disabled. Then the code that usually handles the form submission never gets executed. The browser uses the default bahaviour which is more or less to reload the page.

Make sure that your page doesn't contain JavaScript errors and that JavaScript is enabled. Best take a look into the console of the developer tools. All errors should be listed there. The error messages usually provide good hints to fix these errors.

Leave a new comment

Having thoughts on your mind about this stuff here? Want to tell me and the rest of the world your opinion? Write and post it right here. Be sure to check out the format help (focus the large text field) and give the preview button a try.

Format help

Please us the following stuff to spice up your comment.

An empty line starts a new paragraph. ---- print "---- lines start/end code" ---- * List items start with a * or -

or