Arkanis Development

Styles

Simple good quality subpixel text rendering in OpenGL with stb_truetype and dual source blending

Published

Now that title is a hand full. In the last post I looked at rendering UI rectangles with OpenGL 4.5. The next step on that journey is text rendering, more specifically subpixel font rendering. So instead of rather boring rectangles we're now also throwing individual characters ("glyphs") at the screen. Way more interesting. :)

I wanted to look into subpixel font rendering for quite a while now, especially to test out dual source blending (where you can blend each subpixel individually). In the end I think I found a pretty good sweet spot between complexity and quality and wanted to document that. And of course I also went down some interesting rabit holes (e.g. gamma correction) with unexpected results. It's suprisingly difficult to find hard or complete information on subpixel text rendering and as a result this post became rather… long.

This post isn't an introduction into font rendering. Rather it describes the steps necessary to get a good result in OpenGL and sometimes the stuff nobody talks about. If you stumble over an unknown word (e.g. hinting or kerning) feel free to look them up. This post is way to massive as it is.

In case anyone asks because 4k displays and stuff: All (except one) of my displays are 1080p and 99.99…% of the text I read is horizontal. Subpixel anti-aliased text is a lot more comfortable to read for me but that depends on your eye sight (a friend of mine can hardly tell the difference). If you can't tell the difference: Just don't bother. But if you write UI software for others you might want to occasionally borrow someone elses eyeballs. On mobile and tablets it's a different story and I don't really care much about it. Grayscale anti-aliasing seems good enough there and the subpixel layouts would drive you crazy anyway.

Ok, with all that out of the way some results first:

Click the buttons to switch between various images.
The first 4 images: Ubuntu and Dejavu Serif, rendered output vs. FreeType (via FontManager). I didn't bother to line up the exact line positions so they jump around a bit.
Text block: Dejavu Sans 8pt.
Text over image: Ubuntu Italic 12px from black to white with white to black 1px text shadow over "Haruhism - The Melancholy of Haruhi Suzumiya" (cycles through a lot of colors, useful to spot artifacts). Only the first few and last few lines make sense, the rest is just gray text on a gray text shadow on top of an image.
Source code & fonts: Dark and light source code (Source Code Pro, Sublime Monokai & Breakers color schemes, 12px) and various different fonts at 10pt.
Increased weight: Same as before but with the subpixel coverages adjusted (linearly) for increased weight.

The quality is ok for the most part (at least for my eyes). Not perfect, not as sharp as it could be, but ok. Good enough for home-cooked UIs. But on smaller font sizes the hinting of FreeType clearly makes a big difference.

What it doesn't do:

What it offers:

Still interested? Here's how it works (pretty images below):

Pretty much your run-of-the-mill texture atlas thing with just 3 subpixels in a pixel so to speak. Ok, maybe the LCD filter and subpixel positioning adds a bit extra, but that's it. Each of the steps is explained in detail below, as well as a few interesting side quests (e.g. gamma correction), some future ideas and interesting references. This will be a long one, here's a small table of contents to get around:

At that point I want to apologize: The blog was never designed to cope with posts as massive as this one. It might feel a bit cramped and endless. Sorry about that.

Demo program

A simplified demo program is available on GitHub. It uses OpenGL 4.5 and is written for Linux and Windows (might work on MacOS, don't know).

The demo is pretty much just one large piece of code in the main() function that renders a given string. The code is mostly meant as a reference in case you're interested in the nitty gritty details. Hence no abstractions that you just have to crack open to see what exactly is going on. It's probably a bit unusual and not "pretty" but you can slice the steps down later on however you want. I'll directly link to the relevant pieces of code later on.

The project uses stb_truetype.h for glyph rasterization, mostly because it's simple and just one self-contained header file. But it doesn't support hinting and that's the reason I don't do it. Otherwise I would probably do vertical hinting as mentioned in Higher Quality 2D Text Rendering by Nicolas P. Rougier. Horizontal hinting is a more complex story, though.

After writing most of this post I remembered that rounding the font size to the nearest whole pixel can serve as poor vertical hinting (read that somewhere, not sure where). I played around with that but it wasn't convincing. 8pt looked a bit better but other sizes sometimes worse. And I'm planning on using pixel font sizes instead of points anyway. It wasn't worth redoing all the images so I tucked it in here. It won't be mentioned again in the rest of the post.

The texture atlas used by the demo is just a mockup that's horribly limited and inefficient. But it's simple enough to not distract from the text rendering. If you need a texture atlas you might want to look at Improving texture atlas allocation in WebRender by Nical for some ideas.

How it works

Before we go into each step I just wanted to show the effects and purpose of each one.

Use the buttons to flip through the effects of each step

These images illustrates that you kind of need the whole set to get a good looking result without visible artifacts. Maybe you can get away with skipping subpixel positioning, but reading that kind of text is annoying as heck. Sometimes a word just falls apart like "live" into "li" and "ve" above. For me at least it boils down to: Do all of them or neither. Only using pixel resolution at least looks sharp.

Subpixel resolution

I mention this step mostly for completeness and to avoid confusion. Just rasterize the glyph as a "grayscale image" with 3 times the horizontal resolution, that's it (demo code).

That gives us a value between 0..255 for each subpixel. Those values are linear coverages, 0 means the subpixel is 0% covered by the glyph, 128 means 50% covered and 255 means 100% covered.

A note on padding: The LCD filter below needs 1px padding to the left and right. Subpixel positioning might also shift the glyph almost 1px to the right, so we need another 1px padding to the left for that as well. It makes sense to take that padding into account when you rasterize the glyph into a bitmap. Then you don't have to move the glyph around after rasterization by doing error prone coordinate calculations and extensive bounds checking in the filtering step.

stb_truetype has subpixel functions as well. But these do a lot of what we do on the CPU and you can't really do a simple texture atlas thing with them. Also we don't need them since we have dual source blending. :)

FreeType LCD filter

Straight forward subpixel rendering can be a bit jaring for the eye. I hope the images above showed that. All those subpixels do have a color and this causes the jaring color fringes. The FreeType LCD filter is basically a slight horizontal blur. It distributes the brightness of one subpixel across neighboring subpixels (that have different colors) and thus gets rid of most color fringes.

To apply the filter you basically look from 2 subpixels to the left to two subpixels to the right (5 subpixels in total) and add up their value. The further away a subpixel is from the current one the less it contributes to the sum (aka it's a weighted sum or a 1D kernel with 5 weights).

The filter weights are documented in the FreeType docs as FT_LCD_FILTER_DEFAULT: [0x08 0x4D 0x56 0x4D 0x08] in 1/256 units, or about [0.031, 0.302, 0.337, 0.302, 0.031]. Meaning the outermost subpixels contribute ~3%, our neighboring subpixels ~30% and out own subpixel ~34%. All adds up to 100% so a fully covered area stays 100% covered except at the fringes (demo code).

A note on padding pixels again: Just add 1px padding at the left and right side. The filter is 5 subpixels wide, so the subpixels at the edge of a glyph can distribute their values at max 2 subpixels into that left and right padding. Hence the 1px (3 subpixel) horizontal padding. The subpixel positioning step below also adds 1px padding at the left (comming up right below).

I also recommend to use the same bitmap size for glyph rasterization and filtering. Then you can skip error prone coordinate calculations and extensive bounds checking. It makes the filtering code a lot simpler even if the bitmap for glyph rasterization is a few (3) pixels wider than necessary.

At first I did the filtering in place until I realized that the filter then reads its own output of the previous subpixels (late-night coding…). That distorts the brightness of the text a bit but at least to me it wasn't really noticable. Instead the filter should read from the glyph bitmap and writes the filtered output into a 2nd bitmap. Anyway, if you really want to you can probably get away with just using one bitmap and do the filtering in-place. Here is a small comparison and diff.

The difference between filtering in-place and into a 2nd bitmap. Play spot the difference or look at the diff if you can't. The diff was created using GIMPs "Difference" blend mode.

Subpixel positioning

Subpixel positioning is relevant in two places:

  1. When iterating over all the characters (glyphs) and calculating the rectangle positions for each one.

    Just don't round there. Use a float to keep track of the current x position, for advancing from glyph to glyph and when doing kerning (kerning in the demo code). When calculating the glyphs rectangle position you can round down the x coordinate and send the remaining fraction (between 0…1) off to the shader (aka the subpixel shift in the demo code).

  2. The fragment shader then takes the subpixel shift and, well, shifts the glyph coverages by that amount to the right.

    This is done with a linear interpolation between neighboring subpixels (demo code). Technically this means we can shift a glyph by fractions much smaller than a subpixel. But at some point no one can tell the difference anyway. I found this in the paper Higher Quality 2D Text Rendering by Nicolas P. Rougier which even contains the GLSL code for it (section "2.3. Subpixel Positioning").

The paper also has a small section (and nice image) on kerning if you want to read up on that.

You might notice that the GLSL shader code looks like it shifts the subpixels to the left. For whatever reason this confused me in this context but it's just a matter of perspective. To shift something to the right on the screen a pixel at a fixed position has to show data to the left of itself. It's the same way with texture coordinates and a lot of other stuff in fragment shaders.

That's also the reason why we need an extra 1px padding to the left. The first pixel might look almost 1px to the left of itself and we don't want it to access anyone elses pixels.

You can also do the position calculations on the CPU in font units (ints) if you like. But a float will do. Even at 8k you still have a precision of 0.001 (a 1000th of a pixel).

No fancy images this time. The images in How it works already show the effect pretty well.

Subpixel blending aka dual-source blending

Now we're probably getting to the star of the show. Usual alpha blending works with one blend weight per pixel (alpha) to blend the fragment shader output with the contents of the framebuffer. But with dual-source blending we can give the hardware one blend weight per component. Meaning we can blend each subpixel individually. :)

With that we can directly use the subpixel coverages we get from the subpixel positioning. Just blend each subpixel of the text color on top of the framebuffer with its own weight. That's it. No magic involved. Relevant demo code: Setup in OpenGL, setup in shader and setting the blend weights.

It's core in "desktop" OpenGL since 3.3. For mobile systems GL_EXT_shader_framebuffer_fetch might be worth a look if you feel adventurous.

Unfortunately almost nobody mentions dual-source blending in regards to subpixel text rendering. I found it here quite a few years ago and had it on my "I have to try this!" list ever since. Only took me 5 years or so to finally get around to it. :D

The OpenGL wiki also cover is reasonably well (here and here). Basically you have to define another output for the blend weights and then tell the blend function to use those weights (switching from SRC_ALPHA to SRC1_COLOR):

// Fragment shader
layout(location = 0, index = 0) out vec4 fragment_color;
layout(location = 0, index = 1) out vec4 blend_weights;

// OpenGL setup for "normal" alpha blending (no pre-multiplied alpha)
glBlendFunc(GL_SRC1_COLOR, GL_ONE_MINUS_SRC1_COLOR);
// Or for pre-multiplied alpha (see below)
glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC1_COLOR);

Again no fancy pictures. Jump back to How it works to satisfy your visual cravings. But on the bright side: With this step you're done! The rest of this post is purely optional and informative (and maybe entertaining).

Optional: Coverage adjustment for a thinner or bolder look

In some situations you might want a bolder look (e.g. for a bit more contrast in colored source code) or a thinner look (e.g. black on white in some fonts). This time there are fancy picture again. Lots of them actually. :) Maybe first think about different usecases (terminal, notes, code editor, game UI, etc.) and then try to look for the most pleasent option for each one.

Effects of various ways of coverage adjustment to get a thinner or bolder look.
First image: Unmodified coverages.
Linear: Linear modification of coverages, positive = bolder, negative = thinner.
Exponent: pow() function (coverageexponent), < 1 is bolder, > 1 is thinner.

I hope you picked your favourite. :) If so maybe let me know via a comment at the end of the post.

How is it done? Well, the gradient from the outside of a glyph to the inside is maybe 2 to 3 subpixels wide. 1 subpixel from the rasterizration with 3x horizontal resolution. The FreeType LCD filter then blurs that to 2 or maybe 3 subpixels. For me at least this was easier to understand with a bit of sketching:

Sketch of coverage values along the X-axis, from the outside of the glyph (coverage 0) to the inside (coverage 1.0). Subpixels are marked for scale on the X-axis.

This isn't how it really looks. The FreeType LCD filter blurs it so it's more of a soft transition. But I found it easier to reason about it this way.

So far I've come up with the above two approaches to distort that gradient line a bit: Via an exponent or via a linear modification of the gradients slope.

Coverage adjustment via an exponent
Coverage values along the X-axis when distorted by a power function (coverageexponent).

The sketch overemphasizes a bit for melodramatic effect by using an exponent of ~2.2. If you throw it in a graph plotter and look at x1.43 the effect is much more subtle. In GLSL code this simply is:

pixel_coverages = pow(pixel_coverages, vec3(coverage_adjustment))

Where pixel_coverages is a vec3 of the subpixel coverages along the x-axis, basically our subpixel values after subpixel positioning is done. If coverage_adjustment is 1.43 the font becomes thinner (like the lower curve), if it's 0.70 (or 1/1.43) it becomes bolder (like the upper curve). You can flip the effect with 1 / coverage_adjustment, e.g. to do thinning by the same amount you previously did thickening and vice versa. A value of 1.0 does nothing. I've included it in the demos fragment shader but it's commented out there in favor of the next approach.

To some of you this might look eerily familiar, and yes, this came out of the gamma correction rabit hole (more about that later). Basically I wanted to understand what the hell "gamma-correct" blending with a gamma of 1.43 actually does, because it doesn't do gamma correction. And after a lot of head-scratching, experiments and color-space stuff I think that this outline distortion effect is probably what most people are after when they do that "gamma-correct" blending with a gamma of 1.43. Btw. blending in that 1.43 thing color space unbalances light and dark fonts a bit (black on white becomes a thinner while white on black becomes bolder). We don't have that problem here since we're only adjusting the coverages and don't do any fancy distorted blending. I intentionally called it "coverage adjustment" to make clear that this isn't gamma correction.

Linear coverage adjustment

After I finally understood what as actually happening (hopefully) I came up with another approach: Simply change the slope of the gradient.

Changing the gradient slope. Left side: Steeper slope with reference point at 0 coverage (appears bolder).
Right side: Steeper slope with reference point at 1.0 coverage (appears thinner), including steps of the calculation.

Additionally I built it so that positive coverage_adjustment values go into the left case (bolder) and negative values to into the right case (thinner). For +1.2 this gives us a slightly bolder appearance. The gradient lines slop changes from 1.0 to 1.2, reference point at 0 coverage. For -1.2 we get a slightly thinner one. The slope changes from 1.0 to 1.2 again, but with 1.0 coverage as reference point. But anything between -1.0 .. 1.0 doesn't make much sense that way.

So we take that out and do slope = 1 + coverage_adjustment instead. Meaning +0.2 becomes a slope of 1.2 (bolder), 0 becomes a slope of 1 (does nothing) and -0.2 becomes a slope of -1.2 (thinner). And this finally is our linear coverage adjustment value, the +0.20, etc. you saw above. I've choosen the value range that way so it's easy to interpolate (e.g. based on the font size).

In GLSL it looks like this:

if (coverage_adjustment >= 0) {
    pixel_coverages = min(pixel_coverages * (1 + coverage_adjustment), 1);
} else {
    pixel_coverages = max((1 - (1 - pixel_coverages) * (1 + -coverage_adjustment)), 0);
}

The min() and max() just make sure that the output stays in the range 0..1. That version is in the demo code as well, but coverage_adjustment is set to 0 (do nothing) by default.

In case you're allergic to if statements in your shaders, here is a branchless version. No idea if it's faster or slower, though.

// cond is 1 for negative coverage_adjust_linear values, 0 otherwise.
// Couldn't think of a good name.
float cond = float(coverage_adjust_linear < 0);
float slope = 1 + abs(coverage_adjust_linear);
pixel_coverage = clamp(cond - (cond - pixel_coverage) * slope, 0, 1);
So which approach to use?

I don't know. In my experiments so far the linear coverage adjustment tends to be sharper and it's probably faster as well. But on the other hand the pow() approach makes a softer falloff that maybe gives a bit better anti-aliasing. I have to use this for a while to actually come to a conclusion about that. My current plan is to use no coverage adjustment by default, except for colored source code. There I'm planning to use the linear coverage adjustment with +0.2.

You can send it along as a vertex attribute (or rather per rectangle attribute) instead of a uniform so each glyph can do it's own thing. Then you can select what you need for the occasion while everything still (usually) happens in one draw call.

Maybe I can use linear +0.2 or even +0.4 as a very poor hinting hack for small font sizes like 8pt? Start at 10pt with +0 and then scale it up to +0.4 at 6pt or so? No idea, more testing is needed (think that though in a GLaDOS voice if you want).

Another note: By distorting the outline we're reducing the range where we're actually doing anti-aliasing. If you overdo it the font will look less and less smooth and more and more jagged. Small font features might even vanish if they don't have a fully filled subpixel in them. But even with those caveats it can make text quite a bit more readable in some situations.

Optional: Pre-multiplied alpha

This part isn't really necessary for text rendering so if you're here just for the text rendering feel free to skip it.

Pre-multiplied alpha becomes very useful as soon as stuff in your UI can get transparent (backgrounds, borders, text, etc.). Then you have to blend transparent stuff on top of each other within the shader and normal alpha blending kinda breaks apart at that point. I stumbled upon the topic while figuring out how to properly blend transparent borders on top of a transparent background color.

After a lot of hit-and-miss I finally found the W3C Compositing and Blending Level 1 spec. More specifically in section 5.1. Simple alpha compositing it spells out the formulas for alpha compositing:

The formula for simple alpha compositing is

co = Cs x αs + Cb x αb x (1 - αs)

Where
    co: the premultiplied pixel value after compositing
    Cs: the color value of the source graphic element being composited
    αs: the alpha value of the source graphic element being composited
    Cb: the color value of the backdrop
    αb: the alpha value of the backdrop

The formula for the resultant alpha of the composite is

αo = αs + αb x (1 - αs)

Where
    αo: the alpha value of the composite
    αs: the alpha value of the graphic element being composited
    αb: the alpha value of the backdrop

If you know your blend equations you will have noticed that the destination term (aka backdrop) contains two multiplications instead of just one: Cb x αb x (1 - αs). The hardware blender can't do that. Or at least OpenGL doesn't provide the blend equations for that as far as I know. And here the pre-multiplied alpha trick comes in handy: Just multiply the RGB components of a color with the alpha value before blending:

Often, it can be more efficient to store a pre-multiplied value for the color and opacity. The pre-multiplied value is given by

cs = Cs x αs

with
    cs: the pre-multiplied value
    Cs: the color value
    αs: the alpha value

Thus the formula for simple alpha compositing using pre-multiplied values becomes

co = cs + cb x (1 - αs)

And that's why the blend function above was set to glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC1_COLOR). The demo converts the text color to pre-multiplied alpha in the vertex shader.

Pre-multiplied alpha has a lot of nice properties and uses, especially in gaming (e.g. mipmaps don't cause color artifacts). Alan Wolfe wrote a good (and short) summary on StackExchange. And What is Premultiplied Alpha? A Primer for Artists by David Hart shows quite nicely the effects of pre-multiplication as masking. The post contains nice pictures and gives you a good intuitive understanding of how to use pre-multiplied alpha for compositing. No need to repeat all that here.

A note on the API and implementation, though: I only use 8 bit straight alpha colors (aka not pre-multiplied) in the entire API and for all CPU-side code. This keeps the API simpler and a lot less confusing. The vertex shader then converts those to pre-multiplied alpha. In the vertex shader the colors are float vectors (vec4) and we don't lose precision if the multiplication happens there. Doing the pre-multiply on the CPU with 8 bit ints probably would looses a lot of precision and might cause banding artifacts in dark regions. Never really tested that though, so it might be a bit over-the-top.

For images (icons, etc.) I do the pre-multiplication in the fragment shader after the texture read. You could do the pre-multiplication once on the CPU (with the same potential problems as above) but one multiplication in the shader doesn't kill me. On mobile you might think differently.

Paths not taken

As I became a more experienced programmer I realized more and more that the things you don't do are actually way more important than the things you do. It's oh so easy to build a complex mess of stuff that becomes magical simply because nobody understands it anymore. But building something that does the job good enough and where everybody thinks "well, that's easy, how about adding X?". Well, that is hard.

Programmers (and many other disciplins) don't talk nearly enough about that. So here I do and hopefully fewer people have to walk the same paths in vain, repeating the exact same errors for the exact same wrong reasons.

Paths not taken: The gamma correction rabit hole

If you read about subpixel text rendering you'll pretty quickly stumble upon gamma-correction.

What is it? Stewart Lynch from PureDev Software has a good introduction for programmers: Gamma Encoding. In a nutshell: "The brain can distinguish more dark shades than light [shades]". Instead of directly storing the brightness of each color channel in normal RGB colors the values are distorted a bit towards darker shades. With 8 bit integers this gives us a lot more values for darker shades, and for our perception this is where it counts.

Figure 1 from Stewart Lynchs article: "Shows how brightness values are encoded into 8 bits (0 – 255)"

A word about naming and nomenclature here. Various posts use various names for the different color spaces but here I'll stick to those two:

Peceptual sRGB colors are your standard run-of-the-mill RGB values you see everywhere. Doubling a value gives you something that looks twice as bright to our human eyes (well, somewhat).

With linear sRGB doubling the value doubles the amount of physical brightness (photons) that reach your eyes. Doesn't necessary look like that to us humans, see the above image. But when you calculate light intensities in 3D games this is what you want to work with.

You can translate between the two color spaces with a "transfer function" (fancy name):

linear_srgb_color     = perceptual_srgb_color ^ 2.2
perceptual_srgb_color = linear_srgb_color ^ (1/2.2)

Here 2.2 is "gamma" or "γ". And when you do stuff with linear sRGB colors it's usually called "gamma correction".

Note: This is a simplified transfer function that works most of the time. The official function is a bit more complicated but the results are nearly the same, so most don't care. But when you use OpenGLs sRGB support you should also use the official one or dark shades get distored, see below.

Blending artifacts in perceptual sRGB

John Novak wrote a pretty good article that shows the effects of gamma correction pretty nicely: What every coder should know about gamma. In there he makes a pretty good point that blending two peceptual sRGB colors causes some artifacts.

Figure 8 from John Novaks article: "Effects of gamma-incorrect colour blending. On the left gamma-correct image, the option Blend RGB Colors Using Gamma 1.0 was enabled in Photoshop CS6, on the right it was disabled (that’s the default gamma-incorrect legacy mode)."

Doing the blending with linear sRGB colors instead avoids those artifacts but often results in a brighter look. A linear color space is really important in computer graphics where you actually calculate light intensities and stuff. Not using linear sRGB can really mess things up there and ruin your weekend (or rather many weekends) so it's a rather sensitive topic for some.

Applying that to font rendering

Ok, but back to font rendering now. After reading all this my reasoning was: We have a coverage value for each subpixel. A coverage of 50% means 50% of light comes through from the background. So we should use our coverage values as light intensities and do the blending with linear sRGB colors, right?

But when you do subpixel font rendering and blend in linear sRGB this happens:

Effect of gamma correction on subpixel font rendering on two different examples.
Linear sRGB: Blending is done in linear space using OpenGLs GL_FRAMEBUFFER_SRGB. Input colors are converted with the official sRGB transfer function.
Perceptual sRGB: Blending in perceptual sRGB color space, aka alpha blending normal colors.
Gamma 1.43: Manual blending in the fragment shader with a set background color and a custom gamma of 1.43. Transparent text colors don't work here (never bothered to implement them in this mode).
Various scenarios:
Light to dark text:

Notice the unbalanced font weights in "linear sRGB" and "gamma 1.43"? Black on white looks thinner while white on black looks almost bold. The "light to dark text" example shows this especially well.

But this is the "correct" way, right? This led me on a merry chase to restore that black and white balance while still blending in linear sRGB color space. Adjusting the blending weights based on the text- and/or background color in various (sometimes scary) ways, deriving (or number crunching) polynomials to nudge the blend equation in different directions, etc. All very complicated, increasingly obsure and usually causing other artifacts that need a new set of workarounds. Needless to say, at some point I just pulled the plug on that. After a few days that felt like crazy scientist experiments, that is. :D

What the Skia people say about this

Skia is the UI library used by Firefox and Chrome and they do a pretty good job of font rendering. It's interesting what they say about the topic. Taken from The Raster Tragedy in Skia, emphasized the relevant part. Note: "Linear blend function" means alpha blending here I think.

Skia does not convert into a linear space, apply the linear blend, and convert back to the encoded space. If the destination color space does not have a linear encoding this will lead to ‘incorrect’ blending. The idea is that there are essentially two kinds of users of Skia. First there are existing systems which are already using a non-linear encoding with a linear blend function. While the blend isn’t correct, these users generally don’t want anything to change due to expectations. Second there are those who want everything done correctly and they are willing to pay for a linearly encoded destination in which the linear blend function is correct.

For bi-level glyph rendering a pixel is either covered or not, so there are no coverage blending issues.

For regular full pixel partial coverage (anti-aliased) glyph rendering the user may or may not want correct linear blending. In most non-linear encodings, using the linear blend function tends to make black on white look slightly heavier, using the pixel grid as a kind of contrast and optical sizing enhancement. It does the opposite for white on black, often making such glyphs a bit under-covered. However, this fights the common issue of blooming where light on dark on many displays tends to appear thicker than dark on light. (The black not being fully black also contributes.) If the pixels are small enough and there is proper optical sizing and perhaps anti-aliased drop out control (these latter two achieved either manually with proper font selection or ‘opsz’, automatically, or through hinting) then correct linear blending tends to look great. Otherwise black on white text tends to (correctly) get really anemic looking at small sizes. So correct blending of glyph masks here should be left up to the user of Skia. If they’re really sophisticated and already tackled these issues then they may want linear blending of the glyphs for best effect. Otherwise the glyphs should just keep looking like they used to look due to expectations.

For subpixel partial coverage (subpixel anti-aliased) glyph masks linear blending in a linear encoding is more or less required to avoid color fringing effects. The intensity of the subpixels is being directly exploited so needs to be carefully controlled. The subpixels tend to alleviate the issues with no full coverage (though still problematic if blitting text in one of the display’s primaries). One will still want optical sizing since the glyphs will still look somewhat too light when scaled down linearly.

How I interpret this: If you don't go all in and do "proper optical sizing and perhaps anti-aliased drop out control" don't blend in the linear sRGB color space.

The 2nd part is interesting, though. I'm doing subpixel anti-aliasing here but the color fringes are pretty much gone. My guess is that this part referes to subpixel anti-aliasing with full-pixel alpha blending. There the color fringes are a lot harder to fight. You can see this in the "Subpixel pos." image back in How it works.

Well, this whole project is about a sweetspot between quality and simplicity, so linear sRGB isn't the right tool for the job here. Perceptual sRGB colors just do a better job in our situation with less artifacts. At least when combined with dual source blending. Artifacts are still there, just less than with a linear sRGB color space.

Another way to look at it: Human perception of gradients

The perceptual and linear sRGB color spaces both have their problems. Björn Ottosson wrote a nice post about that: How software gets color wrong.

Both color spaces have parts where they come close to human perception and parts where they fail, even if linear sRGB does better with most colors. Of special interest however is the human perception of the black and white gradient because that kind of is what we blend between for black on white or white on black text.

Comparisons - White and black from Björn Ottossons article:
Perceptual blend: A smooth transition using a model designed to mimic human perception of color. The blending is done so that the perceived brightness and color varies smoothly and evenly.
Linear blend: A model for blending color based on how light behaves physically. [I call this linear sRGB in this post.] This type of blending can occur in many ways naturally, for example when colors are blended together by focus blur in a camera or when viewing a pattern of two colors at a distance.
sRGB blend: This is how colors would normally be blended in computer software, using sRGB to represent the colors. [Called perceptual sRGB in this post.]

With that gradient in mind the result of blending in linear sRGB becomes somewhat apparent: In there almost all parts of the gradient are bright. Now we apply that to the gradient at the outline of a glyph (inside 100%, outside 0%): Most of the outline will be the bright part of the gradient.

If the glyph is bright this will add to the area of the glyph, meaning it becomes bolder. If the background is bright this will add area to the background, or put in another way removes the area from the dark glyph. Meaning it will become thinner. That fits how blending in linear sRGB distorts black on white or white on black text.

Black to white gradients in perceptual sRGB are way closer to human perception, thus the fonts have a balanced font weight.

Maybe the results would be better if we do the blending in a more perceptually uniform color space. Meaning a color space where "identical spatial distance between two colors equals identical amount of perceived color difference" (from the linked Wikipedia page). Basically the "perceptual blend" above.

The Oklab color space by Björn Ottosson again seems an interesting candiate for that. To be honest perceptual sRGB does a pretty good job already and we could only do custom blending in the fragment shader with a known background color anyway. But I couldn't get it out of my head, so I did a quick test:

Effects of blending in the Oklab color space when doing subpixel font rendering. Note that transparent colors are broken in both versions (again, didn't bother to implement that).
Oklab: Convert colors into Oklab, blend and convert back. Each component is blended individually which probably breaks something.
Perceptual sRGB: Manual per-channel alpha blend in perceptual sRGB color space.
Various scenarios:
Light to dark text:

So, yeah, that didn't work out. :D I guess it's better to not take that gradient metaphor to seriously.

But then, we're talking about subpixels here and I'm really not an expert on any of those topics. Maybe that gradient thing is a good indicator of what's going on, maybe not. Color and brightness perception is a weired thing. Take everything in this subsection with a grain of salt (or maybe with a whole pitcher of it).

Other (unintended) consequences

Blending in linear sRGB gives different results by design. Usually we want that… but sometimes we don't.

Effects of blending in the linear sRGB color space on not text rendering stuff for a change.
Use the buttons to switch between linear and perceptual to get an impression of the changes.
Colors: Text (top) and color (middle) blending tests inspired by images from John Novaks What every coder should know about gamma article. The bottom part is a border and corner radius test I used for debugging.
Icons and images: Various image blending tests, Aerial Photography of Rock Formation by Ian Beckley, the icon for "C" files from GNOME Desktop icons (not sure it's from there) and a blending test image from Eric Haines article PNG + sRGB + cutout/decal AA = problematic.
Colors: Icons and images:

The "Colors" scenario reflects my experience with linear sRGB blending so far: Yay, some artifacts are gone (top), oh, the transparent colors look slightly different (middle), oh, the borders don't look like they should (bottom). The border colors in the bottom part are a transparent blue. With perceptual sRGB blending this looks just as designed but linear sRGB blending results in a different color that's a lot closer to the overall background color. Pretty unfortunate in this case.

But thats the kicker: If your artwork (color scheme, images, icons, …) are made for perceptual sRGB blending it will look off with linear sRGB blending (usually to bright). And it seems most stuff is made for perceptual sRGB blending. So if you do linear sRGB blending all the artwork has to be made for it as well. It's not just about doing "correct" blending, you're literally blending differently and your artwork needs to reflect that.

"Icons and images" shows the same dilemma but with images: The shadow of the "C" icon is made for perceptual sRGB blending. With linear sRGB blending the shadow loses a lot of its function. Small thing but can be suprisingly annoying.

The big image at the bottom is a pretty cool test image from Eric Haines article PNG + sRGB + cutout/decal AA = problematic. It tests if blending is done according to the PNG spec. I've added two white rectangles behind the right side so I can see all relevant cases at once. With linear sRGB blending it comes out correct (most of it is as bright as the 50% coverage) and with perceptual sRGB blending you get a different result (not matching the PNG spec). Needless to say, most do perceptual sRGB blending (e.g. browsers).

Same dilemma: Blending and artwork have to match up, otherwise it looks off. As long as everyone is doing (or assuming) the same thing (perceptual sRGB) it works out. But as soon as some mix it up confusion ensues.

Bonus trick: In GIMP you can change how a layer is blended via its context menu. "Compositing Space" → "RGB (linear)" or "RGB (perceptual)". Nice to quickly check what looks right. With GIMP 2.10.30 the default is "RGB (linear)" to blend layers on top of each other but I have no idea what happens when the alpha channel is exported (e.g. as a PNG).

That gamma 1.43 thing

I picked it up in Sub-Pixel, Gamma Correct Font Rendering by Stewart Lynch. It's also mentioned in John Novaks article as 1.42 and it seems to originate from a Photoshop text rendering feature. As the story goes fonts look to thin with gamma correction and a gamma of 2.2 so you use a gamma of 1.43 or so instead. Now they look as intended. Arguably because the fonts were designed for non-gamma-correct font rasterizers. Personally I guess font renderers in the past did way more agressive hinting and fonts (as well as displays and resolutions) were generally smaller. Hence fonts were designed to compensate for that and generally bolder.

Anyway, I have no idea what Photoshop really did there. Maybe they simply adjusted the coverages like my first approach above. Or maybe they actually did blend in a color space with gamma 1.43. I don't know. But blending in a color space with gamma 1.43 is neither perceptual sRGB nor linear sRGB. At that point you're not correcting anything, you just blend in a weird mixed up color space that has less artifacts. Light and dark fonts still become unbalanced, just not by as much (see the "light to dark text" gamma 1.43 image in Applying that to font rendering).

But this gave me an interesting idea: From a math point of view multiplying a color with coverages in a color space with gamma 1.43 is the same as multiplying the color directly with the coverages^(1/1.43).

  ( color1.43 *  coverages )1/1.43
= (color1.43)1/1.43 *  coverages1/1.43
=  color(1.43 * 1/1.43) *  coverages1/1.43
=  color1 *  coverages1/1.43
=  color *  coverages1/1.43

And at some point that strange idea clicked with what that means of the coverages themselves (distorting the gradient at the glyph outline). Hence the coverage adjustment was born and I left the blending alone (read I'm doing it in perceptual sRGB space). And lo and behold, the results were finally uniform again between light and dark text. Balance was restored. :)

And I could adjust the exponent for thinner or bolder looks and get predictable results across different situations.

Final gamma aside: OpenGLs sRGB support

By default OpenGL simply assumes linear behaviour and doesn't care about color spaces and such things. If you feed perceptual sRGB colors into OpenGL it will do blending in that color space. If you feed it linear sRGB colors it blends in this space. There is no magic happening behind your back.

But if you set it up carefully OpenGL will do most of the color conversions for you. You can leave your image, texture and framebuffer data in perceptual sRGB but your shader can work in linear sRGB and the blending happens in linear sRGB as well. Oh, and the conversions are hardware accellerated as well. You just have to tell OpenGL what's what so it can do the work for you.

For textures: You can create textures with sRGB formats (e.g. GL_SRGB8_ALPHA8). This tells OpenGL that the texture data is in the perceptual sRGB color space. When you read the texture data in your shader OpenGL will now automatically convert the texture data to linear sRGB for you. Just have to set the proper texture format, and thats that.

For framebuffers: With glEnable(GL_FRAMEBUFFER_SRGB) you tell OpenGL that your framebuffer is in the perceptual sRGB color space while your fragment shader outputs linear sRGB colors. Blending is then done in linear sRGB color space: Read previous framebuffer value in perceptual sRGB, convert it to linear sRGB and then blend it with your fragment shader output. After that the blend result is converted back to perceptual sRGB and written back to the framebuffer.

You might have to ask for an sRGB capable default framebuffer on window creation, though. In SDL you can do this via SDL_GL_SetAttribute(SDL_GL_FRAMEBUFFER_SRGB_CAPABLE, 1). Not all drivers seem to need that. And you don't need that when you render into a frame- or renderbuffer of your own. Just use an sRGB texture format for that buffer. Haven't done that in my experiments though.

For vertex shader inputs: You have to manually convert colors you pass as uniforms or vertex attributes. OpenGL doesn't know what vec4 is a color and which is a position or whatever, so it can't help you there. But you have to do that with the high-precision gamma transfer function so your work lines up with what OpenGL does.

Often people just use pow(rgb, 2.2) and pow(rgb, 1/2.2) as transfer functions because they're simpler and are mostly good enough. But here this leads to a missmatch: We convert our colors to linear sRGB with a different function than OpenGL uses to convert them back to perceptual sRGB later. As a result darker shades get distorted a bit (starting at brightness ~120 and visible at ~16 or ~8). Using the high-quality transform functions avoids the artifacts and everything lines up nicely between our code and OpenGL / the hardware.

I've implemented them in GLSL based on the linked Wikipedia page above. Finally a nice chance to use mix() to select via a bool vector. :)

vec3 color_srgb_to_linear(vec3 rgb) {
    return mix( rgb / 12.92 , pow((rgb + 0.055) / 1.055, vec3(2.4)) , greaterThan(rgb, vec3(0.04045)) );
}

vec3 color_linear_to_srgb(vec3 rgb) {
    return mix( 12.92 * rgb , 1.055 * pow(rgb, vec3(1 / 2.4)) - 0.055 , greaterThan(rgb, vec3(0.0031308)) );
}

And here is how it looks:

Effects of using the inaccurate gamma 2.2 sRGB transfer function with OpenGLs sRGB support.

Notice that between the official sRGB transfer function and the gamma 2.2 shortcut the text stays the same but the darker background colors on the right get even darker with 2.2. If you then compare them both to the perceptual sRGB output you see that the background colors line up with the official sRGB transfer function but gamma 2.2 distorts the darker shades.

The EXT_framebuffer_sRGB extension also contains an approximation of the transfer function. You might want to use that as well but I haven't tested it (only found it later). It's probably an even closer match to what OpenGL does.

I thought this was pretty neat for UI work and I used it for some of the experiments shown above. But it took some searching to assemble all the parts so I wanted to mention it all in one place.

Paths not taken: Manual blending with a known background color

In some situations the background color is known. For example when displaying source code we probably want to use the color schemes background color. Then we can tell the fragment shader that background color and it can do any kind of blending it wants, just returning the final (non-transparent) color.

You can use this to implement gamma-correct blending even if everything else happens in perceptual sRGB (e.g. so the images and colors look like the artists expect). Or you can do fancy stuff to enhance contrast between the text and background color.

I did a lot of fancy and scary experiments but in the end nothing was really convincing. Blending in a linear sRGB color space caused more trouble than it fixed (see above) and the Oklab color space didn't help either. Even if Oklab looks great for color gradients.

And remember: This only works on top of a known solid background color. And I would really like to blend on top of images, a game or use transparent UI elements. So I shelved the entire line of enquirey for now.

Just as an utterly over-the-top note: If someone finds an exceptionally epic blending function for subpixel rendering and you really want that for everything you can use order-independent transparency. For each pixel you first collect all the colors of e.g. the topmost 4 layers and then combine them in a dedicated resolve pass. And in that resolve pass you have the colors for foreground and background and can blend them however you want.

Order-independent transparency is usually a thing in games to render transparent geometry when sorting doesn't work (or you don't wand to do it). Christoph Kubisch describes that in his presentation Order Independent Transparency In OpenGL 4.x. Specifically the "Atomic Loop 64-bit" approach looks pretty promissing.

We can abuse it for UIs if we really want to invest that complexity. Insanely over-the-top for normal UI work so I thought I mention it. If someone is insane enough to do that, please let me know. :)

Paths not taken: Dynamically choosing coverage adjustment based on text and background color

I did that mostly as a workaround for artifacts caused by various kinds of "gamma correction".

I experimented with a coverage adjustment based on the perceived brightness of the text color, based on the old HSP Color Model — Alternative to HSV (HSB) and HSL by Darel Rex Finley. Oklabs L component might have been a better choice but I didn't know Oklab at that time. It kinda worked anyway, but usually those workarounds just made one situation look good (e.g. colored source code) and caused artifacts in others (color bleeding, normal text becomming to bold, etc.).

Pretty much the same happened when I played around with a known background color or even the difference between the text color and the background color.

I found nothing that worked well in the general case. And if the user knows about a special case (e.g. colored source code) they can simply set some special text rendering parameters (e.g. making the font slightly bolder with coverage adjustment). So in the end I just settled on the coverage adjustments described above. They're simple, relatively easy to understand and have a uniform result.

Ideas for the future

I still have a lot of interesting stuff bouncing around and here is a short list of the highlights. Some of you probably think about simiar things right now (that is if someone ever arrives down here).

Signed distance fields for text shadows

I don't want to use them for normal text rendering because finding the right field size to avoid artifacts is pretty difficult. But at the same time I want a diffuse shadow behind text to enhance the contrast above images. Speak: A blurry text shadow.

And for that signed distance fields should work just fine since fine details should become irrelevant anyway. sbt_truetype can generate them as well.

But I don't want to have a 50px signed distance field when someone requests a large blur radius. Instead I'm planning to add just 2px padding or so outside of the glyph (filled with signed distances). When someone needs a distance outside of that I'll just look for the closest pixels and approximate the distance based on the fields direction there.

With a bit of luck this should approximate the distance as a line (one outermost position) or a circle (two outermost poitions one pixel apart from one another). No idea if that will work, though.

Signed distance fields might also be a nice fallback for very large font sizes. If a glyph would be larger than e.g. 128px we can rasterize it as a signed distance field with a height of 128px and store that in an atlas texture. Then user can zoom in as much as they want while we simply keep using that same 128px field all the time. Sure, we still might lose some glyph detail for extreme fonts but it's probably simpler than writing an entire curve renderer for outlines. Should also work well with subpixel positioning and dual source blending, but at those font sizes it's probably a moot point.

2x2 oversampling for animated text

Right now I only need horizontal text and it's not even animated. But once I want animated text I'll try Font character oversampling for rendering from atlas textures from stb_truetype. Basically just render the glyph with double the horizontal and vertical resolution and use linear interpolation. Sounds neat and simple and fits in nicely with the grayscale atlas I'll need for signed distance fields anyway.

Use DirectWrite and Pango when available

This is very far in the future. Mostly I want them for complex text layout. Pango provides an API for that and I've seen something about that in DirectWrite but I'm not sure if it's usable. It would also provide automatic replacement glyphs in case a font doesn't offer a glyph. I guess we would get a more authentic system look, too.

It's also a good point to take a closer look at horizontal hinting for small font sizes. The 8pt output of FreeType looks pretty convincing to me and if they do horizontal hinting to achieve that so would I. But it's probably tricky to pull off without derailing the texture atlas approach. Maybe let FreeType do hinting on subpixel boundaries and then do something in the shader to nudge those boundaries to full pixel boundaries. Or try to round the subpixel shift to full subpixels and see how that looks.

Do something useful when the background color is known

As a final part it still feels like you can do a lot more if the text and background colors are known. I couldn't find anything that actually made things better, but that doesn't mean there is no such thing. Granted, dual source blending probably eliminates the need for most tricks in that regard, but still… it feels like wasted potential.

Other approaches and interesting references

I'll mostly just link to other stuff here. Explaining it all would take the post even further past the endurance threshold. Anyway, Aras Pranckevičius wrote a pretty nice post in 2017: Font Rendering is Getting Interesting. It provides a good overview and has some pictures. It boils down to:

The CPU rasterization and texture atlas approach is still the workhorse as far as I know. FreeType, stb_truetype, DirectWrite, etc. do the rasterization and UIs like GTK then to the texture atlas thing. Same as I described in this post. Pretty good quality but someone has to manage the texture atlas. It's the "classic" GPU assisted font rendering approach.

Distance fields are pretty useful if you have a lot of different text sizes floating around. Or for animations and effects like shadows, outlines, etc. But they lose small glyph details depending on the size of the distance field and in UIs their advantages don't weight that heavily. But in games and espcially on 3D surfaces they're pretty awsome. I stopped following that field a few years ago but back then multi-channel distance fields by Viktor Chlumský where the most advanced approach. There you use RGB for 3 distance fields and combine them in the shader to better represent sharp features in glyphs. Avoids a lot of artifacts and lets you get away with smaller field sizes. Constructing the multi-channel field is rather complex, but fortunately described in detail in his masters thesis.

There are also approaches that rasterize the glyphs directly on the GPU. Aras blog post above lists a few. As far as I can tell Slug by Eric Lengyel came out of that. There you store the font description on the GPUs and each pixel looks at it to build the glyph. It's the first approach that actually convinced me. The website also links a paper and slides with a lot of details about the algorithm. But it's a bit to ambitious for a hobby project, even for my tastes.

Easy Scalable Text Rendering on the GPU by Evan Wallace also seems pretty interesting but requires a lot of triangles and overdraw per glyph. It's neat, but not sure if that is worth it.

Apart from those low-level libraries and approaches there are of course libraries like Skia that do text rendering. And Skia does a pretty good job of it. They also have a very interesting design document where they talk about their font rendering: The Raster Tragedy in Skia that I quoted from above.

And this brings me to the The Raster Tragedy at Low-Resolution Revisited by Beat Stamm. He worked on a lot of font rendering stuff at Microsoft, including ClearType. The Raster Tragedy is a vast treasure trove and I still haven't finished reading it all. But it focuses a lot on hinting as a tool to adjust fonts and I'm not that interested in the internals of glyph rasterization. I'm leaning more towards the GPU side of things.

Interesting tidbit: As far as I can remember none of them mentioned subpixel blending aka dual source blending. I wouldn't be surprised if Skia does it but I couldn't find any mention of it from them.

Also worth mentioning is FreeTypes Harmony LCD rendering as mentioned in their documentation. It can handle different subpixel positions and is a pretty neat idea.

A note on something not strictly font-rendering related: stb_truetype is a small self-contained single header C file. No dependencies. Skia is a huge project with a lot of dependencies and I don't event want to think about building it. Depending on the project those aspects can be more important than quality (in my case they are).

The End

Phew! If anyone other than myself reads those last few words: You've just unlocked the endurance achievement, congrats. :) I hope those mad ramblings were interesting or at least entertaining. And maybe they spared you a bit of pain and suffering. If you have any related tidbits or ideas please drop a comment or send me a mail. May all your glyphs look sharp enough.

Leave a new comment

Having thoughts on your mind about this stuff here? Want to tell me and the rest of the world your opinion? Write and post it right here. Be sure to check out the format help (focus the large text field) and give the preview button a try.

optional

Format help

Please us the following stuff to spice up your comment.

An empty line starts a new paragraph. ---- print "---- lines start/end code" ---- * List items start with a * or -

Just to keep your skill sharp and my comments clean.

or