<?xml version="1.0" encoding="UTF-8"?> 
<feed xmlns="http://www.w3.org/2005/Atom">
	<title>Arkanis blog posts</title>
	<subtitle>A blog about random stuff, but mostly programming.</subtitle>
	<id>http://arkanis.de/weblog.xml</id>
	<link href="https://arkanis.de/weblog/posts.xml" rel="self" />
	<link href="https://arkanis.de/" rel="alternate" />
	<updated>2026-04-26T16:18:50+00:00</updated>
	<entry>
		<title>Lots of gambling all around</title>
		<author>
			<name>Stephan Soller</name>
		</author>
		<id>http://arkanis.de/weblog/2026-04-26-lots-of-gambling-all-around</id>
		<link rel="alternate" href="https://arkanis.de/weblog/2026-04-26-lots-of-gambling-all-around/" />
		<link rel="replies" type="text/html" href="https://arkanis.de/weblog/2026-04-26-lots-of-gambling-all-around/#comments" />
		<link rel="replies" type="application/atom+xml" href="https://arkanis.de/weblog/2026-04-26-lots-of-gambling-all-around/comments.xml" />
		<updated>2026-04-26T16:11:39+00:00</updated>
		<content type="html" xml:base="https://arkanis.de/weblog/2026-04-26-lots-of-gambling-all-around/">
&lt;p&gt;I&#039;m sure many people have noticed this already:
Gambling with investor money (very fashionable), loot boxes and other gambling mechanics in games, dark patterns in UIs like endless scrolling, likes in social media, prediction markets, what will the next output of some LLM do, the list goes on and on.
There sure are a lot of ways to get your dopamine fix.&lt;/p&gt;
&lt;p&gt;For me it&#039;s most obvious in games. Call it RNG, random draws, dice rolls or whatever. Maybe with a nice animation with something moving or flying around and hitting the right spot. Or not. If you make or play games you might think that this is just the way some games work. More and more I don&#039;t. I see slot machines instead.&lt;/p&gt;
&lt;p&gt;Rolling for characters in Genshin Impact kind of threw that dynamic right into my face.
At some point I could feel myself being conditioned. Never played it since then.
I&#039;ve only played it for a few weeks, and this was years ago by now. So no harm done. But it really soured an otherwise interesting world and story for me.&lt;/p&gt;
&lt;p&gt;I bet (no pun intended) you&#039;ve seen the same or similar mechanics in other games more and more over recent years.&lt;/p&gt;
&lt;p&gt;There are various perspectives on that that slosh around in my head:&lt;/p&gt;
&lt;h1&gt;Technical perspective&lt;/h1&gt;
&lt;p&gt;Randomness (aka RNG) is just a tool to keep games interesting. Less repetitive. Sometimes it&#039;s a good way to approximate a more complex world, suggesting a richer and more dynamic world than is actually there. To deepen the immersion.
Alas, for me that only works when the randomness is invisible. As soon as it&#039;s thrown in my face, I see a slot machine.&lt;/p&gt;
&lt;p&gt;Like with many things, it&#039;s easier to picture it as a spectrum: Randomness as a technical tool on one end, an obvious slot machine on the other.&lt;/p&gt;
&lt;h1&gt;Business&lt;/h1&gt;
&lt;p&gt;Well, gambling addiction is bad for people. I hope that&#039;s a given. It&#039;s great for business, though. Dopamine dependency keeps your players playing. Keeps users using your app. Even if they know it&#039;s bad for them. And active concurrent users is something every investor likes to see going up. Anyway, win-win, right? More money for your business, and the users have an engaging experience.&lt;/p&gt;
&lt;p&gt;Yeah, by now it&#039;s kind of horrifying how many people still swallow that. Never mind &lt;a href=&quot;https://en.wikipedia.org/wiki/Problem_gambling&quot;&gt;the consequences&lt;/a&gt;. There&#039;s a reason gambling was regulated so tightly.&lt;/p&gt;
&lt;p&gt;At least with loot boxes some have woken up. But in regards to general gameplay or UIs I honestly see no progress so far. All those small little dopamine triggers? Still there.&lt;/p&gt;
&lt;p&gt;But I probably have strange tastes for UIs. For me the best UI doesn&#039;t divert any attention, doesn&#039;t leave any impression. I just do what I opened the app for and be done with it. I shouldn&#039;t &lt;em&gt;feel&lt;/em&gt; anything when using a good UI. It shouldn&#039;t consume enough attention to evoke feelings, because any attention consumed is diverted from my original task. It should just communicate the information needed for my next decision. It should be boring, predictable, invisible. UX people will likely disagree… very much so, I assume.&lt;/p&gt;
&lt;p&gt;And by now more than a few have noticed that work with LLMs can sometimes follow a similar pattern: Well, it almost got it right… just a few tweaks here and there… better, but problems there… and after how knows how many rounds it finally works. Or not. But no matter the final outcome, a bit of a dopamine roller coaster.&lt;/p&gt;
&lt;p&gt;Again, this is a spectrum: A little bit of eye candy doesn&#039;t hurt. But if you ever found yourself opening up an app without meaning to… well, that&#039;s the other end. Where Pavlov&#039;s dogs and &lt;a href=&quot;https://en.wikipedia.org/wiki/Operant_conditioning_chamber&quot;&gt;Skinner boxes&lt;/a&gt; reside.&lt;/p&gt;
&lt;h1&gt;Social and systemic&lt;/h1&gt;
&lt;p&gt;I&#039;m probably a bit more aware of this dynamic than most.
At least judging by the fact that no one ever talks about it.
But maybe everyone else thinks that, too. So how knows.&lt;/p&gt;
&lt;p&gt;But honestly to me it feels like the financial industry and part of the software industry have become the new dopamine dealers.
The software and games industry are more like school yard dealers (hook them when they&#039;re young), and the financial industry is for adults with money to burn (or not).&lt;/p&gt;
&lt;p&gt;Maybe those industries are just &amp;quot;exploring new revenue sources&amp;quot; until there is push back from society.
And it&#039;s a lot easier to try that with something insubstantial like money, software and entertainment than with e.g. buildings. Not for lack of trying. Various property bubbles sure felt like land developers gambling a lot.
And there&#039;s always the &lt;a href=&quot;https://en.wikipedia.org/wiki/Overton_window&quot;&gt;Overton window&lt;/a&gt; to push ever so slowly.&lt;/p&gt;
&lt;p&gt;Maybe it&#039;s just that over the last few decades we&#039;ve grown up more and more used to our daily dopamine fixes.
And where there&#039;s a demand, supply will soon follow. Consequences be damned.&lt;/p&gt;
&lt;h1&gt;Final words&lt;/h1&gt;
&lt;p&gt;Anyway, I wish I had something positive to end on. But I don&#039;t.
It&#039;s just one of the many small things that still leaves me baffled when observing how people behave.
Something I keep noticing but sure wish I wouldn&#039;t.&lt;/p&gt;
&lt;p&gt;If you&#039;re reading this: Thank you, non the less.
Maybe it helps to spot and avoid trouble.
Or if you make games or apps maybe it even keeps you from putting slot machines into them by accident.
Anyway, take a cookie. Can&#039;t leave you without any conditioning now, can I. 😉&lt;/p&gt; 
		</content>
	</entry>
	<entry>
		<title>JavaScript plotting tool for interactive plots</title>
		<author>
			<name>Stephan Soller</name>
		</author>
		<id>http://arkanis.de/weblog/2025-11-27-javascript-plotting-tool-for-interactive-plots</id>
		<link rel="alternate" href="https://arkanis.de/weblog/2025-11-27-javascript-plotting-tool-for-interactive-plots/" />
		<link rel="replies" type="text/html" href="https://arkanis.de/weblog/2025-11-27-javascript-plotting-tool-for-interactive-plots/#comments" />
		<link rel="replies" type="application/atom+xml" href="https://arkanis.de/weblog/2025-11-27-javascript-plotting-tool-for-interactive-plots/comments.xml" />
		<updated>2025-11-27T18:48:03+00:00</updated>
		<content type="html" xml:base="https://arkanis.de/weblog/2025-11-27-javascript-plotting-tool-for-interactive-plots/">
&lt;p&gt;I wrote another &lt;a href=&quot;https://arkanis.de/projects/js2plot/&quot;&gt;function plotter&lt;/a&gt; where you can write your function in simple JavaScript code. Something like &lt;code class=&quot;lang_js&quot;&gt;&lt;span class=&quot;variation_c&quot;&gt;func&lt;/span&gt;(&lt;span class=&quot;variation_b&quot;&gt;&amp;quot;f&amp;quot;&lt;/span&gt;, x &lt;span class=&quot;landmark_a&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;variation_c&quot;&gt;sin&lt;/span&gt;(x))&lt;/code&gt;. No new special math-like syntax you have to learn, just JavaScript.&lt;/p&gt;
&lt;figure&gt;
    &lt;video controls src=&quot;demo.webm&quot; poster=&quot;demo.jpg&quot;&gt;&lt;/video&gt;
&lt;/figure&gt;
&lt;p&gt;Features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Plot updates each time you change the code&lt;/li&gt;
&lt;li&gt;Multiple plots&lt;/li&gt;
&lt;li&gt;Non-function objects like points, lines, rects and circles.&lt;/li&gt;
&lt;li&gt;Draggable objects (set &lt;code&gt;draggable: true&lt;/code&gt; then shift+click to drag)&lt;/li&gt;
&lt;li&gt;Save and load to local files&lt;/li&gt;
&lt;li&gt;The tool is a single self-contained HTML file, you can download it and use it as an offline tool.&lt;/li&gt;
&lt;li&gt;Sharable links that contain entire data (&lt;code&gt;deflate-raw&lt;/code&gt; compressed)&lt;/li&gt;
&lt;li&gt;All &lt;code&gt;Math.*&lt;/code&gt; functions and many GLSL 2D functions directly available, no &lt;code&gt;Math&lt;/code&gt; namespace required.&lt;/li&gt;
&lt;li&gt;Resizeable code panel (use resize handle in the lower right corner)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Downsides / limits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No 2D vectors or operator overloading. Right now I&#039;m using &lt;code&gt;{x, y}&lt;/code&gt; objects for that and duplicate the calculation for each component. Not great, not terrible, but a bit annoying.&lt;/li&gt;
&lt;li&gt;No plots for implicit functions where a value is calculated (and plotted) for each pixel on the plot. I didn&#039;t need it and not having 2D vectors gets really painful there.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For me the most important parts are draggable items and code that can react to those user interactions. And that I can use that tool offline and save the scripts into files. That&#039;s why it&#039;s a single self-contained HTML file, no backend anywhere. 🙂&lt;/p&gt;
&lt;h1&gt;Why it exists&lt;/h1&gt;
&lt;p&gt;In one of my current side-projects I had a strange math problem. Somehow, Palpatine returned… err, ok, no. Somehow a discontinuity appeared somewhere and was causing visual artifacts. A lot less dramatic but a lot more relevant to me. Time to investigate.&lt;/p&gt;
&lt;p&gt;Unfortunately I&#039;m not good enough at math to find the cause via derivation in this case. It&#039;s just to complicated for me. And anyway, the math might be fine and it just doesn&#039;t look like I expect or want it to. Usually I&#039;m using a function plotter to build each part of the math I need, check the plot to see if it does what I want and build an intuition about each part.&lt;/p&gt;
&lt;p&gt;I was mostly using &lt;a href=&quot;https://graphtoy.com/&quot;&gt;Graphtoy&lt;/a&gt; in this project but was constantly running into limits. For example it only supports up to 6 functions, which sounds like plenty, but once to you do some meaty stuff it gets rather limiting. If you know Graphtoy you might guess I&#039;m doing something with &lt;a href=&quot;https://iquilezles.org/articles/distfunctions2d/&quot;&gt;SDFs&lt;/a&gt; (Signed Distance Fields), more specifically &lt;a href=&quot;https://iquilezles.org/articles/smin/&quot;&gt;soft union of 2D SDFs&lt;/a&gt;. But with a lot of twists and constraints to them. For my level pretty meaty stuff. I was juggling at least 3 distance functions, transforming them into different spaces, doing some smoothing and normalizing stuff there, inverse transformation and other stuff. More like an algorithm than just (relatively simple) functions. So yeah, I was reaching the limits of Graphtoy.&lt;/p&gt;
&lt;p&gt;I was thinking about using &lt;a href=&quot;https://www.desmos.com/calculator&quot;&gt;desmos&lt;/a&gt; instead. You can drag points around and do some interactive stuff, but when I was using it for another project complex stuff quickly became rather cumbersome. A lot of time went into &amp;quot;how do I tell desmos I want this…&amp;quot;. Most of the time, actually. Not what I wanted here.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.shadertoy.com/&quot;&gt;Shadertoy&lt;/a&gt; was also a possibility, but I wanted to quickly try different point and circle positions. In other projects dragging stuff on a 2D visualization really helped me with finding ugly cornercases really quickly. And with Shadertoy I would have to build all that by myself. I could use GLSL that way, but plotting lines (function graphs) would be a real pain there.&lt;/p&gt;
&lt;p&gt;For a few hours I looked for a function plotter that did what I needed… and found none. Well, you already know the answer: I built the tool I needed myself. Ok, not quite. I already had a function plotter &lt;a href=&quot;https://arkanis.de/weblog/2018-06-14-js2plot-plot-2d-functions-written-in-javascript/&quot;&gt;lying around&lt;/a&gt; which already did most of what I wanted. So I just added what I needed an was done. Ok, again, who am I kidding. I wrote a lot of code for the new features and the code I reused was extensively refactored. You know the drill.&lt;/p&gt;
&lt;h1&gt;Random ramblings&lt;/h1&gt;
&lt;p&gt;Honestly, by now working with Browser APIs reminds me of working with the Win32 API. You salvage bits and pieces to get something done that&#039;s trivial in most other languages or environments. It feels like finding your way through a maze of APIs that were never quite intended to do what you want. An with luck it even works and isn&#039;t too buggy. It&#039;s either that or drowning in an unmaintainable mess of fashionable dependencies (that wasn&#039;t an option with the Win API… so progress?).&lt;/p&gt;
&lt;p&gt;Anyway, I didn&#039;t want to embed a heavy-weight &amp;quot;mini IDE&amp;quot; into this project. So instead I took an old syntax highlighter I wrote for few years ago and extended it a bit. Similar to the simple server-side highlighting &lt;a href=&quot;https://arkanis.de/weblog/2025-07-24-arkanis-development-version-5/#markdown-and-syntax-highlighting&quot;&gt;I use in my blog&lt;/a&gt; it takes one large RegExp. Each matching named group is then wrapped into an HTML element and styled with CSS. But JavaScript doesn&#039;t support extended regular expressions, so I wrote a small &lt;a href=&quot;https://github.com/arkanis/useful_functions/blob/main/js_extended_regexp.js&quot;&gt;translation function&lt;/a&gt;. Might be useful for others.&lt;/p&gt;
&lt;p&gt;I&#039;ve also extracted the textarea with syntax highlighting as well as the plotting stuff into web components. They still need a bit of cleaning up, but I hope to reuse them in other projects in the future. We&#039;ll see how those side-side-side projects turn out.&lt;/p&gt;
&lt;p&gt;There&#039;s a lot of other stuff going on in the plotting tool. But I want to end this side-side project for now and go back to my side-project. So if you have any questions, just ask in the comments. 🙂&lt;/p&gt; 
		</content>
	</entry>
	<entry>
		<title>The Popularity Trap of Software Development</title>
		<author>
			<name>Stephan Soller</name>
		</author>
		<id>http://arkanis.de/weblog/2025-09-25-the-popularity-trap-of-software-development</id>
		<link rel="alternate" href="https://arkanis.de/weblog/2025-09-25-the-popularity-trap-of-software-development/" />
		<link rel="replies" type="text/html" href="https://arkanis.de/weblog/2025-09-25-the-popularity-trap-of-software-development/#comments" />
		<link rel="replies" type="application/atom+xml" href="https://arkanis.de/weblog/2025-09-25-the-popularity-trap-of-software-development/comments.xml" />
		<updated>2025-09-25T18:16:50+00:00</updated>
		<content type="html" xml:base="https://arkanis.de/weblog/2025-09-25-the-popularity-trap-of-software-development/">
&lt;p&gt;I&#039;ve been talking to students about this for a few years now.
But I&#039;ve never heard anyone else talk about it, so it might be worth writing about it here.
Or not. You&#039;ll be the judge of that. 😉&lt;/p&gt;
&lt;p&gt;When programmers need a library (framework, programming language, tool, …) they often just grab a popular one.
GitHub stars, NPM downloads, Stack Overflow posts or frameworks &amp;quot;everyone uses&amp;quot;.
After all, if a lot of people use them, they can&#039;t be bad. Right?
I&#039;ve had my phase of that as well.&lt;/p&gt;
&lt;p&gt;But after a while it turned out to be… somewhat misleading.
How do libraries become popular?
Sometimes they have nice descriptions or promise a lot. Let&#039;s call that good marketing.
I hope that makes it clear that you should not take that stuff at face value. After all, a pretty idea in a readme doesn&#039;t tell you much about what the code actually does. Or if it works out at all.
Anyway, that is one dynamic leading to popularity and I hope everyone should have at least some resistance to it. Or at least understand that it&#039;s something you need resistance against.&lt;/p&gt;
&lt;p&gt;Another dynamic is simply how popularity works in the internet. At least right now, but probably also way into the future.
Search engines and statistics make stuff &amp;quot;popular&amp;quot; when people talk and write about it a lot.
So, why would you write about a library?
Hm, maybe the API isn&#039;t quite clear, so someone asks a question. Maybe it isn&#039;t clear at all, so a lot of people ask questions.
Maybe there are bugs. Those need to be discovered, triaged, discussed, solved, etc.
Maybe you have a &amp;quot;can do anything&amp;quot; library, that gets complicated once you apply it to something that isn&#039;t a trivial toy problem. Pretty much any usecase would need to be discussed. Known working recipes discovered, shared, fought over, the whole tribal shebang.&lt;/p&gt;
&lt;p&gt;If you&#039;ve been programming for a few years, you&#039;ve probably seen that play out in many different variations.
Competitions over GitHub stars between projects, fights over &amp;quot;best practices&amp;quot;, stuff like that.&lt;/p&gt;
&lt;p&gt;But you know what doesn&#039;t help much to make a library popular?
If it solves the specific problem you currently have.&lt;/p&gt;
&lt;p&gt;Sounds strange? Let&#039;s play it through:
You need a library to solve a specific problem and through various ways you stumble upon a project.
The readme specifies what problem the library solves, the assumptions that went into it and the limits where it breaks (when it should not be used).
It&#039;s a reasonable fit for your problem and will work in your context. Nothing to ask or talk about here.
The API is clear and simple, not much that can be used wrong. Again, nothing to ask about.
After doing a quick experiment everything checks out. It works to solve your problem and there are no obvious bugs. Again, nothing to talk about.
You use it to solve your problem and move on to the next thing you have to do.&lt;/p&gt;
&lt;p&gt;How much would such a library be talked about?
Maybe a bit here and there. Maybe someone would recommend it for a while after using it. Or maybe not. Maybe it just worked well enough and didn&#039;t leave much of an impact.
A lone project page, a bit of documentation and a few posts about it here and there.
But if you measure popularity by how much people talk about something, it would sure fly under the radar.
At least compared to other libraries with a lot more unknowns that need to be discussed.&lt;/p&gt;
&lt;p&gt;The first time I thought about that it felt very strange to me. Not at all like I felt it should work. Maybe one of those social heuristics evolved a long time ago that doesn&#039;t translate well into modern times. Who knows.&lt;/p&gt;
&lt;p&gt;Anyway, what I want to demonstrate with that is that &amp;quot;popular&amp;quot; doesn&#039;t mean &amp;quot;solves my problem&amp;quot;.&lt;/p&gt;
&lt;p&gt;I can understand why people take that mental shortcut, though.
Searching for a library that&#039;s a good match for your problem takes time and effort.
Just taking a popular one? A lot less effort. And it gives you a handy excuse to justify your decision.
In a world where we often have to work under pressure a strategy that can shave of days of work is attractive.&lt;/p&gt;
&lt;p&gt;Another advantage: You don&#039;t have to understand much about your problem.
Again, it saves time if you don&#039;t have to think about the details. Or you&#039;re new and don&#039;t know them and are afraid to ask questions (and are still in the phase where you conflate not asking questions with competence).
I&#039;ve met a lot of students who are simply afraid in those situations. And this is one way to deal with that fear. So there&#039;s also a big emotional component here.&lt;/p&gt;
&lt;p&gt;But as anyone who has lived life for a while knows: Decisions have consequences.
Those important details you haven&#039;t thought about? They will catch up to you. Sooner or later.
If you&#039;re lucky sooner. Then you can at least understand what went wrong, fix it and avoid it the next time.
If you&#039;re unlucky later. To late to fix. Or maybe you&#039;ve moved on and don&#039;t realize the mistake and happily repeat it again and again. Or even teach it and have others duplicate your mistake.&lt;/p&gt;
&lt;p&gt;If we really get extreme we can make an entire business out of running away fast enough. Have had any inexperienced &amp;quot;consultants&amp;quot; in house lately that told you how to solve your problem they don&#039;t have a clue about?&lt;/p&gt;
&lt;p&gt;Ok, I&#039;ve had my little rant now. Back to the topic at hand. So searching for a library boils down to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Understand your problem and what you need.&lt;/li&gt;
&lt;li&gt;Look for a library that does what you need. A simple as possible, as complex as necessary.&lt;/li&gt;
&lt;li&gt;Test and check if it actually does that.&lt;/li&gt;
&lt;li&gt;Repeat until you found one that fits.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;And popularity doesn&#039;t much factor into that decision. Which makes it hard to find good candidates.
But I hope by now you understand that discarding something because of an ugly project page or little activity isn&#039;t a good strategy.&lt;/p&gt;
&lt;p&gt;For many students this is a very strange way of thinking.
Sometimes I have the impression that our current programming society encourages using the fashion part of our brains to make that decision.
And it takes some training to use the problem solving part instead.
But hey, you have to learn it somewhere.&lt;/p&gt;
&lt;p&gt;I think I&#039;ll leave it here. I&#039;ve had a whole other train of thought about the synergies between all that and simplicity, but that will have to wait for another time.
Or maybe never. At least the advantages of simplicity are well known compared to the traps of popularity. 🙂&lt;/p&gt; 
		</content>
	</entry>
	<entry>
		<title>Arkanis Development Version 5</title>
		<author>
			<name>Stephan Soller</name>
		</author>
		<id>http://arkanis.de/weblog/2025-07-24-arkanis-development-version-5</id>
		<link rel="alternate" href="https://arkanis.de/weblog/2025-07-24-arkanis-development-version-5/" />
		<link rel="replies" type="text/html" href="https://arkanis.de/weblog/2025-07-24-arkanis-development-version-5/#comments" />
		<link rel="replies" type="application/atom+xml" href="https://arkanis.de/weblog/2025-07-24-arkanis-development-version-5/comments.xml" />
		<updated>2025-07-24T18:30:39+00:00</updated>
		<content type="html" xml:base="https://arkanis.de/weblog/2025-07-24-arkanis-development-version-5/">
&lt;p&gt;For a few years now I&#039;ve been toying with the idea of redoing my website.
And I&#039;m moving home at the moment and I needed something to stay sane (at least somewhat).
So a new website it is. 😄&lt;/p&gt;
&lt;p&gt;Anyway, it&#039;s been ~7 years since the last time I touched this website (apart from writing content).
And over the years I collected a small list of ideas I wanted to play around with.
But before starting with new toys I did a small retrospective of what worked well for those 7 years:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The old page chugged along for 7 years without &lt;em&gt;any&lt;/em&gt; manual mantenance. Well, Debian auto-updates, but that&#039;s it.&lt;/p&gt;
&lt;p&gt;This was thanks to only using bare PHP. No framework, no extra package manager and packages for developers.
Just Debians own PHP package for Apache2. This approach worked exceptionally well, so I&#039;m keeping that.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Storing posts and comments as simple files and using the file browser (via SSH) as an admin interface.
Basically to write new posts and to moderate comments.&lt;/p&gt;
&lt;p&gt;This also worked remarkably well, but had some small teething issues. I just open my server in my local file browser
and edit files like local files. But creating and saving files over SSH like that can change
file and directory permissions and the webserver might not be happy about that. But that only became a problem when I
wanted to publish a post (did that by dragging it into another directory). So that will be done via a console command
in the future and it can update permissions accordingly.&lt;/p&gt;
&lt;p&gt;But appart from that it&#039;s a hell of a complexity-to-usefulness ratio. 🙂 No database, no dedicated admin interface and
user management (and hence a lot less attack surface), stupidly easy backup &amp;amp; restore, …. So yeah, sticking with that.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Newsfeed for notifications&lt;/p&gt;
&lt;p&gt;This website is a low traffic page (obviously). I also don&#039;t post that often (obviously), and comments also don&#039;t
happen that often (obviously). An (Atom) newsfeed was all I needed to stay up-to-date. It&#039;s simple, easy and something
also useful to vistors. Keeping that.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Simple HTML &amp;amp; CSS based design&lt;/p&gt;
&lt;p&gt;It didn&#039;t break in 15 years. 7 years ago I rewrote the &amp;quot;backend&amp;quot;, but the design I reused from the previous version.
So yeah, it&#039;s mostly 15 years old (I already said it&#039;s time for a change 😉 ). Browsers take backwards compatiblity very
seriously. At least for the core stuff. I think I can&#039;t emphasis enough what it means in modern web development that a
website looked basically the same for 15 years without any glitches.&lt;/p&gt;
&lt;p&gt;On the other hand browsers have grown into very complex things. And I have the impression that this stability is
bought with the sanity (and increasing insanity) of browser developers. I feel a bit guilty about that but at least
want to say &amp;quot;thank you&amp;quot; for that sacrifice.&lt;/p&gt;
&lt;p&gt;Also, I don&#039;t need to maintain a complex development environment. Just open the developer tools (especially Firefoxes
&amp;quot;Styles&amp;quot; tab) and I&#039;m good to go. I work in a lot of different environments and toolchain maintenance is a factor. But
not here. So I&#039;ll stick to that approach.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Style switcher&lt;/p&gt;
&lt;p&gt;The old website (4th version) had two different designs. An even older one (2nd version) had 3 designs. And you
could switch between them via a style switcher. At first the design switched automatically based on the time of day
(green for morning, blue for the day, red for sunset and the night). But that confused people and annoyed myself, so
I disabled it.&lt;/p&gt;
&lt;p&gt;I also stumbled upon a &lt;a href=&quot;https://arkanis.de/weblog/2018-05-21-chromium-and-webkit-bug-when-switching-stylesheets/&quot;&gt;Chrome bug&lt;/a&gt;
when I implemented the style switcher in JavaScript in 2018 (alternate styles are actually a pretty old browser
feature). It took a few years to get fixed, but then it was a low-priority bug with a known workaround.
Ok, strike that, I just opened the old test page in Chromium 138 and it&#039;s still buggy (link in the linked post).
So much about that.&lt;/p&gt;
&lt;p&gt;Anyway, no point in having multiple designs anymore. Meaning no point in having a style switcher. Not keeping that.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;List of personal projects&lt;/p&gt;
&lt;p&gt;The old website had a list of my projects. At least of those that were somewhat interesting.
Anyway, I didn&#039;t spend any time to keep that list up-to-date. And honestly, creating it was mostly a nice trip through
memory lane for myself. Not keeping that, just needs maintenance work I&#039;m not going to invest.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ok, writing about that got a lot longer than I expected. Sorry about that.
Now on to the fun part. New toys and ideas to play around with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The design&lt;/li&gt;
&lt;li&gt;Static page generation&lt;/li&gt;
&lt;li&gt;What to do with tags&lt;/li&gt;
&lt;li&gt;Markdown and syntax highlighting&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;A new design. It&#039;s about time.&lt;/h1&gt;
&lt;figure&gt;
    &lt;div style=&quot;display: flex; justify-content: center; gap: 0.5lh;&quot;&gt;
        &lt;a style=&quot;width: 30%;&quot; href=&quot;ciechanow.ski_2025-07-23.png&quot;&gt;&lt;img src=&quot;ciechanow.ski_2025-07-23.png&quot;&gt;&lt;/a&gt;
        &lt;a style=&quot;width: 30%;&quot; href=&quot;chirpy.cotes.page_2025-07-23.png&quot;&gt;&lt;img src=&quot;chirpy.cotes.page_2025-07-23.png&quot;&gt;&lt;/a&gt;
    &lt;/div&gt;
    &lt;figcaption&gt;The two main inspirations for the design.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For the overall structure I wanted something like &lt;a href=&quot;https://ciechanow.ski/&quot;&gt;Bartosz Ciechanowski&lt;/a&gt;. Something simple
that directs the readers attention towards the content, not the design itself. A kind of minimalist aesthetic.&lt;/p&gt;
&lt;p&gt;The old design and page structure was also meant for small but many blog posts, e.g. showing multiple posts on a single
page. But I&#039;ve gravitated more towards fewer but longer posts  (like the one about &lt;a href=&quot;https://arkanis.de/weblog/2023-08-14-simple-good-quality-subpixel-text-rendering-in-opengl-with-stb-truetype-and-dual-source-blending/&quot;&gt;subpixel text rendering&lt;/a&gt;)
and the structure of Bartosz Ciechanowski just seems like a good fit for that.&lt;/p&gt;
&lt;p&gt;Bartosz Ciechanowski has a pretty bright color scheme (except for the parts where he uses dark backgrounds like
about the moon), but I wanted to do a dark-mode like color scheme this time. While looking around for inspirations I
found &lt;a href=&quot;https://chirpy.cotes.page/posts/text-and-typography/&quot;&gt;Chirpy&lt;/a&gt;, which looks very impressive. For a short time I
even considered playing around with Jekyll. But a look at the dependencies ended that train of thought pretty fast.
Anyway, I liked the colors and shades.&lt;/p&gt;
&lt;p&gt;Did I already mention that Chirpy looks slick? Well, after sketching around for
a bit I noticed that Chirpy has a lot of bells and whistles. They look nice, but also draw a lot of attention away from
the content towards the design (like the animations in the table of contents). I get why it&#039;s like that, but not
something I want for my page.
You should remember the content, not the design.&lt;/p&gt;
&lt;p&gt;Of course there were a lot of other inspirations, like &lt;a href=&quot;https://www.universetoday.com/articles/watch-the-sun-unleash-a-solar-flare&quot;&gt;Universe Today&lt;/a&gt;, other &lt;a href=&quot;https://miketuritzin.com/post/rendering-particles-with-compute-shaders/&quot;&gt;blog&lt;/a&gt; &lt;a href=&quot;https://giannirosato.com/blog/post/lossless-data-comp/&quot;&gt;posts&lt;/a&gt; and
&lt;a href=&quot;https://github.com/mzlogin/rouge-themes/tree/master?tab=readme-ov-file#base16dark&quot;&gt;color schemes&lt;/a&gt; like &lt;code&gt;base16-edge-dark&lt;/code&gt; from &lt;a href=&quot;https://highlightjs.org/demo&quot;&gt;highlight.js&lt;/a&gt;. But the two above were
the main ones.&lt;/p&gt;
&lt;p&gt;Here&#039;s what I came up with in the end. And to make it fun (and somewhat embarrassing) I dragged screenshots of all
previous versions out of my personal archive:&lt;/p&gt;
&lt;figure&gt;
    &lt;div style=&quot;display: flex; justify-content: center; align-items: start; gap: 0.5lh;&quot;&gt;
        &lt;a style=&quot;flex: 1 1 0px;&quot; href=&quot;arkanis_v5.png&quot;&gt;&lt;img src=&quot;arkanis_v5.jpg&quot;&gt;&lt;/a&gt;
        &lt;a style=&quot;flex: 1 1 0px;&quot; href=&quot;arkanis_v4.png&quot;&gt;&lt;img src=&quot;arkanis_v4.jpg&quot;&gt;&lt;/a&gt;
        &lt;a style=&quot;flex: 1 1 0px;&quot; href=&quot;arkanis_v3.png&quot;&gt;&lt;img src=&quot;arkanis_v3.jpg&quot;&gt;&lt;/a&gt;
        &lt;a style=&quot;flex: 1 1 0px;&quot; href=&quot;arkanis_v2.png&quot;&gt;&lt;img src=&quot;arkanis_v2.jpg&quot;&gt;&lt;/a&gt;
        &lt;a style=&quot;flex: 1 1 0px;&quot; href=&quot;arkanis_v1.png&quot;&gt;&lt;img src=&quot;arkanis_v1.jpg&quot;&gt;&lt;/a&gt;
    &lt;/div&gt;
    &lt;figcaption&gt;
        All designs over the years. In my defense I did know even less about design back then than I do now (which still isn&#039;t much).&lt;br&gt;
        Left to right: v5 from 2025-07, v4 from 2018-05, v3 from 2010-07, v2 from 2006-07 and v1 from 2005-10. v2 is a blend of all 3 color schemes.
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2&gt;Scoped styles and CSS nesting&lt;/h2&gt;
&lt;p&gt;Now on to more technical aspects of the design. While implementing it and migrating old content I stumbled upon &lt;a href=&quot;https://chriscoyier.net/2023/10/19/style-scoped/&quot;&gt;&lt;code&gt;&amp;lt;style scoped&amp;gt;&lt;/code&gt;&lt;/a&gt;.
It would have allowed you to put all article specific styles inside the &lt;code&gt;&amp;lt;article&amp;gt;&lt;/code&gt; element and the styles would only affect that article. Neat, simple and would have been useful to me.
But alas, it&#039;s not yet there and it seems to have morphed into &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/CSS/@scope&quot;&gt;&lt;code&gt;@scope&lt;/code&gt;&lt;/a&gt; which can do the same, albeit with a bit more boilerplate.
But it&#039;s not ready yet. Maybe the next time I redo my website I can use something like that.&lt;/p&gt;
&lt;p&gt;I&#039;ve also had some funny situations with &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_nesting/Using_CSS_nesting&quot;&gt;CSS nesting&lt;/a&gt; and specificity.
This was the first time I could use CSS nesting while redoing a complete website.
And usually I start with the general rules that make up the design and then add the special cases. And in a reasonably complex design there are a lot of special cases.
Human perception of color and spacing is complex and sometimes you have to apply different spacing to make it look consistent (even if it isn&#039;t on a technical level).
Line heights of fonts are not consistent and you have to nudge that for some font combinations. The list goes on.&lt;/p&gt;
&lt;p&gt;CSS selectors fit this pattern pretty well: General rules have simple selectors, hence a low specificity.
The special cases usually have more complex selectors, hence a high specificity, and overwrite the general rules.
Actually, I never had to think about specificity. It just worked for me. Specific rules overwrite general ones. How else should it be?&lt;/p&gt;
&lt;p&gt;But with CSS nesting this no longer worked for me.
I used nesting to document the relevant HTML structure while writing the general rules.
This makes the interplay of multiple elements more obvious (e.g. to configure layout models like grid, flexbox, positioning, …).
But all those nested selectors compound their specificity and started to overwrite the rules for special cases.
All in all I think I just have to adjust my mental model and how I write stylesheets. But it simply surprised me.&lt;/p&gt;
&lt;p&gt;Combining the rules-based nature of CSS with the block-based structure of nesting led to unexpected complexity for me.
Maybe I have to look into &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Learn_web_development/Core/Styling_basics/Cascade_layers&quot;&gt;CSS layers&lt;/a&gt; but this would make it only more complex.
Another concept to juggle around and a more complex mental model.
Well, something to experiment with in the next few projects.&lt;/p&gt;
&lt;h1&gt;Static page generation&lt;/h1&gt;
&lt;p&gt;This was something I wanted to play around with for a long time. I saw it in my apprenticeship back in 2003 and used it myself back in ~2006 (in Ruby with ERB).
But back then this wasn&#039;t trendy and I was stupid, so I stopped doing it. What should I say, this was my &amp;quot;frameworks are awesome&amp;quot; phase.&lt;/p&gt;
&lt;p&gt;Anyway, the plan this time was to generate static pages for the entire website.
If I write a new post or someone adds a comment, just generate a new static page and reload.
Simple concept and if something breaks you can always use the static pages as a read-only version of the website to fall back on.&lt;/p&gt;
&lt;p&gt;The only difference to a &amp;quot;pure&amp;quot; static page generator is that there&#039;s still some code on the server that regenerates pages on demand.
I could have gone pure static pages and do the comments via JavaScript and some extra thing that takes care of comments.
But I wanted to keep it simple (and static) and wanted to keep the comments on my own server. No point in spewing user data around unnecessarily.&lt;/p&gt;
&lt;p&gt;At first I though about writing it in Ruby and using ERB again. But the packages and dependencies required to parse Markdown and do syntax highlighting discouraged me from that (more on syntax highlighting in a bit).&lt;/p&gt;
&lt;p&gt;Well, PHP has a &lt;a href=&quot;https://www.php.net/control-structures.alternative-syntax&quot;&gt;builtin template syntax&lt;/a&gt; and using that to build a template system is about 5 lines of code.
Something to render static pages about 25 lines.
Also there&#039;s &lt;a href=&quot;https://parsedown.org/&quot;&gt;Parsedown&lt;/a&gt;, a single-file library with no dependencies. Ok, two files if you count ParsedownExtra.
Still, a lot less complexity to get acquainted with and check.
So PHP it is this time.&lt;/p&gt;
&lt;h2&gt;Strange language and library selection&lt;/h2&gt;
&lt;p&gt;That approach to language and library selection might seem strange to you. After all, libraries are there to make development easy.
But to put it bluntly: If I use a library, I execute code from someone else.
Does this code contain a crypto miner? I don&#039;t know, I have to check.
Does it exfiltrate data my users entrusted to me? I don&#039;t know, I have to check.
Will it join a botnet and make my server into a zombi? I don&#039;t know, I have to check.&lt;/p&gt;
&lt;p&gt;In a perfect world I could just call a library function in it&#039;s own little sandbox.
Then it could only access the data I gave to it and give me back the result.
Just like you could do on the old &lt;a href=&quot;https://en.wikipedia.org/wiki/CAP_computer&quot;&gt;Cambridge CAP computer&lt;/a&gt; from 1970.
But alas, we don&#039;t live in that world and there seems little interest to get there.
Which annoys the hell out of me btw., because it would be cheap by todays hardware standards.
So I have to find another way to handle the trust users implicitly put into software while still working with 3rd party code (because I don&#039;t want to reinvent the wheel).&lt;/p&gt;
&lt;p&gt;Some projects are under enough public scrutiny to trust them. Stuff like the Linux kernel, the PHP or Ruby interpreters, compilers, etc.
But most libraries are not.
The only way I found for my own projects is to use simple libraries and check manually.
I know, very high-tech. I hope we&#039;ll arrive in the 1970s before I retire, but honestly I&#039;ve lost hope about that. I did say this situation annoys the hell out of me for a reason. 😉
But that is one of the reasons why I avoid complexity like the plague.
Because if a library needs 50 classes and who knows how many methods to solve a simple problem, checking that codes quickly becomes impractical.
But simple libraries, those I can check.&lt;/p&gt;
&lt;p&gt;At least doing all that comes with fringe benifits.
Knowing details about a library makes it easier to work with and extend.
And when you find bugs in the libraries you have an easier time fixing them.&lt;/p&gt;
&lt;p&gt;Funnily enough this was the main reason why I started to read library code years and years ago.
Back then I found library bugs in pretty much every project I did. Usually more than one.
Or the documentation was incomplete and I had to read the code to figure out how to use the API.
This got better the more I focused on simple libraries, but sometimes you have to use complex libraries like ffmpeg or x265.&lt;/p&gt;
&lt;p&gt;Then software supply chain attacks became a thing and that put a much more serious spin on things.
And here we are. 🙁&lt;/p&gt;
&lt;p&gt;Sorry about that adhoc rant. I put a heading over it to make it look like I planned to write about it, but I didn&#039;t.
Anyway, moving on.&lt;/p&gt;
&lt;h1&gt;What to do with tags&lt;/h1&gt;
&lt;p&gt;Actually I wanted to remove them.
I don&#039;t know about you, but I never found it very useful to see a list of posts that have a given tag.
Thats what the old page and most blogs do when you click on a tag.&lt;/p&gt;
&lt;figure class=&quot;inline&quot; style=&quot;width: 30%;&quot;&gt;
    &lt;a href=&quot;tags.png&quot;&gt;&lt;img src=&quot;tags.png&quot;&gt;&lt;/a&gt;
    &lt;figcaption&gt;Posts similar to the brute-force substring search post.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I get the impression they&#039;re meant to be used like categories.
There you put each post into just one category and have a nice list of mutually exclusive categories.
Unfortunately the real world is complex and posts usually touch multiple topics, meaning they don&#039;t fit nicely into just one category.
Instead what I usually want is: Are there similar posts to this one? And maybe explore a little.&lt;/p&gt;
&lt;p&gt;But then I had a strange thought: What if I just build what I want?
Yeah, sometimes my mind works in strange ways.
Just show posts ranked by how many tags they have in common with a given post.
Then throw in the old tag cloud to easily explore tag combinations.
That was what first came to mind and it worked surprisingly well (see the screenshot or tags page).&lt;/p&gt;
&lt;p&gt;I guess it depends a lot on how you use tags. So this won&#039;t work for everyone.
But I had all those tagged posts lying around and it seemed like a waste to throw that away.&lt;/p&gt;
&lt;h1&gt;Markdown and syntax highlighting&lt;/h1&gt;
&lt;p&gt;This was an odd side quest.
Before implementing a project in earnest I usually do little isolated experiments to check out all critical parts.
Markdown processing is one of those parts for this project.&lt;/p&gt;
&lt;p&gt;Parsedown is a reasonable simple library that&#039;s easy to extend. But it doesn&#039;t do syntax highlighting and this time around I wanted to do that in the static page generation.
I found a few libraries that combine Parsedown with various syntax highlighting libraries, but while looking them over they all seemed way to complex for what I needed (mostly meaning the syntax highlighters).&lt;/p&gt;
&lt;p&gt;With syntax highlighting there are two ends of the spectrum:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Just color some parts of the source code to make it pretty and to provide recognizable visual patterns.&lt;/li&gt;
&lt;li&gt;Properly parse the code with the programming languages grammar.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you&#039;re building an IDE and developers use your syntax highlighting for direct feedback you probably want to be closer to number 2.
But this can get quite complicated and requires a lot of code.
For a blog where we just want to make source code look pretty? Pretty much number 1. And with regular expressions we really don&#039;t need much complexity / code to do that.&lt;/p&gt;
&lt;p&gt;To my surprise most syntax highlighting libraries I looked at were leaning towards complex parsing. And hence had large and complex code bases. Not what I need for this project.
Other libraries were mostly concerned with smashing the logic into pieces and squirreling them away into quite a few classes. Not what I want for this project.
Wraping parts of a string into &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt; elements and coloring them isn&#039;t that complex of a problem.&lt;/p&gt;
&lt;p&gt;So after a day or two of searching, reading code and experimenting, I gave up.
Just to make this clear: All libraries I looked at worked. I just wasn&#039;t happy with how much complexity / code they pulled in to solve my relatively simple problem.
You can probably guess the rest of the story. I wrote a small syntax highlighting function myself.&lt;/p&gt;
&lt;p&gt;A regex to match interesting parts of a given programming language, each part as a named pattern.
Then a PHP function that applies that regex to the given source code and wraps each found named pattern into a &lt;code&gt;&amp;lt;span&amp;gt;&lt;/code&gt;.
With the name of the pattern becomming the class name, more or less.&lt;/p&gt;
&lt;p&gt;Code:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[section name]
name=value&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Regex:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;(
    ^ (?&amp;lt;name&amp;gt; \w+ ) = (?&amp;lt;value&amp;gt; .+ )
|   ^ (?&amp;lt;section&amp;gt; \[ [^]]* \] )
)xm&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Result:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_html&quot;&gt;&lt;span class=variation_c&gt;&amp;lt;span&lt;/span&gt;&lt;span class=lang_html_attr&gt; &lt;span class=variation_a&gt;class&lt;/span&gt;=&lt;span class=variation_b&gt;section&lt;/span&gt;&lt;/span&gt;&lt;span class=variation_c&gt;&amp;gt;&lt;/span&gt;[section name]&lt;span class=variation_c&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;
&lt;span class=variation_c&gt;&amp;lt;span&lt;/span&gt;&lt;span class=lang_html_attr&gt; &lt;span class=variation_a&gt;class&lt;/span&gt;=&lt;span class=variation_b&gt;name&lt;/span&gt;&lt;/span&gt;&lt;span class=variation_c&gt;&amp;gt;&lt;/span&gt;name&lt;span class=variation_c&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;=&lt;span class=variation_c&gt;&amp;lt;span&lt;/span&gt;&lt;span class=lang_html_attr&gt; &lt;span class=variation_a&gt;class&lt;/span&gt;=&lt;span class=variation_b&gt;value&lt;/span&gt;&lt;/span&gt;&lt;span class=variation_c&gt;&amp;gt;&lt;/span&gt;value&lt;span class=variation_c&gt;&amp;lt;/span&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I tried a few of different variations and (of course) profiled a lot, but in the end it boiled down to ~40 lines of PHP code.
And a 20 - 30 line regex for each supported programming language (or sometimes split it into two).
I spare you the details, if you&#039;re interested take a look at the &lt;a href=&quot;https://github.com/arkanis/php_simple_syntax_highlighting&quot;&gt;GitHub repo&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Anyway, I spend about a day on the PHP side of things. Then one or two days (don&#039;t remember) writing the language definitions.
That was actually something I quite enjoyed.
I learned some nifty little details of some languages and I&#039;m still amazed at how many useful little things Ruby has (just look at those &lt;a href=&quot;https://ruby-doc.org/3.4.1/syntax/literals_rdoc.html#label-Percent+Literals&quot;&gt;percent literals&lt;/a&gt;!)
The whole thing is only meant for code on my website, but thats still GLSL, Ruby, Bash, C, HTML, CSS, JS, PHP, SQL and Java.&lt;/p&gt;
&lt;p&gt;But it only needs to be pretty, not correct, and this makes writing something like that a lot faster (and enjoyable, especially in &lt;a href=&quot;https://regex101.com/&quot;&gt;regex101&lt;/a&gt;).
Doing it by myself also came handy in some not-so-popular cases I used in some articles, for example Javas &lt;a href=&quot;https://docs.oracle.com/javase/specs/jls/se24/html/jls-3.html#jls-3.10.6&quot;&gt;text blocks&lt;/a&gt;.
A lot of the highlighters I tried didn&#039;t process them, but I could just implement them and be done with it.
I also played around with some rather unusual highlighting ideas, e.g. highlighting GLSL vector swizzle patterns like &lt;code class=&quot;lang_glsl&quot;&gt;blend_weights.&lt;span class=&quot;variation_b&quot;&gt;xxyy&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;That was a fun little (unexpected) detour.&lt;/p&gt;
&lt;h1&gt;Other stuff&lt;/h1&gt;
&lt;p&gt;I added reactions to posts and comments. Mostly because I wanted to give readers a quicker form of feedback.
It was surprisingly difficult to find a good set of emojis with filled and outline variants.
In the end I went with a small set in FontAwesome. Not perfect, but gets the job done.&lt;/p&gt;
&lt;p&gt;Comments also got extended into a tree, mostly because I found that useful.
Reply chains without branches are flattened when displayed. This avoids those annoying reply cascades for simple conversations.&lt;/p&gt;
&lt;p&gt;Both are a bit of overkill for this blog, but I wanted to play around with it. So I did. 🙂&lt;/p&gt;
&lt;p&gt;Anyway, I better stop here or I&#039;ll never get this post done. If anyone is interested in more details, feel free to ask.&lt;/p&gt;
&lt;p&gt;In the end I was quite happy that Parsedown itself was by far the most complex part of the entire website.
If you want some very rough numbers:
Common PHP code for the website is ~320 &lt;abbr title=&quot;Lines of Code&quot;&gt;LoC&lt;/abbr&gt;,
pages and templates that render to HTML or newsfeeds are ~570 LoC and then there are ~100 LoC that rerender pages on-demand e.g. when someone posts a comment.
Syntax highlighting is ~300 LoC (including all language definitions).
Parsedown and ParsedownExtra come to about ~2100 LoC.
Measured highly professionally by scrolling through the code and eyebaling how many comments and real code there is. So don&#039;t take them to seriously. 😉&lt;/p&gt;
&lt;p&gt;We&#039;ll see which of those ideas will survive the next rewrite. If you&#039;re still with me, thanks for dropping by and reading all the way to the end. 🙂&lt;/p&gt; 
		</content>
	</entry>
	<entry>
		<title>Can you host Reddit on a Raspberry Pi?</title>
		<author>
			<name>Stephan Soller</name>
		</author>
		<id>http://arkanis.de/weblog/2024-09-17-can-you-host-redit-on-a-raspberry-pi</id>
		<link rel="alternate" href="https://arkanis.de/weblog/2024-09-17-can-you-host-redit-on-a-raspberry-pi/" />
		<link rel="replies" type="text/html" href="https://arkanis.de/weblog/2024-09-17-can-you-host-redit-on-a-raspberry-pi/#comments" />
		<link rel="replies" type="application/atom+xml" href="https://arkanis.de/weblog/2024-09-17-can-you-host-redit-on-a-raspberry-pi/comments.xml" />
		<updated>2024-09-17T15:01:49+00:00</updated>
		<content type="html" xml:base="https://arkanis.de/weblog/2024-09-17-can-you-host-redit-on-a-raspberry-pi/">
&lt;p&gt;A few years ago I read somewhere that you can probably run a Reddit-sized forum on a Raspberry Pi.
It was a Reddit comment and I had my doubts about that.
But a few weeks ago I watched &lt;a href=&quot;https://www.youtube.com/watch?v=RgBYohJ7mIk&quot;&gt;What if you tried to print Wikipedia?&lt;/a&gt; and it mentions that the English Wikipedia receives about 100 edits per minute (1.7 edits/sec).
For some reason that really surprised me (I expected way more) and brought me back to that Reddit on a Pi question from years back.
So I gave the &amp;quot;spherical cow in a vacuum&amp;quot; version of that scenario a go.&lt;/p&gt;
&lt;p&gt;To my surprise it actually somewhat checked out.
At least when you ignore little details like images, videos and everything else except reading and posting posts and comments.
Also I assumed you would use some read-cache like &lt;a href=&quot;https://varnish-cache.org/&quot;&gt;Varnish&lt;/a&gt; and our Raspberry Pi only get the requests not in that cache.&lt;/p&gt;
&lt;p&gt;Under those (and a lot more) assumptions I got about 6000 - 7000 req/sec on a Raspberry Pi 4b with about 150% - 200% CPU load.
Yes, those are thousands of requests per second.
1 in 100 requests inserted a new post (basically executes an &lt;code&gt;INSERT&lt;/code&gt; statement).
99 in 100 query all the posts of a random topic (a &lt;code&gt;SELECT&lt;/code&gt; statement).
The whole thing was kind of funny and maybe even interesting enough to write about. So here we go.&lt;/p&gt;
&lt;figure&gt;
    &lt;figcaption&gt;
        The Raspberry Pi I used for the experiment. I just wanted an image here.
    &lt;/figcaption&gt;
    &lt;img src=&quot;pi4b.jpg&quot;&gt;
&lt;/figure&gt;
&lt;h1&gt;Setting expectations&lt;/h1&gt;
&lt;p&gt;This is just me having some fun here.
Don&#039;t expect nice graphs or accurate measurements.
I just wanted to know what order of magnitude of requests per second a Raspberry Pi 4b could handle.
So my highly professional measurement procedure was &amp;quot;look at &lt;code&gt;htop&lt;/code&gt; for 10 seconds and write down the average&amp;quot;.
Well, that and measuring requests per seconds with a benchmarking tool because that&#039;s hard to eyeball.
That&#039;s the level of rigor for this post. You have been warned. 😉&lt;/p&gt;
&lt;p&gt;As any good grown-up software developer I started by picking the toys I wanted to play with: C and SQLite.
And, as is customary today, I also came up with a good story to justify my decision:&lt;/p&gt;
&lt;p&gt;Software spends a lot of time (and code) on shuffling data around between various components.
HTTP implementation, webserver, backend logic, database, etc. all have their interfaces and someone needs to glue that stuff together.
Sometimes this is just calling a few function or repackaging data into other data structures.
But in the case of most databases this actually involves serializing queries and data and shipping that whole stuff to and from another process (the database).
If the database runs on the same machine that is.
Usually there&#039;s also a lot of management involved to keep track of everything in flight and not going crazy while doing that.
How about we just don&#039;t do any of that this time?&lt;/p&gt;
&lt;p&gt;We want to process HTTP GET and POST requests.
HTTP is just a (somewhat) simple TCP protocol (at least the basics, &amp;quot;spherical cow in a vacuum&amp;quot;, remember?).
And TCP is handled via the operating systems Socket API. This is a C API, so that means we&#039;ll code in C.&lt;/p&gt;
&lt;p&gt;For the database we want something that doesn&#039;t do serialization.
In fact it should just run directly in our own process and just do any I/O right there.
SQLite does that, so SQLite it is. Mind you, I never used the SQLite C API before.&lt;/p&gt;
&lt;p&gt;Neat story, no?&lt;/p&gt;
&lt;h1&gt;Step 1: Looking for a benchmark tool&lt;/h1&gt;
&lt;p&gt;I think I&#039;m to lazy to write much about this part. Long story short: I ended up using &lt;a href=&quot;https://github.com/wg/wrk&quot;&gt;wrk&lt;/a&gt;.
It was the fastest from a bunch of tools I looked at, easy to compile and use.
It also has LuaJIT support to customize the requests you send. Which I only realized later but came in quite handy.
And the source code looks adequately state-machiny, which to me makes it look like someone actually knew what they were doing.&lt;/p&gt;
&lt;p&gt;To get a baseline I wrote 3 very simple &amp;quot;Hello World!&amp;quot; webservers and took the benchmark tool that could squeeze the most requests per second from those servers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One webserver in plain node.js. Pretty much the &amp;quot;Hello World!&amp;quot; example on the node.js website.&lt;/li&gt;
&lt;li&gt;One is a simple C program. It just accepts one connection, does one &lt;code&gt;read()&lt;/code&gt;, writes one string and then closes the connection. That string happens to be a HTTP response. All with simple synchronous blocking syscalls.&lt;/li&gt;
&lt;li&gt;And another C program which does the same but using an epoll based eventloop.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Not something I would call a web server under normal circumstances, but &amp;quot;spherical cow in a vacuum&amp;quot; again. 😉&lt;/p&gt;
&lt;p&gt;I did that on my PC, an 8-core AMD Ryzen 7 5700X running Linux Mint 21.1 (kernel 6.5.0-21-generic x86_64).
No Raspberry Pi stuff yet. But still some numbers to give you an idea what a single-threaded &amp;quot;Hello World!&amp;quot; webserver can handle:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Server&lt;/th&gt;
&lt;th&gt;req/sec&lt;/th&gt;
&lt;th&gt;Traffic&lt;/th&gt;
&lt;th&gt;Server CPU util.&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;node.js&lt;/td&gt;
&lt;td&gt;~45k&lt;/td&gt;
&lt;td&gt;~7.6 MB/s&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;consecutive accept() loop&lt;/td&gt;
&lt;td&gt;~40k&lt;/td&gt;
&lt;td&gt;~5.8 MB/s&lt;/td&gt;
&lt;td&gt;~25% - ~80%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;eventloop with epoll&lt;/td&gt;
&lt;td&gt;~60k&lt;/td&gt;
&lt;td&gt;~5.5 MB/s&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note that 100% CPU utilization means that just 1 CPU core runs at full utilization.
Even if those are just dummies to select a benchmark tool I was pretty impressed by the node.js performance.
Especially since it&#039;s the only one that actually parses the HTTP stuff.
The other two just throw a string back at you and basically only masquerade as HTTP servers.&lt;/p&gt;
&lt;p&gt;On the other hand the ~40k req/sec of a very very simple endless loop of blocking &lt;code&gt;accept()&lt;/code&gt;, &lt;code&gt;read()&lt;/code&gt;, &lt;code&gt;write()&lt;/code&gt;, &lt;code&gt;close()&lt;/code&gt; calls also surprised me.
Yeah, it&#039;s wonky, but you can write it in just a few minutes (that includes the time to look stuff up in the man pages).
That complexity-to-performance ratio is hard to beat.&lt;/p&gt;
&lt;p&gt;Misc. details:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This is the command used for benchmarking: &lt;code&gt;./wrk -t 12 -c 400 -d 10 http://localhost:8000/&lt;/code&gt;. 12 threads, 400 concurrent connections and run for 10 seconds.&lt;/li&gt;
&lt;li&gt;I had to start the epoll server and &lt;code&gt;wrk&lt;/code&gt; with &lt;code&gt;ulimit -n 8192&lt;/code&gt;. Otherwise there were too many connections in flight and the default file descriptor limit for a process was reached.&lt;/li&gt;
&lt;li&gt;Telling the compiler to optimize the epoll server (compiling with &lt;code&gt;-O2&lt;/code&gt;) didn&#039;t make a significant difference.&lt;/li&gt;
&lt;li&gt;Instead the server is totally dominated by syscalls. 96% of the CPU time it turns out. Highly professionally measured by running the epoll server with the &lt;code&gt;time&lt;/code&gt; command in bash (&lt;code&gt;user 0m0,366s&lt;/code&gt; and &lt;code&gt;sys 0m9,681s&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Step 2: Fooling around with SQLite&lt;/h1&gt;
&lt;p&gt;With the HTTP and benchmarking side covered, it&#039;s time to switch toys to SQLite.
I simply started with &lt;a href=&quot;https://sqlite.org/quickstart.html&quot;&gt;SQLite In 5 Minutes Or Less&lt;/a&gt; in the documentations &amp;quot;Programming Interfaces&amp;quot; part and went from there.
And I have to admit, the SQLite documentation is quite excellent. Concise and well written.&lt;/p&gt;
&lt;p&gt;First stop: Create a database and write a program that fills it with some test data (100k posts will do).
That&#039;s the extent of our &amp;quot;spherical cow in a vacuum&amp;quot; version of Reddit:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_sql&quot;&gt;&lt;span class=landmark_a&gt;CREATE&lt;/span&gt; &lt;span class=landmark_a&gt;TABLE&lt;/span&gt; posts (
    id       &lt;span class=landmark_b&gt;INTEGER&lt;/span&gt; &lt;span class=landmark_a&gt;PRIMARY&lt;/span&gt; &lt;span class=landmark_a&gt;KEY&lt;/span&gt;,
    topic_id &lt;span class=landmark_b&gt;INTEGER&lt;/span&gt;,
    body     &lt;span class=landmark_b&gt;TEXT&lt;/span&gt;
);
&lt;span class=landmark_a&gt;CREATE&lt;/span&gt; &lt;span class=landmark_a&gt;INDEX&lt;/span&gt; posts_topic_index &lt;span class=landmark_a&gt;ON&lt;/span&gt; posts (topic_id);&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Every topic is filled up with 5 - 1000 posts.
For the post text I took the first 2772 characters from &amp;quot;Lorem ipsum&amp;quot; and each post takes the first 57 - 2772 bytes from that.
Not very original and the distribution is probably totally off, but I just wanted some data to play around with.
Incidentally that came to about 175 MB of data (but I might remember that wrong).&lt;/p&gt;
&lt;p&gt;Generating those posts took around 2 seconds, so ~50k inserts/sec.
But all in one big transaction. And that transaction part worried me, but more on that later.&lt;/p&gt;
&lt;p&gt;Serious thoughts aside, the most complicated part of that program was choosing the &lt;a href=&quot;https://www.pcg-random.org/download.html&quot;&gt;PCG random number generator&lt;/a&gt; I wanted.
Of course I went with the &amp;quot;Minimal C Implementation&amp;quot; and copied the whole ~7 lines of it.
Well, that, and using the neat &lt;code&gt;ldexp()&lt;/code&gt; trick from &lt;a href=&quot;https://www.pcg-random.org/using-pcg-c-basic.html#generating-doubles&quot;&gt;Generating doubles&lt;/a&gt; to get random numbers in the range I wanted.&lt;/p&gt;
&lt;p&gt;Now we have some test data and a rough understanding of how many &lt;code&gt;INSERT&lt;/code&gt;s we can throw at SQLite with a single thread.
Time to figure out how many &lt;code&gt;SELECT&lt;/code&gt; queries we can throw at it.&lt;/p&gt;
&lt;p&gt;The next program just executed 213 &lt;code&gt;SELECT&lt;/code&gt; queries, each selecting all posts of one topic.
All executed one after the other and the output is printed via &lt;code&gt;printf()&lt;/code&gt;.
In the end we want to return JSON to some hypothetical browser, and that gave me the bright idea to make SQLite do all that JSON work for me (escaping, string concatenation, etc.).
Also that way the CPU can stay within the SQLite bytecode interpreter all the time, but I&#039;m not sure that actually makes any difference.
Hence I tested two queries:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_sql&quot;&gt;&lt;span class=landmark_a&gt;SELECT&lt;/span&gt; id, body &lt;span class=landmark_a&gt;FROM&lt;/span&gt; posts &lt;span class=landmark_a&gt;WHERE&lt;/span&gt; topic_id = ?;

&lt;span class=landmark_a&gt;SELECT&lt;/span&gt; &lt;span class=variation_c&gt;json_group_array&lt;/span&gt;(&lt;span class=variation_c&gt;json_object&lt;/span&gt;(&lt;span class=variation_b&gt;&amp;#039;id&amp;#039;&lt;/span&gt;, id, &lt;span class=variation_b&gt;&amp;#039;body&amp;#039;&lt;/span&gt;, body))
&lt;span class=landmark_a&gt;FROM&lt;/span&gt; posts &lt;span class=landmark_a&gt;WHERE&lt;/span&gt; topic_id = ?;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The first just fetches the data as-is, the second accumulates everything into a JSON array and gives us that as a string. Here are the &lt;code&gt;time&lt;/code&gt; measurements (with output redirected to &lt;code&gt;/dev/null&lt;/code&gt; of course):&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;walltime&lt;/th&gt;
&lt;th&gt;queries/sec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple &lt;code&gt;SELECT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0,099s&lt;/td&gt;
&lt;td&gt;213 queries / 0.099 sec = ~2150 queries/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON aggregated &lt;code&gt;SELECT&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;0,168s&lt;/td&gt;
&lt;td&gt;213 queries / 0.168 sec = ~1250 queries/sec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Using JSON takes about 1.7x longer here. Maybe worth it or maybe we could create our own format and let clients decode that. I mean ~2150 vs. ~1250 queries per second, a hefty performance gain. Maybe worth &lt;a href=&quot;https://sqlite.org/json1.html#performance_considerations&quot;&gt;optimizing a bit&lt;/a&gt;, but of course we would use our own format. So from now on I&#039;ll ignore JSON.&lt;/p&gt;
&lt;p&gt;The queries are actually relatively slow compared to the ~40k req/sec we were playing around with before.
But keep in mind that the average topic has ~500 posts, which is way too much.
This was my first inkling that my test data distribution was probably garbage.
But I didn&#039;t want to generate new test data and just rolled with it.&lt;/p&gt;
&lt;h1&gt;Step 3: Bolting HTTP and SQLite together&lt;/h1&gt;
&lt;p&gt;Now, let&#039;s just take our epoll HTTP server and SQLite and bolt them together.
Well, that was the plan. Reality said &amp;quot;no&amp;quot;.
Turns out SQLite doesn&#039;t have an async API. At least I couldn&#039;t find one.
In fact I think it&#039;s pretty much built for synchronous operation.
I looked at a few bindings for various async systems (e.g. node.js) but they all seem to use a threadpool to make it look async.
Not this time, not for this experiment.
I almost scrapped the entire experiment at that point, basically because two of my toys didn&#039;t fit together.&lt;/p&gt;
&lt;h2&gt;Prefork to the rescue&lt;/h2&gt;
&lt;p&gt;Instead I dumped epoll and did the 2nd most &amp;quot;Hello World&amp;quot; style server design: Prefork (but with threads).
The basic idea is to start a server socket and then spawn a predefined number of worker threads (e.g. 8).
Each thread then calls &lt;code&gt;accept()&lt;/code&gt; to wait for an incoming connection.
When it comes it is handled with good old synchronous blocking I/O calls and when done the connection is closed.
Then the thread calls &lt;code&gt;accept()&lt;/code&gt; again and the cycle repeats.
It&#039;s basically the simplest single-threaded webserver you can image but run several times in parallel.&lt;/p&gt;
&lt;p&gt;That server design relies on the operating system to distribute incoming connections to the worker threads waiting in &lt;code&gt;accept()&lt;/code&gt;.
Also the CPU load can be somewhat erratic and finding the right number of worker threads can be a bit finicky.
At least that&#039;s what I&#039;ve read somewhere.
But: It&#039;s simple and fits the bill for this experiment. So into the pod it goes.&lt;/p&gt;
&lt;p&gt;Funny aside: I read about that server design maybe 20 years ago in the Apache 2 documentation.
But I never used it… until now.
Apache 2 uses worker processes instead of threads and child processes are created by &lt;code&gt;fork()&lt;/code&gt;, hence the name &amp;quot;prefork&amp;quot;.
That gives you better isolation between the workers and you can easily recover when one worker crashes.
But we don&#039;t need that here.
None of that code will ever see production anyway (famous last words…).&lt;/p&gt;
&lt;h2&gt;Write-Ahead Logging&lt;/h2&gt;
&lt;p&gt;While I was searching the SQLite documentation for async stuff I also stumbled across something very interesting: &lt;a href=&quot;https://sqlite.org/wal.html&quot;&gt;Write-Ahead Logging&lt;/a&gt; (aka WAL mode).&lt;/p&gt;
&lt;p&gt;I had an admittedly very outdated mental image of transactions in SQLite:
Each transaction creates a new file and when that transaction is done that file is cleaned up.
Many transactions = many file operations = lots of overhead = slow.&lt;/p&gt;
&lt;p&gt;That is why I was worried about transactions when generating the posts.
For this experiment I wanted every 100th request to insert a new post.
Meaning 99 of 100 requests query all the posts of a random topic, 1 of 100 inserts a new post into a random topic.
With that there would be a lot of transactions with &lt;code&gt;INSERT&lt;/code&gt; statements going on.
Maybe enough to become a bottleneck.&lt;/p&gt;
&lt;p&gt;With WAL mode this isn&#039;t much of a problem anymore. To greatly oversimplify:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The changes of all transactions go into one extra log file.&lt;/li&gt;
&lt;li&gt;When that reaches a given size (e.g. 4 MB) the changes are copied back into the main database file and the log gets reset. That occasional copying back of changes is called a &amp;quot;checkpoint&amp;quot; in the documentation.&lt;/li&gt;
&lt;li&gt;What&#039;s really great about this: Readers and writers don&#039;t block each other.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Every thread can have it&#039;s own database connection doing its own reading and writing, without stepping on each other&#039;s toes.
Only while doing a checkpoint writers get blocked for a bit (or not, see &lt;a href=&quot;https://sqlite.org/c3ref/wal_checkpoint_v2.html&quot;&gt;SQLITE_CHECKPOINT_PASSIVE&lt;/a&gt;).
And if I remember correctly it&#039;s only one &lt;code&gt;fsync()&lt;/code&gt; syscall per checkpoint (instead of one per transaction).&lt;/p&gt;
&lt;p&gt;Seriously, this is awesome! Kudos to whoever thought of this and implemented it.&lt;/p&gt;
&lt;p&gt;To top it off you can even take control of the checkpointing and decide for yourself when and how to do it.
So I did (hey, we&#039;re playing around here and I just found a new toy).
Usually each of the database connections (worker threads in our case) would occasionally push the WAL log above the threshold (of e.g. 4 MB) and do a checkpoint.
Instead I just spawned another thread that does a checkpoint every 1 second.
So this dedicated &amp;quot;checkpointing&amp;quot; thread can do all the slow real I/O work while the readers and writers live it up.
Does it make any sense to do it that way? I don&#039;t know, and didn&#039;t try anything else. &amp;quot;Spherical cow in a vacuum&amp;quot;, remember?&lt;/p&gt;
&lt;p&gt;Getting checkpointing to work that way required some attentive reading of the documentation, though.
In the end I used the &lt;code&gt;SQLITE_DEFAULT_WAL_SYNCHRONOUS=1&lt;/code&gt; &lt;a href=&quot;https://sqlite.org/compile.html&quot;&gt;compile time option&lt;/a&gt; and the dedicated checkpointer thread periodically calls &lt;code&gt;sqlite3_wal_checkpoint_v2()&lt;/code&gt; with mode &lt;code&gt;SQLITE_CHECKPOINT_FULL&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;A little before that I also discovered that I stored the whole thing on an old HDD in my PC. I was hearing a lot of faint HDD seeking sounds while running the tests (or maybe fsync() noises, don&#039;t know).
And with the above configuration I just got one faint HDD seeking sound per second.
Auditory I/O feedback and debugging, now that is something an SSD can&#039;t do. 😄&lt;/p&gt;
&lt;p&gt;Just for completeness sake I ran the whole thing on an SSD.
Didn&#039;t made a significant difference.
But then the whole database was ~200 MB large, meaning it was completely in the kernels page cache anyway.
All normal I/O thus just became in-memory operations and only the occasional &lt;code&gt;fsync()&lt;/code&gt; did any real I/O.
When the whole database doesn&#039;t fit into memory you&#039;ll likely get a completely different picture.&lt;/p&gt;
&lt;h2&gt;Compiling and source code&lt;/h2&gt;
&lt;p&gt;Oh, just for reference:
I was using the SQLite v3.46.0 amalgamation with the recommended compile-time options, gcc v11.4.0 and without optimization.
The amalgamation packs the entire SQLite source code into a single C file (9 MB or so).&lt;/p&gt;
&lt;p&gt;It takes 2 or 3 seconds to compile a program with that but I was too lazy to add a line to my Makefile.
Those were some of the longest compile times I experienced in a few years.
Usually when I do my little experiments in C everything compiles in a fraction of a second.
One of the reasons I like C.
On second thought, maybe I should have spent those few seconds to add a line to my Makefile…&lt;/p&gt;
&lt;p&gt;Another fun anomaly: For some reason compiling with &lt;code&gt;-O2&lt;/code&gt; or even &lt;code&gt;-O1&lt;/code&gt; made the programs quite a bit slower (from ~4.6k req/sec down to ~830 req/sec on my PC).
Maybe some funny combination of SQLite and gcc versions.
Or gcc doesn&#039;t like 9 MB of C code in one large piece.
Since I was I/O bound anyway I didn&#039;t investigate it and just didn&#039;t compile with optimization.&lt;/p&gt;
&lt;p&gt;In the end the whole server program came down to ~450 lines of C code (including comments, blank lines, etc.).
If you&#039;re interested in the code, just ask.
I just don&#039;t want some random person taking some crappy throw-away code from somewhere on the internet and putting it into some mission critical system.
That would be like taking the mad ramblings of some person, putting them in a book (or social network) and selling them as the truth.
And our society doesn&#039;t have the best track record in that regard.
There is good documentation on all the parts I played around with, no reason to add my crappy code to that.&lt;/p&gt;
&lt;h1&gt;Running the server on the Raspberry Pi&lt;/h1&gt;
&lt;p&gt;Ok, now on to the real meat. The test setup:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Raspberry Pi 4b with 4 GB RAM&lt;/li&gt;
&lt;li&gt;Using Raspberry Pi OS Lite (64-bit, released 2024-07-04)&lt;/li&gt;
&lt;li&gt;Compiled the program directly on the Raspberry Pi with gcc v12.2.0&lt;/li&gt;
&lt;li&gt;Server uses 8 worker threads, 1 dedicated checkpointing thread.&lt;/li&gt;
&lt;li&gt;Ran &lt;code&gt;wrk&lt;/code&gt; on my PC, connected to the Raspberry Pi via a 1 GBit/s switch.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As mentioned before 1 in 100 is a POST request inserting a new post.
99 in 100 are GET requests querying all posts of a random topic.
I used a &lt;code&gt;wrk&lt;/code&gt; Lua script to do the POST request with a 1 in 100 chance (&lt;code&gt;math.random(1, 100) == 1&lt;/code&gt;).
This is the full &lt;code&gt;wrk&lt;/code&gt; command:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_bash&quot;&gt;&lt;span class=variation_c&gt;./wrk&lt;/span&gt; &lt;span class=landmark_c&gt;-t&lt;/span&gt; &lt;span class=variation_a&gt;12&lt;/span&gt; &lt;span class=landmark_c&gt;-c&lt;/span&gt; &lt;span class=variation_a&gt;400&lt;/span&gt; &lt;span class=landmark_c&gt;-d&lt;/span&gt; &lt;span class=variation_a&gt;10&lt;/span&gt; &lt;span class=landmark_c&gt;--script&lt;/span&gt; wrk_script.lua http://raspberrypi:&lt;span class=variation_a&gt;8000&lt;/span&gt;/&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At first I used my original test data with that pretty messed up distribution. Because I&#039;m lazy.
Posts have between 57 - 2772 characters (avg. around 1400) and each topic has between 5 - 1000 posts (avg. around 500).
Way too many posts per topic.
Anyway, here are the results (3 runs):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Requests/sec:  161.43    162.39    163.83
Transfer/sec:  111.36MB  111.41MB  111.72MB
CPU usage: 80% - 100% (of 400% because 4 cores)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And the full &lt;code&gt;wrk&lt;/code&gt; output for the 2nd run (just as an example):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Running 10s test @ http://raspberrypi:8001/
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   304.81ms  311.84ms   1.97s    87.31%
    Req/Sec    19.01     14.06   110.00     84.91%
  1638 requests in 10.09s, 1.10GB read
  Socket errors: connect 0, read 0, write 0, timeout 46
Requests/sec:    162.39
Transfer/sec:    111.41MB&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The observant reader will notice that ~112 MB/s is the maximum transfer speed for one 1 GBit/s Ethernet connection.
Also we only utilize about a fourth of the CPU.
So yeah, totally I/O bound on the network connection.&lt;/p&gt;
&lt;p&gt;I was hoping for more, but in retrospect this shouldn&#039;t be surprising with my messed up test data.
99 of 100 requests fetch all posts of a topic.
And ignoring any HTTP and encoding overhead that comes to an average data of:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;500 posts/topic * 1400 bytes/post = 700000 bytes/topic = 700 KByte/topic&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With a 1 GBit/s Ethernet connection this can give us:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;112 MB/s / 0.7 KB/topic = 160 topics/s&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Those are all just back of the envelope estimates.
But they match up with the measurements and confirm that we&#039;re I/O bound because of our data.&lt;/p&gt;
&lt;p&gt;At this point I could have started to think about compression (hey, 3 of 4 cores are idle!), pre-compressing entire topics and only updating them when changed, etc.
But the theme of this experiment is to be lazy and avoid difficult or complex things.
So I just changed my data distribution!&lt;/p&gt;
&lt;h2&gt;Using different test data&lt;/h2&gt;
&lt;p&gt;I looked around for a bit, but the only thing I could find was &lt;a href=&quot;https://www.reddit.com/r/dataisbeautiful/comments/g469ui/how_the_average_comment_length_compares_between/&quot;&gt;How the average comment length compares between subreddits&lt;/a&gt; by tigeer on r/dataisbeautiful.
From what I&#039;ve read his source were comments posted in October July-2019 gathered using the pushshift.io API.
This gives us at least a little bit of information to cook up a new distribution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;100 - 200 characters per post&lt;/li&gt;
&lt;li&gt;2 - 50 posts per topic (no particular source, just less than before)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Again, highly professionally eyeballing it here.
How does the Raspberry Pi handle test data with that distribution? 4 runs this time:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Requests/sec:  7204.74    6049.65    7177.06    6087.99
Transfer/sec:    32.68MB    28.30MB    34.79MB    31.41MB
CPU usage: 150% - 200%&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And one of the &lt;code&gt;wrk&lt;/code&gt; outputs:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Running 30s test @ http://raspberrypi:8001/
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    17.96ms   69.00ms   1.67s    95.44%
    Req/Sec   512.65    185.41     1.63k    73.87%
  183166 requests in 30.09s, 0.92GB read
  Socket errors: connect 0, read 0, write 0, timeout 21
Requests/sec:   6087.99
Transfer/sec:     31.41MB&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&#039;s interesting that we&#039;re not limited by the network bandwidth (only ~30 MB/s from ~112 MB/s).
But we&#039;re also not limited by the CPU since it doesn&#039;t reach 400%.&lt;/p&gt;
&lt;p&gt;This is probably the point where the prefork server model breaks down.
To test this assumption I ran it again with 16 worker threads instead of 8:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Requests/sec:  7363.26    5899.52    5907.11    6461.15
Transfer/sec:    40.19MB    33.00MB    33.87MB    39.12MB
CPU usage: 270% - 300%&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&#039;s hard to tell, but it doesn&#039;t make much of a difference in req/sec.
And this is a common theme of thread-based synchronous I/O systems:
You throw a lot more resources at them but at some point it barely makes a difference (diminishing returns).
There&#039;s so much blocking, waiting, task switching, etc. going on that you can&#039;t really utilize the hardware resources efficiently.&lt;/p&gt;
&lt;p&gt;Eventloops and async I/O are better in that regard, but all-in-all they&#039;re a lot more complex.
We already get 6k - 7k req/sec with what amounts to an almost &amp;quot;Hello World!&amp;quot; server bolted together with SQLite. That gives it a hell of an complexity-to-performance ratio.&lt;/p&gt;
&lt;p&gt;By the way: All of this I/O was going to the SDCard of the Pi.
Not sure how much that matters since all the data would be in the kernels page cache, but &lt;code&gt;fsync()&lt;/code&gt; will probably have to say something about that.&lt;/p&gt;
&lt;h2&gt;Running it on the PC&lt;/h2&gt;
&lt;p&gt;Just for the lols I ran the same benchmark on my PC (8-core AMD Ryzen 7 5700X) with 8 worker threads.
This time everything runs on the same machine, so we won&#039;t be limited by the network bandwidth:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Requests/sec:  51716.15    42627.91    32928.37    27801.58
Transfer/sec:    265.33MB    270.44MB    240.95MB    245.43MB
CPU usage of server process:  420% - 580%
CPU usage of wrk process:    ~250%&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Full CPU utilization would be 1600% because of hyper-threading.
Honestly, those numbers make me shudder to think what a more server-like system with a Threadripper, 256 GB of RAM and several 10 GBit/s network ports could handle.&lt;/p&gt;
&lt;h1&gt;Very rough estimates of Reddit read/write ops&lt;/h1&gt;
&lt;p&gt;Up until now I did what every (professional) (modern) (web-)developer does: Completely ignoring the purpose of the whole thing and just playing around with my favorite tools. I sincerely hope this is irony on my part, but I let you be the judge of that. Someone once told me that &amp;quot;professional&amp;quot; only means you get paid for what you do, not that you&#039;re doing something in a proficient manner.&lt;/p&gt;
&lt;p&gt;Anyway, I tried to get some numbers about Reddit. After all I wanted to know if one could host a greatly over-simplified Reddit-like thing on a Raspberry Pi (&amp;quot;spherical cow in a vacuum&amp;quot;, remember?). Specifically how many new posts and comments come in every second. Well, I didn&#039;t find much. In the end the best I found was &lt;a href=&quot;https://backlinko.com/reddit-users&quot;&gt;Reddit User and Growth Stats&lt;/a&gt; which seems to be based on the Reddit SEC filings and Statista. I tried to look at the SEC filings but couldn&#039;t make heads or tails of it. And the Statista data is behind a paywall.&lt;/p&gt;
&lt;p&gt;For whatever reason they mostly care about active users, revenue and other unimportant stuff. But thankfully the page listed posts and comments by year and I only cared about 2024:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; 550 million posts    in 2024
2.72 billion comments in 2024&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In keeping with the theme of this experiment that estimate is good enough.&lt;/p&gt;
&lt;p&gt;The numbers sounds awfully impressive.
Nice, large numbers for which at least I don&#039;t have any reference point.
Except maybe the world population or something equally abstract.
But then, that stuff was made to impress investors so they give you money.&lt;/p&gt;
&lt;p&gt;So… let&#039;s do some really sophisticated analysis and divide those yearly numbers by the number of seconds in a year.
For reference: A year has more or less &lt;code&gt;365 * 24 * 60 * 60&lt;/code&gt; = &lt;code&gt;31 536 000&lt;/code&gt; = 31.5 million seconds.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; 550 million posts    =   550 000 000 / (365*24*60*60) = 17.5 posts/s
2.72 billion comments = 2 720 000 000 / (365*24*60*60) = 86.25 comments/s&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let&#039;s call it ~100 inserts/sec in total.&lt;/p&gt;
&lt;p&gt;For some reason I expected that number to be much much higher.
Sure, this is an average and there will be spikes.
And the distribution across different subreddits will be very uneven.
But even then, what&#039;s going to prevent you from giving each bigger subreddit-alike its own Raspberry Pi?&lt;/p&gt;
&lt;h1&gt;Closing words&lt;/h1&gt;
&lt;p&gt;In a kind-of funny way this number really took the wind out of the entire experiment.
Realistically nothing I&#039;ll ever build will come close to needing the capacity a single Raspberry Pi can provide.
My motivation to try different configurations (e.g. with checkpointing) just evaporated.
And mind you, this was just a few days of fooling around, avoiding complexity and maybe gluing stuff together in a slightly less stupid way than usual.
At least in regards to requests per second.&lt;/p&gt;
&lt;p&gt;Bandwidth, caching, storing accumulated data, images, videos, and so on are matters outside this experiment&#039;s particular &amp;quot;spherical cow in a vacuum&amp;quot;.
And I hope you keep that in mind.&lt;/p&gt;
&lt;p&gt;Anyway, the next time someone wants to use &amp;lt;insert your favorite overcomplicated stack here&amp;gt; because we need to &amp;quot;scale&amp;quot; or &amp;quot;that&#039;s how everyone else does it&amp;quot; I can ask if we need to scale beyond one Raspberry Pi.
And can back that up with some numbers. 😄
Which honestly don&#039;t matter anyway in such discussions, it&#039;s just about favorite toys.
Might as well throw in my own as well.&lt;/p&gt;
&lt;p&gt;I would much rather support the Raspberry Pi foundation with my money, not AWS.
But I have to give the sales people of cloud providers credit, they trained web developers well.
Pavlov would be proud. 😄&lt;/p&gt; 
		</content>
	</entry>
	<entry>
		<title>Simple good quality subpixel text rendering in OpenGL with stb_truetype and dual source blending</title>
		<author>
			<name>Stephan Soller</name>
		</author>
		<id>http://arkanis.de/weblog/2023-08-14-simple-good-quality-subpixel-text-rendering-in-opengl-with-stb-truetype-and-dual-source-blending</id>
		<link rel="alternate" href="https://arkanis.de/weblog/2023-08-14-simple-good-quality-subpixel-text-rendering-in-opengl-with-stb-truetype-and-dual-source-blending/" />
		<link rel="replies" type="text/html" href="https://arkanis.de/weblog/2023-08-14-simple-good-quality-subpixel-text-rendering-in-opengl-with-stb-truetype-and-dual-source-blending/#comments" />
		<link rel="replies" type="application/atom+xml" href="https://arkanis.de/weblog/2023-08-14-simple-good-quality-subpixel-text-rendering-in-opengl-with-stb-truetype-and-dual-source-blending/comments.xml" />
		<updated>2023-08-14T16:23:21+00:00</updated>
		<content type="html" xml:base="https://arkanis.de/weblog/2023-08-14-simple-good-quality-subpixel-text-rendering-in-opengl-with-stb-truetype-and-dual-source-blending/">
&lt;p&gt;Now that title is a hand full.
In the last post I looked at rendering UI rectangles with OpenGL 4.5.
The next step on that journey is text rendering, more specifically &lt;a href=&quot;https://en.wikipedia.org/wiki/Subpixel_rendering&quot;&gt;subpixel font rendering&lt;/a&gt;.
So instead of rather boring rectangles we&#039;re now also throwing individual characters (&amp;quot;glyphs&amp;quot;) at the screen. Way more interesting. 🙂&lt;/p&gt;
&lt;p&gt;I wanted to look into subpixel font rendering for quite a while now, especially to test out dual source blending (where you can blend each subpixel individually).
In the end I think I found a pretty good sweet spot between complexity and quality and wanted to document that.
And of course I also went down some interesting rabit holes (e.g. gamma correction) with unexpected results.
It&#039;s suprisingly difficult to find hard or complete information on subpixel text rendering and as a result this post became rather… long.&lt;/p&gt;
&lt;p&gt;This post isn&#039;t an introduction into font rendering. Rather it describes the steps necessary to get a good result in OpenGL and sometimes the stuff nobody talks about.
If you stumble over an unknown word (e.g. hinting or kerning) feel free to look them up. This post is way to massive as it is.&lt;/p&gt;
&lt;p&gt;In case anyone asks because 4k displays and stuff: All (except one) of my displays are 1080p and 99.99…% of the text I read is horizontal.
Subpixel anti-aliased text is a lot more comfortable to read for me but that depends on your eye sight (a friend of mine can hardly tell the difference).
If you can&#039;t tell the difference: Just don&#039;t bother. But if you write UI software for others you might want to occasionally borrow someone elses eyeballs.
On mobile and tablets it&#039;s a different story and I don&#039;t really care much about it. Grayscale anti-aliasing seems good enough there and the subpixel layouts would drive you crazy anyway.&lt;/p&gt;
&lt;p&gt;Ok, with all that out of the way some results first:&lt;/p&gt;
&lt;script&gt;
    const article = document.currentScript.closest(&quot;article&quot;)

    // Preload all images for seamless switching
    for (const button of article.querySelectorAll(&quot;figure button[data-img-src]&quot;)) {
        const img = new Image()
        img.src = button.dataset.imgSrc
        img.decode()
    }

    // Handle image switch buttons
    article.addEventListener(&quot;click&quot;, event =&gt; {
        if ( event.target.matches(&quot;figure:has(img) button[data-img-src]&quot;) ) {
            event.target.closest(&quot;figure&quot;).querySelector(&quot;img&quot;).src = event.target.dataset.imgSrc
            event.preventDefault()
        }
    })
&lt;/script&gt;
&lt;figure class=&quot;long-caption&quot;&gt;
    &lt;img src=&quot;01_results_02_waterfall_ubuntu.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;01_results_02_waterfall_ubuntu.png&quot;&gt;Ubuntu&lt;/button&gt;
            &lt;button data-img-src=&quot;01_results_01_fontmanager-freetype_ubuntu.png&quot;&gt;Ubuntu (FreeType)&lt;/button&gt;
            &lt;button data-img-src=&quot;01_results_04_waterfall_dejavu-serif.png&quot;&gt;Dejavu Serif&lt;/button&gt;
            &lt;button data-img-src=&quot;01_results_03_fontmanager-freetype_dejavu-serif.png&quot;&gt;Dejavu Serif (FreeType)&lt;/button&gt;
        &lt;/p&gt;&lt;p&gt;
            &lt;button data-img-src=&quot;01_results_05_textblock_dejavu-sans.png&quot;&gt;Text block&lt;/button&gt;
            &lt;button data-img-src=&quot;01_results_06_text-over-image_ubuntu-italic.png&quot;&gt;Text over image&lt;/button&gt;
            &lt;button data-img-src=&quot;01_results_07_source-code-font-gallery.png&quot;&gt;Source code &amp;amp; fonts&lt;/button&gt;
            &lt;button data-img-src=&quot;01_results_08_source-code-font-gallery_coverage-adjust_linear-0.20.png&quot;&gt;Increased weight&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;Click the buttons to switch between various images.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;The first 4 images&lt;/i&gt;: Ubuntu and Dejavu Serif, rendered output vs. FreeType (via FontManager). I didn&#039;t bother to line up the exact line positions so they jump around a bit.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Text block&lt;/i&gt;: Dejavu Sans 8pt.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Text over image&lt;/i&gt;: Ubuntu Italic 12px from black to white with white to black 1px text shadow over &quot;Haruhism - The Melancholy of Haruhi Suzumiya&quot; (cycles through a lot of colors, useful to spot artifacts).
           Only the first few and last few lines make sense, the rest is just gray text on a gray text shadow on top of an image.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Source code &amp;amp; fonts&lt;/i&gt;: Dark and light source code (Source Code Pro, Sublime Monokai &amp;amp; Breakers color schemes, 12px) and various different fonts at 10pt.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Increased weight&lt;/i&gt;: Same as before but with the subpixel coverages adjusted (linearly) for increased weight.&lt;/p&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The quality is ok for the most part (at least for my eyes). Not perfect, not as sharp as it could be, but ok. Good enough for home-cooked UIs. But on smaller font sizes the hinting of FreeType clearly makes a big difference.&lt;/p&gt;
&lt;p&gt;What it offers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Simpicity&lt;/li&gt;
&lt;li&gt;Can blend text over arbitrary background pixels (on top of images, for semi-transparent UIs, as a HUD in games, etc.)&lt;/li&gt;
&lt;li&gt;Just needs one texture atlas entry per glyph and font size (no need to store variants for subpixel shifts, background colors, etc.)&lt;/li&gt;
&lt;li&gt;No dependency on system librarys (Pango, DirectWrite, etc.)&lt;/li&gt;
&lt;li&gt;Only uses stb_truetype (self-contained single header library) for glyph rasterization&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What it doesn&#039;t do:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hinting (nither vertical nor horizontal, and it shows)&lt;/li&gt;
&lt;li&gt;Different subpixel layouts (just standard RGB)&lt;/li&gt;
&lt;li&gt;Anything fancy with signed distance fields&lt;/li&gt;
&lt;li&gt;Ligatures&lt;/li&gt;
&lt;li&gt;Emojis or glyphs with multiple colors&lt;/li&gt;
&lt;li&gt;Variable fonts&lt;/li&gt;
&lt;li&gt;Fancy effects like text shadow&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Still interested? Here&#039;s how it works (pretty images &lt;a href=&quot;#gl45-subpixel-text-rendering-how-it-works&quot;&gt;below&lt;/a&gt;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rasterize the glyph with 3x the horizontal resolution using &lt;a href=&quot;https://github.com/nothings/stb/blob/master/stb_truetype.h&quot;&gt;stb_truetype&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Apply the &lt;a href=&quot;https://freetype.org/freetype2/docs/reference/ft2-lcd_rendering.html&quot;&gt;FreeType LCD filter&lt;/a&gt; to mitigate color fringes&lt;/li&gt;
&lt;li&gt;Upload the filtered glyph into an RGB texture atlas&lt;/li&gt;
&lt;li&gt;Do subpixel positioning while iterating over the glyphs and &lt;a href=&quot;https://jcgt.org/published/0002/01/04/&quot;&gt;later in the shader&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Blend each subpixel individually with the framebuffer by using &lt;a href=&quot;https://stackoverflow.com/questions/48491340/use-rgb-texture-as-alpha-values-subpixel-font-rendering-in-opengl#answer-51679209&quot;&gt;dual-source blending&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Optional: Adjust the subpixel coverages before blending to make the text a bit thinner or bolder&lt;/li&gt;
&lt;li&gt;Optional: I also use pre-multiplied alpha for the text color. Not really necessary here but makes life a lot easier once you have transparent borders, backgrounds, etc. and have to combine them.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Pretty much your run-of-the-mill texture atlas thing with just 3 subpixels in a pixel so to speak. Ok, maybe the LCD filter and subpixel positioning adds a bit extra, but that&#039;s it.
Each of the steps is explained in detail below, as well as a few interesting side quests (e.g. gamma correction), some future ideas and interesting references.
This will be a long one, here&#039;s a small table of contents to get around:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-demo&quot;&gt;Demo program&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-how-it-works&quot;&gt;How it works&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-subpixel-res&quot;&gt;Subpixel resolution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-lcd-filter&quot;&gt;FreeType LCD filter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-subpixel-pos&quot;&gt;Subpixel positioning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-subpixel-blend&quot;&gt;Subpixel blending aka dual-source blending&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-coverage-adjust&quot;&gt;Optional: Coverage adjustment for a thinner or bolder look&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-pre-mult-alpha&quot;&gt;Optional: Pre-multiplied alpha&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-paths-not-taken&quot;&gt;Paths not taken&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-gamma-correction&quot;&gt;The gamma correction rabit hole&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-manual-blending-bg&quot;&gt;Manual blending with a known background color&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-dynamic-outline-adjustment&quot;&gt;Dynamically choose coverage adjustment based on text and background color&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-future-ideas&quot;&gt;Ideas for the future&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-future-sdf-shadows&quot;&gt;Signed distance fields for text shadows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-future-oversampling&quot;&gt;2x2 oversampling for animated text&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-future-pango-directwrite&quot;&gt;Use DirectWrite and Pango when available&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-future-known-bg-color&quot;&gt;Do something useful when the background color is known&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gl45-subpixel-text-rendering-references&quot;&gt;Other approaches and interesting references&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At that point I want to apologize: The blog was never designed to cope with posts as massive as this one. It might feel a bit cramped and endless. Sorry about that.&lt;/p&gt;
&lt;h1 id=&quot;gl45-subpixel-text-rendering-demo&quot;&gt;Demo program&lt;/h1&gt;
&lt;p&gt;A simplified demo program is &lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering&quot;&gt;available on GitHub&lt;/a&gt;.
It uses OpenGL 4.5 and is written for Linux and Windows (might work on MacOS, don&#039;t know).&lt;/p&gt;
&lt;p&gt;The demo is pretty much just one large piece of code in the &lt;code&gt;main()&lt;/code&gt; function that renders a given string.
The code is mostly meant as a reference in case you&#039;re interested in the nitty gritty details.
Hence no abstractions that you just have to crack open to see what exactly is going on.
It&#039;s probably a bit unusual and not &amp;quot;pretty&amp;quot; but you can slice the steps down later on however you want.
I&#039;ll directly link to the relevant pieces of code later on.&lt;/p&gt;
&lt;p&gt;The project uses &lt;a href=&quot;https://github.com/nothings/stb/blob/master/stb_truetype.h&quot;&gt;stb_truetype.h&lt;/a&gt; for glyph rasterization, mostly because it&#039;s simple and just one self-contained header file.
But it doesn&#039;t support hinting and that&#039;s the reason I don&#039;t do it. Otherwise I would probably do vertical hinting as mentioned in &lt;a href=&quot;https://jcgt.org/published/0002/01/04/&quot;&gt;Higher Quality 2D Text Rendering&lt;/a&gt; by Nicolas P. Rougier.
Horizontal hinting is a more complex story, though.&lt;/p&gt;
&lt;p&gt;After writing most of this post I remembered that rounding the font size to the nearest whole pixel can serve as poor vertical hinting (read that somewhere, not sure where).
I played around with that but it wasn&#039;t convincing. 8pt looked a bit better but other sizes sometimes worse. And I&#039;m planning on using pixel font sizes instead of points anyway.
It wasn&#039;t worth redoing all the images so I tucked it in here. It won&#039;t be mentioned again in the rest of the post.&lt;/p&gt;
&lt;p&gt;The texture atlas used by the demo is just a mockup that&#039;s horribly limited and inefficient.
But it&#039;s simple enough to not distract from the text rendering.
If you need a texture atlas you might want to look at &lt;a href=&quot;https://mozillagfx.wordpress.com/2021/02/04/improving-texture-atlas-allocation-in-webrender/&quot;&gt;Improving texture atlas allocation in WebRender&lt;/a&gt; by Nical for some ideas.&lt;/p&gt;
&lt;h1 id=&quot;gl45-subpixel-text-rendering-how-it-works&quot;&gt;How it works&lt;/h1&gt;
&lt;p&gt;Before we go into each step I just wanted to show the effects and purpose of each one.&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&quot;02_features_01_res-off_pos-off_blend_off.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;02_features_01_res-off_pos-off_blend_off.png&quot;&gt;Pixel res.&lt;/button&gt;
            &lt;button data-img-src=&quot;02_features_02_res-on_pos-off_blend_off.png&quot;&gt;Subpixel res.&lt;/button&gt;
            &lt;button data-img-src=&quot;02_features_03_res-filtered_pos-off_blend_off.png&quot;&gt;LCD filter&lt;/button&gt;
            &lt;button data-img-src=&quot;02_features_04_res-filtered_pos-on_blend-off.png&quot;&gt;Subpixel pos.&lt;/button&gt;
            &lt;button data-img-src=&quot;02_features_05_res-filtered_pos-on_blend-on.png&quot;&gt;Dual source blend&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;Use the buttons to flip through the effects of each step.&lt;/p&gt;
    &lt;/figcaption&gt;

&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;Subpixel resolution gives us more resolution (from ~100dpi to ~300dpi on 1080p displays). This makes the text sharper but causes color fringes.&lt;/li&gt;
&lt;li&gt;The FreeType LCD filter then exchanges some of that resolution to avoid most color fringes. It&#039;s basically a slight 1D blur.&lt;/li&gt;
&lt;li&gt;Subpixel positioning allows us to use the enhanced resolution to get rid of those uneven letter spacings. Especially visible in the name of the &amp;quot;Ubuntu Bold&amp;quot; font or in the light-on-dark text.&lt;/li&gt;
&lt;li&gt;Dual source blending then gets rid of those ugly fringy blending artifacts, especially on bright backgrounds. It helps sharpness and colors quite a bit, too.
All steps before just used the average of the subpixel coverages as alpha. There are way more complex ways to do this but it illustrates that there will be artifacts if you don&#039;t use dual source blending.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These images illustrates that you kind of need the whole set to get a good looking result without visible artifacts.
Maybe you can get away with skipping subpixel positioning, but reading that kind of text is annoying as heck. Sometimes a word just falls apart like &amp;quot;live&amp;quot; into &amp;quot;li&amp;quot; and &amp;quot;ve&amp;quot; above.
For me at least it boils down to: Do all of them or neither.
Only using pixel resolution at least looks sharp.&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-subpixel-res&quot;&gt;Subpixel resolution&lt;/h2&gt;
&lt;p&gt;I mention this step mostly for completeness and to avoid confusion.
Just rasterize the glyph as a &amp;quot;grayscale image&amp;quot; with 3 times the horizontal resolution, that&#039;s it (&lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L597&quot;&gt;demo code&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;That gives us a value between 0..255 for each subpixel.
Those values are linear coverages, 0 means the subpixel is 0% covered by the glyph, 128 means 50% covered and 255 means 100% covered.&lt;/p&gt;
&lt;p&gt;A note on padding: The LCD filter below needs 1px padding to the left and right. Subpixel positioning might also shift the glyph almost 1px to the right, so we need another 1px padding &lt;em&gt;to the left&lt;/em&gt; for that as well.
It makes sense to take that padding into account when you rasterize the glyph into a bitmap.
Then you don&#039;t have to move the glyph around after rasterization by doing error prone coordinate calculations and extensive bounds checking in the filtering step.&lt;/p&gt;
&lt;p&gt;stb_truetype has subpixel functions as well. But these do a lot of what we do on the CPU and you can&#039;t really do a simple texture atlas thing with them. Also we don&#039;t need them since we have dual source blending. 🙂&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-lcd-filter&quot;&gt;FreeType LCD filter&lt;/h2&gt;
&lt;p&gt;Straight forward subpixel rendering can be a bit jaring for the eye. I hope the images above showed that.
All those subpixels &lt;em&gt;do&lt;/em&gt; have a color and this causes the jaring color fringes.
The FreeType LCD filter is basically a slight horizontal blur. It distributes the brightness of one subpixel across neighboring subpixels (that have different colors) and thus gets rid of most color fringes.&lt;/p&gt;
&lt;p&gt;To apply the filter you basically look from 2 subpixels to the left to two subpixels to the right (5 subpixels in total) and add up their value.
The further away a subpixel is from the current one the less it contributes to the sum (aka it&#039;s a weighted sum or a 1D kernel with 5 weights).&lt;/p&gt;
&lt;p&gt;The filter weights are documented in the &lt;a href=&quot;https://freetype.org/freetype2/docs/reference/ft2-lcd_rendering.html&quot;&gt;FreeType docs&lt;/a&gt; as &lt;code&gt;FT_LCD_FILTER_DEFAULT&lt;/code&gt;: &lt;code&gt;[0x08 0x4D 0x56 0x4D 0x08]&lt;/code&gt; in 1/256 units, or about [0.031, 0.302, 0.337, 0.302, 0.031].
Meaning the outermost subpixels contribute ~3%, our neighboring subpixels ~30% and out own subpixel ~34%. All adds up to 100% so a fully covered area stays 100% covered except at the fringes
(&lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L617&quot;&gt;demo code&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;A note on padding pixels again: Just add 1px padding at the left and right side.
The filter is 5 subpixels wide, so the subpixels at the edge of a glyph can distribute their values at max 2 subpixels into that left and right padding.
Hence the 1px (3 subpixel) horizontal padding.
The subpixel positioning step below also adds 1px padding at the left (comming up right below).&lt;/p&gt;
&lt;p&gt;I also recommend to use the same bitmap size for glyph rasterization and filtering.
Then you can skip error prone coordinate calculations and extensive bounds checking.
It makes the filtering code a lot simpler even if the bitmap for glyph rasterization is a few (3) pixels wider than necessary.&lt;/p&gt;
&lt;p&gt;At first I did the filtering in place until I realized that the filter then reads its own output of the previous subpixels (late-night coding…).
That distorts the brightness of the text a bit but at least to me it wasn&#039;t really noticable.
Instead the filter should read from the glyph bitmap and writes the filtered output into a 2nd bitmap.
Anyway, if you really want to you can probably get away with just using one bitmap and do the filtering in-place.
Here is a small comparison and diff.&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&quot;03_filter_01_in-place.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;03_filter_01_in-place.png&quot;&gt;in-place filter&lt;/button&gt;
            &lt;button data-img-src=&quot;03_filter_02_buffered.png&quot;&gt;filter into extra buffer&lt;/button&gt;
            &lt;button data-img-src=&quot;03_filter_03_diff.png&quot;&gt;diff&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;The difference between filtering in-place and into a 2nd bitmap. Play spot the difference or look at the diff if you can&#039;t. The diff was created using GIMPs &quot;Difference&quot; blend mode.&lt;/p&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;aside&gt;
Update: A few months later I stumbled upon a situation where the in-place filter caused very visible (and ugly) color fringes. Unfortunately I no longer remember the details.
But I recommend to apply the LCD filter properly with an extra buffer.
&lt;/aside&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-subpixel-pos&quot;&gt;Subpixel positioning&lt;/h2&gt;
&lt;p&gt;Subpixel positioning is relevant in two places:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;When iterating over all the characters (glyphs) and calculating the rectangle positions for each one.&lt;/p&gt;
&lt;p&gt;Just don&#039;t round there.
Use a &lt;code&gt;float&lt;/code&gt; to keep track of the current x position, for advancing from glyph to glyph and when doing kerning (&lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L550&quot;&gt;kerning in the demo code&lt;/a&gt;).
When calculating the glyphs rectangle position you can round down the x coordinate and send the remaining fraction (between 0…1) off to the shader (aka the subpixel shift in the &lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L684&quot;&gt;demo code&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The fragment shader then takes the subpixel shift and, well, shifts the glyph coverages by that amount to the right.&lt;/p&gt;
&lt;p&gt;This is done with a linear interpolation between neighboring subpixels (&lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L347&quot;&gt;demo code&lt;/a&gt;).
Technically this means we can shift a glyph by fractions much smaller than a subpixel.
But at some point no one can tell the difference anyway.
I found this in the paper &lt;a href=&quot;https://jcgt.org/published/0002/01/04/&quot;&gt;Higher Quality 2D Text Rendering&lt;/a&gt; by Nicolas P. Rougier which even contains the GLSL code for it (section &amp;quot;2.3. Subpixel Positioning&amp;quot;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The paper also has a small section (and nice image) on kerning if you want to read up on that.&lt;/p&gt;
&lt;p&gt;You might notice that the GLSL shader code looks like it shifts the subpixels to the left. For whatever reason this confused me in this context but it&#039;s just a matter of perspective.
To shift something to the right on the screen a pixel at a fixed position has to show data to the left of itself. It&#039;s the same way with texture coordinates and a lot of other stuff in fragment shaders.&lt;/p&gt;
&lt;p&gt;That&#039;s also the reason why we need an extra 1px padding &lt;em&gt;to the left&lt;/em&gt;. The first pixel might look almost 1px to the left of itself and we don&#039;t want it to access anyone elses pixels.&lt;/p&gt;
&lt;p&gt;You can also do the position calculations on the CPU in font units (ints) if you like. But a float will do. Even at 8k you still have a precision of 0.001 (a 1000th of a pixel).&lt;/p&gt;
&lt;p&gt;No fancy images this time. The images in &lt;a href=&quot;#gl45-subpixel-text-rendering-how-it-works&quot;&gt;How it works&lt;/a&gt; already show the effect pretty well.&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-subpixel-blend&quot;&gt;Subpixel blending aka dual-source blending&lt;/h2&gt;
&lt;p&gt;Now we&#039;re probably getting to the star of the show.
Usual alpha blending works with one blend weight per pixel (alpha) to blend the fragment shader output with the contents of the framebuffer.
But with dual-source blending we can give the hardware one blend weight per component. Meaning we can blend each subpixel individually. 🙂&lt;/p&gt;
&lt;p&gt;With that we can directly use the subpixel coverages we get from the subpixel positioning.
Just blend each subpixel of the text color on top of the framebuffer with its own weight.
That&#039;s it. No magic involved.
Relevant demo code:
&lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L717&quot;&gt;Setup in OpenGL&lt;/a&gt;,
&lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L344&quot;&gt;setup in shader&lt;/a&gt; and
&lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L398&quot;&gt;setting the blend weights&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It&#039;s core in &amp;quot;desktop&amp;quot; OpenGL since 3.3. For mobile systems GL_EXT_shader_framebuffer_fetch might be worth a look if you feel adventurous.&lt;/p&gt;
&lt;p&gt;Unfortunately almost nobody mentions dual-source blending in regards to subpixel text rendering.
I found it &lt;a href=&quot;dual-source-blending-post&quot;&gt;here&lt;/a&gt; quite a few years ago and had it on my &amp;quot;I have to try this!&amp;quot; list ever since. Only took me 5 years or so to finally get around to it. 😄&lt;/p&gt;
&lt;p&gt;The OpenGL wiki also cover is reasonably well (&lt;a href=&quot;https://www.khronos.org/opengl/wiki/Blending#Dual_Source_Blending&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://www.khronos.org/opengl/wiki/Fragment_Shader#Dual-source_blending&quot;&gt;here&lt;/a&gt;).
Basically you have to define another output for the blend weights and then tell the blend function to use those weights (switching from SRC_ALPHA to SRC1_COLOR):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_glsl&quot;&gt;&lt;span class=backdrop_a&gt;// Fragment shader&lt;/span&gt;
&lt;span class=landmark_a&gt;layout&lt;/span&gt;(location = &lt;span class=variation_a&gt;0&lt;/span&gt;, index = &lt;span class=variation_a&gt;0&lt;/span&gt;) &lt;span class=landmark_a&gt;out&lt;/span&gt; &lt;span class=landmark_c&gt;vec4&lt;/span&gt; fragment_color;
&lt;span class=landmark_a&gt;layout&lt;/span&gt;(location = &lt;span class=variation_a&gt;0&lt;/span&gt;, index = &lt;span class=variation_a&gt;1&lt;/span&gt;) &lt;span class=landmark_a&gt;out&lt;/span&gt; &lt;span class=landmark_c&gt;vec4&lt;/span&gt; blend_weights;

&lt;span class=backdrop_a&gt;// OpenGL setup for &amp;quot;normal&amp;quot; alpha blending (no pre-multiplied alpha)&lt;/span&gt;
&lt;span class=variation_c&gt;glBlendFunc&lt;/span&gt;(GL_SRC1_COLOR, GL_ONE_MINUS_SRC1_COLOR);
&lt;span class=backdrop_a&gt;// Or for pre-multiplied alpha (see below)&lt;/span&gt;
&lt;span class=variation_c&gt;glBlendFunc&lt;/span&gt;(GL_ONE, GL_ONE_MINUS_SRC1_COLOR);&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Again no fancy pictures. Jump back to &lt;a href=&quot;#gl45-subpixel-text-rendering-how-it-works&quot;&gt;How it works&lt;/a&gt; to satisfy your visual cravings.
But on the bright side: With this step you&#039;re done! The rest of this post is purely optional and informative (and maybe entertaining).&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-coverage-adjust&quot;&gt;Optional: Coverage adjustment for a thinner or bolder look&lt;/h2&gt;
&lt;p&gt;In some situations you might want a bolder look (e.g. for a bit more contrast in colored source code) or a thinner look (e.g. black on white in some fonts).
This time there are fancy picture again. Lots of them actually. 🙂
Maybe first think about different usecases (terminal, notes, code editor, game UI, etc.) and then try to look for the most pleasent option for each one.&lt;/p&gt;
&lt;figure class=&quot;side-by-side&quot;&gt;
    &lt;img src=&quot;04_coverage-adjust_01_unmodified.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;Effects of various ways of coverage adjustment to get a thinner or bolder look.&lt;/p&gt;
        &lt;p&gt;First image: Unmodified coverages.&lt;/p&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_01_unmodified.png&quot;&gt;unmodified&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;Linear: Linear modification of coverages, negative = thinner, positive = bolder.&lt;/p&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_06_linear_-0.40.png&quot;&gt;-0.4&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_05_linear_-0.20.png&quot;&gt;-0.2&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_04_linear_-0.10.png&quot;&gt;-0.1&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_01_unmodified.png&quot;&gt; 0.0&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_02_linear_+0.20.png&quot;&gt;+0.2&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_03_linear_+0.40.png&quot;&gt;+0.4&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;Exponent: &lt;code&gt;pow()&lt;/code&gt; function (coverage&lt;sup&gt;exponent&lt;/sup&gt;), &amp;gt; 1 is thinner, &amp;lt; 1 is bolder.&lt;/p&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_08_exponent_1.43.png&quot;&gt;1.43&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_07_exponent_1.20.png&quot;&gt;1.20&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_01_unmodified.png&quot;&gt;1.00&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_09_exponent_0.83.png&quot;&gt;0.83&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_10_exponent_0.70.png&quot;&gt;0.70&lt;/button&gt;
            &lt;button data-img-src=&quot;04_coverage-adjust_11_exponent_0.50.png&quot;&gt;0.50&lt;/button&gt;
        &lt;/p&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;I hope you picked your favourite. 🙂 If so maybe let me know via a comment at the end of the post.&lt;/p&gt;
&lt;p&gt;How is it done? Well, the gradient from the outside of a glyph to the inside is maybe 2 to 3 subpixels wide.
1 subpixel from the rasterizration with 3x horizontal resolution. The FreeType LCD filter then blurs that to 2 or maybe 3 subpixels.
For me at least this was easier to understand with a bit of sketching:&lt;/p&gt;
&lt;figure&gt;
    &lt;figcaption&gt;Sketch of coverage values along the X-axis, from the outside of the glyph (coverage 0) to the inside (coverage 1.0). Subpixels are marked for scale on the X-axis.&lt;/figcaption&gt;
    &lt;img src=&quot;05_coverage-adjust-sketches_01_unmodified.png&quot;&gt;
&lt;/figure&gt;
&lt;p&gt;This isn&#039;t how it really looks. The FreeType LCD filter blurs it so it&#039;s more of a soft transition. But I found it easier to reason about it this way.&lt;/p&gt;
&lt;p&gt;So far I&#039;ve come up with the above two approaches to distort that gradient line a bit: Via an exponent or via a linear modification of the gradients slope.&lt;/p&gt;
&lt;h3&gt;Coverage adjustment via an exponent&lt;/h3&gt;
&lt;figure class=&quot;inline important&quot;&gt;
    &lt;figcaption&gt;Coverage values along the X-axis when distorted by a power function (coverage&lt;sup&gt;exponent&lt;/sup&gt;).&lt;/figcaption&gt;
    &lt;img src=&quot;05_coverage-adjust-sketches_02_exponent.png&quot;&gt;
&lt;/figure&gt;
&lt;p&gt;The sketch overemphasizes a bit for melodramatic effect by using an exponent of ~2.2.
If you throw it in a graph plotter and look at &lt;code&gt;x&lt;sup&gt;1.43&lt;/sup&gt;&lt;/code&gt; the effect is much more subtle.
In GLSL code this simply is:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_glsl&quot;&gt;pixel_coverages = &lt;span class=variation_c&gt;pow&lt;/span&gt;(pixel_coverages, &lt;span class=landmark_c&gt;vec3&lt;/span&gt;(coverage_adjustment))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Where &lt;code&gt;pixel_coverages&lt;/code&gt; is a &lt;code&gt;vec3&lt;/code&gt; of the subpixel coverages along the x-axis, basically our subpixel values after subpixel positioning is done.
If &lt;code&gt;coverage_adjustment&lt;/code&gt; is 1.43 the font becomes thinner (like the lower curve), if it&#039;s 0.70 (or 1/1.43) it becomes bolder (like the upper curve).
You can flip the effect with &lt;code&gt;1 / coverage_adjustment&lt;/code&gt;, e.g. to do thinning by the same amount you previously did thickening and vice versa.
A value of 1.0 does nothing.
I&#039;ve included it in the demos &lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L381&quot;&gt;fragment shader&lt;/a&gt; but it&#039;s commented out there in favor of the next approach.&lt;/p&gt;
&lt;p&gt;To some of you this might look eerily familiar, and yes, this came out of the gamma correction rabit hole (more about that later).
Basically I wanted to understand what the hell &amp;quot;gamma-correct&amp;quot; blending with a gamma of 1.43 actually does, because it doesn&#039;t do gamma correction.
And after a lot of head-scratching, experiments and color-space stuff I think that this outline distortion effect is probably what most people are after when they do that &amp;quot;gamma-correct&amp;quot; blending with a gamma of 1.43.
Btw. blending in that 1.43 thing color space unbalances light and dark fonts a bit (black on white becomes a thinner while white on black becomes bolder).
We don&#039;t have that problem here since we&#039;re only adjusting the coverages and don&#039;t do any fancy distorted blending.
I intentionally called it &amp;quot;coverage adjustment&amp;quot; to make clear that this isn&#039;t gamma correction.&lt;/p&gt;
&lt;h3&gt;Linear coverage adjustment&lt;/h3&gt;
&lt;p&gt;After I finally understood what as actually happening (hopefully) I came up with another approach: Simply change the slope of the gradient.&lt;/p&gt;
&lt;figure class=&quot;side-by-side&quot;&gt;
    &lt;img src=&quot;05_coverage-adjust-sketches_03_linear.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;Changing the gradient slope.&lt;/p&gt;
        &lt;p&gt;Left side: Steeper slope with reference point at 0 coverage (appears bolder).&lt;/p&gt;
        &lt;p&gt;Right side: Steeper slope with reference point at 1.0 coverage (appears thinner), including steps of the calculation.&lt;/p&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Additionally I built it so that positive &lt;code&gt;coverage_adjustment&lt;/code&gt; values go into the left case (bolder) and negative values to into the right case (thinner).
For +1.2 this gives us a slightly bolder appearance. The gradient lines slop changes from 1.0 to 1.2, reference point at 0 coverage.
For -1.2 we get a slightly thinner one. The slope changes from 1.0 to 1.2 again, but with 1.0 coverage as reference point.
But anything between -1.0 .. 1.0 doesn&#039;t make much sense that way.&lt;/p&gt;
&lt;p&gt;So we take that out and do &lt;code&gt;slope = 1 + coverage_adjustment&lt;/code&gt; instead. Meaning +0.2 becomes a slope of 1.2 (bolder), 0 becomes a slope of 1 (does nothing) and -0.2 becomes a slope of -1.2 (thinner).
And this finally is our linear coverage adjustment value, the +0.20, etc. you saw above.
I&#039;ve choosen the value range that way so it&#039;s easy to interpolate (e.g. based on the font size).&lt;/p&gt;
&lt;p&gt;In GLSL it looks like this:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_glsl&quot;&gt;&lt;span class=landmark_b&gt;if&lt;/span&gt; (coverage_adjustment &amp;gt;= &lt;span class=variation_a&gt;0&lt;/span&gt;) {
    pixel_coverages = &lt;span class=variation_c&gt;min&lt;/span&gt;(pixel_coverages * (&lt;span class=variation_a&gt;1&lt;/span&gt; + coverage_adjustment), &lt;span class=variation_a&gt;1&lt;/span&gt;);
} &lt;span class=landmark_b&gt;else&lt;/span&gt; {
    pixel_coverages = &lt;span class=variation_c&gt;max&lt;/span&gt;((&lt;span class=variation_a&gt;1&lt;/span&gt; - (&lt;span class=variation_a&gt;1&lt;/span&gt; - pixel_coverages) * (&lt;span class=variation_a&gt;1&lt;/span&gt; + -coverage_adjustment)), &lt;span class=variation_a&gt;0&lt;/span&gt;);
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;min()&lt;/code&gt; and &lt;code&gt;max()&lt;/code&gt; just make sure that the output stays in the range 0..1.
That version is &lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L370&quot;&gt;in the demo code&lt;/a&gt; as well, but &lt;code&gt;coverage_adjustment&lt;/code&gt; is set to 0 (do nothing) by default.&lt;/p&gt;
&lt;p&gt;In case you&#039;re allergic to &lt;code&gt;if&lt;/code&gt; statements in your shaders, here is a branchless version. No idea if it&#039;s faster or slower, though.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_glsl&quot;&gt;&lt;span class=backdrop_a&gt;// cond is 1 for negative coverage_adjust_linear values, 0 otherwise.&lt;/span&gt;
&lt;span class=backdrop_a&gt;// Couldn&amp;#039;t think of a good name.&lt;/span&gt;
&lt;span class=landmark_c&gt;float&lt;/span&gt; cond = &lt;span class=landmark_c&gt;float&lt;/span&gt;(coverage_adjust_linear &amp;lt; &lt;span class=variation_a&gt;0&lt;/span&gt;);
&lt;span class=landmark_c&gt;float&lt;/span&gt; slope = &lt;span class=variation_a&gt;1&lt;/span&gt; + &lt;span class=variation_c&gt;abs&lt;/span&gt;(coverage_adjust_linear);
pixel_coverage = &lt;span class=variation_c&gt;clamp&lt;/span&gt;(cond - (cond - pixel_coverage) * slope, &lt;span class=variation_a&gt;0&lt;/span&gt;, &lt;span class=variation_a&gt;1&lt;/span&gt;);&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;So which approach to use?&lt;/h3&gt;
&lt;p&gt;I don&#039;t know.
In my experiments so far the linear coverage adjustment tends to be sharper and it&#039;s probably faster as well.
But on the other hand the &lt;code&gt;pow()&lt;/code&gt; approach makes a softer falloff that maybe gives a bit better anti-aliasing.
I have to use this for a while to actually come to a conclusion about that.
My current plan is to use no coverage adjustment by default, except for colored source code. There I&#039;m planning to use the linear coverage adjustment with +0.2.&lt;/p&gt;
&lt;p&gt;You can send it along as a vertex attribute (or rather per rectangle attribute) instead of a uniform so each glyph can do it&#039;s own thing.
Then you can select what you need for the occasion while everything still (usually) happens in one draw call.&lt;/p&gt;
&lt;p&gt;Maybe I can use linear +0.2 or even +0.4 as a very poor hinting hack for small font sizes like 8pt?
Start at 10pt with +0 and then scale it up to +0.4 at 6pt or so? No idea, more testing is needed (think that though in a GLaDOS voice if you want).&lt;/p&gt;
&lt;p&gt;Another note:
By distorting the outline we&#039;re reducing the range where we&#039;re actually doing anti-aliasing.
If you overdo it the font will look less and less smooth and more and more jagged.
Small font features might even vanish if they don&#039;t have a fully filled subpixel in them.
But even with those caveats it can make text quite a bit more readable in some situations.&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-pre-mult-alpha&quot;&gt;Optional: Pre-multiplied alpha&lt;/h2&gt;
&lt;p&gt;This part isn&#039;t really necessary for text rendering so if you&#039;re here just for the text rendering feel free to skip it.&lt;/p&gt;
&lt;p&gt;Pre-multiplied alpha becomes very useful as soon as stuff in your UI can get transparent (backgrounds, borders, text, etc.).
Then you have to blend transparent stuff on top of each other within the shader and normal alpha blending kinda breaks apart at that point.
I stumbled upon the topic while figuring out how to properly blend transparent borders on top of a transparent background color.&lt;/p&gt;
&lt;p&gt;After a lot of hit-and-miss I finally found the &lt;a href=&quot;https://www.w3.org/TR/compositing-1/&quot;&gt;W3C Compositing and Blending Level 1 spec&lt;/a&gt;.
More specifically in section &lt;a href=&quot;https://www.w3.org/TR/compositing-1/#simplealphacompositing&quot;&gt;5.1. Simple alpha compositing&lt;/a&gt; it spells out the formulas for alpha compositing:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The formula for simple alpha compositing is&lt;/p&gt;
&lt;p&gt;co = Cs x αs + Cb x αb x (1 - αs)&lt;/p&gt;
&lt;p&gt;Where&lt;br /&gt;
    co: the premultiplied pixel value after compositing&lt;br /&gt;
    Cs: the color value of the source graphic element being composited&lt;br /&gt;
    αs: the alpha value of the source graphic element being composited&lt;br /&gt;
    Cb: the color value of the backdrop&lt;br /&gt;
    αb: the alpha value of the backdrop&lt;/p&gt;
&lt;p&gt;The formula for the resultant alpha of the composite is&lt;/p&gt;
&lt;p&gt;αo = αs + αb x (1 - αs)&lt;/p&gt;
&lt;p&gt;Where&lt;br /&gt;
    αo: the alpha value of the composite&lt;br /&gt;
    αs: the alpha value of the graphic element being composited&lt;br /&gt;
    αb: the alpha value of the backdrop&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you know your blend equations you will have noticed that the destination term (aka backdrop) contains two multiplications instead of just one: Cb x αb x (1 - αs).
The hardware blender can&#039;t do that. Or at least OpenGL doesn&#039;t provide the blend equations for that as far as I know.
And here the pre-multiplied alpha trick comes in handy: Just multiply the RGB components of a color with the alpha value before blending:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Often, it can be more efficient to store a pre-multiplied value for the color and opacity. The pre-multiplied value is given by&lt;/p&gt;
&lt;p&gt;cs = Cs x αs&lt;/p&gt;
&lt;p&gt;with&lt;br /&gt;
    cs: the pre-multiplied value&lt;br /&gt;
    Cs: the color value&lt;br /&gt;
    αs: the alpha value&lt;/p&gt;
&lt;p&gt;Thus the formula for simple alpha compositing using pre-multiplied values becomes&lt;/p&gt;
&lt;p&gt;co = cs + cb x (1 - αs)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And that&#039;s why the blend function above was set to &lt;code&gt;glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC1_COLOR)&lt;/code&gt;.
The demo converts the text color to pre-multiplied alpha &lt;a href=&quot;https://github.com/arkanis/gl-4.5-subpixel-text-rendering/blob/17f4af4df858c52092ccad7c4292e7e4cd08091b/main.c#L318&quot;&gt;in the vertex shader&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Pre-multiplied alpha has a lot of nice properties and uses, especially in gaming (e.g. mipmaps don&#039;t cause color artifacts).
Alan Wolfe wrote a &lt;a href=&quot;https://computergraphics.stackexchange.com/a/5612&quot;&gt;good (and short) summary&lt;/a&gt; on StackExchange.
And &lt;a href=&quot;https://limnu.com/premultiplied-alpha-primer-artists/&quot;&gt;What is Premultiplied Alpha? A Primer for Artists&lt;/a&gt; by David Hart shows quite nicely the effects of pre-multiplication as masking.
The post contains nice pictures and gives you a good intuitive understanding of how to use pre-multiplied alpha for compositing.
No need to repeat all that here.&lt;/p&gt;
&lt;p&gt;A note on the API and implementation, though:
I only use 8 bit straight alpha colors (aka not pre-multiplied) in the entire API and for all CPU-side code.
This keeps the API simpler and a lot less confusing.
The vertex shader then converts those to pre-multiplied alpha.
In the vertex shader the colors are float vectors (&lt;code&gt;vec4&lt;/code&gt;) and we don&#039;t lose precision if the multiplication happens there.
Doing the pre-multiply on the CPU with 8 bit ints probably would looses a lot of precision and might cause banding artifacts in dark regions.
Never really tested that though, so it might be a bit over-the-top.&lt;/p&gt;
&lt;p&gt;For images (icons, etc.) I do the pre-multiplication in the fragment shader after the texture read.
You could do the pre-multiplication once on the CPU (with the same potential problems as above) but one multiplication in the shader doesn&#039;t kill me.
On mobile you might think differently.&lt;/p&gt;
&lt;h1 id=&quot;gl45-subpixel-text-rendering-paths-not-taken&quot;&gt;Paths not taken&lt;/h1&gt;
&lt;p&gt;As I became a more experienced programmer I realized more and more that the things you &lt;em&gt;don&#039;t&lt;/em&gt; do are actually way more important than the things you do.
It&#039;s oh so easy to build a complex mess of stuff that becomes magical simply because nobody understands it anymore.
But building something that does the job good enough and where everybody thinks &amp;quot;well, that&#039;s easy, how about adding X?&amp;quot;. Well, that is hard.&lt;/p&gt;
&lt;p&gt;Programmers (and many other disciplins) don&#039;t talk nearly enough about that.
So here I do and hopefully fewer people have to walk the same paths in vain, repeating the exact same errors for the exact same wrong reasons.&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-gamma-correction&quot;&gt;Paths not taken: The gamma correction rabit hole&lt;/h2&gt;
&lt;p&gt;If you read about subpixel text rendering you&#039;ll pretty quickly stumble upon gamma-correction.&lt;/p&gt;
&lt;p&gt;What is it? Stewart Lynch from PureDev Software has a good introduction for programmers: &lt;a href=&quot;http://www.puredevsoftware.com/blog/2019/01/21/gamma-encoding/&quot;&gt;Gamma Encoding&lt;/a&gt;.
In a nutshell: &amp;quot;The brain can distinguish more dark shades than light [shades]&amp;quot;.
Instead of directly storing the brightness of each color channel in normal RGB colors the values are distorted a bit towards darker shades.
With 8 bit integers this gives us a lot more values for darker shades, and for our perception this is where it counts.&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&quot;06_gamma_gamma-encoded-values.png&quot; style=&quot;max-width: 100%;&quot;&gt;
    &lt;figcaption&gt;
        Figure 1 from Stewart Lynchs article: &quot;Shows how brightness values are encoded into 8 bits (0 – 255)&quot;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;A word about naming and nomenclature here. Various posts use various names for the different color spaces but here I&#039;ll stick to those two:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Peceptual sRGB aka: &amp;quot;normal&amp;quot; RGB colors, gamma-encoded values, &amp;quot;encoded&amp;quot;, &amp;quot;in sRGB colorspace&amp;quot;, &amp;quot;RGB (perceptual)&amp;quot; in GIMP&lt;/li&gt;
&lt;li&gt;Linear sRGB aka: linear colors, decoded values, &amp;quot;in linear colorspace&amp;quot;, &amp;quot;RGB (linear)&amp;quot; in GIMP&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Peceptual sRGB colors are your standard run-of-the-mill RGB values you see everywhere. Doubling a value gives you something that looks twice as bright to our human eyes (well, somewhat).&lt;/p&gt;
&lt;p&gt;With linear sRGB doubling the value doubles the amount of physical brightness (photons) that reach your eyes. Doesn&#039;t necessary look like that to us humans, see the above image.
But when you calculate light intensities in 3D games this is what you want to work with.&lt;/p&gt;
&lt;p&gt;You can translate between the two color spaces with a &amp;quot;transfer function&amp;quot; (fancy name):&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;linear_srgb_color     = perceptual_srgb_color ^ 2.2
perceptual_srgb_color = linear_srgb_color ^ (1/2.2)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Here 2.2 is &amp;quot;gamma&amp;quot; or &amp;quot;γ&amp;quot;. And when you do stuff with linear sRGB colors it&#039;s usually called &amp;quot;gamma correction&amp;quot;.&lt;/p&gt;
&lt;p&gt;Note: This is a simplified transfer function that works most of the time.
The &lt;a href=&quot;https://en.wikipedia.org/wiki/SRGB#Transformation&quot;&gt;official function&lt;/a&gt; is a bit more complicated but the results are nearly the same, so most don&#039;t care.
But when you use OpenGLs sRGB support you should also use the official one or dark shades get distored, see &lt;a href=&quot;#gl45-subpixel-text-rendering-gamma-opengl-srgb&quot;&gt;below&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;gl45-subpixel-text-rendering-gamma-perceptual-blending-artifacts&quot;&gt;Blending artifacts in perceptual sRGB&lt;/h3&gt;
&lt;p&gt;John Novak wrote a pretty good article that shows the effects of gamma correction pretty nicely: &lt;a href=&quot;https://blog.johnnovak.net/2016/09/21/what-every-coder-should-know-about-gamma/&quot;&gt;What every coder should know about gamma&lt;/a&gt;.
In there he makes a pretty good point that blending two peceptual sRGB colors causes some artifacts.&lt;/p&gt;
&lt;figure&gt;
    &lt;img src=&quot;06_gamma_color-blending.jpg&quot; style=&quot;max-width: 100%;&quot;&gt;
    &lt;figcaption&gt;
        Figure 8 from John Novaks article: &quot;Effects of gamma-incorrect colour blending.
        On the left gamma-correct image, the option Blend RGB Colors Using Gamma 1.0 was enabled in Photoshop CS6, on the right it was disabled (that’s the default gamma-incorrect legacy mode).&quot;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Doing the blending with linear sRGB colors instead avoids those artifacts but often results in a brighter look.
A linear color space is really important in computer graphics where you actually calculate light intensities and stuff.
Not using linear sRGB can really mess things up there and ruin your weekend (or rather many weekends) so it&#039;s a rather sensitive topic for some.&lt;/p&gt;
&lt;h3 id=&quot;gl45-subpixel-text-rendering-gamma-in-font-rendering-gamma-results&quot;&gt;Applying that to font rendering&lt;/h3&gt;
&lt;p&gt;Ok, but back to font rendering now. After reading all this my reasoning was:
We have a coverage value for each subpixel. A coverage of 50% means 50% of light comes through from the background.
So we should use our coverage values as light intensities and do the blending with linear sRGB colors, right?&lt;/p&gt;
&lt;p&gt;But when you do subpixel font rendering and blend in linear sRGB this happens:&lt;/p&gt;
&lt;figure class=&quot;side-by-side&quot;&gt;
    &lt;img src=&quot;07_gamma-correction_03_linear-srgb_official-transfer-function.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;Various scenarios:&lt;/p&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;07_gamma-correction_03_linear-srgb_official-transfer-function.png&quot;&gt;linear sRGB&lt;/button&gt;
            &lt;button data-img-src=&quot;07_gamma-correction_01_perceptual-srgb.png&quot;&gt;perceptual sRGB&lt;/button&gt;
            &lt;button data-img-src=&quot;07_gamma-correction_04_gamma-1.43.png&quot;&gt;gamma 1.43&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;Light to dark text:&lt;/p&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;07_gamma-correction_07_light-to-dark_linear-srgb_official-transfer-function.png&quot;&gt;linear sRGB&lt;/button&gt;
            &lt;button data-img-src=&quot;07_gamma-correction_05_light-to-dark_perceptual-srgb.png&quot;&gt;perceptual sRGB&lt;/button&gt;
            &lt;button data-img-src=&quot;07_gamma-correction_08_light-to-dark_gamma-1.43.png&quot;&gt;gamma 1.43&lt;/button&gt;
        &lt;/p&gt;
        &lt;!--
        &lt;p&gt;Gamma 2.2 shortcut:&lt;/p&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;07_gamma-correction_02_linear-srgb_gamma-2.2-shortcut.png&quot;              &gt;various scenarios&lt;/button&gt;
            &lt;button data-img-src=&quot;07_gamma-correction_06_light-to-dark_linear-srgb_gamma-2.2-shortcut.png&quot;&gt;light to dark text&lt;/button&gt;
        &lt;/p&gt;
        --&gt;
        &lt;p&gt;Effect of gamma correction on subpixel font rendering on two different examples.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Linear sRGB&lt;/i&gt;: Blending is done in linear space using OpenGLs &lt;code&gt;GL_FRAMEBUFFER_SRGB&lt;/code&gt;. Input colors are converted with the &lt;a href=&quot;https://en.wikipedia.org/wiki/SRGB#From_sRGB_to_CIE_XYZ&quot;&gt;official sRGB transfer function&lt;/a&gt;.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Perceptual sRGB&lt;/i&gt;: Blending in perceptual sRGB color space, aka alpha blending normal colors.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Gamma 1.43&lt;/i&gt;: Manual blending in the fragment shader with a set background color and a custom gamma of 1.43. Transparent text colors don&#039;t work here (never bothered to implement them in this mode).&lt;/p&gt;
        &lt;!--
        &lt;p&gt;&lt;i&gt;Gamma 2.2 shortcut&lt;/i&gt;: Same as &quot;linear sRGB&quot; but input colors are converted with the simpler &lt;code&gt;pow(rgb, 2.2)&lt;/code&gt; transfer function instead of the more complex official one which OpenGL uses. The missmatch distorts darker shades.&lt;/p&gt;
        --&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Notice the unbalanced font weights in &amp;quot;linear sRGB&amp;quot; and &amp;quot;gamma 1.43&amp;quot;?
Black on white looks thinner while white on black looks almost bold.
The &amp;quot;light to dark text&amp;quot; example shows this especially well.&lt;/p&gt;
&lt;p&gt;But this is the &amp;quot;correct&amp;quot; way, right? This led me on a merry chase to restore that black and white balance while still blending in linear sRGB color space.
Adjusting the blending weights based on the text- and/or background color in various (sometimes scary) ways, deriving (or number crunching) polynomials to nudge the blend equation in different directions, etc.
All very complicated, increasingly obsure and usually causing other artifacts that need a new set of workarounds.
Needless to say, at some point I just pulled the plug on that. After a few days that felt like crazy scientist experiments, that is. 😄&lt;/p&gt;
&lt;h3 id=&quot;gl45-subpixel-text-rendering-gamma-skia&quot;&gt;What the Skia people say about this&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://skia.org/&quot;&gt;Skia&lt;/a&gt; is the UI library used by Firefox and Chrome and they do a pretty good job of font rendering.
It&#039;s interesting what they say about the topic. Taken from &lt;a href=&quot;https://skia.org/docs/dev/design/raster_tragedy/&quot;&gt;The Raster Tragedy in Skia&lt;/a&gt;, emphasized the relevant part.
Note: &amp;quot;Linear blend function&amp;quot; means alpha blending here I think.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Skia does not convert into a linear space, apply the linear blend, and convert back to the encoded space. If the destination color space does not have a linear encoding this will lead to ‘incorrect’ blending. The idea is that there are essentially two kinds of users of Skia. First there are existing systems which are already using a non-linear encoding with a linear blend function. While the blend isn’t correct, these users generally don’t want anything to change due to expectations. Second there are those who want everything done correctly and they are willing to pay for a linearly encoded destination in which the linear blend function is correct.&lt;/p&gt;
&lt;p&gt;For bi-level glyph rendering a pixel is either covered or not, so there are no coverage blending issues.&lt;/p&gt;
&lt;p&gt;For regular full pixel partial coverage (anti-aliased) glyph rendering the user may or may not want correct linear blending. &lt;strong&gt;In most non-linear encodings, using the linear blend function tends to make black on white look slightly heavier, using the pixel grid as a kind of contrast and optical sizing enhancement. It does the opposite for white on black, often making such glyphs a bit under-covered. However, this fights the common issue of blooming where light on dark on many displays tends to appear thicker than dark on light. (The black not being fully black also contributes.) If the pixels are small enough and there is proper optical sizing and perhaps anti-aliased drop out control (these latter two achieved either manually with proper font selection or ‘opsz’, automatically, or through hinting) then correct linear blending tends to look great. Otherwise black on white text tends to (correctly) get really anemic looking at small sizes.&lt;/strong&gt; So correct blending of glyph masks here should be left up to the user of Skia. If they’re really sophisticated and already tackled these issues then they may want linear blending of the glyphs for best effect. Otherwise the glyphs should just keep looking like they used to look due to expectations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For subpixel partial coverage (subpixel anti-aliased) glyph masks linear blending in a linear encoding is more or less required to avoid color fringing effects.&lt;/strong&gt; The intensity of the subpixels is being directly exploited so needs to be carefully controlled. The subpixels tend to alleviate the issues with no full coverage (though still problematic if blitting text in one of the display’s primaries). One will still want optical sizing since the glyphs will still look somewhat too light when scaled down linearly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;How I interpret this: If you don&#039;t go all in and do &amp;quot;proper optical sizing and perhaps anti-aliased drop out control&amp;quot; don&#039;t blend in the linear sRGB color space.&lt;/p&gt;
&lt;p&gt;The 2nd part is interesting, though. I&#039;m doing subpixel anti-aliasing here but the color fringes are pretty much gone. My guess is that this part referes to subpixel anti-aliasing with full-pixel alpha blending. There the color fringes are a lot harder to fight. You can see this in the &amp;quot;Subpixel pos.&amp;quot; image back in &lt;a href=&quot;#gl45-subpixel-text-rendering-how-it-works&quot;&gt;How it works&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Well, this whole project is about a sweetspot between quality and simplicity, so linear sRGB isn&#039;t the right tool for the job here.
Perceptual sRGB colors just do a better job in our situation with less artifacts. At least when combined with dual source blending.
Artifacts are still there, just less than with a linear sRGB color space.&lt;/p&gt;
&lt;h3 id=&quot;gl45-subpixel-text-rendering-gamma-gradients&quot;&gt;Another way to look at it: Human perception of gradients&lt;/h3&gt;
&lt;p&gt;The perceptual and linear sRGB color spaces both have their problems.
Björn Ottosson wrote a nice post about that: &lt;a href=&quot;https://bottosson.github.io/posts/colorwrong/&quot;&gt;How software gets color wrong&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Both color spaces have parts where they come close to human perception and parts where they fail, even if linear sRGB does better with most colors.
Of special interest however is the human perception of the black and white gradient because that kind of is what we blend between for black on white or white on black text.&lt;/p&gt;
&lt;figure class=&quot;side-by-side&quot;&gt;
    &lt;img src=&quot;08_gamma_whiteblack.png&quot; style=&quot;background: white; max-width: 100%;&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;&lt;a href=&quot;https://bottosson.github.io/posts/colorwrong/#comparisons&quot;&gt;Comparisons&lt;/a&gt; - White and black from Björn Ottossons article:&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Perceptual blend&lt;/i&gt;: A smooth transition using a model designed to mimic human perception of color. The blending is done so that the perceived brightness and color varies smoothly and evenly.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Linear blend&lt;/i&gt;: A model for blending color based on how light behaves physically. [I call this linear sRGB in this post.] This type of blending can occur in many ways naturally, for example when colors are blended together by focus blur in a camera or when viewing a pattern of two colors at a distance.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;sRGB blend&lt;/i&gt;: This is how colors would normally be blended in computer software, using sRGB to represent the colors. [Called perceptual sRGB in this post.]&lt;/p&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;With that gradient in mind the result of blending in linear sRGB becomes somewhat apparent:
In there almost all parts of the gradient are bright.
Now we apply that to the gradient at the outline of a glyph (inside 100%, outside 0%): Most of the outline will be the bright part of the gradient.&lt;/p&gt;
&lt;p&gt;If the glyph is bright this will add to the area of the glyph, meaning it becomes bolder.
If the background is bright this will add area to the background, or put in another way removes the area from the dark glyph. Meaning it will become thinner.
That fits how blending in linear sRGB distorts black on white or white on black text.&lt;/p&gt;
&lt;p&gt;Black to white gradients in perceptual sRGB are way closer to human perception, thus the fonts have a balanced font weight.&lt;/p&gt;
&lt;p&gt;Maybe the results would be better if we do the blending in a more &lt;a href=&quot;https://en.wikipedia.org/wiki/Color_appearance_model&quot;&gt;perceptually uniform&lt;/a&gt; color space.
Meaning a color space where &amp;quot;identical spatial distance between two colors equals identical amount of perceived color difference&amp;quot; (from the linked Wikipedia page).
Basically the &amp;quot;perceptual blend&amp;quot; above.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://bottosson.github.io/posts/oklab/&quot;&gt;Oklab color space&lt;/a&gt; by Björn Ottosson again seems an interesting candiate for that.
To be honest perceptual sRGB does a pretty good job already and we could only do custom blending in the fragment shader with a known background color anyway.
But I couldn&#039;t get it out of my head, so I did a quick test:&lt;/p&gt;
&lt;figure class=&quot;side-by-side&quot;&gt;
    &lt;img src=&quot;09_gamma_oklab_combined_on.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;
            Various scenarios:
            &lt;button data-img-src=&quot;09_gamma_oklab_combined_on.png&quot;&gt;Oklab&lt;/button&gt;
            &lt;button data-img-src=&quot;09_gamma_oklab_combined_off.png&quot;&gt;perceptual sRGB&lt;/button&gt;
        &lt;/p&gt;&lt;p&gt;
            Light to dark text:
            &lt;button data-img-src=&quot;09_gamma_oklab_light-to-dark_on.png&quot;&gt;Oklab&lt;/button&gt;
            &lt;button data-img-src=&quot;09_gamma_oklab_light-to-dark_off.png&quot;&gt;perceptual sRGB&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;Effects of blending in the Oklab color space when doing subpixel font rendering.
           Note that transparent colors are broken in both versions (again, didn&#039;t bother to implement that).&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Oklab&lt;/i&gt;: Convert colors into Oklab, blend and convert back. Each component is blended individually which probably breaks something.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Perceptual sRGB&lt;/i&gt;: Manual per-channel alpha blend in perceptual sRGB color space.&lt;/p&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;So, yeah, that didn&#039;t work out. 😄 I guess it&#039;s better to not take that gradient metaphor to seriously.&lt;/p&gt;
&lt;p&gt;But then, we&#039;re talking about subpixels here and I&#039;m really not an expert on any of those topics.
Maybe that gradient thing is a good indicator of what&#039;s going on, maybe not. Color and brightness perception is a weired thing.
Take everything in this subsection with a grain of salt (or maybe with a whole pitcher of it).&lt;/p&gt;
&lt;h3 id=&quot;gl45-subpixel-text-rendering-gamma-unintended-consequences&quot;&gt;Other (unintended) consequences&lt;/h3&gt;
&lt;p&gt;Blending in linear sRGB gives different results by design. Usually we want that… but sometimes we don&#039;t.&lt;/p&gt;
&lt;figure class=&quot;side-by-side&quot;&gt;
    &lt;img src=&quot;10_gamma-effects_blending1_perceptual.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;
            Colors:
            &lt;button data-img-src=&quot;10_gamma-effects_blending1_perceptual.png&quot;&gt;perceptual&lt;/button&gt;
            &lt;button data-img-src=&quot;10_gamma-effects_blending1_linear.png&quot;&gt;linear&lt;/button&gt;
        &lt;/p&gt;&lt;p&gt;
            Icons and images:
            &lt;button data-img-src=&quot;10_gamma-effects_blending2_perceptual.png&quot;&gt;perceptual&lt;/button&gt;
            &lt;button data-img-src=&quot;10_gamma-effects_blending2_linear.png&quot;&gt;linear&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;Effects of blending in the linear sRGB color space on not text rendering stuff for a change.&lt;/p&gt;
        &lt;p&gt;Use the buttons to switch between linear and perceptual to get an impression of the changes.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Colors&lt;/i&gt;: Text (top) and color (middle) blending tests inspired by images from John Novaks &lt;a href=&quot;https://blog.johnnovak.net/2016/09/21/what-every-coder-should-know-about-gamma/&quot;&gt;What every coder should know about gamma&lt;/a&gt; article.
           The bottom part is a border and corner radius test I used for debugging.&lt;/p&gt;
        &lt;p&gt;&lt;i&gt;Icons and images&lt;/i&gt;: Various image blending tests, 
           &lt;a href=&quot;https://www.pexels.com/photo/aerial-photography-of-rock-formation-2440079/&quot;&gt;Aerial Photography of Rock Formation&lt;/a&gt; by Ian Beckley,
           the icon for &quot;C&quot; files from &lt;a href=&quot;https://commons.wikimedia.org/wiki/GNOME_Desktop_icons&quot;&gt;GNOME Desktop icons&lt;/a&gt; (not sure it&#039;s from there)
           and a blending test image from Eric Haines article &lt;a href=&quot;https://www.realtimerendering.com/blog/png-srgb-cutoutdecal-aa-problematic/&quot;&gt;PNG + sRGB + cutout/decal AA = problematic&lt;/a&gt;.&lt;/p&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The &amp;quot;Colors&amp;quot; scenario reflects my experience with linear sRGB blending so far: Yay, some artifacts are gone (top), oh, the transparent colors look slightly different (middle), oh, the borders don&#039;t look like they should (bottom).
The border colors in the bottom part are a transparent blue. With perceptual sRGB blending this looks just as designed but linear sRGB blending results in a different color that&#039;s a lot closer to the overall background color.
Pretty unfortunate in this case.&lt;/p&gt;
&lt;p&gt;But thats the kicker: If your artwork (color scheme, images, icons, …) are made for perceptual sRGB blending it will look off with linear sRGB blending (usually to bright).
And it seems most stuff is made for perceptual sRGB blending.
So if you do linear sRGB blending all the artwork has to be made for it as well.
It&#039;s not just about doing &amp;quot;correct&amp;quot; blending, you&#039;re literally blending differently and your artwork needs to reflect that.&lt;/p&gt;
&lt;p&gt;&amp;quot;Icons and images&amp;quot; shows the same dilemma but with images: The shadow of the &amp;quot;C&amp;quot; icon is made for perceptual sRGB blending. With linear sRGB blending the shadow loses a lot of its function.
Small thing but can be suprisingly annoying.&lt;/p&gt;
&lt;p&gt;The big image at the bottom is a pretty cool test image from Eric Haines article &lt;a href=&quot;https://www.realtimerendering.com/blog/png-srgb-cutoutdecal-aa-problematic/&quot;&gt;PNG + sRGB + cutout/decal AA = problematic&lt;/a&gt;.
It tests if blending is done according to the PNG spec. I&#039;ve added two white rectangles behind the right side so I can see all relevant cases at once.
With linear sRGB blending it comes out correct (most of it is as bright as the 50% coverage) and with perceptual sRGB blending you get a different result (not matching the PNG spec).
Needless to say, most do perceptual sRGB blending (e.g. browsers).&lt;/p&gt;
&lt;p&gt;Same dilemma: Blending and artwork have to match up, otherwise it looks off.
As long as everyone is doing (or assuming) the same thing (perceptual sRGB) it works out.
But as soon as some mix it up confusion ensues.&lt;/p&gt;
&lt;p&gt;Bonus trick: In GIMP you can change how a layer is blended via its context menu. &amp;quot;Compositing Space&amp;quot; → &amp;quot;RGB (linear)&amp;quot; or &amp;quot;RGB (perceptual)&amp;quot;. Nice to quickly check what looks right.
With GIMP 2.10.30 the default is &amp;quot;RGB (linear)&amp;quot; to blend layers on top of each other but I have no idea what happens when the alpha channel is exported (e.g. as a PNG).&lt;/p&gt;
&lt;h3 id=&quot;gl45-subpixel-text-rendering-gamma-1.43&quot;&gt;That gamma 1.43 thing&lt;/h3&gt;
&lt;p&gt;I picked it up in &lt;a href=&quot;https://www.puredevsoftware.com/blog/2019/01/22/sub-pixel-gamma-correct-font-rendering/&quot;&gt;Sub-Pixel, Gamma Correct Font Rendering&lt;/a&gt; by Stewart Lynch. It&#039;s also mentioned in &lt;a href=&quot;https://blog.johnnovak.net/2016/09/21/what-every-coder-should-know-about-gamma/&quot;&gt;John Novaks article&lt;/a&gt; as 1.42 and it seems to originate from a Photoshop text rendering feature. As the story goes fonts look to thin with gamma correction and a gamma of 2.2 so you use a gamma of 1.43 or so instead.  Now they look as intended. Arguably because the fonts were designed for non-gamma-correct font rasterizers.
Personally I guess font renderers in the past did way more agressive hinting and fonts (as well as displays and resolutions) were generally smaller. Hence fonts were designed to compensate for that and generally bolder.&lt;/p&gt;
&lt;p&gt;Anyway, I have no idea what Photoshop really did there.
Maybe they simply adjusted the coverages like my first approach above.
Or maybe they actually did blend in a color space with gamma 1.43. I don&#039;t know.
But blending in a color space with gamma 1.43 is neither perceptual sRGB nor linear sRGB.
At that point you&#039;re not correcting anything, you just blend in a weird mixed up color space that has less artifacts.
Light and dark fonts still become unbalanced, just not by as much (see the &amp;quot;light to dark text&amp;quot; gamma 1.43 image in &lt;a href=&quot;#gl45-subpixel-text-rendering-gamma-in-font-rendering&quot;&gt;Applying that to font rendering&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;But this gave me an interesting idea:
From a math point of view multiplying a color with coverages in a color space with gamma 1.43 is the same as multiplying the color directly with the coverages^(1/1.43).&lt;/p&gt;
&lt;!--
&lt;math display=block style=&quot;font-family: SourceSans;&quot;&gt;
    &lt;mtable&gt;
        &lt;mtr&gt;
            &lt;mtd&gt;
                &lt;msup&gt;
                    &lt;mrow&gt;
                        &lt;mo&gt;(&lt;/mo&gt;
                        &lt;msup&gt;
                            &lt;mi&gt;color&lt;/mi&gt;
                            &lt;mn&gt;1.43&lt;/mn&gt;
                        &lt;/msup&gt;
                        &lt;mo&gt;*&lt;/mo&gt;
                        &lt;mi&gt;coverages&lt;/mi&gt;
                        &lt;mo&gt;)&lt;/mo&gt;
                    &lt;/mrow&gt;
                    &lt;mfrac&gt;
                        &lt;mn&gt;1&lt;/mn&gt;
                        &lt;mn&gt;1.43&lt;/mn&gt;
                    &lt;/mfrac&gt;
                &lt;/msup&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mo&gt;=&lt;/mo&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;msup&gt;
                    &lt;mrow&gt;
                        &lt;mo&gt;(&lt;/mo&gt;
                        &lt;msup&gt;
                            &lt;mi&gt;color&lt;/mi&gt;
                            &lt;mn&gt;1.43&lt;/mn&gt;
                        &lt;/msup&gt;
                        &lt;mo&gt;)&lt;/mo&gt;
                    &lt;/mrow&gt;
                    &lt;mfrac&gt;
                        &lt;mn&gt;1&lt;/mn&gt;
                        &lt;mn&gt;1.43&lt;/mn&gt;
                    &lt;/mfrac&gt;
                &lt;/msup&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mo&gt;*&lt;/mo&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;msup&gt;
                    &lt;mi&gt;coverages&lt;/mi&gt;
                    &lt;mfrac&gt;
                        &lt;mn&gt;1&lt;/mn&gt;
                        &lt;mn&gt;1.43&lt;/mn&gt;
                    &lt;/mfrac&gt;
                &lt;/msup&gt;
            &lt;/mtd&gt;
        &lt;/mtr&gt;
        &lt;mtr&gt;
            &lt;mtd&gt;&lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mo&gt;=&lt;/mo&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;msup&gt;
                    &lt;mi&gt;color&lt;/mi&gt;
                    &lt;mrow&gt;
                        &lt;mo&gt;(&lt;/mo&gt;
                        &lt;mn&gt;1.43&lt;/mn&gt;
                        &lt;mo&gt;*&lt;/mo&gt;
                        &lt;mfrac&gt;
                            &lt;mn&gt;1&lt;/mn&gt;
                            &lt;mn&gt;1.43&lt;/mn&gt;
                        &lt;/mfrac&gt;
                        &lt;mo&gt;)&lt;/mo&gt;
                    &lt;/mrow&gt;
                &lt;/msup&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mo&gt;*&lt;/mo&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;msup&gt;
                    &lt;mi&gt;coverages&lt;/mi&gt;
                    &lt;mfrac&gt;
                        &lt;mn&gt;1&lt;/mn&gt;
                        &lt;mn&gt;1.43&lt;/mn&gt;
                    &lt;/mfrac&gt;
                &lt;/msup&gt;
            &lt;/mtd&gt;
        &lt;/mtr&gt;
        &lt;mtr&gt;
            &lt;mtd&gt;&lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mo&gt;=&lt;/mo&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;msup&gt;
                    &lt;mi&gt;color&lt;/mi&gt;
                    &lt;mrow&gt;
                        &lt;mn&gt;1&lt;/mn&gt;
                    &lt;/mrow&gt;
                &lt;/msup&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mo&gt;*&lt;/mo&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;msup&gt;
                    &lt;mi&gt;coverages&lt;/mi&gt;
                    &lt;mfrac&gt;
                        &lt;mn&gt;1&lt;/mn&gt;
                        &lt;mn&gt;1.43&lt;/mn&gt;
                    &lt;/mfrac&gt;
                &lt;/msup&gt;
            &lt;/mtd&gt;
        &lt;/mtr&gt;
        &lt;mtr&gt;
            &lt;mtd&gt;&lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mo&gt;=&lt;/mo&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mi&gt;color&lt;/mi&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;mo&gt;*&lt;/mo&gt;
            &lt;/mtd&gt;
            &lt;mtd&gt;
                &lt;msup&gt;
                    &lt;mi&gt;coverages&lt;/mi&gt;
                    &lt;mfrac&gt;
                        &lt;mn&gt;1&lt;/mn&gt;
                        &lt;mn&gt;1.43&lt;/mn&gt;
                    &lt;/mfrac&gt;
                &lt;/msup&gt;
            &lt;/mtd&gt;
        &lt;/mtr&gt;
    &lt;/mtable&gt;
&lt;/math&gt;
--&gt;
&lt;pre&gt;
&lt;span style=&quot;width: 16em; display: inline-block;&quot;&gt;(color&lt;sup&gt;1.43&lt;/sup&gt; * coverages)&lt;sup&gt;1/1.43&lt;/sup&gt;&lt;/span&gt; = &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt;(color&lt;sup&gt;1.43&lt;/sup&gt;)&lt;sup&gt;1/1.43&lt;/sup&gt;&lt;/span&gt; * coverages&lt;sup&gt;1/1.43&lt;/sup&gt;
&lt;span style=&quot;width: 16em; display: inline-block;&quot;&gt;&lt;/span&gt; = &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt; color&lt;sup&gt;(1.43 * 1/1.43)&lt;/sup&gt;&lt;/span&gt; * coverages&lt;sup&gt;1/1.43&lt;/sup&gt;
&lt;span style=&quot;width: 16em; display: inline-block;&quot;&gt;&lt;/span&gt; = &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt; color&lt;sup&gt;1&lt;/sup&gt;&lt;/span&gt; * coverages&lt;sup&gt;1/1.43&lt;/sup&gt;
&lt;span style=&quot;width: 16em; display: inline-block;&quot;&gt;&lt;/span&gt; = &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt; color&lt;/span&gt; * coverages&lt;sup&gt;1/1.43&lt;/sup&gt;&lt;!--
  &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt;(color&lt;sup&gt;1.43&lt;/sup&gt;&lt;/span&gt; *  coverages)&lt;sup&gt;1/1.43&lt;/sup&gt;
= &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt;(color&lt;sup&gt;1.43&lt;/sup&gt;)&lt;sup&gt;1/1.43&lt;/sup&gt;&lt;/span&gt; *  coverages&lt;sup&gt;1/1.43&lt;/sup&gt;
= &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt; color&lt;sup&gt;(1.43 * 1/1.43)&lt;/sup&gt;&lt;/span&gt; *  coverages&lt;sup&gt;1/1.43&lt;/sup&gt;
= &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt; color&lt;sup&gt;1&lt;/sup&gt;&lt;/span&gt; *  coverages&lt;sup&gt;1/1.43&lt;/sup&gt;
= &lt;span style=&quot;width: 11em; display: inline-block;&quot;&gt; color&lt;/span&gt; *  coverages&lt;sup&gt;1/1.43&lt;/sup&gt;
--&gt;
&lt;/pre&gt;
&lt;p&gt;And at some point that strange idea clicked with what that means of the coverages themselves (distorting the gradient at the glyph outline).
Hence the coverage adjustment was born and I left the blending alone (read I&#039;m doing it in perceptual sRGB space).
And lo and behold, the results were finally uniform again between light and dark text.
Balance was restored. 🙂&lt;/p&gt;
&lt;p&gt;And I could adjust the exponent for thinner or bolder looks and get predictable results across different situations.&lt;/p&gt;
&lt;h3 id=&quot;gl45-subpixel-text-rendering-gamma-opengl-srgb&quot;&gt;Final gamma aside: OpenGLs sRGB support&lt;/h3&gt;
&lt;p&gt;By default OpenGL simply assumes linear behaviour and doesn&#039;t care about color spaces and such things.
If you feed perceptual sRGB colors into OpenGL it will do blending in that color space. If you feed it linear sRGB colors it blends in this space.
There is no magic happening behind your back.&lt;/p&gt;
&lt;p&gt;But if you set it up carefully OpenGL will do most of the color conversions for you.
You can leave your image, texture and framebuffer data in perceptual sRGB but your shader can work in linear sRGB and the blending happens in linear sRGB as well.
Oh, and the conversions are hardware accellerated as well.
You just have to tell OpenGL what&#039;s what so it can do the work for you.&lt;/p&gt;
&lt;p&gt;For textures: You can create textures with sRGB formats (e.g. &lt;code&gt;GL_SRGB8_ALPHA8&lt;/code&gt;).
This tells OpenGL that the texture data is in the perceptual sRGB color space.
When you read the texture data in your shader OpenGL will now automatically convert the texture data to linear sRGB for you.
Just have to set the proper texture format, and thats that.&lt;/p&gt;
&lt;p&gt;For framebuffers: With &lt;code&gt;glEnable(GL_FRAMEBUFFER_SRGB)&lt;/code&gt; you tell OpenGL that your framebuffer is in the perceptual sRGB color space while your fragment shader outputs linear sRGB colors.
Blending is then done in linear sRGB color space: Read previous framebuffer value in perceptual sRGB, convert it to linear sRGB and then blend it with your fragment shader output.
After that the blend result is converted back to perceptual sRGB and written back to the framebuffer.&lt;/p&gt;
&lt;p&gt;You might have to ask for an sRGB capable default framebuffer on window creation, though.
In SDL you can do this via &lt;code&gt;SDL_GL_SetAttribute(SDL_GL_FRAMEBUFFER_SRGB_CAPABLE, 1)&lt;/code&gt;.
Not all drivers seem to need that.
And you don&#039;t need that when you render into a frame- or renderbuffer of your own.
Just use an sRGB texture format for that buffer. Haven&#039;t done that in my experiments though.&lt;/p&gt;
&lt;p&gt;For vertex shader inputs:
You have to manually convert colors you pass as uniforms or vertex attributes.
OpenGL doesn&#039;t know what vec4 is a color and which is a position or whatever, so it can&#039;t help you there.
But you have to do that with the &lt;a href=&quot;https://en.wikipedia.org/wiki/SRGB#From_sRGB_to_CIE_XYZ&quot;&gt;high-precision gamma transfer function&lt;/a&gt; so your work lines up with what OpenGL does.&lt;/p&gt;
&lt;p&gt;Often people just use &lt;code&gt;pow(rgb, 2.2)&lt;/code&gt; and &lt;code&gt;pow(rgb, 1/2.2)&lt;/code&gt; as transfer functions because they&#039;re simpler and are mostly good enough.
But here this leads to a missmatch: We convert our colors to linear sRGB with a different function than OpenGL uses to convert them back to perceptual sRGB later.
As a result darker shades get distorted a bit (starting at brightness ~120 and visible at ~16 or ~8).
Using the high-quality transform functions avoids the artifacts and everything lines up nicely between our code and OpenGL / the hardware.&lt;/p&gt;
&lt;p&gt;I&#039;ve implemented them in GLSL based on the linked Wikipedia page above. Finally a nice chance to use &lt;code&gt;mix()&lt;/code&gt; to select via a &lt;code&gt;bool&lt;/code&gt; vector. 🙂&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;lang_glsl&quot;&gt;&lt;span class=landmark_c&gt;vec3&lt;/span&gt; &lt;span class=variation_c&gt;color_srgb_to_linear&lt;/span&gt;(&lt;span class=landmark_c&gt;vec3&lt;/span&gt; rgb) {
    &lt;span class=landmark_b&gt;return&lt;/span&gt; &lt;span class=variation_c&gt;mix&lt;/span&gt;( rgb / &lt;span class=variation_a&gt;12.92&lt;/span&gt; , &lt;span class=variation_c&gt;pow&lt;/span&gt;((rgb + &lt;span class=variation_a&gt;0.055&lt;/span&gt;) / &lt;span class=variation_a&gt;1.055&lt;/span&gt;, &lt;span class=landmark_c&gt;vec3&lt;/span&gt;(&lt;span class=variation_a&gt;2.4&lt;/span&gt;)) , &lt;span class=variation_c&gt;greaterThan&lt;/span&gt;(rgb, &lt;span class=landmark_c&gt;vec3&lt;/span&gt;(&lt;span class=variation_a&gt;0.04045&lt;/span&gt;)) );
}

&lt;span class=landmark_c&gt;vec3&lt;/span&gt; &lt;span class=variation_c&gt;color_linear_to_srgb&lt;/span&gt;(&lt;span class=landmark_c&gt;vec3&lt;/span&gt; rgb) {
    &lt;span class=landmark_b&gt;return&lt;/span&gt; &lt;span class=variation_c&gt;mix&lt;/span&gt;( &lt;span class=variation_a&gt;12.92&lt;/span&gt; * rgb , &lt;span class=variation_a&gt;1.055&lt;/span&gt; * &lt;span class=variation_c&gt;pow&lt;/span&gt;(rgb, &lt;span class=landmark_c&gt;vec3&lt;/span&gt;(&lt;span class=variation_a&gt;1&lt;/span&gt; / &lt;span class=variation_a&gt;2.4&lt;/span&gt;)) - &lt;span class=variation_a&gt;0.055&lt;/span&gt; , &lt;span class=variation_c&gt;greaterThan&lt;/span&gt;(rgb, &lt;span class=landmark_c&gt;vec3&lt;/span&gt;(&lt;span class=variation_a&gt;0.0031308&lt;/span&gt;)) );
}&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And here is how it looks:&lt;/p&gt;
&lt;figure class=&quot;long-caption&quot;&gt;
    &lt;img src=&quot;11_gamma_transfer-func_official-srgb.png&quot;&gt;
    &lt;figcaption&gt;
        &lt;p&gt;
            &lt;button data-img-src=&quot;11_gamma_transfer-func_official-srgb.png&quot;&gt;Official sRGB transfer function&lt;/button&gt;
            &lt;button data-img-src=&quot;11_gamma_transfer-func_gamma-2.2-shortcut.png&quot;&gt;gamma 2.2 shortcut&lt;/button&gt;
            &lt;button data-img-src=&quot;11_gamma_transfer-func_perceptual.png&quot;&gt;perceptual sRGB&lt;/button&gt;
        &lt;/p&gt;
        &lt;p&gt;Effects of using the inaccurate gamma 2.2 sRGB transfer function with OpenGLs sRGB support.&lt;/p&gt;
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Notice that between the official sRGB transfer function and the gamma 2.2 shortcut the text stays the same but the darker background colors on the right get even darker with 2.2.
If you then compare them both to the perceptual sRGB output you see that the background colors line up with the official sRGB transfer function but gamma 2.2 distorts the darker shades.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://registry.khronos.org/OpenGL/extensions/EXT/EXT_framebuffer_sRGB.txt&quot;&gt;EXT_framebuffer_sRGB&lt;/a&gt; extension also contains an approximation of the transfer function.
You might want to use that as well but I haven&#039;t tested it (only found it later).
It&#039;s probably an even closer match to what OpenGL does.&lt;/p&gt;
&lt;p&gt;I thought this was pretty neat for UI work and I used it for some of the experiments shown above.
But it took some searching to assemble all the parts so I wanted to mention it all in one place.&lt;/p&gt;
&lt;h1 id=&quot;gl45-subpixel-text-rendering-manual-blending-bg&quot;&gt;Paths not taken: Manual blending with a known background color&lt;/h1&gt;
&lt;p&gt;In some situations the background color is known. For example when displaying source code we probably want to use the color schemes background color.
Then we can tell the fragment shader that background color and it can do any kind of blending it wants, just returning the final (non-transparent) color.&lt;/p&gt;
&lt;p&gt;You can use this to implement gamma-correct blending even if everything else happens in perceptual sRGB (e.g. so the images and colors look like the artists expect).
Or you can do fancy stuff to enhance contrast between the text and background color.&lt;/p&gt;
&lt;p&gt;I did a lot of fancy and scary experiments but in the end nothing was really convincing.
Blending in a linear sRGB color space caused more trouble than it fixed (see above) and the Oklab color space didn&#039;t help either.
Even if Oklab looks &lt;a href=&quot;https://aras-p.info/blog/2021/11/29/Gradients-in-linear-space-arent-better/&quot;&gt;great for color gradients&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And remember: This only works on top of a known solid background color.
And I would really like to blend on top of images, a game or use transparent UI elements.
So I shelved the entire line of enquirey for now.&lt;/p&gt;
&lt;p&gt;Just as an utterly over-the-top note:
If someone finds an exceptionally epic blending function for subpixel rendering and you really want that for everything you can use order-independent transparency.
For each pixel you first collect all the colors of e.g. the topmost 4 layers and then combine them in a dedicated resolve pass.
And in that resolve pass you have the colors for foreground and background and can blend them however you want.&lt;/p&gt;
&lt;p&gt;Order-independent transparency is usually a thing in games to render transparent geometry when sorting doesn&#039;t work (or you don&#039;t wand to do it).
Christoph Kubisch describes that in his presentation &lt;a href=&quot;https://on-demand.gputechconf.com/gtc/2014/presentations/S4385-order-independent-transparency-opengl.pdf&quot;&gt;Order Independent Transparency In OpenGL 4.x&lt;/a&gt;.
Specifically the &amp;quot;Atomic Loop 64-bit&amp;quot; approach looks pretty promissing.&lt;/p&gt;
&lt;p&gt;We can abuse it for UIs if we really want to invest that complexity.
Insanely over-the-top for normal UI work so I thought I mention it.
If someone is insane enough to do that, please let me know. 🙂&lt;/p&gt;
&lt;h1 id=&quot;gl45-subpixel-text-rendering-dynamic-outline-adjustment&quot;&gt;Paths not taken: Dynamically choosing coverage adjustment based on text and background color&lt;/h1&gt;
&lt;p&gt;I did that mostly as a workaround for artifacts caused by various kinds of &amp;quot;gamma correction&amp;quot;.&lt;/p&gt;
&lt;p&gt;I experimented with a coverage adjustment based on the perceived brightness of the text color, based on the old &lt;a href=&quot;http://alienryderflex.com/hsp.html&quot;&gt;HSP Color Model — Alternative to HSV (HSB) and HSL&lt;/a&gt; by Darel Rex Finley.
Oklabs L component might have been a better choice but I didn&#039;t know Oklab at that time.
It kinda worked anyway, but usually those workarounds just made one situation look good (e.g. colored source code) and caused artifacts in others (color bleeding, normal text becomming to bold, etc.).&lt;/p&gt;
&lt;p&gt;Pretty much the same happened when I played around with a known background color or even the difference between the text color and the background color.&lt;/p&gt;
&lt;p&gt;I found nothing that worked well in the general case.
And if the user knows about a special case (e.g. colored source code) they can simply set some special text rendering parameters (e.g. making the font slightly bolder with coverage adjustment).
So in the end I just settled on the coverage adjustments described above.
They&#039;re simple, relatively easy to understand and have a uniform result.&lt;/p&gt;
&lt;h1 id=&quot;gl45-subpixel-text-rendering-future-ideas&quot;&gt;Ideas for the future&lt;/h1&gt;
&lt;p&gt;I still have a lot of interesting stuff bouncing around and here is a short list of the highlights.
Some of you probably think about simiar things right now (that is if someone ever arrives down here).&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-future-sdf-shadows&quot;&gt;Signed distance fields for text shadows&lt;/h2&gt;
&lt;p&gt;I don&#039;t want to use them for normal text rendering because finding the right field size to avoid artifacts is pretty difficult.
But at the same time I want a diffuse shadow behind text to enhance the contrast above images. Speak: A blurry text shadow.&lt;/p&gt;
&lt;p&gt;And for that signed distance fields should work just fine since fine details should become irrelevant anyway. sbt_truetype can generate them as well.&lt;/p&gt;
&lt;p&gt;But I don&#039;t want to have a 50px signed distance field when someone requests a large blur radius.
Instead I&#039;m planning to add just 2px padding or so outside of the glyph (filled with signed distances).
When someone needs a distance outside of that I&#039;ll just look for the closest pixels and approximate the distance based on the fields direction there.&lt;/p&gt;
&lt;p&gt;With a bit of luck this should approximate the distance as a line (one outermost position) or a circle (two outermost poitions one pixel apart from one another).
No idea if that will work, though.&lt;/p&gt;
&lt;p&gt;Signed distance fields might also be a nice fallback for very large font sizes.
If a glyph would be larger than e.g. 128px we can rasterize it as a signed distance field with a height of 128px and store that in an atlas texture.
Then user can zoom in as much as they want while we simply keep using that same 128px field all the time.
Sure, we still might lose some glyph detail for extreme fonts but it&#039;s probably simpler than writing an entire curve renderer for outlines.
Should also work well with subpixel positioning and dual source blending, but at those font sizes it&#039;s probably a moot point.&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-future-oversampling&quot;&gt;2x2 oversampling for animated text&lt;/h2&gt;
&lt;p&gt;Right now I only need horizontal text and it&#039;s not even animated.
But once I want animated text I&#039;ll try &lt;a href=&quot;https://github.com/nothings/stb/blob/master/tests/oversample/README.md&quot;&gt;Font character oversampling for rendering from atlas textures&lt;/a&gt; from stb_truetype.
Basically just render the glyph with double the horizontal and vertical resolution and use linear interpolation.
Sounds neat and simple and fits in nicely with the grayscale atlas I&#039;ll need for signed distance fields anyway.&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-future-pango-directwrite&quot;&gt;Use DirectWrite and Pango when available&lt;/h2&gt;
&lt;p&gt;This is very far in the future. Mostly I want them for complex text layout.
Pango provides an API for that and I&#039;ve seen something about that in DirectWrite but I&#039;m not sure if it&#039;s usable.
It would also provide automatic replacement glyphs in case a font doesn&#039;t offer a glyph.
I guess we would get a more authentic system look, too.&lt;/p&gt;
&lt;p&gt;It&#039;s also a good point to take a closer look at horizontal hinting for small font sizes.
The 8pt output of FreeType looks pretty convincing to me and if they do horizontal hinting to achieve that so would I.
But it&#039;s probably tricky to pull off without derailing the texture atlas approach.
Maybe let FreeType do hinting on subpixel boundaries and then do something in the shader to nudge those boundaries to full pixel boundaries.
Or try to round the subpixel shift to full subpixels and see how that looks.&lt;/p&gt;
&lt;h2 id=&quot;gl45-subpixel-text-rendering-future-known-bg-color&quot;&gt;Do something useful when the background color is known&lt;/h2&gt;
&lt;p&gt;As a final part it still feels like you can do a lot more if the text and background colors are known.
I couldn&#039;t find anything that actually made things better, but that doesn&#039;t mean there is no such thing.
Granted, dual source blending probably eliminates the need for most tricks in that regard, but still… it feels like wasted potential.&lt;/p&gt;
&lt;h1 id=&quot;gl45-subpixel-text-rendering-references&quot;&gt;Other approaches and interesting references&lt;/h1&gt;
&lt;p&gt;I&#039;ll mostly just link to other stuff here. Explaining it all would take the post even further past the endurance threshold. Anyway, Aras Pranckevičius wrote a pretty nice post in 2017: &lt;a href=&quot;https://aras-p.info/blog/2017/02/15/Font-Rendering-is-Getting-Interesting/&quot;&gt;Font Rendering is Getting Interesting&lt;/a&gt;. It provides a good overview and has some pictures. It boils down to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rasterize on the CPU (FreeType, stb_truetype, whatever) and cache the result in a texture atlas&lt;/li&gt;
&lt;li&gt;Put signed distance fields in the atlas so you can scale the glyph on the GPU and still get sharp glyph shapes&lt;/li&gt;
&lt;li&gt;Rasterize the glyphs directly on the GPU&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The CPU rasterization and texture atlas approach is still the workhorse as far as I know.
FreeType, stb_truetype, DirectWrite, etc. do the rasterization and UIs like GTK then to the texture atlas thing. Same as I described in this post.
Pretty good quality but someone has to manage the texture atlas. It&#039;s the &amp;quot;classic&amp;quot; GPU assisted font rendering approach.&lt;/p&gt;
&lt;p&gt;Distance fields are pretty useful if you have a lot of different text sizes floating around.
Or for animations and effects like shadows, outlines, etc.
But they lose small glyph details depending on the size of the distance field and in UIs their advantages don&#039;t weight that heavily.
But in games and espcially on 3D surfaces they&#039;re pretty awsome.
I stopped following that field a few years ago but back then &lt;a href=&quot;https://github.com/Chlumsky/msdfgen&quot;&gt;multi-channel distance fields&lt;/a&gt; by Viktor Chlumský where the most advanced approach.
There you use RGB for 3 distance fields and combine them in the shader to better represent sharp features in glyphs.
Avoids a lot of artifacts and lets you get away with smaller field sizes.
Constructing the multi-channel field is rather complex, but fortunately described in detail in his masters thesis.&lt;/p&gt;
&lt;p&gt;There are also approaches that rasterize the glyphs directly on the GPU. Aras blog post above lists a few.
As far as I can tell &lt;a href=&quot;https://sluglibrary.com/&quot;&gt;Slug&lt;/a&gt; by Eric Lengyel came out of that.
There you store the font description on the GPU and each pixel looks at it to build the glyph. It&#039;s the first approach that actually convinced me.
The website also links a &lt;a href=&quot;http://jcgt.org/published/0006/02/02/&quot;&gt;paper&lt;/a&gt; and &lt;a href=&quot;http://terathon.com/i3d2018_lengyel.pdf&quot;&gt;slides&lt;/a&gt; with &lt;em&gt;a lot&lt;/em&gt; of details about the algorithm.
But it&#039;s a bit to ambitious for a hobby project, even for my tastes.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://medium.com/@evanwallace/easy-scalable-text-rendering-on-the-gpu-c3f4d782c5ac&quot;&gt;Easy Scalable Text Rendering on the GPU&lt;/a&gt; by Evan Wallace also seems pretty interesting but requires a lot of triangles and overdraw per glyph.
It&#039;s neat, but not sure if that is worth it.&lt;/p&gt;
&lt;p&gt;Apart from those low-level libraries and approaches there are of course libraries like &lt;a href=&quot;https://skia.org/&quot;&gt;Skia&lt;/a&gt; that do text rendering.
And Skia does a pretty good job of it.
They also have a very interesting design document where they talk about their font rendering: &lt;a href=&quot;https://skia.org/docs/dev/design/raster_tragedy/&quot;&gt;The Raster Tragedy in Skia&lt;/a&gt; that I quoted from above.&lt;/p&gt;
&lt;p&gt;And this brings me to the &lt;a href=&quot;http://rastertragedy.com/&quot;&gt;The Raster Tragedy at Low-Resolution Revisited&lt;/a&gt; by Beat Stamm.
He worked on a lot of font rendering stuff at Microsoft, including ClearType.
The Raster Tragedy is a vast treasure trove and I still haven&#039;t finished reading it all.
But it focuses a lot on hinting as a tool to adjust fonts and I&#039;m not &lt;em&gt;that&lt;/em&gt; interested in the internals of glyph rasterization.
I&#039;m leaning more towards the GPU side of things.&lt;/p&gt;
&lt;p&gt;Interesting tidbit: As far as I can remember none of them mentioned subpixel blending aka dual source blending.
I wouldn&#039;t be surprised if Skia does it but I couldn&#039;t find any mention of it from them.&lt;/p&gt;
&lt;p&gt;Also worth mentioning is FreeTypes Harmony LCD rendering as mentioned in &lt;a href=&quot;https://freetype.org/freetype2/docs/reference/ft2-lcd_rendering.html&quot;&gt;their documentation&lt;/a&gt;.
It can handle different subpixel positions and is a pretty neat idea.&lt;/p&gt;
&lt;p&gt;A note on something not strictly font-rendering related: &lt;a href=&quot;https://github.com/nothings/stb/blob/master/stb_truetype.h&quot;&gt;stb_truetype&lt;/a&gt; is a small self-contained single header C file. No dependencies.
Skia is a huge project with a lot of dependencies and I don&#039;t event want to think about building it.
Depending on the project those aspects can be more important than quality (in my case they are).&lt;/p&gt;
&lt;h1&gt;The End&lt;/h1&gt;
&lt;p&gt;Phew! If anyone other than myself reads those last few words: You&#039;ve just unlocked the endurance achievement, congrats. 🙂
I hope those mad ramblings were interesting or at least entertaining. And maybe they spared you a bit of pain and suffering.
If you have any related tidbits or ideas please drop a comment or send me a mail.
May all your glyphs look sharp enough.&lt;/p&gt; 
		</content>
	</entry>
	<entry>
		<title>Rectangle rendering benchmark</title>
		<author>
			<name>Stephan Soller</name>
		</author>
		<id>http://arkanis.de/weblog/2023-07-10-rectangle-rendering-benchmark</id>
		<link rel="alternate" href="https://arkanis.de/weblog/2023-07-10-rectangle-rendering-benchmark/" />
		<link rel="replies" type="text/html" href="https://arkanis.de/weblog/2023-07-10-rectangle-rendering-benchmark/#comments" />
		<link rel="replies" type="application/atom+xml" href="https://arkanis.de/weblog/2023-07-10-rectangle-rendering-benchmark/comments.xml" />
		<updated>2023-07-10T21:46:49+00:00</updated>
		<content type="html" xml:base="https://arkanis.de/weblog/2023-07-10-rectangle-rendering-benchmark/">
&lt;p&gt;Well, it happened again. I was just writing a UI system in C (different story) and thought about the performance of different ways to render rectangles. Most people would just look around a bit and choose a suitably impressive or &amp;quot;modern&amp;quot; approach and be done with it. Some would even research the topic for a bit and then choose the appropriate approach for their environment. Well, I did a bit of both, got distracted for a few weeks and somehow ended up with another small benchmark about UI rectangle rendering… I didn&#039;t do it on purpose! It just happened! Could happen to anybody, right? Ok… maybe there was a bit of purpose involved. 😄&lt;/p&gt;
&lt;p&gt;Aaanyway, most UIs today don&#039;t generate the pixels you see on screen directly. Instead they create a list of what needs to be drawn and then tell the GPU to do it. I wanted to know how long that GPU rendering part takes. Given the UI for a text editor, does it take 10ms, 1ms or 0.1ms? After all, everything before that point depends on how I choose to design the UI system. But I can&#039;t choose to create a graphical user interface that doesn&#039;t show anything. So that rendering part is my hard performance limit and I can&#039;t get faster than that.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; On a desktop system it takes between 1ms and 0.1ms to render a text editor UI with a lot of characters (6400 mostly small rects). At least when rendering is done using one large buffer for rectangle data and one draw call. ~1ms on low-end systems (AMD A10-7850K Radeon R7 and i7-8550U with UHD 620) and ~0.1ms on more decent GPUs (Radeon RX 480, Arc A770, GeForce GTX 1080).&lt;/p&gt;
&lt;h1&gt;Results (without context)&lt;/h1&gt;
&lt;p&gt;In case you just want to dive in, here are the results for a text editor and mediaplayer dummy UI. Those charts are huge, you have been warned:&lt;/p&gt;
&lt;figure&gt;
    &lt;a href=&quot;sublime.svg&quot;&gt;&lt;img src=&quot;sublime.svg&quot;&gt;&lt;/a&gt;
&lt;/figure&gt;
&lt;figure&gt;
    &lt;a href=&quot;mediaplayer.svg&quot;&gt;&lt;img src=&quot;mediaplayer.svg&quot;&gt;&lt;/a&gt;
&lt;/figure&gt;
&lt;p&gt;Yes, you&#039;ll have to scroll around a lot. Those two images contain the measured times for different approaches (on the left) on different hardware (on the top).&lt;/p&gt;
&lt;p&gt;I was also interested in how the different approaches compare to one another on the same hardware. Basically which approach I should use for my UI system. So here are the same diagrams but normalized to the &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; approach. Meaning that the values for &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; are always &lt;code&gt;1.0&lt;/code&gt;. But if e.g. the draw calls of &lt;code&gt;ssbo_instr_list&lt;/code&gt; take 3 times longer on the GPU that bar would now show a &lt;code&gt;3.00&lt;/code&gt;. Note that the visual bars are capped at 5.0, otherwise the diagram would be mindbogglingly huge.&lt;/p&gt;
&lt;figure&gt;
    &lt;a href=&quot;sublime-normalized.svg&quot;&gt;&lt;img src=&quot;sublime-normalized.svg&quot;&gt;&lt;/a&gt;
&lt;/figure&gt;
&lt;figure&gt;
    &lt;a href=&quot;mediaplayer-normalized.svg&quot;&gt;&lt;img src=&quot;mediaplayer-normalized.svg&quot;&gt;&lt;/a&gt;
&lt;/figure&gt;
&lt;p&gt;Please don&#039;t use the data to argue that one GPU is better than another. Rectangle rendering is a trivial usecase compared to games and the workload is very different. The first set of charts is there to get a rough feeling for how long it takes to render a UI, the second set of chars to compare the approaches with each other.&lt;/p&gt;
&lt;p&gt;I&#039;ve put the &lt;a href=&quot;https://github.com/arkanis/gl-4.5-rectangle-drawing-benchmark/tree/master&quot;&gt;raw data and source code&lt;/a&gt; into a repository. Take a look if you want, but be warned, this is a hobby project. The code is pretty ugly and there&#039;s a lot of data. I don&#039;t get paid enough for that to think about how other programmers read the source code of my crazy experiments.&lt;/p&gt;
&lt;h1&gt;Context (aka what all that means)&lt;/h1&gt;
&lt;p&gt;My target systems were linux and windows desktops. No smartphones, no tables, no Macs. Mostly because developing for them is a pain and at that point I&#039;m doing this for fun.&lt;/p&gt;
&lt;p&gt;I used OpenGL 4.5 as an API. It&#039;s well supported on Linux and Windows and the Direct State Access API makes it quite pleasant to work with (no more binding stuff). I though about using Vulkan but it would be a lot more hassle and when I made that decision there was no Mesa driver for nVidia hardware (I&#039;m using an AMD GPU but I didn&#039;t want to close off that option). Mac only supports OpenGL 4.3, but I don&#039;t care about that system either.&lt;/p&gt;
&lt;h2&gt;UI scenarios&lt;/h2&gt;
&lt;p&gt;There are 2 dummy UI scenarios. Both look like crap because this isn&#039;t about the quality of font rendering, etc. I just wanted to get somewhat realistic lists of rectangles and textures.&lt;/p&gt;
&lt;figure&gt;
    &lt;figcaption&gt;&lt;q&gt;sublime&lt;/q&gt; scenario, 6400 rectangles with an average area of 519px². Mostly small glyph rectangles but also some larger ones (window, sidebar, etc.) that push up the average.&lt;/figcaption&gt;
    &lt;a href=&quot;benchmark-scenario-sublime.png&quot;&gt;&lt;img src=&quot;benchmark-scenario-sublime.png&quot;&gt;&lt;/a&gt;
&lt;/figure&gt;
&lt;figure&gt;
    &lt;figcaption&gt;&lt;q&gt;mediaplayer&lt;/q&gt; scenario, 53 rects with an average area of 55037px². Not a lot of text, hence the bigger rectangles dominate.&lt;/figcaption&gt;
    &lt;a href=&quot;benchmark-scenario-mediaplayer.png&quot;&gt;&lt;img src=&quot;benchmark-scenario-mediaplayer.png&quot;&gt;&lt;/a&gt;
&lt;/figure&gt;
&lt;p&gt;Both scenarios are pretty extreme cases. &lt;code&gt;sublime&lt;/code&gt; has a lot of small textured rectangles that show glyphs, meaning more load on vertex shaders. &lt;code&gt;mediaplayer&lt;/code&gt; has a lot fewer rects where the larger ones really dominate, resulting in more load on fragment shaders. I figured most real UIs would be somewhere in between those two extremes.&lt;/p&gt;
&lt;p&gt;Btw. that image in the mediaplayer is &lt;a href=&quot;https://www.deviantart.com/arsenixc/art/Clouds-Battle-99516571&quot;&gt;Clouds Battle by arsenixc&lt;/a&gt;. I just wanted something interesting to look at during development.&lt;/p&gt;
&lt;h2&gt;Measurement and benchmark procedure&lt;/h2&gt;
&lt;p&gt;From a measurement perspective each approach basically looks like this:&lt;/p&gt;
&lt;figure&gt;
    &lt;figcaption&gt;Benchmarked stages of the rendering process&lt;/figcaption&gt;
    &lt;img src=&quot;benchmark-diagram.png&quot;&gt;
&lt;/figure&gt;
&lt;p&gt;For each of those stages I measured different things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU walltime, measured with &lt;code&gt;gettimeofday()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Consumed CPU time of the process, measured with &lt;code&gt;clock_gettime()&lt;/code&gt; (Linux) and &lt;code&gt;GetProcessTimes()&lt;/code&gt; (Windows).&lt;/li&gt;
&lt;li&gt;GPU elapsed time, measured with &lt;code&gt;glBeginQuery(GL_TIME_ELAPSED, …)&lt;/code&gt; and &lt;code&gt;glEndQuery(GL_TIME_ELAPSED)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;GPU timestamps with &lt;code&gt;glQueryCounter(…, GL_TIMESTAMP)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The GPU times are read after each frame so there is a pipeline stall there. I was to lazy to use multiple buffers for that. Also &lt;code&gt;GL_TIME_ELAPSED&lt;/code&gt; timers seem to have a lot better resolution, so those in the benchmark results above. If something takes 100µs (that&#039;s thousands of a millisecond) it doesn&#039;t really matter, e.g. 111.200µs GL_TIMESTAMP vs. 107.680µs GL_TIME_ELAPSED. But for very short operations it can differ drastically: 3.840µs GL_TIMESTAMP vs. 0.640µs GL_TIME_ELAPSED. That was on a Radeon RX 480 on Linux (Mesa driver) and the Intel Arc A770 on Windows looks similar. On a GeForce GTX 1080 the difference wasn&#039;t quite so drastic (2.048µs GL_TIMESTAMP vs. 1.024µs GL_TIME_ELAPSED) but the resolution seems to be just a 10th of the other cards.&lt;/p&gt;
&lt;p&gt;Measuring the consumed CPU time was a similar problem. That is the CPU time all threads in the entire process consumed. I wanted to know that to catch any processing that doesn&#039;t happen during the frame (driver overhead, pipelined drivers, etc.) and it&#039;s generally a nice indicator of how costly a given approach is on the CPU in total. Unfortunately on Windows the resolution of &lt;code&gt;GetProcessTimes()&lt;/code&gt; isn&#039;t good enough to do that each frame. A lot of the values simply came out as 0 for most frames. Linux doesn&#039;t have that problem.&lt;/p&gt;
&lt;p&gt;My workaround was to measure the walltime and consumed CPU time for 100 frames and  divide it by 100. Unfortunately this includes the benchmarking code itself, which does some pretty hefty &lt;code&gt;printf()&lt;/code&gt; calls to log the data. But that is the same overhead for all approaches. So if one approach consumes 40 times more CPU time than another (I kid you not) it&#039;s pretty inefficient, benchmarking overhead or not.&lt;/p&gt;
&lt;p&gt;Dumping all that data into a bar chart would be useless, so the result charts only show the interesting measurements. Those are also depicted in the stage diagram above. And yet the result chars are still pretty unwieldy. 🙁&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Approach walltime (with overhead): The CPU walltime to render 100 frames with a given approach, including the time it took to measure and log the data. Then divided by 100 to get an average per-frame time.&lt;/li&gt;
&lt;li&gt;Approach CPU time (with overhead): Consumed CPU time of all threads in the process to render 100 frames. Again including the benchmarking overhead, again divided by 100.&lt;/li&gt;
&lt;li&gt;Frame CPU walltime: The CPU walltime from the start of the frame until after the call of &lt;code&gt;SDL_GL_SwapWindow()&lt;/code&gt;. Pretty much how long the CPU takes to tell the GPU what to do. This does not include the pipeline stall to read the GPU times (but maybe &lt;code&gt;SDL_GL_SwapWindow()&lt;/code&gt; does a pipeline stall, I don&#039;t know).&lt;/li&gt;
&lt;li&gt;Buffer generation CPU walltime: Almost all approaches need to prepare one large buffer with all the rectangle data for the GPU. This is how long the CPU takes for only that stage.&lt;/li&gt;
&lt;li&gt;Buffer upload GPU elapsed time: GPU time spend to transfer the rectangle buffer from the CPU memory into GPU memory. Measured with &lt;code&gt;GL_TIME_ELAPSED&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Draw calls CPU walltime: CPU walltime spend to call the OpenGL draw calls. Basically how expensive it is to tell the OpenGL driver what we want from it.&lt;/li&gt;
&lt;li&gt;Draw calls GPU elapsed time: GPU time spend actually executing those draw calls on the GPU. This is how expensive the operation actually is for the hardware. Measured with &lt;code&gt;GL_TIME_ELAPSED&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Present GPU elapsed time: The GPU time spend in &lt;code&gt;SDL_GL_SwapWindow()&lt;/code&gt;, whatever that is. I hope it&#039;s just the driver handing off the rendered frame to the window manager or DWM. But I added it to catch drivers that defer some work into that stage. Not sure what to make of the measurements, though. On Linux it&#039;s pretty much nothing, on Windows it can sometimes take longer than the rectangle rendering itself.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The benchmark itself simply runs every approach, each one rendering 100 frames without vsync. The benchmark program is then run 5 times in a row. Sometimes it&#039;s just slower, maybe because the benchmark starts on an unlucky core or something. Hence the 5 repeats. The average of those 500 samples is then taken to get the per-frame values in the result charts. Except for &amp;quot;approach walltime&amp;quot; and &amp;quot;approach CPU time&amp;quot;, there it&#039;s just an average of 5 values divided by 100.&lt;/p&gt;
&lt;p&gt;Thanks to my brother we could run the benchmark on a lot of different systems. He has quite a lot of different hardware and that way I got data for the Intel Arc A770 GPU, older and newer nVidia GPUs and a lot of different AMD GPUs. 🙂&lt;/p&gt;
&lt;h2&gt;Approaches&lt;/h2&gt;
&lt;p&gt;Now we&#039;re on to the meat of the thing.&lt;/p&gt;
&lt;p&gt;The approaches I implemented are a somewhat random mishmash of ideas. No all approaches do the same. Some just implement basic textured rectangles while others offer more complex rendering features (like signed distance fields, rounded corners, etc.). That was just my laziness and the observation that it doesn&#039;t make that much of an impact and my data is pretty noisy anyway. Remember, I&#039;m doing that just for fun.&lt;/p&gt;
&lt;p&gt;I&#039;ll compare most approaches to the &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; approach. That was the most promising one when I had the idea to normalize the data, so it became the reference point.&lt;/p&gt;
&lt;p&gt;For details feel free to look into &lt;a href=&quot;https://github.com/arkanis/gl-4.5-rectangle-drawing-benchmark/blob/master/source/26-bench-rect-drawing.c&quot;&gt;26-bench-rect-drawing.c&lt;/a&gt;. Again, be warned, this is ugly code. 😉&lt;/p&gt;
&lt;h3&gt;1rect_1draw&lt;/h3&gt;
&lt;p&gt;One draw call per rectangle. The rectangle data is set via uniforms and no buffer is needed. But that approach it pretty much maximum API overhead personified. Doesn&#039;t make much sense, but I wanted to see how spectacular the fireball is. Implemented textured rectangles, colored glyphs and rounded rectangles with borders for that one.&lt;/p&gt;
&lt;p&gt;In the &lt;code&gt;mediaplayer&lt;/code&gt; scenario with it&#039;s 53 rects it doesn&#039;t really matter. It consumes a bit more &amp;quot;Draw calls CPU walltime&amp;quot; but that&#039;s it. This scenario performs similarly for pretty much all approaches, so from here on out I&#039;ll only talk about the &lt;code&gt;sublime&lt;/code&gt; scenario.&lt;/p&gt;
&lt;p&gt;That one has 6400 rects and this approach just dies, with frame CPU walltimes being 20x to almost 40x slower than &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; (but on one slow system it was just 2x slower). The draw calls CPU walltime explodes even more spectacularly, mostly being a few &lt;em&gt;hundred&lt;/em&gt; times slower than &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;. Now that is API overhead. 😄&lt;/p&gt;
&lt;p&gt;To put it into somewhat real numbers, on my Radeon RX 480 Linux system this means the following (&lt;code&gt;1rect_1draw&lt;/code&gt; vs. &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;):&lt;br /&gt;
Frame CPU walltime: 3.699ms vs. 0.095ms&lt;br /&gt;
Draw calls CPU walltime: 3.651ms vs. 0.004ms&lt;br /&gt;
Draw calls GPU elapsed time: 0.236ms vs. 0.142ms&lt;/p&gt;
&lt;h3&gt;complete_vbo&lt;/h3&gt;
&lt;p&gt;The simplest &amp;quot;classic&amp;quot; way to render rectangles with one large vertex buffer in OpenGL. Six vertices per rectangle (2 triangles, each 3 vertices) with the per-rectangle data duplicated in each vertex. A vertex array object (VAO) and vertex attributes are setup to feed the buffer into the vertex shader. Again implemented textured rectangles, colored glyphs and rounded rectangles with borders.&lt;/p&gt;
&lt;p&gt;Buffer generation CPU walltime is mostly 3 to 4 times slower than &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; which is no surprise. It generates pretty much 6 times as much data because &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; just uses a list of rectangles instead of vertices. Frame CPU walltime is mostly 2x that of &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;And in numbers on my Radeon RX 480 Linux system (&lt;code&gt;complete_vbo&lt;/code&gt; vs. &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;):&lt;br /&gt;
Frame CPU walltime: 0.340ms vs. 0.095ms&lt;br /&gt;
Buffer generation CPU walltime: 0.123ms vs 0.032ms&lt;br /&gt;
Draw calls CPU walltime: 0.004ms vs. 0.004ms&lt;br /&gt;
Draw calls GPU elapsed time: 0.254ms vs. 0.142ms&lt;/p&gt;
&lt;h3&gt;one_ssbo&lt;/h3&gt;
&lt;p&gt;That approach was inspired by how compute shaders read their input (a more &amp;quot;modern&amp;quot; approach). It uses one shader storage buffer object (SSBO) with a list of rectangle data. An empty VAO is setup without any vertex attributes, so no data is fed into the vertex shader by OpenGL. But each shader can access the SSBO. Then 6 vertex shaders per rectangle are spawned, each shader uses the &lt;code&gt;gl_VertexID&lt;/code&gt; to calculate the index of the rectangle it belongs to, reads the rectangle data and puts itself on the proper screen position.&lt;/p&gt;
&lt;p&gt;This bypasses the whole VAO and vertex attribute mess of the OpenGL API, which is nice. But on the flip side we can only read &lt;code&gt;uint&lt;/code&gt;s, &lt;code&gt;float&lt;/code&gt;s, &lt;code&gt;vec4&lt;/code&gt;s and stuff. Not single bytes. So we have to unpack 8bit values from 32bit (e.g. colors) via bitshifts and bitmasks or &lt;code&gt;bitfieldExtract()&lt;/code&gt; or &lt;code&gt;unpackUnorm4x8()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Again implemented textured rectangles, colored glyphs and rounded rectangles with borders.&lt;/p&gt;
&lt;p&gt;This one is usually about as fast as &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;. Sometimes a bit faster, sometimes a bit slower. The draw calls GPU elapsed time is usually a bit on the faster side but &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; can do more complex rendering so this is to be expected.&lt;/p&gt;
&lt;p&gt;On my Radeon RX 480 Linux system:&lt;br /&gt;
Frame CPU walltime: 0.084ms vs. 0.095ms&lt;br /&gt;
Buffer generation CPU walltime: 0.025ms vs 0.032ms&lt;br /&gt;
Draw calls CPU walltime: 0.004ms vs. 0.004ms&lt;br /&gt;
Draw calls GPU elapsed time: 0.137ms vs. 0.142ms&lt;/p&gt;
&lt;p&gt;All in all a pretty robust approach. Works well and is fast on pretty much any system the benchmark was run on, but I hope you like bit twiddling and memory layouts. 😉&lt;/p&gt;
&lt;h3&gt;ssbo_instr_list&lt;/h3&gt;
&lt;p&gt;This approach also uses one SSBO to store the rectangle data. But what kind of processing the fragment shader should do is stored in a second &amp;quot;instruction list&amp;quot; SSBO. For example if you just want to render a glyph there would be one entry in the rect list and one glyph rendering instruction for that rect in the instruction list. If you want to render a rounded rect with two borders you can do that with 3 instructions.&lt;/p&gt;
&lt;p&gt;This makes each entry in the rect list smaller since it doesn&#039;t need to contain the data for all possible rendering features. Each instruction was also packed into 64 bit, making them fairly small. But the primary idea behind that approach was flexibility. I added instructions for different kinds of signed distance fields (rounded rect, circle, lines, etc.) and wanted to combine them with border drawing, drop shadow drawing and so on. And you would only have to pay that memory and processing cost for rects that actually use those instructions. Again, inspired by how you would do it with compute shaders. Sounds nice, doesn&#039;t it?&lt;/p&gt;
&lt;p&gt;The frame CPU walltime is mostly a bit faster than &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;. But for draw calls GPU elapsed time it depends. AMD mostly agrees with that approach. There the GPU time is roughly about as fast as &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;. Sometimes it just takes 0.6 times, sometimes 2.0 times as much GPU time. nVidia and Intel disagree. 🙁 On nVidia it takes about 2.0 times as much GPU time, on Intel 3 to 4 times. Seems like they really don&#039;t like that the fragment shader reads a variable number of instructions, even if most rects just read one (glyph instruction).&lt;/p&gt;
&lt;p&gt;On my Radeon RX 480 Linux system (funnily enough on Windows this approach was twice as fast):&lt;br /&gt;
Frame CPU walltime: 0.136ms vs. 0.095ms&lt;br /&gt;
Buffer generation CPU walltime: 0.019ms vs 0.032ms&lt;br /&gt;
Draw calls CPU walltime: 0.004ms vs. 0.004ms&lt;br /&gt;
Draw calls GPU elapsed time: 0.288ms vs. 0.142ms&lt;/p&gt;
&lt;p&gt;So this approach offers great flexibility, seems to be about as fast as &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; from the CPUs point of view but stresses Intel and nVidia GPUs a lot more.&lt;/p&gt;
&lt;h3&gt;ssbo_inlined_instr_6&lt;/h3&gt;
&lt;p&gt;Well, what if we read those processing instructions in the vertex shader instead of the fragment shader? That way each rect would only read the instructions 6 times from GPU global memory (once for each vertex). Of course we then need to transfer those instructions from the vertex shaders to the fragment shaders. In this case I used an &lt;code&gt;uvec2[6]&lt;/code&gt; array, aka 6 64 bit values. This limits the flexibility somewhat, but with 6 instructions you can do a lot of fancy stuff (e.g. drawing a rounded rect with 5 different borders).&lt;/p&gt;
&lt;p&gt;AMD and Intel GPUs like it. For those it usually takes ~1.1 times (AMD) or ~1.5 times (Intel) as much GPU time as &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;. Still not great, but think about the flexibility those instructions allow! 🙂 Well, nVidia just says no. They take 3 to 4 times as much GPU time. 🙁&lt;/p&gt;
&lt;p&gt;At that point I was very tempted to just ignore nVidia GPUs. They have how knows how many engineers and enough marketing people to tell the world 10 times over how great their GPUs are. Then I should not have to worry about iterating over an instruction list in the fragment shader. Maybe it&#039;s the data transfer from vertex to fragment shader (but other approaches use more and are a lot faster), the bit unpacking in the fragment shader, maybe its just the instruction loop, I don&#039;t know. In case anyone has an idea feel free to look at the &lt;a href=&quot;https://github.com/arkanis/gl-4.5-rectangle-drawing-benchmark/blob/master/source/26-bench-rect-drawing.c#L1707&quot;&gt;source code&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then, grudgingly, my professional attitude came back in. I wanted to see what works well on every GPU… well, the instruction list doesn&#039;t. 🙁&lt;/p&gt;
&lt;h3&gt;one_ssbo_ext_one_sdf&lt;/h3&gt;
&lt;p&gt;This is basically back to &lt;code&gt;one_ssbo&lt;/code&gt; but extended to support more fancy shapes within a rectangle. That works via signed distance functions to draw rounded rects, pie segments, polygons, etc. However it&#039;s limited to one fancy shape per rect, one border, etc. No more instruction list in there.&lt;/p&gt;
&lt;p&gt;It works reasonable well on all GPUs, hence it became the reference point to compare the approaches.&lt;/p&gt;
&lt;p&gt;On my Radeon RX 480 Linux system:&lt;br /&gt;
Frame CPU walltime: 0.095ms&lt;br /&gt;
Buffer generation CPU walltime: 0.032ms&lt;br /&gt;
Draw calls CPU walltime: 0.004ms&lt;br /&gt;
Draw calls GPU elapsed time: 0.142ms&lt;/p&gt;
&lt;h3&gt;inst_div&lt;/h3&gt;
&lt;p&gt;One last experiment. I hacked that one together pretty much as my brother was starting to run the benchmarks on various machines. 😄&lt;/p&gt;
&lt;p&gt;Anyway, its mostly the same code as &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt; but the data is not read from an SSBO but instead fed into the vertex shader via vertex attributes and instancing. Back to the &amp;quot;classic&amp;quot; OpenGL style. The idea is to have two vertex buffers: A small one with just 6 vertices (1 rectangle) and a larger one with per-rectangle data. The small vertex buffer is then draw with instancing, one instance per rectangle. The larger vertex buffer is setup in the VAO to advance by 1 for each instance (aka an instance divisor of 1). This lets the OpenGL driver generate the buffer decoding shader code, which is hopefully more optimized than my stuff. The API functions to setup the VAO (what stuff in the buffers goes into what vertex shader attribute) can be a bit confusing but the Direct State Access API makes that tolerable.&lt;/p&gt;
&lt;p&gt;Turns out the Intel Arc A770 really likes it. The GPU time goes down to 0.34x of &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;. On the GTX 1080 it&#039;s 0.67x. For the rest it&#039;s pretty much the same as &lt;code&gt;one_ssbo_ext_one_sdf&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;I&#039;ll probably use that approach in the end, even if it&#039;s less &amp;quot;modern&amp;quot;. Simply because I don&#039;t have to write the whole bit unpacking stuff in the vertex shader and the VAO setup isn&#039;t &lt;em&gt;that&lt;/em&gt; bad. At least as long as I use nice round types like 16 bit signed int coordinates. If I need to pack the data a bit tighter (e.g. 12 bit per coordinate) it&#039;s back to SSBOs and manual unpacking.&lt;/p&gt;
&lt;h1&gt;Quirks and closing remarks&lt;/h1&gt;
&lt;p&gt;I got sidetracked for way to long on this thing. But I guess it&#039;s nice to know how SSBOs and classical vertex attributes stack up against each other. Spoiler: It doesn&#039;t matter for this workload, use whatever floats your boat.&lt;/p&gt;
&lt;p&gt;Another interesting aside was the difference between Linux and Windows OpenGL drivers. I&#039;m using Linux and Windows on my Radeon RX 480 PC in a dual boot configuration (Linux for productive things, Windows for games or fooling around with the WinAPI… which is also like playing games). The result charts contain the benchmark on Linux and Windows, directly next to each other. So there&#039;s a direct comparison there. Some years ago the open source Mesa drivers had the reputation of being relatively slow. Doesn&#039;t look like it. They&#039;re basically just as fast as the Windows driver on my machine. Except for the &lt;code&gt;1rect_1draw&lt;/code&gt; approach. All that API overhead takes almost 3x as long on Linux. The benchmark also consumes a lot less CPU time on Linux but I have no idea what part of that is the OpenGL driver, window manager, just a more optimized &lt;code&gt;printf()&lt;/code&gt; function or whatever.&lt;/p&gt;
&lt;p&gt;During development I also looked at tooling. I&#039;ve been using &lt;a href=&quot;https://renderdoc.org/&quot;&gt;RenderDoc&lt;/a&gt; for a few years now and is pretty nice but it can&#039;t really tell you why a shader is slow. The &lt;a href=&quot;https://gpuopen.com/rgp/&quot;&gt;Radeon GPU Profiler&lt;/a&gt; looks pretty neat, especially those instruction timings, but it only supports Vulkan and DirectX 12. If anyone knows such an shader instruction timing tool for OpenGL 4.5 please let me know. On the other hand that might be a reason to rewrite the thing in Vulkan in a few years.&lt;/p&gt;
&lt;p&gt;I&#039;ve omitted a lot of detail (like the different CPUs), but the post is already too long as it is. In case you have any questions feel free to drop a comment or send me a mail.&lt;/p&gt;
&lt;p&gt;If you read the whole thing until here, color me impressed. Take a cookie, you deserve it. 😉&lt;/p&gt; 
		</content>
	</entry>
</feed>