… and how to waste two days. This post is meant for all developers having the same kind of problem (searching for
options of the to_yaml
method) and because of that it's written in English.
While programming on the Simple Localization plugin for Ruby on Rails, a mail exchange with Roman Gonzalez
produced a nice idea: if a user accesses a key of the language file (a file containing YAML code) and that key does not
exist it should be automatically added to the language file.
So far so good. Loading YAML is easy:
YAML.load_file …
And saving YAML isn't hard as well:
File.open(target_file, 'wb') do |file|
YAML.dump data, file
end
However trying to change the way the data is converted to YAML… is a waste of time.
Why care?
Well, because the language files of the plugin are written by hand and therefore it is important
to keep the YAML code as clean as possible. Let's demonstrate this. Here is a snippet of the German language file:
dates:
abbr_monthnames: [Jan, Feb, Mär, Apr, Mai, Jun, Jul, Aug, Sep, Oct, Nov, Dez]
After loading it to Ruby and dumping it as YAML again it looks like this:
---
dates:
abbr_monthnames:
- Jan
- Feb
- "M\xC3\xA4r"
- Apr
- Mai
- Jun
- Jul
- Aug
- Sep
- Oct
- Nov
- Dez
I wouldn't say it screws up the language file, the YAML emitter (the thing constructing the YAML code) does every thing
right (the YAML code is working). However the keys are not ordered and you can not be sure to find a key where you left
it, after new YAML code is saved to a language file. Special characters (like German "umlauts") are escaped because the
generated YAML code is encoded in plain old ASCII. Most readers will also notice that the generated YAML code does not
use inline collections (eg. [a, b, c, …]
) and does everything in usual ordinary sequences (- a\n- b\n- c
). The final new
part added by the emitter is the document separator (---
).
Please, don't get me wrong. There's all right with the data stored in the YAML code, only the presentation isn't as clear
as it could be. The Simple Localization plugin heavily depends on hand written YAML files (as a place to store the localization
information) and I decided to use YAML for this because it's a very powerful and more important a simple way to write
information. It makes writing the language files almost painless, sometimes even fun. This is a very important part of the
plugin if not the core itself. But all the things that happen to the YAML file when it's loaded and saved back makes it very
annoying to work with the language files.
So, what do to? Usually Ruby libraries offer all kinds of options to manipulate their behavior. A short look at Rubys core docs
for the to_yaml method looks promising: to_yaml( opts = {} )
. So there's an opts
parameter, probably a hash
witch makes it possible to alter the way the YAML code is generated. A few searches later the documentation of the yaml4r
project shows the desired information: The Options Hash.
Nice, every thing we need. At least until you try it out. Options are fine but they are useless if they don't change anything.
I tried to specify the options to every YAML object that is involved in generating YAML code. A bit later I found a mail
at least mentioning this:
Q3.
The following options are defined in 'yaml/constants.rb':
:Indent => 2, :UseHeader => false, :UseVersion => false, :Version => '1.0',
:SortKeys => false, :AnchorFormat => 'id%03d', :ExplicitTypes => false,
:WidthType => 'absolute', :BestWidth => 80,
:UseBlock => false, :UseFold => false, :Encoding => :None
These options are intended to be used witch Object#to_yaml(),
YAML::Stream.new(), and YAML::Store.new().
But all except :Indent, :SortKeys, :ExplicitTypes and :UseBlock are not
available when I tried on Ruby 1.8.2.
What options are available in Ruby 1.8.2 and 1.8.3?
Most of these constants are now singleton methods in Ruby. I need to update documentation and remove the constants from yaml/constants.rb.
Ok, fired up irb
and searched everything YAML-like for singelton methods. No go. Nothing there, too.
After more than a day searching for the YAML options enough was enough. I started to read the source files in Rubys
yaml
directory. A bit of reading later I found out that Ruby uses the Syck library as a backend for it's YAML support.
Syck grew out of the pure Ruby YAML library used before the days YAML was added to Rubys core. Some files in the
yaml
directory are nothing else than old unused files left there for backward compatibility. The main work is done by the
pure C Syck library and a wrapper used to talk to Ruby.
Now I hoped that the pure Ruby Interface just uses an outdated way to pass the options to the Syck objects. However
analyzing Ruby and C code for a few hours (I've never written a C extension for Ruby) only brought this:
The wrapper around the Syck objects does not pass the options at all. It just stores them in an instance variable and leaves
them there.
I ended up in the ext/syck/rubyext.c source file of the current Ruby trunk in the syck_emitter_reset
function
(YAML::Syck::Emitter#initialize
in Ruby). It does get the options but does not use them to set options to the Syck
emitter object. A list of supported options can be found in the syck_new_emitter
function in ext/syck/emitter.c.
Since the Syck functions can only be accessed via C code this could only be fixed in C. Maybe I'll create a patch for
Ruby to fix this but this will not solve my problems with the Simple Localization plugin (at least not in time). Maybe
Ruby Inline can help here but for now I'll focus on other features still on the plugins todo list.
I just wrote all this down to spare others from searching for this for to long. Please notice that I'm not experience in
reading C extensions for Ruby and therefore I may have missed a thing or two. If there is a way for this to work I've
overlooked please let me know.
At the end I still like Ruby and YAML. Rubys support for YAML is pretty impressive neverless. Because of Rubys and
YAMLs powerful natures the little details can just get very tricky sometimes…