Welcome to Arkanis Development

Options for Rubys to_yaml method…


… and how to waste two days. This post is meant for all developers having the same kind of problem (searching for options of the to_yaml method) and because of that it's written in English.

While programming on the Simple Localization plugin for Ruby on Rails, a mail exchange with Roman Gonzalez produced a nice idea: if a user accesses a key of the language file (a file containing YAML code) and that key does not exist it should be automatically added to the language file.

So far so good. Loading YAML is easy:

YAML.load_file …

And saving YAML isn't hard as well:

File.open(target_file, 'wb') do |file|
  YAML.dump data, file

However trying to change the way the data is converted to YAML… is a waste of time.

Why care? Well, because the language files of the plugin are written by hand and therefore it is important to keep the YAML code as clean as possible. Let's demonstrate this. Here is a snippet of the German language file:

  abbr_monthnames: [Jan, Feb, Mär, Apr, Mai, Jun, Jul, Aug, Sep, Oct, Nov, Dez]

After loading it to Ruby and dumping it as YAML again it looks like this:

  - Jan
  - Feb
  - "M\xC3\xA4r"
  - Apr
  - Mai
  - Jun
  - Jul
  - Aug
  - Sep
  - Oct
  - Nov
  - Dez

I wouldn't say it screws up the language file, the YAML emitter (the thing constructing the YAML code) does every thing right (the YAML code is working). However the keys are not ordered and you can not be sure to find a key where you left it, after new YAML code is saved to a language file. Special characters (like German "umlauts") are escaped because the generated YAML code is encoded in plain old ASCII. Most readers will also notice that the generated YAML code does not use inline collections (eg. [a, b, c, …]) and does everything in usual ordinary sequences (- a\n- b\n- c). The final new part added by the emitter is the document separator (---).

Please, don't get me wrong. There's all right with the data stored in the YAML code, only the presentation isn't as clear as it could be. The Simple Localization plugin heavily depends on hand written YAML files (as a place to store the localization information) and I decided to use YAML for this because it's a very powerful and more important a simple way to write information. It makes writing the language files almost painless, sometimes even fun. This is a very important part of the plugin if not the core itself. But all the things that happen to the YAML file when it's loaded and saved back makes it very annoying to work with the language files.

So, what do to? Usually Ruby libraries offer all kinds of options to manipulate their behavior. A short look at Rubys core docs for the to_yaml method looks promising: to_yaml( opts = {} ). So there's an opts parameter, probably a hash witch makes it possible to alter the way the YAML code is generated. A few searches later the documentation of the yaml4r project shows the desired information: The Options Hash.

Nice, every thing we need. At least until you try it out. Options are fine but they are useless if they don't change anything. I tried to specify the options to every YAML object that is involved in generating YAML code. A bit later I found a mail at least mentioning this:


The following options are defined in 'yaml/constants.rb':

:Indent => 2, :UseHeader => false, :UseVersion => false, :Version => '1.0', :SortKeys => false, :AnchorFormat => 'id%03d', :ExplicitTypes => false, :WidthType => 'absolute', :BestWidth => 80, :UseBlock => false, :UseFold => false, :Encoding => :None These options are intended to be used witch Object#to_yaml(), YAML::Stream.new(), and YAML::Store.new(). But all except :Indent, :SortKeys, :ExplicitTypes and :UseBlock are not available when I tried on Ruby 1.8.2.

What options are available in Ruby 1.8.2 and 1.8.3?

Most of these constants are now singleton methods in Ruby. I need to update documentation and remove the constants from yaml/constants.rb.

Ok, fired up irb and searched everything YAML-like for singelton methods. No go. Nothing there, too.

After more than a day searching for the YAML options enough was enough. I started to read the source files in Rubys yaml directory. A bit of reading later I found out that Ruby uses the Syck library as a backend for it's YAML support. Syck grew out of the pure Ruby YAML library used before the days YAML was added to Rubys core. Some files in the yaml directory are nothing else than old unused files left there for backward compatibility. The main work is done by the pure C Syck library and a wrapper used to talk to Ruby.

Now I hoped that the pure Ruby Interface just uses an outdated way to pass the options to the Syck objects. However analyzing Ruby and C code for a few hours (I've never written a C extension for Ruby) only brought this:

The wrapper around the Syck objects does not pass the options at all. It just stores them in an instance variable and leaves them there.

I ended up in the ext/syck/rubyext.c source file of the current Ruby trunk in the syck_emitter_reset function (YAML::Syck::Emitter#initialize in Ruby). It does get the options but does not use them to set options to the Syck emitter object. A list of supported options can be found in the syck_new_emitter function in ext/syck/emitter.c.

Since the Syck functions can only be accessed via C code this could only be fixed in C. Maybe I'll create a patch for Ruby to fix this but this will not solve my problems with the Simple Localization plugin (at least not in time). Maybe Ruby Inline can help here but for now I'll focus on other features still on the plugins todo list.

I just wrote all this down to spare others from searching for this for to long. Please notice that I'm not experience in reading C extensions for Ruby and therefore I may have missed a thing or two. If there is a way for this to work I've overlooked please let me know.

At the end I still like Ruby and YAML. Rubys support for YAML is pretty impressive neverless. Because of Rubys and YAMLs powerful natures the little details can just get very tricky sometimes…

7 comments for this post

leave a new one

#1 by
Evan Pon

Thanks a bunch - I'm ending my YAML options research now - only wasted an hour or so. Much better than 2 days. Extra thanks for writing in English, as my German is almost completely forgotten (and was never good in the first place).

#2 by

You're welcome. Good to see that this helped someone. Maybe RbYAML (a pure Ruby YAML parser and emiter, http://rbyaml.rubyforge.org/) can solve your problems. However I haven't looked into this yet and I don't know if you can modify the emited YAML code with this library.

#3 by

I also want to thank you -- I was trying to figure out if one of the constants of the options Hash would fix an entirely different problem (YAML is storing information that it can't read back out, and I'm not sure why yet -- was trying to figure out if :UseBlock would do the trick).

#4 by
Gabe da Silveira

Thanks for the info. This is just incredibly irritating. I'm resorting my own hacked together to_yaml function with all kinds of hackery just to emulate the SortKeys option.

#5 by

Much thanks!!! You saved me a ton of time!

#7 by

Thanks for the information. Works well. Unfortunately sorting hash keys does not seem to work by overwriting the to_yaml_style method.

Leave a new comment

Having thoughts on your mind about this stuff here? Want to tell me and the rest of the world your opinion? Write and post it right here. Be sure to check out the format help (focus the large text field) and give the preview button a try.

Format help

Please us the following stuff to spice up your comment.

An empty line starts a new paragraph. ---- print "---- lines start/end code" ---- * List items start with a * or -