Archive for the 'Soft Hacks' Category

Testing recipe definitions with chefspec

Last time [1], we wrote chefspec tests to cover the behavior of the nginx::default recipe.  Another component provided by the recipe, is the nginx_site definition.  Similar to the previous approach we cover the definition’s resources for each possible track the recipe can go through.  The following is the output when the specs succeed.

  #nginx_site (definition)
    #enable => true (default)
      should execute command "/usr/sbin/nxensite foo"
      should notify "service[nginx]" and "reload"
    #enable => false
      should execute command "/usr/sbin/nxdissite foo"
      should notify "service[nginx]" and "reload"

To go about covering the behavior, we start driving TDD by creating failing specs staring with the default case:

context "#nginx_site (definition)" do
  context "#enable => true (default)" do
    it { expect(chef_run).to execute_command "/usr/sbin/nxensite foo" }
    it do
      expect(chef_run.execute("nxensite foo"))
      .to notify("service[nginx]", "reload")

The above specs will fail because nginx::default doesn’t to make a nginx_site(“foo”) call.  In minitest or test-kitchen, the approach was to create a test recipe that will conduct this integration test.  However, for isolated chefspecs, this is too heavy weight.  Code diving into the Chef 11.4.4 documentation and code [2], every recipe file is loaded by an instance_eval call on Recipe objects.  Hence we can make this approach to inject a fake recipe.  Below, we created a recipe called “nginx_spec::default” with using the existing run context from the “nginx::default” run.

def fake_recipe(run, &block)
  recipe ="nginx_spec", "default", run.run_context)

# call to the spec example

recipe = fake_recipe(chef_run) do
  nginx_site "foo" do
    enable enabled

Next, we create a new ChefSpec::ChefRunner instance by appending our internally-created “nginx_spec::default” recipe to the existing “nginx::default” run.  For this we monkey-patch chefspec to converge the added recipe:

class ChefSpec::ChefRunner
  def append(recipe)
    runner =

# usage in our examples:
new_run = chef_run.append(recipe)

Now we can change our specs to use this new runner context instead to make our expectations.  Below is the whole context that makes the spec pass:

context "#enable => true (default)" do
  let(:run) do
    recipe = fake_recipe(chef_run) do
      nginx_site "foo"

  it { expect(run).to execute_command "/usr/sbin/nxensite foo" }

  it do
    expect(run.execute("nxensite foo"))
    .to notify("service[nginx]", "reload")

And now we have the nginx_site definition covered.  I made commits on the changes in my fork [3] of the opscode cookbook.  Although the approach is useful, this intrusive monkey-patching to chefspec (which is itself a monkey-patch on chef) shows why folks at Opscode recommend to use LWRPS into new recipe development as you can monitor the state of the new resource itself.  With definitions, you have to track the state of the resources made inside the definition action and provide the necessary spec.  This also has implications when you are driving the recipes via TDD to use the nginx_site definition.  I will cover testing that in another post.


Behavior-driven development of research code

I’ve been reading a lot of papers on optimizing DAG-based workflows and supporting papers on scheduling for my research. Most of the time, I just characterized existing software code on existing systems. This is my first time writing scheduler code from scratch. I started by programming by wishful thinking but got stumped on how a scheduler workflow actually looks like.

In parallel, I have been reading the The Cucumber Book for use in a startup I’m bootstrapping with some friends. I thought that maybe it can help my graduate thought process as well. So I decided to drive my scientific-methodish thinking on a Gherkin file. So here it goes:

Feature: Pipeline workflow optimizations
  We hypothesize that a optimizing a pipeline will have a better load balance
  than a data-aware scheduler.

  Scenario: Load balance comparison of DAG optimization and data-aware scheduler
    Given A pipeline workload with parametarized data
    When We obtain the load from a data-aware scheduler
    And We obtain the load from the DAG-optimized version
    Then DAG-optimized load is more balanced

Here I used the “Then” clause to describe my hypothesis and the “Given” and “When” clauses to describe the experiment that attempts to verify the hypothesis. In my BDD thought process based on the chapter, “Working with Legacy Code”, if “Then” succeeds we accept the hypothesis else we reject the hypothesis and write new Given and When steps.

If you take a look at my Git commit history, I wrote a lower-level .feature file first that describes how a data-aware schedule should work. Taking a short walk and looking back at my Gherkin features, my ‘stakeholder hat’ started to kick in when I read the features again. Hopefully I’m Cuking from the outside correctly.

Building (hackishly) rubygems from CMake-based tarballs

Here is my attempt was making a user-space bundle install of the SimGrid’s Ruby bindings. I created a gemspec for the source tarball in a hackish way. Basically I just made a system “cmake .” call in the extconf.rb file when building the gem extensions. Here’s the result:

$ cat Gemfile
source :rubygems

gem “simgrid-ruby”
$ SIMGRID_ROOT=/opt/simgrid-3.6.2 bundle install –path vendor

make install
[ 66%] Built target SG_ruby
[100%] Built target SG_rubydag
Install the project…
— Install configuration: “”
CMake Error at cmake_install.cmake:47 (FILE):
file cannot create directory: /usr/local/ruby/1.9.1/x86_64-linux. Maybe
need administrative privileges.

make: *** [install] Error 1

-edit Gemfile
$ cat Gemfile
source :rubygems

gem “simgrid-ruby”, :path => “~/tmp/simgrid-ruby” # where I have
simgrid-ruby.git cloned
$ SIMGRID_ROOT=/opt/simgrid-3.6.2 bundle install
/home/aespinosa/LocalDisk/opt/ruby/lib/ruby/1.9.1/yaml.rb:56:in `’:
Using simgrid-ruby (0.0.4890e3f) from source at ~/tmp/simgrid-ruby
Using bundler (1.0.21)
Updating .gem files in vendor/cache
* simgrid-ruby at `~/tmp/simgrid-ruby` will not be cached.
Your bundle is complete! It was installed into ./vendor
$ cd vendor/ruby/1.9.1/gems/simgrid-ruby-0.0.4890e3f/examples/
$ bundle exec ruby MasterSlave.rb
Tremblay:Master:(1) 0.000000] [ruby/INFO] args[0]=20
[Tremblay:Master:(1) 0.000000] [ruby/INFO] args[1]=50000000
[Tremblay:Master:(1) 0.000000] [ruby/INFO] args[2]=1000000
[Tremblay:Master:(1) 0.000000] [ruby/INFO] args[3]=4
[Tremblay:Master:(1) 0.000000] [ruby/INFO] Master Sending Task_0 to
slave 0 with Compute Size 50000000.0
[Tremblay:Master:(1) 0.215872] [ruby/INFO] Master Sending Task_1 to
slave 1 with Compute Size 50000000.0
[Tremblay:Master:(1) 0.381834] [ruby/INFO] Master Sending Task_2 to
slave 2 with Compute Size 50000000.0

[Bourassa:Slave:(2) 6.234851] [ruby/INFO] Slave ‘slave 0’ done
executing task Task_16.
[Fafard:Slave:(4) 6.243210] [ruby/INFO] Slave ‘slave 2’ done executing
task Task_18.
[Ginette:Slave:(5) 6.759426] [ruby/INFO] Slave ‘slave 3’ done
executing task Task_19.
[Tremblay:Master:(1) 6.772657] [ruby/INFO] Master : Everything’s Done
Simulation time : 6.77266

Sort of works! Basically since rubygems installed the entire manifest (I placed
everything in the tarball) on the gem directory, the lib/*.so and lib/*.rb files
somehow fall in place.
Problems arise in the build when the gem installer cannot replace the
RUBYARCHDIR string in the Makefile to install it in isolation

I made a fork of a repository with the gemspec package in . For now this
hackish-setup satisfies my needs of a user-space installation of
simgrid-ruby bindings.

ADEM on github

For my first Ruby project, I reimplemented ADEM from scratch. ADEM is tool for automatically installing applications on the Open Science Grid via pacman. Here is a glimpse of the interface.

ADEM is a tool for deploying and managing software on the Open Science Grid.

    adem command [options]

    adem config --display
    adem sites --update
    adem app --avail

  Further help:
    adem config -h/--help        Configure ADEM
    adem sites -h/--help         Manipulate the site list
    adem app -h/--help           Application installation
    adem help                    This help message

The original subversion repository is found in A Github repository is also mirrored in git://

Happy grid computing!

SwiftScript Vim syntax file

One weekend I was reading an article howto made a Vim syntax file. In application I decided to make one for the Swift workflow system. Most of the script contains simple word matches for some Swift keywords. Then I copied the matching rules for comments from the C syntax files in the standard Vim distribution. For a preview, checkout the screenshot below which uses the desert256.vim colorscheme in a gnome-terminal:

oops_swift by yecartes, on Flickr

Fig. a Swift workflow syntax highlighted

The syntax file can be downloaded from my graduate student code-shanty page.

memcache-ing everything in an ActiveRecord model.

I was writing a script that was interactive with a database. The script was being invoked 400 times by a java program. It wasn’t too happy forking too many processes at a time. First I thought that mysql can only handle so many remote connections. Here is my attempt to reduce database load to almost 0 percent.

cache_fu was too dependent on being installed in Rails environment. Commenting out the code which referred to Rails variables made the ruby interpreter not complain. But according to the memcached server logs, nothing was being cached at all! I ended up using the low-level access API fauna.

Wrapping all the ActiveRecord::Base.find…() operations in the fauna-documented recipe, all of the SELECT * statements are now being cached. But the problem that remains is that the models still make ‘SHOW FIELDS..’ queries to the database whenever the object is first invoked. And since the script is being invoked 400 times, only bandwidth and round trip times were saved by the caching effort.

I poked through the metaprogramming examples in the pick-axe book and resulted to overriding my models like this:

require 'memcached'
require 'digest/md5'
require 'active_record'

class Variation < ActiveRecord::Base
  def self.digest

  def self.columns=(cached_columns)
    @cached_columns= cached_columns
    def self.columns

cache =
  Variation.columns = cache.get(Variation.digest)
rescue Memcached::NotFound
  cache.set(Variation.digest, Variation.columns)
cache = nil

Now everything is cached!

Pasting Excel Charts in Word via Crossover Linux

Pasting a chart from MS Excel 2007 to Word 2007 will yield a To insert a chart, you must first close any open dialog boxes or cancel editing mode in Microsoft Office Excel error.  This bug is found in Crossover Linux Pro v8.0.0.  I assume that this will also occur in publicly available versions of Wine.

An alternative solution aside from booting to Windows is to use the Paste Special feature. Paste it as a Picture (Enhanced Metafile). This will preserve the vector information of the chart so zooming in and out of the document object wont’ be a problem.