Posts Tagged 'ruby'

Building (hackishly) rubygems from CMake-based tarballs

Here is my attempt was making a user-space bundle install of the SimGrid’s Ruby bindings. I created a gemspec for the source tarball in a hackish way. Basically I just made a system “cmake .” call in the extconf.rb file when building the gem extensions. Here’s the result:

$ cat Gemfile
source :rubygems

gem “simgrid-ruby”
$ SIMGRID_ROOT=/opt/simgrid-3.6.2 bundle install –path vendor


make install
[ 66%] Built target SG_ruby
[100%] Built target SG_rubydag
Install the project…
— Install configuration: “”
CMake Error at cmake_install.cmake:47 (FILE):
file cannot create directory: /usr/local/ruby/1.9.1/x86_64-linux. Maybe
need administrative privileges.

make: *** [install] Error 1

-edit Gemfile
$ cat Gemfile
source :rubygems

gem “simgrid-ruby”, :path => “~/tmp/simgrid-ruby” # where I have
simgrid-ruby.git cloned
$ SIMGRID_ROOT=/opt/simgrid-3.6.2 bundle install
/home/aespinosa/LocalDisk/opt/ruby/lib/ruby/1.9.1/yaml.rb:56:in `’:
Using simgrid-ruby (0.0.4890e3f) from source at ~/tmp/simgrid-ruby
Using bundler (1.0.21)
Updating .gem files in vendor/cache
* simgrid-ruby at `~/tmp/simgrid-ruby` will not be cached.
Your bundle is complete! It was installed into ./vendor
$ cd vendor/ruby/1.9.1/gems/simgrid-ruby-0.0.4890e3f/examples/
$ bundle exec ruby MasterSlave.rb
Tremblay:Master:(1) 0.000000] [ruby/INFO] args[0]=20
[Tremblay:Master:(1) 0.000000] [ruby/INFO] args[1]=50000000
[Tremblay:Master:(1) 0.000000] [ruby/INFO] args[2]=1000000
[Tremblay:Master:(1) 0.000000] [ruby/INFO] args[3]=4
[Tremblay:Master:(1) 0.000000] [ruby/INFO] Master Sending Task_0 to
slave 0 with Compute Size 50000000.0
[Tremblay:Master:(1) 0.215872] [ruby/INFO] Master Sending Task_1 to
slave 1 with Compute Size 50000000.0
[Tremblay:Master:(1) 0.381834] [ruby/INFO] Master Sending Task_2 to
slave 2 with Compute Size 50000000.0


[Bourassa:Slave:(2) 6.234851] [ruby/INFO] Slave ‘slave 0’ done
executing task Task_16.
[Fafard:Slave:(4) 6.243210] [ruby/INFO] Slave ‘slave 2’ done executing
task Task_18.
[Ginette:Slave:(5) 6.759426] [ruby/INFO] Slave ‘slave 3’ done
executing task Task_19.
[Tremblay:Master:(1) 6.772657] [ruby/INFO] Master : Everything’s Done
Simulation time : 6.77266
$

Sort of works! Basically since rubygems installed the entire manifest (I placed
everything in the tarball) on the gem directory, the lib/*.so and lib/*.rb files
somehow fall in place.
Problems arise in the build when the gem installer cannot replace the
RUBYARCHDIR string in the Makefile to install it in isolation
(https://github.com/rubygems/rubygems/blob/master/lib/rubygems/ext/builder.rb)

I made a fork of a repository with the gemspec package in
https://github.com/aespinosa/simgrid-ruby . For now this
hackish-setup satisfies my needs of a user-space installation of
simgrid-ruby bindings.

memcache-ing everything in an ActiveRecord model.

I was writing a script that was interactive with a database. The script was being invoked 400 times by a java program. It wasn’t too happy forking too many processes at a time. First I thought that mysql can only handle so many remote connections. Here is my attempt to reduce database load to almost 0 percent.

cache_fu was too dependent on being installed in Rails environment. Commenting out the code which referred to Rails variables made the ruby interpreter not complain. But according to the memcached server logs, nothing was being cached at all! I ended up using the low-level access API fauna.

Wrapping all the ActiveRecord::Base.find…() operations in the fauna-documented recipe, all of the SELECT * statements are now being cached. But the problem that remains is that the models still make ‘SHOW FIELDS..’ queries to the database whenever the object is first invoked. And since the script is being invoked 400 times, only bandwidth and round trip times were saved by the caching effort.

I poked through the metaprogramming examples in the pick-axe book and resulted to overriding my models like this:

require 'memcached'
require 'digest/md5'
require 'active_record'

class Variation < ActiveRecord::Base
  def self.digest
    Digest::MD5.hexdigest(self.to_s).to_s
  end

  def self.columns=(cached_columns)
    @cached_columns= cached_columns
    def self.columns
      @cached_columns
    end
    @cached_columns
  end
end

cache = Memcached.new
begin
  Variation.columns = cache.get(Variation.digest)
rescue Memcached::NotFound
  Variation.columns
  cache.set(Variation.digest, Variation.columns)
end
cache = nil

Now everything is cached!

FASTA splitting with BioRuby

In reference to my previous post, here’s the splitter using BioRuby.  Note that I also changed the outer loop to one file per iteration instead of some crazy rules of when to create the file.

#!/usr/bin/env ruby
#
# Script: dumpseq.rb [file] [N] [prefix]
# Description: Splits a fasta file evenl across N files.  dumps files in the
#              [prefix]  directory
require 'bio'
require 'fileutils'

include Bio


seqs =  FlatFile.open(ARGV[0])
ncpus = ARGV[1].to_i
prefix = ARGV[2]

# Remove and hardwire n_seqs if you know beforehand the number of sequences in
# a file.  Saves readtime
n_seqs = 0
seqs.each do |seq|
 n_seqs += 1
end
seqs.rewind

overflow = n_seqs % ncpus
split_size = n_seqs / ncpus

ncpus.times do |i|
  filename = sprintf "%s/D%07d/seq%07d.fasta", prefix, i, i
  FileUtils.mkdir_p File.dirname(filename)
  dump = File.new(filename, "w")
  split_size.times do |j|
    dump << seqs.next_entry.to_s
  end
  if i < overflow 
    dump << seqs.next_entry.to_s
  end
  dump.close
end