Archive Page 2

ADEM on github

For my first Ruby project, I reimplemented ADEM from scratch. ADEM is tool for automatically installing applications on the Open Science Grid via pacman. Here is a glimpse of the interface.


ADEM is a tool for deploying and managing software on the Open Science Grid.

  Usage:
    adem command [options]

  Examples:
    adem config --display
    adem sites --update
    adem app --avail

  Further help:
    adem config -h/--help        Configure ADEM
    adem sites -h/--help         Manipulate the site list
    adem app -h/--help           Application installation
    adem help                    This help message

The original subversion repository is found in https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/adem-osg. A Github repository is also mirrored in git://github.com/aespinosa/adem.git

Happy grid computing!

広告

Correspondence on presidentiable’s science agenda

A few days ago, I sent facebook messages to all of the presidential candidates. It was hard to distinguish between fan-generated accounts and the real ones. Amazed by my own facebook stalking skills, I got replies from Dick Gordon and Eddie Villanueva. Here was my message to all of them:

hi,

I’m deciding on my ballot as an absentee voter. as a scientist, please point me to pages/documents about your science & innovation agenda. I sent this message to all the candidates’ facebook pages.

also, please make a response aside from “increase the R & D” budget. it’s like saying ‘world peace’ in a beauty pageant.

thank you

Response from Richard Gordon. Most of the initiatives he mentioned are highly googlable press articles. This guy know his technology stuff:

Allan, kindly look up track record on the net. My Subic cybercity program as well as GIS and GPS mapping implemented in the 90’s. Check also Project 143 and the geo-hazard mapping initiatives with the Red Cross to address disasters and calamities. V-12 program for Filipinos abroad to help promote tourism with telcos using sms is another initiative. Also go to www. senate.gov.ph for transcript of debates re my position on agricultural modernization, and ST and IT development. Thanks.

Response from Eddie Villanueva. I think the reply was referring to a section his platform page. Some of the party’s stand reitarated the current government’s policy such as mother tongue-based education. Also, I’m still waiting for a reply on the email address he gave in the response:

hello! please email your concern to bagongpilipinas@gmail.com
you may also want to visit http://broeddie.ph for the information that you might need.

Bangon Pilipinas!

I will update this page as soon as I get a response from the other candidates, provided that it is their real facebook account that I corresponded to.

SwiftScript Vim syntax file

One weekend I was reading an article howto made a Vim syntax file. In application I decided to make one for the Swift workflow system. Most of the script contains simple word matches for some Swift keywords. Then I copied the matching rules for comments from the C syntax files in the standard Vim distribution. For a preview, checkout the screenshot below which uses the desert256.vim colorscheme in a gnome-terminal:

oops_swift by yecartes, on Flickr

Fig. a Swift workflow syntax highlighted

The syntax file can be downloaded from my graduate student code-shanty page.

memcache-ing everything in an ActiveRecord model.

I was writing a script that was interactive with a database. The script was being invoked 400 times by a java program. It wasn’t too happy forking too many processes at a time. First I thought that mysql can only handle so many remote connections. Here is my attempt to reduce database load to almost 0 percent.

cache_fu was too dependent on being installed in Rails environment. Commenting out the code which referred to Rails variables made the ruby interpreter not complain. But according to the memcached server logs, nothing was being cached at all! I ended up using the low-level access API fauna.

Wrapping all the ActiveRecord::Base.find…() operations in the fauna-documented recipe, all of the SELECT * statements are now being cached. But the problem that remains is that the models still make ‘SHOW FIELDS..’ queries to the database whenever the object is first invoked. And since the script is being invoked 400 times, only bandwidth and round trip times were saved by the caching effort.

I poked through the metaprogramming examples in the pick-axe book and resulted to overriding my models like this:

require 'memcached'
require 'digest/md5'
require 'active_record'

class Variation < ActiveRecord::Base
  def self.digest
    Digest::MD5.hexdigest(self.to_s).to_s
  end

  def self.columns=(cached_columns)
    @cached_columns= cached_columns
    def self.columns
      @cached_columns
    end
    @cached_columns
  end
end

cache = Memcached.new
begin
  Variation.columns = cache.get(Variation.digest)
rescue Memcached::NotFound
  Variation.columns
  cache.set(Variation.digest, Variation.columns)
end
cache = nil

Now everything is cached!

Pasting Excel Charts in Word via Crossover Linux

Pasting a chart from MS Excel 2007 to Word 2007 will yield a To insert a chart, you must first close any open dialog boxes or cancel editing mode in Microsoft Office Excel error.  This bug is found in Crossover Linux Pro v8.0.0.  I assume that this will also occur in publicly available versions of Wine.

An alternative solution aside from booting to Windows is to use the Paste Special feature. Paste it as a Picture (Enhanced Metafile). This will preserve the vector information of the chart so zooming in and out of the document object wont’ be a problem.

FASTA splitting with BioRuby

In reference to my previous post, here’s the splitter using BioRuby.  Note that I also changed the outer loop to one file per iteration instead of some crazy rules of when to create the file.

#!/usr/bin/env ruby
#
# Script: dumpseq.rb [file] [N] [prefix]
# Description: Splits a fasta file evenl across N files.  dumps files in the
#              [prefix]  directory
require 'bio'
require 'fileutils'

include Bio


seqs =  FlatFile.open(ARGV[0])
ncpus = ARGV[1].to_i
prefix = ARGV[2]

# Remove and hardwire n_seqs if you know beforehand the number of sequences in
# a file.  Saves readtime
n_seqs = 0
seqs.each do |seq|
 n_seqs += 1
end
seqs.rewind

overflow = n_seqs % ncpus
split_size = n_seqs / ncpus

ncpus.times do |i|
  filename = sprintf "%s/D%07d/seq%07d.fasta", prefix, i, i
  FileUtils.mkdir_p File.dirname(filename)
  dump = File.new(filename, "w")
  split_size.times do |j|
    dump << seqs.next_entry.to_s
  end
  if i < overflow 
    dump << seqs.next_entry.to_s
  end
  dump.close
end

Splitting bioinformatics FASTA files

I keep forgetting where my scripts were in my home directories. Below is my ruby script to split a large FASTA [1] sequence into N sequences per file:

#!/usr/bin/env ruby
#
# Script: dumpseq.rb
# Description: Parses the a BLAST Fasta file and dumps each sequence to a 
#              file.
# Usage: dumpseq.rb [fasta_file]

require 'fileutils'


fasta_db  = File.new(ARGV[0])

sno = 0
d = 0

file = nil

while true
  x = fasta_db.readline("\n>").sub(/>$/, "")
  x =~ />(.*)\n/
  if sno % 2 == 0 # 2 seqs per query
    file.close if file != nil
    dir = sprintf("D%04d000", d / 1000)
    FileUtils.mkdir_p dir
    # short filenames
    fname = sprintf "SEQ%07d.fasta", d
    d += 1
    file = File.new("#{dir}/#{fname}","w")
  end
  file << x
  sno += 1
  fasta_db.ungetc ?>
end

Its pretty hackish-looking. But then I found out that BioRuby [2] wrappers for parsing FASTA files.

[1] http://en.wikipedia.org/wiki/Fasta
[2] http://www.bioruby.org