2 min read

Loading data from multiple files in Ruby using Hash

Syed Aslam

Lately, I have been working applications which require data from an external source to be loaded into the local database. One such application is Sydrea (in the making!), which requires the Drug information to be loaded. Drugs@FDA is freely downloadable compressed zip file which contains about 9 CSV files having Drug related data.

The usual path to data loading is through a rails runner script. I was tired of writing, testing different scripts to load data. I had to come up with one general script which, with minimal customizations, could load data into whatever table.

I created the required models and tables. Database schema for the tables to hold drug related information is available at the Drugs@FDA website. I started with defining a Hash mapping the file containing data, to the Object.

files = {
  'AppDoc.txt' => 'AppDoc',
  'AppDocType_Lookup.txt' => 'AppDocTypeLookup',
  'application.txt' => 'Application',
  'ChemTypeLookup.txt' => 'ChemicalTypeLookup',
  'DocType_lookup.txt' => 'DocTypeLookup',
  'product.txt' => 'Product',
  'Product_tecode.txt' => 'ProductTECode',
  'RegActionDate.txt' => 'RegActionDate',
  'ReviewClass_Lookup.txt' => 'ReviewClassLookup'
}

What I need to do now is iterate over this hash, read file whose name is given by the Key and create new objects of the type Value. To the AR's new method you can pass a hash with key names matching the associated table column names. The column names are given by the file headers. Now, I need a method which would take two arrays and return me a hash. Like given

arr1 = ['col_name1', 'col_name2']
arr2 = ['val1', 'val2']

would return

{
  'col_name1' => 'val1',
  'col_name2' => 'val2'
}

I came up with a method that does exactly that and I added it to the Array class:

class Array
  def self.to_hash(headers, values)
    hsh = Hash.new
    headers.each_with_index do |h, i|
      hsh[h.underscore] = values[i]
    end

    hsh
  end
end

Then, we're all set to put everything together and load data:

require 'rubygems'
require 'fastercsv'

files.each do |key, value|
  file = "#{RAILS_ROOT}/db/drugsatfda/" + key
  recs = 0

  puts "Working with #{value.pluralize}.."
  FasterCSV.foreach(file, :headers => true) do |row|
    begin
      obj = value.constantize.new(Array.to_hash(row.headers, row.fields))
      obj.save

      recs += 1
    rescue => e
      puts "Rows processed: " + recs.to_s
      puts e
    end
  end
  puts "Loaded #{recs} #{value.pluralize}"
end
Technicals: Rails 2.3.8, PostgreSQL, FasterCSV