Lately, I have been working applications which require data from an external source to be loaded into the local database. One such application is Sydrea (in the making!), which requires the Drug information to be loaded. Drugs@FDA is freely downloadable compressed zip file which contains about 9 CSV files having Drug related data.
The usual path to data loading is through a rails runner script. I was tired of writing, testing different scripts to load data. I had to come up with one general script which, with minimal customizations, could load data into whatever table.
I created the required models and tables. Database schema for the tables to hold drug related information is available at the Drugs@FDA website. I started with defining a Hash mapping the file containing data, to the Object.
files = {
'AppDoc.txt' => 'AppDoc',
'AppDocType_Lookup.txt' => 'AppDocTypeLookup',
'application.txt' => 'Application',
'ChemTypeLookup.txt' => 'ChemicalTypeLookup',
'DocType_lookup.txt' => 'DocTypeLookup',
'product.txt' => 'Product',
'Product_tecode.txt' => 'ProductTECode',
'RegActionDate.txt' => 'RegActionDate',
'ReviewClass_Lookup.txt' => 'ReviewClassLookup'
}
What I need to do now is iterate over this hash, read file whose name is given by the Key
and create new objects of the type Value
. To the AR's new
method you can pass a hash with key names matching the associated table column names. The column names are given by the file headers. Now, I need a method which would take two arrays and return me a hash. Like given
arr1 = ['col_name1', 'col_name2']
arr2 = ['val1', 'val2']
would return
{
'col_name1' => 'val1',
'col_name2' => 'val2'
}
I came up with a method that does exactly that and I added it to the Array class:
class Array
def self.to_hash(headers, values)
hsh = Hash.new
headers.each_with_index do |h, i|
hsh[h.underscore] = values[i]
end
hsh
end
end
Then, we're all set to put everything together and load data:
require 'rubygems'
require 'fastercsv'
files.each do |key, value|
file = "#{RAILS_ROOT}/db/drugsatfda/" + key
recs = 0
puts "Working with #{value.pluralize}.."
FasterCSV.foreach(file, :headers => true) do |row|
begin
obj = value.constantize.new(Array.to_hash(row.headers, row.fields))
obj.save
recs += 1
rescue => e
puts "Rows processed: " + recs.to_s
puts e
end
end
puts "Loaded #{recs} #{value.pluralize}"
end