SA

Loading data from multiple files in Ruby using Hash

Photo on Unsplash.

Lately, I have been working applications which require data from an external source to be loaded into the local database. One such application is Sydrea (in the making!), which requires the Drug information to be loaded. Drugs@FDA is freely downloadable compressed zip file which contains about 9 CSV files having Drug related data.

The usual path to data loading is through a rails runner script. I was tired of writing, testing different scripts to load data. I had to come up with one general script which, with minimal customizations, could load data into whatever table.

I created the required models and tables. Database schema for the tables to hold drug related information is available at the Drugs@FDA website. I started with defining a Hash mapping the file containing data, to the Object.


files = {
	'AppDoc.txt' => 'AppDoc',
	'AppDocType_Lookup.txt' => 'AppDocTypeLookup',
	'application.txt' => 'Application',
	'ChemTypeLookup.txt' => 'ChemicalTypeLookup',
	'DocType_lookup.txt' => 'DocTypeLookup',
	'product.txt' => 'Product',
	'Product_tecode.txt' => 'ProductTECode',
	'RegActionDate.txt' => 'RegActionDate',
	'ReviewClass_Lookup.txt' => 'ReviewClassLookup'
}

What I need to do now is iterate over this hash, read file whose name is given by the Key and create new objects of the type Value. To the AR's new method you can pass a hash with key names matching the associated table column names. The column names are given by the file headers. Now, I need a method which would take two arrays and return me a hash. Like given


arr1 = ['col_name1', 'col_name2']
arr2 = ['val1', 'val2']

would return


{
	'col_name1' => 'val1',
	'col_name2' => 'val2'
}

I came up with a method that does exactly that and I added it to the Array class:


class Array
	def self.to_hash(headers, values)
		hsh = Hash.new
		headers.each_with_index do |h, i|
			hsh[h.underscore] = values[i]
		end
	hsh
end

end

Then, we're all set to put everything together and load data:


require 'rubygems'
require 'fastercsv'

files.each do |key, value| file = "#{RAILS_ROOT}/db/drugsatfda/" + key recs = 0

puts "Working with #{value.pluralize}.."
FasterCSV.foreach(file, :headers => true) do |row|
	begin
		obj = value.constantize.new(Array.to_hash(row.headers, row.fields))
		obj.save

		recs += 1
	rescue => e
		puts "Rows processed: " + recs.to_s
		puts e
	end
end
puts "Loaded #{recs} #{value.pluralize}"

end

Technicals: Rails 2.3.8, PostgreSQL, FasterCSV