10 min read

Working with Files in Ruby

Syed Aslam

It all starts with the IO class. The IO class is the basis for all input and output in Ruby either by itself or via its descendant classes, particularly File. To a large extent, IO's API consists of wrappers around system library calls, with some enhancements and modifications.

A File is an abstraction of any file object accessible by the program. File includes the methods of module FileTest as class methods, allowing you to write (for example)

File.exist?("foo")
.

Opening a File

Before we can do anything with the file, we need to open it. This signals our intent to read from or write to the file, allowing Ruby to do its low-level stuff that makes that intention actually happen on the filesystem. Once its done those things, Ruby gives us a File object that we can use to read from it, write to it, inspect its permissions, derive file information and much more.

Ruby provides a way to open files in the form of

open
method that optionally takes a code block. With no associated block,
File.open
is synonymous to calling the constructor
new
. If you call File.open with a block, the block receives the File object as its single argument. When the block ends, the File object is automatically closed.

File.open("file.txt") do |file|
  ...
end

The open method takes modes and options as additional arguments along with the file path:

Mode Meaning
"r" Read-only, starts at beginning of file (default mode).
"r+" Read-write, starts at beginning of the file.
"w" Write-only, truncates the existing file to zero length or
creates a new file for writing.
"w+" Read-write, truncates existing file to zero length
or creates a new file for reading and writing.
"a" Write-only, starts at end of the file if file exists,
otherwise creates a new file for writing.
"a+" Read-write, starts at end of the file if file exists,
otherwise creates a new file for reading and writing.
"b" Binary file mode (may appear with any of the key letters
listed above). Suppresses EOL <-> CRLF conversion on
Windows. And sets the external encoding to ASCII-8BIT
unless explicitly specified.
"t" Text file mode (may appear with any of the key letters
listed above except "b").

Reading from a File

Reading from a file can be performed one byte at a time, a specific number of bytes at a time, one line at a time, or entire file at once. You can also change the position of the next read operation in the file by moving forward or backwards a certain number of bytes or by advancing the File object's internal pointer to a specific byte offset in the file.

All of these operations are given by the File class. So, first, you need to create a File object. The simplest way to do this is with

File.new
. Pass a filename to this constructor, and, assuming the file exists, you'll get back a filehandle opened for reading:

> file = File.new("README.md")
=> #<File:README.md>

Reading the whole file at once

The easiest way to access the contents of a file in Ruby is to read the entire file in one go. The read method returns a string containing the file's contents.

> file.read
=> "# Syed Aslam | [syedaslam.com](https://syedaslam.com)\n\n[![Netlify Status](https://api.netlify.com/api/v1/badges/09787514-8dbf-4d0b-b4b8-024f8f32c8ff/deploy-status)](https://app.netlify.com/sites/syedaslam/deploys)\n"

Also, if all you're doing if reading the file and have no further use of the File object once you've done so, Ruby offers a shortcut. File class defines a read class method which works the same way. Given the path of the file, it will open the file, read it, and close it returning the contents:

> contents = File.read("README.md")
=> "# Syed Aslam | [syedaslam.com](https://syedaslam.com)\n\n[![Netlify Status](https://api.netlify.com/api/v1/badges/09787514-8dbf-4d0b-b4b8-024f8f32c8ff/deploy-status)](https://app.netlify.com/sites/syedaslam/deploys)\n"

Although using read is tempting in many situations and appropriate in some, it can be inefficient when working with bigger files and doesn't help much if you need more granularity in your data-reading and processing tasks.

Line-based file reading

Lots of plain-text formats - log files, for instance - use the lines of a file as a way of structuring content where each line represents a distinct item or record. The easiest way to read the next line from a file is with gets.

> file.gets
=> "# Syed Aslam | [syedaslam.com](https://syedaslam.com)\n"

There is also the readline method that does much what gets does: it reads one line from the file. The difference between these two methods lies in how they behave when you try to read beyond the end of a file: gets returns nil, whereas readline raises a fatal error.

> file.gets
=> "[![Netlify Status](https://api.netlify.com/api/v1/badges/09787514-8dbf-4d0b-b4b8-024f8f32c8ff/deploy-status)](https://app.netlify.com/sites/syedaslam/deploys)\n"
> file.gets
=> nil
> file.readline
Traceback (most recent call last):
        ...
EOFError (end of file reached)

You can also get the entire file at once, much line read, but breaking the content up into individual lines in an array with readlines.

> file.rewind
=> 0
> file.readlines
=> ["# Syed Aslam | [syedaslam.com](https://syedaslam.com)\n", "\n", "[![Netlify Status](https://api.netlify.com/api/v1/badges/09787514-8dbf-4d0b-b4b8-024f8f32c8ff/deploy-status)](https://app.netlify.com/sites/syedaslam/deploys)\n"]

Treating files as streams

Reading the entire contents of a file in one go isn't always the best solution. Keeping all the contents in memory might merely be wasteful with smaller files, but it can turn out to be plain impossible with larger files. Imagine wanting to process a 500GB file on a computer with 4GB of memory. It would be impossible for us to read the file at once.

The solution is to treat the file as a stream. Instead of reading from the beginning to the end in one go, you can read only a small amount at a time. Read the first line, process it and move onto the next line and so on until you reach the end of the file. Or you could read the file character by character or word by word. The thing is at no point you'll have the full file in memory.

By varying exactly how much you read, you can also step through the file in a way that reflects its structure. If you know the file has many lines, each of which represents a record, then you can read one line at a time. And, if you know that the file is one enormous line, but the fields are separated by commas, you can read up to the next comma each time, processing text one field at a time.

File enumerability

You probably have heard of

Enumerable
module. Enumerable module of Ruby defines methods like map, find_all, count, reduce etc. The purpose of Enumerable is to make it easy to search within, add to, delete from, iterate over, and otherwise manipulate collections.

File objects are essentially enumerables. As enumerables, File objects can perform many of the same functions that arrays, hashes, and other collections do. Understanding how file enumeration works require a slightly different mental model: whereas an array exists already and walks through its elements in the course of the iteration, File objects have to manage line-by-line reading behind the scenes when you iterate through them. But the similarity of the idioms— the common use of the methods from Enumerable — means you don't have to think in much detail about the file-reading process when you iterate through a file.

The each method of File objects (as known by the synonym each_line) is for this purpose. This enables us to work with enormous files -gigabytes in size, if necessary- without consuming much memory.

File.open("commas.txt") do |file|
  file.each(",") do |record|
    puts record
  end
end

Similarly, the

getc
method reads and returns one character from the file at a time:

> file.getc
=> "#"

You can also un-get a character-that is, put a specific character back onto the file-input stream so it's the first character read on the next read:

> file.getc
=> "#"
> file.ungetc("|")
=> nil
> file.gets
=> "| Syed Aslam"

Every character is represented by one or more bytes. How bytes map to characters depends on the encoding. Whatever the encoding, you can move byte-wise as well as character-wise through a file, using

getbyte
. Depending on the encoding, the number of bytes and number of characters in your file may or may not be equal, and getc and getbyte, at a given position in the file, may or may not return the same thing.

> file.rewind
> file.read(15)
=> "# Syed Aslam | "

Just as readline differs from gets in that readline raises a fatal error if you use it at the end of the file, the methods

readchar
and
readbyte
differ from getc and getbyte, respectively, in the same way.

Seeking through a file

Until now we've advanced through the file as a stream, starting at the beginning and moving through the file. But just as we use read to advance through and consume a portion of the file, we can also move to a specific location without consuming anything.

The File object has a sense of where in the file it has left off reading. You can both read and change this internal pointer using the File object's

pos
(position) attribute and/or the
seek
method.

With pos, you can tell where in the file the pointer is currently pointing:

> file.rewind
=> 0
> file.pos
=> 0
> file.gets
=> "# Syed Aslam | [syedaslam.com](https://syedaslam.com)\n"
> file.pos
=> 54

Here, the position is 0 after a rewind and 54 after reading one line. You can assign to the position value, which moves the pointer to a specific location in the file:

> file.pos = 13
=> 13
> file.gets
=> "| [syedaslam.com](https://syedaslam.com)\n"

The string returned is what the File object considers a "line" as of byte 13: everything from that position onward until the next occurrence of newline or of

$/
.

The Seek method lets you move around in a file by moving the position pointer to a new location. The location can be specific offset into the file, or it can be relative to either the current pointer position or the end of the file. You can specify what you want using special constants from the IO class:

file.seek(20, IO::SEEK_SET)
file.seek(15, IO::SEEK_CUR)
file.seek(-10, IO::SEEK_END)

Writing to files

Writing to a file involves using

puts
,
print
or
write
on a File object that's opened in write or append mode. Write mode is indicated by
w
as the second argument to new. See the table above for a list of all modes a file can be opened in.

> f = File.new("data.out", "w")
=> #<File:data.out>
> f.puts "Syed Aslam"
> nil
> f.close
=> nil
> puts File.read("data.out")
Syed Aslam
=> nil
> f = File.new("data.out", "a")
=> #<File:data.out>
> f.puts "Ruby, Rails and JavaScript developer."
=> nil
> f.close
=> nil
> puts File.read("data.out")
Syed Aslam
Ruby, Rails and JavaScript developer.

Querying File objects

File class, FileTest module along with File::Stat class offer numerous query methods that can give lots of information about a file.

File::Stat encapsulates common status information for File objects. Many of these methods return platform-specific values, and not all values are meaningful on all systems. FileTest module also implements file test operations similar to those used in File::Stat. It exists as a standalone module, and its methods are also included in the File class. You can either call these methods on File class or FileTest module.

> File.file?("/Users/syed/Private/syedaslam.com/README.md")
=> true
> File.executable?("/Users/syed/Private/syedaslam.com/README.md")
=> false
> FileTest.directory?("/Users/syed/Work")
=> true
> FileTest.size?("/Users/syed/Private/syedaslam.com/README.md")
=> 214

All of these methods return either true or false except size, which returns an integer.

Wrapping Up

That's about it for basics of working with files in Ruby. There are a great deal more facilities available in Ruby to work with Files and in general with IO streams. Ruby's file-handling facilities provided via FileUtils, pathname and StringIO packages are so powerful that these will be indispensable if you do any kind of file-intensive Ruby programming.

File docs

File::Stat docs

FileTest docs

Finally, go deeper and understand how it works under the hood in Under the Hood: "Slurping" and Streaming Files in Ruby.