Managing Database Files with Ruby on Rails (Page 1 of 4 )
Filesystem Storage
The reality is that filesystem storage is the best option, as a general rule. Filesystems are optimized to handle large amounts of binary and/or character data, and they are fast at it. The Linux kernel has syscalls such as sendfile() that work on physical files. There are hundreds of third-party utilities that you can only leverage when using physical files:
Image processing is arguably the most popular application for storing binary data. Programs like ImageMagick are much easier to use in their command-line form, operating on files, rather than getting often-problematic libraries like RMagick to work with Ruby.
Physical files can be shared with NFS or AFS, put on a MogileFS host, or otherwise clustered. Achieving high availability or load balancing with database large objects can be tricky.
Any other utility that works on files will have to be integrated or otherwise modified to work from a database.
Why Is Filesystem Storage So Fast?
The short answer is that web servers are optimized for throwing binary files down a TCP socket. And the most common thing you do with binary files is throw them down a TCP socket.
Long answer: the secret to this performance, under Linux and various BSDs, is the kernel sendfile() syscall (not to be confused with X-Sendfile, discussed later). The sendfile() function copies data quickly from a file descriptor (which represents an open file) to a socket (which is connected to the client). This happens in kernel mode, not user mode--the entire process is handled by the operating system. The web server doesn't even have to think about it. When sendfile() is invoked, the process looks a bit like Figure 4-1.
On the other hand, Rails is necessarily involved with the whole process when reading data from the database. The file must be passed, chunk by chunk, from the database to Rails, which creates a response and sends the whole thing (including the file) to the web server. The web server then sends the response to the client. Using sendfile() would be impossible here because the data does not exist as a file. The data must be buffered in memory, and the whole operation runs in user mode. The entire file is processed several times by user-mode code, which is a much more complicated process, as shown in Figure4-2.