Home arrow Ruby-on-Rails arrow Page 4 - Databases and Ruby on Rails
RUBY-ON-RAILS

Databases and Ruby on Rails


Ruby-on-Rails supports more database management systems than just MySQL these days. This five-part series will take a look at the other systems and tell you what you need to know to choose which one to use (and help you manage whichever one you choose). It is excerpted from chapter four of the book Advanced Rails, written by Brad Ediger (O'Reilly; ISBN: 0596510322).Copyright © 2008 O'Reilly Media, Inc. All rights reserved. Used with permission from the publisher. Available from booksellers or direct from O'Reilly Media.

Author Info:
By: O'Reilly Media
Rating: 5 stars5 stars5 stars5 stars5 stars / 1
January 29, 2010
TABLE OF CONTENTS:
  1. · Databases and Ruby on Rails
  2. · MySQL
  3. · SQLite
  4. · Large/Binary Objects

print this article
SEARCH DEVARTICLES

TOOLS YOU CAN USE

advertisement
Databases and Ruby on Rails - Large/Binary Objects
(Page 4 of 4 )

Sooner or later, many web applications must deal with the issue of LOB (large object) data. LOB data may be small, but it is usually large compared to other attributes being stored (tens of kilobytes to hundreds of gigabytes or larger). The defining characteristic of LOB data, however, is that the application has no knowledge of the semantics of the internal structure of the data.

The canonical example is image data; a web application usually has no need to know the data in a JPEG file representing a user's avatar as long as it can send it to the client, replace it, and delete it when needed.

LOB storage is usually divided into CLOB (character large object) for text data and BLOB (binary large object) for everything else. Some DBMSs separate the two as separate data types. CLOB types can often be indexed, collated, and searched; BLOBs cannot.

Database Storage

The DBA types among us might prefer database storage of large objects. From a theoretical standpoint, storing binary data in the database is the most clean and straightforward solution. It offers some immediate advantages:

  1. All of your application data is in the same place: the database. There is only one interface to the data, and one program is responsible for managing the data in all its forms.
  2. You have greater flexibility with access control, which really helps when working with large-scale projects. DBMS permitting, different permissions may be assigned to different tables within the same database. 
     
  3. The binary data is not tied to a physical file path; when using filesystem storage, you must update the file paths in the referring database if you move the storage location.

There are many practical considerations, though, depending on your DBMS's implementation of large objects.

PostgreSQL

PostgreSQL has some downright weird support for binary data. There are two ways to store binary data in a PostgreSQL database: the BYTEA data type and large objects.

The BYTEA* type is the closest thing PostgreSQL has to a BLOB type--just a sequence of bytes--but it is really terrible for large amounts of binary data. The protocol for shuttling BYTEA types back and forth from the database requires escaping all non-printable bytes, so a single null byte would be encoded as the ASCII string \000 (4 bytes). Needless to say, this causes unnecessary expansion of the data. In addition, it is impossible to stream data from the database to the web browser without running it through an unescape filter. Pullinga 2 MB binary file from the database usually means streaming somewhere around 6 MB of data through the unescape code. The nave method runs all of the data through Ruby strings, where it balloons tremendously in memory. A better option would be to have the postgres C library handle quoting and unquoting, but this is a lot of work and still suboptimal. Up to 1 GB of data can be stored in a BYTEA column.

The other option is large objects. The large object features in PostgreSQL work well enough, but they are also a little bit clunky. Files are kept in the pg_largeobject system catalog in small pages.A pointer is kept in the referring table to the OID (object ID) of the file. Up to 2 GB of data may be stored in a large object. This method is fast, and has good APIs, but there are drawbacks. There is no per-table or per-object access control; the pg_largeobject catalog is global to the database, and accessible by anyone with permission to connect to the database. The large object mechanism is also slightly deprecated in favor of in-table storage, as the TOAST storage technique allows values of up to 1 GB in length to be stored directly as attributes within the table.

My recommendation is to use filesystem storage for all binary objects if you use PostgreSQL. Although the database might be the more proper place for this type of data, it just does not work well enough yet. If you have to use the database, large objects actually perform pretty well. Avoid BYTEA at all costs.

MySQL

MySQL does a fairly good job with binary data. LOB-type columns (including the TEXT types) can store up to 4 GB of data, using the LONGBLOB type. Actual storage and performance depend on the wire protocol being used, buffer size, and available memory. Storage is efficient, using up to 4 bytes to store the data length, followed by the binary data itself. However, MySQL suffers from issues similar to PostgreSQL with streaming data, and it is always more awkward for a web application to stream data from the database than from the filesystem.

Oracle

Oracle supports the BLOB data type, for objects up to 4 GB. It is supported by a fairly mature API, and can be used directly from Rails.

Oracle also provides the BFILE type, which is a pointer to a binary file on disk. Consider it a formalization of the filesystem storage method discussed below. This may prove to be of value in some situations.

Please check back next week for the continuation of this article.


DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

blog comments powered by Disqus
RUBY-ON-RAILS ARTICLES

- Adding Style with Action Pack
- Handling HTML in Templates with Action Pack
- Filters, Controllers and Helpers in Action P...
- Action Pack and Controller Filters
- Action Pack Categories and Events
- Logging Out, Events and Templates with Actio...
- Action Pack Sessions and Architecture
- More on Action Pack Partial Templates
- Action Pack Partial Templates
- Displaying Error Messages with the Action Pa...
- Action Pack Request Parameters
- Creating an Action Pack Registration Form
- Ruby on Rails Templates and Layouts
- Action Pack Controller Creation
- Writing an Action Pack Controller

Dev Articles Forums 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Contact Us 
Site Map 
Privacy Policy 
Support 



© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 9 - Follow our Sitemap
Popular Web Development Topics
All Web Development Tutorials