I really wished somebody had suggested I use a hashed file paths when I was planning out my application. Let me be that guy.
Anybody building an application that allows file uploads should seriously consider use hashed file paths. Why? Everybody dreams of their app becoming the next big thing, but what happens when that dream turns into a reality? It may not happen overnight, but there is a limit to the number of files you can fit into one directory.
A few weeks ago I woke up to an inbox full of emails from my Rails app titled, “Errno::EMLINK (Too many links)”. After some quick research I learned most Linux distributions use the ext3 file system which has an upper limit of 32,000 links per inode. At the time of my little meltdown I had been storing user uploads in one directory. Apparently I’ve reached the limit’
Further research revealed that the most painless solution to this problem was to use hashed file paths. Instead of saving files to “files/1/resized/image.jpg” files would be stored in “files/cdd/01d/2cf/resized/image.jpg”.
I ran the idea by the guys at my data center and it they agreed that it was the correct course of action. Wanting to further understand how a hashed file path would be different than my current method, I did some more digging. Here’s how Nick, one of the data center guys explained it:
“If I understand the way an MD5 hash operates, it will use the first few hex numerals to make the first folder, then the next few for another, and do a couple of nested layers. The idea being that while each md5 sum itself is unique, there are far fewer than 32,000 combinations for each pair, triplet, etc of hex digits, so you don’t have to worry about hitting the directory limitation. For example, if you have 10 files that all start with “1c”, let’s say, they would go into the same directory, and their nested directories would be based off of the next characters in the md5. This way, what is 10 folders in a pure :id system have just become one.
In order to hit that limit again, you’d need to have 32,000 uploads that all share the same X number of characters in their md5 sum (where x is however many are being used to generate the hash), and the chances of that happening are extremely small.”
Anyways, on to the code! As I codeviously stated, I have been using Paperclip in my Rails application. Using Paperclip interpolations we can easily create a hash path and make it available to our models. First, we need to create a file named “paperclip-md5-file-paths.rb” in config/initializers/.
If you’re using an older version of Paperclip, try something like this:
Paperclip.interpolates :hash_path do |attachment, style|
# set the FINAL_POST_ID_BEFORE_MD5_HASH to be the ID of the existing latest attachment at the time of transition.
return "content/#{attachment.instance.id}" if attachment.instance.id < FINAL_ID
hash = Digest::MD5.hexdigest(attachment.instance.id.to_s + 'secret')
hash_path = ''
3.times { hash_path += '/' + hash.slice!(0..2) }
hash_path[1..12] << ''
end
If you’re using a newer version of Paperclip, this may be more useufl:
Paperclip::Attachment.interpolations[:hash_path] = lambda do |attachment, style|
return "content/#{attachment.instance.id}" if attachment.instance.id < FINAL_ID
hash = Digest::MD5.hexdigest(attachment.instance.id.to_s + 'secret')
hash_path = ''
3.times { hash_path += '/' + hash.slice!(0..2) }
hash_path[1..12] << ''
end
The first line of code determines if the current PID was before, or after our migration to the hashed file paths. This was necessary in order to support the codevious 32,000 files; you may or may not need something like this. The next block of code takes the MD5 hash of our current PID with an optinal secret word, then converts the result to a string. Finally, we iterate over the hash and slices it into three pieces. The result is our “cdd/01d/2cf” path.
Now we have a :hashed_path available to us…let’s do something with it! In your model, you can now access this new path like so:
has_attached_file :image,
:url => "/content/:hash_path/:style/:basename.:extension",
:path => ":rails_root/public/content/:hash_path/:style/:basename.:extension",
Why this isn’t a default configuration in Paperclip is beyond me. Enjoy!