Rackspace Cloud Files + CDN + FileConveyor + Drupal 6 = Not as easy as it seems

Drupal 6 - CDN support

To get the most out of the CDN module in Drupal 6 you will need to apply a patch or use Pressflow. Pressflow is a customised Drupal core that provides various performance tweaks under the hood. A lot of these were included in Drupal 7.

There are a few things in Pressflow that may put people off or you may not want to use a fully customised core in which case you can apply the patch which is provided by the CDN module. The patch changes how URLs are created by various functions allowing the CDN module to overwrite them. This patch is included in Pressflow and Drupal 7.

To apply the patch, you need to copy the drupal6.patch file which is included in the CDN module to your document root of your Drupal install. Then run the following command:

patch -p0 < drupal6.patch

Rackspace Cloud - Not like the others

The most common type of CDN is an origin-pull CDN which pulls content from your server automatically.

When your page is loaded up there will be various images, CSS and javascript files that can be presented from the CDN. The CDN module will convert all these URLs to point to your CDN instead of your local Drupal install. If the file doesn't exist on the CDN the CDN will download the file from your Drupal install then provide the file to the users browser. Any future user trying to access the file will get it from the CDN without the CDN needing to do a second lookup.

Origin-pull CDNs are easy to set up in that they have pretty much no set up required.

Rackspace Cloud Files isn't an origin-pull CDN which is where complexities come in. Rackspace Cloud Files is a push CDN. That means that you have to send the file to Rackspace, get the resulting URL from Rackspace then provide this to the CDN module which will then update the URLs on the page. Push CDNs can require a fair bit of setup and banging your head on your desk.

Wim Leers, the creator of the CDN module has provided a tool called File Conveyor which automatically scans a selected directory and uploads files based on rules as and when they are created/updated. It then gets the CDN URL from Rackspace and stores this information in an sqlite database. If the file doesn't exist in the sqlite database Drupal will use the standard URL to your files directory.

File Conveyor

File Conveyor is a daemon written in Python to detect, process and sync files.

I had no end of problems getting File Conveyor set up on our Rackspace dedicated server running Red Hat. One of the main issues seems to be around the fact that Rackspace UK uses slightly different settings (i.e. auth URL).

But first things first is that you need to do is get Python 2.5+ installed. Red Hat comes with Python 2.4 set up. Unfortunately File Conveyor requires at least 2.5.

Installing an upgraded Python

To avoid any potential problems with overwriting Python 2.4 I chosen to do an alt install from the source so that two concurrent versions can run side-by-side with 2.4 being the default.

yum -y install gcc
wget http://python.org/ftp/python/2.6.5/Python-2.6.5.tgz
tar zxvf Python-2.6.5.tgz -C /usr/src && cd /usr/src/Python-2.6.5
./configure
make
make altinstall

If you run the python command you should be told that you are running Python 2.4.3. But if you run the command python26 (note the 26 on the end) you should be told that you are running Python 2.6.5.

GIT

You will need GIT to clone the File Conveyor files from Github.

I found that the most recent Wim Leers release (2012-11-20) didn't work for me, I was getting errors with no error message. I am sure this will probably work for people who aren't using the UK Rackspace Cloud Files service.

The one that I did find to work is Chris Ivens no-delete fork.

Change to the directory where you want to run File Conveyor and run the following command:

git clone https://github.com/chrisivens/fileconveyor.git -b no-delete

This will download all the files from Git Hub. You will then need to run the setup.py file to install all the necessary dependencies.

python26 setup.py install

This will install Django and all the dependencies that are required for the various transport mechanisms provided by File Conveyor.

Once it has finished, change to the fileconveyor directory and create a copy of the config.sample.xml file and call it config.xml.

The settings in this file should be pretty self-explanatory although the example doesn't have an example of the "cumulus" transport settings under servers. View full config on PasteBin.

You can set up different containers for different files, so any files in XXX directory are put in to container YYY whereas all other files are adding in to container ZZZ by creating different rules and servers. But if you do this you will need to make sure that your theme CSS files are put in to the same container as the images they point to otherwise you will end up with broken relative links. The alternative is to use full URLs to images in your CSS files.

I would recommend changing the location of the fileconveyor.pid in the settings.py file. I used a relative path of:

PID_FILE = './fileconveyor.pid'

Then all you need to do is run python26 arbitrator.py. It can take a while to scan all the files in your directory especially if your site is large and has a lot of uploads.

Once you have File Conveyor up and running properly you can run arbitrator.py in the background by running the following command:

nohup python26 arbitrator.py &

Configuration of CDN module

In your Drupal site, go to admin/settings/cdn/details. On this page, set the Mode to File Conveyor.

For the Mode-specific settings, set the paths for the PID file and the Synced files database. These should be the full paths including the filename. E.g.

/var/run/fileconveyor/fileconveyor/fileconveyor.pid

Save and go to the General tab and enable the Testing mode to make sure everything is working as expected. When on the CDN pages you will get a message letting you know the current status of File Conveyor.

Comments

Great article, I've been desperately trying to find a good solution for working with a shared file system in AWS/EC2 for a Drupal 6 site and this is extremely useful!