Squirm - A redirector for Squid

Squirm is a fast & configurable redirector for the Squid Internet Object Cache. It requires the GNU Regex Library (now included in the Squirm source), and of course, a working Squid. It is available free under the terms of the GNU GPL.

  • Features
  • Download Squirm
  • Installing Squirm
  • Configuring Squirm
  • squirm.patterns Examples
  • Testing Squirm Interactively
  • Squirm Log Files
  • Reconfiguring Squirm
  • Credits & Copyright

  • Note: This web page documents version 1.0-BetaB. There is, in fact, a newer version: squirm-1.26, but I haven't documented it yet. This new version has some new features:
  • new squirm.conf file allows different redirection lists based upon different sets of source network addresses in CIDR notation.
  • Automagic accelerator string generation.
  • It compiles with a simple 'make'
  • A few other minor features.
  • Use your systems regex library

  • Features

    Squirm has the following features:

    I started writing it because the existing redirector scripts used too much memory and all were too slow for Squids that receive a lot of requests.

    On my Pentium Pro 200 running Linux, it manages to do 16,440 lines per second (that's 59 million lines per hour!) using my squirm.local and squirm.patterns config files.

    It can handle nifty things like file mirrors with the regex pattern replacement strings, and do site blocking - useful for schools. It could also do such things as banner add rewriting, and just about anything else :-)

    Download Squirm

    The latest version is squirm-1.0betaB which you can download as a normal tar file or a gzipped tar file.

    The most recent version is always available from this page at http://squirm.foote.com.au/

    Installing Squirm

    1. Untar the Squirm tar file

    2. Compile the GNU Regex library by doing:
      cd regex
      make clean
      cp -p regex.o regex.h ..
      This step is a bit ugly - I welcome anyone who has experience with the configure script to incorporate this directly into Squirm - Anyone ?

    3. Search for cache_effective_user in your Squid configuration file (usually /usr/local/squid/etc/squid.conf and take note of the Squid user and group id that squid runs under. (Squirm won't work if Squid executes as root!)

    4. Edit the Makefile and find the install: section. You will need to change the installation user and group id to the ones Squid executes as. (The default is user squid, group squid). If you don't want to install Squirm in /usr/local, you'll need to change the directory paths as well.

    5. If you changed the directory path for Squirm in the Makefile above, then you will need to edit the file paths.h to reflect the new path for log files.

    6. type make

    7. su to root and type make install

    8. Try running squirm to make sure the installation worked:
      orbit:/usr/local/src/squirm-1.0betaB# whoami
      orbit:/usr/local/src/squirm-1.0betaB# /usr/local/squirm/bin/squirm
      Squirm running as UID 0: writing logs to stderr
      Wed Mar 11 13:20:37 1998:unable to open local addresses file [/usr/local/squirm/etc/squirm.local]
      Wed Mar 11 13:20:37 1998:unable to open redirect patterns file
      Wed Mar 11 13:20:37 1998:Invalid condition - continuing in DODO mode
      Wed Mar 11 13:20:37 1998:Squirm (PID 29760) started
      [Crtl + C]
      (Yep, it did work, the error above indicates that the config files don't yet exist :-)

    9. Once you have Squirm up and running, to get Squid to pass requests through Squirm, you need to add a couple of lines to your squid.conf file:
      redirect_program /usr/local/squirm/bin/squirm
      redirect_children 10
      the number of children is dependant on the load on your squid box. Try 10 and use the cachemgr.cgi CGI to see if all redirector processes get used, and if they do, you can raise this number.

    Configuring Squirm

    By default, the two config files are located as /usr/local/squirm/etc/squirm.local and /usr/local/squirm/etc/squirm.patterns You need to create these two files from scratch with the aid of the following instructions:

    Local Addresses

    You need to place abbreviations for class C networks in the squirm.local file for your clients. Here's an example:

    These are used to determine if Squirm should rewrite a URL. You wouldn't normally want any Squid neighbours to be able to use your redirector as the extra load of ICP requests would bog down your machine, so don't include them in the file.

    For the above config file, requests to the Squid from would be accepted, whilst requests from would be ignored.

    There is currently no plan to implement CIDR notation because Squirm uses simple integer comparisons to make lookups really quick.

    Squirm Patterns


    The syntax of lines in the squirm.patterns file are of the form:

    	regex|regexi pattern replacement [[^]accelerator_string[$]]
    	abort .filename_extension


    Full regex matching and replacement is made available by the use of the GNU Regex libary. It also supports pattern buffers.

    Let's say you want to redirect requests to a local URL for a common file, where it's matched case sensitively:

    regex  ^.*/n32e301\.exe$ http://www.mydomain/path_to/n32e301.exe
    this means: replace URLs ending in /n32e301.exe with the URL of your local copy.

    To do the same as above except case insensitively, you would use regexi instead of regex at the start of the line.

    Accelerator Strings

    The accelerator string is used to avoid regex comparisons of URLs unless they are close to the pattern expected. Squirm first compares a URL against the accelerator string before it bothers do do a proper regex comparson, and saves many CPU cycles on a busy machine. Note: you should always use accelerator strings if possible on a busy box!

    For the above example, a speedup is acheived through the use of the accelerator string n32e301.exe$, so the line would look like:

    regex  ^.*/n32e301\.exe$ http://www.mydomain/path_to/n32e301.exe n32e301.exe$

    The accelerator string can have a leading caret '^' OR a trailing dollar '$' to indicate that the rough match should search at the start of end of the URL respectively.

    Abort Extensions

    The reason behind use of the abort extension is a massive speedup by aborting pattern searches for URLs that end in a certain filename extension. (Why traverse the entire patterns list and do comparisons when they won't be matched anyway ?)

    Let's say we don't need to traverse the list for files ending in .gif. The line needed is:

    abort .gif

    squirm.patterns Examples

    An example for an ISP

    regexi ^http://tucows\.[^/]*/(.*$) http://tucows.mymirror.com/\1 ^http://tucows.
    abort .gif
    abort .html
    abort .jpg
    abort .htm
    regex .*/c16e401\.jar$ http://redirector1.senet.com.au/c16e401.jar c16e401.jar$
    regexi .*/c32e401\.jar$ http://redirector1.senet.com.au/c32e401.jar c32e401.jar$
    regex .*/cb16e401\.exe$ http://redirector1.senet.com.au/cb16e401.exe cb16e401.exe$
    regex .*/cb32e401\.exe$ http://redirector1.senet.com.au/cb32e401.exe cb32e401.exe$
    regex .*/cc16e401\.exe$ http://redirector1.senet.com.au/cc16e401.exe cc16e401.exe$
    regex .*/cc32e401\.exe$ http://redirector1.senet.com.au/cc32e401.exe cc32e401.exe$

    The first line contains an accelerator string ^http://tucows. so Squirm has to do the regex comparison only if the URL matches it. Because this is the first line in the squirm.patterns file, much time is saved by not having to do a regex comparison for every single URL. (Accelerator strings are not compulsary on a config line, but the speed improvement is quite large.)

    The first regex comparison uses a case insensitive pattern which matches HTTP for any hostname beginning with tucows. It stores the path information in a pattern buffer which is later replayed in the URL replacement by using \1 (up to 10 replays possible)

    The abort extensions are used so that comparisons for none of the following lines continues unless they don't match filenames listed in the abort lines. It is wise to include the most frequent filename extension of requests in cases where the abort extension can be used, but not filename extensions that occur infrequently. .gif, .jpg, .html, .htm are good candidates for the abort extension.

    Examples for Schools

    You may wish to have a way of blocking access to sites which contain material unsuitable for viewing by children and return them a web page which let's them know they have requested a site which is blocked.

    Simple Block List

    regexi ^http://www\.playboy\.com/.* http://www/notallowed.html
    regexi ^http://www\.xxx\.com/.* http://www/notallowed.html

    This will return the URL http://www/notallowed.html to anyone requesting URLs starting with http://www.playboy.com or http://www.xxx.com

    For long lists for sites to block the use of accelerator strings may help, in which case the above example would be:

    regexi ^http://www\.playboy\.com/ http://www/notallowed.html ^http://www.playboy.com
    regexi ^http://www\.xxx\.com/ http://www/notallowed.html ^http://www.xxx.com

    Block List with URL notification

    If you would like to include the blocked URL requested in the resulting page (something like "The URL http://www.playboy.com/file.jpg has been blocked", you could create a CGI which takes the URL as an argument, and add the request to the pattern replacement.

    regexi ^(http://www\.playboy\.com/.*) http://www/cgi-bin/na?url=\1
    This might be a good choice for a list of hostnames you may already have to add to the list, for example:
    cat list-of-banned-sites \
    	| sed -e "s/\./\\\./g" \
    	| awk '{ print "regexi ^(http://" $1 "/.*) http://www/cgi-bin/na?url=\1" }' \
    	>> /usr/local/squirm/etc/squirm.patterns	

    Again, adding accelerator strings to long lists may help with speed.

    Testing Squirm Interactively

    When Squirm is run as root, it goes into interactive mode which echoes all information that would normally be logged to standard error output. This gives the opportunity to test a configuration file modification before restarting the current squirm processes on the machine.

    Optionally, you can supply the path of a squirm patterns config file, if it's not in the default location, for the first argument.

    Squid sends requests to the standard input of a redirector process with the form:

    	URL   src_address/hostname   ident   method
    The ident field is usually a dash '-'. The hostname is normally a dash too, since squid is normally configured not to look up hostnames for proxy requests. For Squirm to do any redirection, the method is GET and the src_address must match an address from the squirm.local file.

    The following text is an example of running squirm interactively, with the input lines to test marked in bold:
    frog:~\:# whoami
    Squirm running as UID 0: writing logs to stderr
    Tue Mar 10 22:00:34 1998:Loading IP List
    Tue Mar 10 22:00:34 1998:Reading Patterns from config /usr/local/squirm/etc/squirm.patterns
    Tue Mar 10 22:00:34 1998:Squirm (PID 16955) started
    http://tucows.com/downloads/win95/n32e301p.exe - GET
    http://tucows.senet.com.au/downloads/win95/n32e301p.exe - GET
    Tue Mar 10 22:00:57 1998:http://tucows.com/downloads/win95/n32e301p.exe:http://tucows.senet.com.au/downloads/win95/n32e301p.exe
    http://www.somewhere.com/path/file - GET
    http://www.somewhere.com/path/file - GET
    [Ctrl + D]
    Alternatively you can provide input from a file by using the syntax:

    /usr/local/squirm/bin/squirm < filename

    Squirm Log Files

    There are several log files in /usr/local/squirm/logs which are normally only viewable by the squid user id and root:

    Contains verbose info if DEBUG is defined when compiling
    Contains messages for invalid config or other alert conditions
    Lists instances where length of URL was too short (< 4 chars)
    Squirm restarts and reconfigurations
    Shows URLs which were successfully replaced by a pattern replacement
    The /usr/local/squid/logs directory *must* be writeable by the user id that Squid executes as. This was set up for you when make install was executed.

    Reconfiguring Squirm

    When you have modified either squirm.local or squirm.patterns all of the running squirm processes need to be restarted by a HUP signal.

    (Restarting Squid will do this (by sending squid a HUP signal), but this usually isn't convenient because it makes squid become unvailable for a period of time.)

    Under Linux, you can do this by typing:

    	killall -HUP squirm

    On other systems you may have to write a small script:

    for PID in `ps -aux | grep redirector | grep -v grep | awk '{ print $2 }'`
    	kill -HUP $PID

    Credits & Copyright

    Maintained by Chris Foote, chris@foote.com.au
    Copyright (C) 1998 Chris Foote & Wayne Piekarski
    If you find it useful, I'd like to know - please send email
    to chris@foote.com.au - Ta!
    Includes the GNU Regex library written by many authors - see
    regex/AUTHORS for details.
        This program is free software; you can redistribute it and/or modify
        it under the terms of the GNU General Public License as published by
        the Free Software Foundation; either version 2 of the License, or
        (at your option) any later version.
        This program is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        GNU General Public License for more details.
        You should have received a copy of the GNU General Public License
        along with this program; if not, write to the Free Software
        Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
    Please see the file GPL in the source directory for full copyright

    File Last Modified: Aug 21 2005

    This site is sponsored by Inetd and HostExpress, written by Chris Foote.

    Powered by HostExpress

    . .