Squirm - A redirector for Squid

Squirm is a fast & configurable redirector for the Squid Internet Object Cache. It requires the GNU Regex Library (now included in the Squirm source), and of course, a working Squid. It is available free under the terms of the GNU GPL.

Features

Download Squirm

Installing Squirm

Configuring Squirm

Local Addresses
Squirm Patterns

squirm.patterns Examples

Example for an ISP
Examples for Schools

Testing Squirm Interactively

Squirm Log Files

Reconfiguring Squirm

Credits & Copyright

Note: This web page documents version 1.0-BetaB. There is, in fact, a newer version: squirm-1.26, but I haven't documented it yet. This new version has some new features:

new squirm.conf file allows different redirection lists based upon different sets of source network addresses in CIDR notation.

Automagic accelerator string generation.

It compiles with a simple 'make'

A few other minor features.

Use your systems regex library

Features

Squirm has the following features:

Very, very fast
Virtually no memory usage
It can re-read it's config files while running by sending it a HUP signal
Interactive test mode for checking new configs
Full regular expression matching and replacement
Config files for patterns and IP addresses.
If you mess up the config file, Squirm runs in Dodo Mode so your squid keeps working :-)

I started writing it because the existing redirector scripts used too much memory and all were too slow for Squids that receive a lot of requests.

On my Pentium Pro 200 running Linux, it manages to do 16,440 lines per second (that's 59 million lines per hour!) using my squirm.local and squirm.patterns config files.

It can handle nifty things like file mirrors with the regex pattern replacement strings, and do site blocking - useful for schools. It could also do such things as banner add rewriting, and just about anything else :-)

Download Squirm

The latest version is squirm-1.0betaB which you can download as a normal tar file or a gzipped tar file.

The most recent version is always available from this page at http://squirm.foote.com.au/

Installing Squirm

Untar the Squirm tar file
Compile the GNU Regex library by doing:
```
cd regex
./configure
make clean
make
cp -p regex.o regex.h ..
```
This step is a bit ugly - I welcome anyone who has experience with the configure script to incorporate this directly into Squirm - Anyone ?
Search for cache_effective_user in your Squid configuration file (usually /usr/local/squid/etc/squid.conf and take note of the Squid user and group id that squid runs under. (Squirm won't work if Squid executes as root!)
Edit the Makefile and find the install: section. You will need to change the installation user and group id to the ones Squid executes as. (The default is user squid, group squid). If you don't want to install Squirm in /usr/local, you'll need to change the directory paths as well.
If you changed the directory path for Squirm in the Makefile above, then you will need to edit the file paths.h to reflect the new path for log files.
type make
su to root and type make install

Try running squirm to make sure the installation worked:

orbit:/usr/local/src/squirm-1.0betaB# whoami
root
orbit:/usr/local/src/squirm-1.0betaB# /usr/local/squirm/bin/squirm
Squirm running as UID 0: writing logs to stderr
Wed Mar 11 13:20:37 1998:unable to open local addresses file [/usr/local/squirm/etc/squirm.local]
Wed Mar 11 13:20:37 1998:unable to open redirect patterns file
Wed Mar 11 13:20:37 1998:Invalid condition - continuing in DODO mode
Wed Mar 11 13:20:37 1998:Squirm (PID 29760) started
[Crtl + C]

(Yep, it did work, the error above indicates that the config files don't yet exist :-)

Once you have Squirm up and running, to get Squid to pass requests through Squirm, you need to add a couple of lines to your squid.conf file:
```
redirect_program /usr/local/squirm/bin/squirm
redirect_children 10
```
the number of children is dependant on the load on your squid box. Try 10 and use the cachemgr.cgi CGI to see if all redirector processes get used, and if they do, you can raise this number.

Configuring Squirm

By default, the two config files are located as /usr/local/squirm/etc/squirm.local and /usr/local/squirm/etc/squirm.patterns You need to create these two files from scratch with the aid of the following instructions:

Local Addresses

You need to place abbreviations for class C networks in the squirm.local file for your clients. Here's an example:

127.0.0
10.2.3
192.168.1

These are used to determine if Squirm should rewrite a URL. You wouldn't normally want any Squid neighbours to be able to use your redirector as the extra load of ICP requests would bog down your machine, so don't include them in the file.

For the above config file, requests to the Squid from 10.2.3.4 would be accepted, whilst requests from 1.2.3.4 would be ignored.

There is currently no plan to implement CIDR notation because Squirm uses simple integer comparisons to make lookups really quick.

Squirm Patterns

Syntax

The syntax of lines in the squirm.patterns file are of the form:

	regex|regexi pattern replacement [[^]accelerator_string[$]]

	abort .filename_extension

`regex|regexi`

Full regex matching and replacement is made available by the use of the GNU Regex libary. It also supports pattern buffers.

Let's say you want to redirect requests to a local URL for a common file, where it's matched case sensitively:

regex  ^.*/n32e301\.exe$ http://www.mydomain/path_to/n32e301.exe

this means: replace URLs ending in /n32e301.exe with the URL of your local copy.

To do the same as above except case insensitively, you would use regexi instead of regex at the start of the line.

Accelerator Strings

The accelerator string is used to avoid regex comparisons of URLs unless they are close to the pattern expected. Squirm first compares a URL against the accelerator string before it bothers do do a proper regex comparson, and saves many CPU cycles on a busy machine. Note: you should always use accelerator strings if possible on a busy box!

For the above example, a speedup is acheived through the use of the accelerator string n32e301.exe$, so the line would look like:

regex  ^.*/n32e301\.exe$ http://www.mydomain/path_to/n32e301.exe n32e301.exe$

The accelerator string can have a leading caret '^' OR a trailing dollar '$' to indicate that the rough match should search at the start of end of the URL respectively.

Abort Extensions

The reason behind use of the abort extension is a massive speedup by aborting pattern searches for URLs that end in a certain filename extension. (Why traverse the entire patterns list and do comparisons when they won't be matched anyway ?)

Let's say we don't need to traverse the list for files ending in .gif. The line needed is:

abort .gif

`squirm.patterns` Examples

An example for an ISP

regexi ^http://tucows\.[^/]*/(.*$) http://tucows.mymirror.com/\1 ^http://tucows.

abort .gif
abort .html
abort .jpg
abort .htm

regex .*/c16e401\.jar$ http://redirector1.senet.com.au/c16e401.jar c16e401.jar$
regexi .*/c32e401\.jar$ http://redirector1.senet.com.au/c32e401.jar c32e401.jar$
regex .*/cb16e401\.exe$ http://redirector1.senet.com.au/cb16e401.exe cb16e401.exe$
regex .*/cb32e401\.exe$ http://redirector1.senet.com.au/cb32e401.exe cb32e401.exe$
regex .*/cc16e401\.exe$ http://redirector1.senet.com.au/cc16e401.exe cc16e401.exe$
regex .*/cc32e401\.exe$ http://redirector1.senet.com.au/cc32e401.exe cc32e401.exe$

The first line contains an accelerator string ^http://tucows. so Squirm has to do the regex comparison only if the URL matches it. Because this is the first line in the squirm.patterns file, much time is saved by not having to do a regex comparison for every single URL. (Accelerator strings are not compulsary on a config line, but the speed improvement is quite large.)

The first regex comparison uses a case insensitive pattern which matches HTTP for any hostname beginning with tucows. It stores the path information in a pattern buffer which is later replayed in the URL replacement by using \1 (up to 10 replays possible)

The abort extensions are used so that comparisons for none of the following lines continues unless they don't match filenames listed in the abort lines. It is wise to include the most frequent filename extension of requests in cases where the abort extension can be used, but not filename extensions that occur infrequently. .gif, .jpg, .html, .htm are good candidates for the abort extension.

Examples for Schools

You may wish to have a way of blocking access to sites which contain material unsuitable for viewing by children and return them a web page which let's them know they have requested a site which is blocked.

Simple Block List

regexi ^http://www\.playboy\.com/.* http://www/notallowed.html
regexi ^http://www\.xxx\.com/.* http://www/notallowed.html

This will return the URL http://www/notallowed.html to anyone requesting URLs starting with http://www.playboy.com or http://www.xxx.com

For long lists for sites to block the use of accelerator strings may help, in which case the above example would be:

regexi ^http://www\.playboy\.com/ http://www/notallowed.html ^http://www.playboy.com
regexi ^http://www\.xxx\.com/ http://www/notallowed.html ^http://www.xxx.com

Block List with URL notification

If you would like to include the blocked URL requested in the resulting page (something like "The URL http://www.playboy.com/file.jpg has been blocked", you could create a CGI which takes the URL as an argument, and add the request to the pattern replacement.

regexi ^(http://www\.playboy\.com/.*) http://www/cgi-bin/na?url=\1

This might be a good choice for a list of hostnames you may already have to add to the list, for example:

cat list-of-banned-sites \
	| sed -e "s/\./\\\./g" \
	| awk '{ print "regexi ^(http://" $1 "/.*) http://www/cgi-bin/na?url=\1" }' \
	>> /usr/local/squirm/etc/squirm.patterns

Again, adding accelerator strings to long lists may help with speed.

Testing Squirm Interactively

When Squirm is run as root, it goes into interactive mode which echoes all information that would normally be logged to standard error output. This gives the opportunity to test a configuration file modification before restarting the current squirm processes on the machine.

Optionally, you can supply the path of a squirm patterns config file, if it's not in the default location, for the first argument.

Squid sends requests to the standard input of a redirector process with the form:

	URL   src_address/hostname   ident   method

The ident field is usually a dash '-'. The hostname is normally a dash too, since squid is normally configured not to look up hostnames for proxy requests. For Squirm to do any redirection, the method is GET and the src_address must match an address from the squirm.local file.

The following text is an example of running squirm interactively, with the input lines to test marked in bold:

frog:~\:# whoami root /usr/local/squirm/bin/squirm Squirm running as UID 0: writing logs to stderr Tue Mar 10 22:00:34 1998:Loading IP List Tue Mar 10 22:00:34 1998:Reading Patterns from config /usr/local/squirm/etc/squirm.patterns Tue Mar 10 22:00:34 1998:Squirm (PID 16955) started http://tucows.com/downloads/win95/n32e301p.exe 127.0.0.1/- - GET http://tucows.senet.com.au/downloads/win95/n32e301p.exe 127.0.0.1/- - GET Tue Mar 10 22:00:57 1998:http://tucows.com/downloads/win95/n32e301p.exe:http://tucows.senet.com.au/downloads/win95/n32e301p.exe http://www.somewhere.com/path/file 127.0.0.1/- - GET http://www.somewhere.com/path/file 127.0.0.1/- - GET [Ctrl + D]

Alternatively you can provide input from a file by using the syntax:

/usr/local/squirm/bin/squirm < filename

Squirm Log Files

There are several log files in /usr/local/squirm/logs which are normally only viewable by the squid user id and root:

squirm.debug: Contains verbose info if DEBUG is defined when compiling
squirm.error: Contains messages for invalid config or other alert conditions
squirm.fail: Lists instances where length of URL was too short (< 4 chars)
squirm.info: Squirm restarts and reconfigurations
squirm.match: Shows URLs which were successfully replaced by a pattern replacement

The /usr/local/squid/logs directory *must* be writeable by the user id that Squid executes as. This was set up for you when make install was executed.

Reconfiguring Squirm

When you have modified either squirm.local or squirm.patterns all of the running squirm processes need to be restarted by a HUP signal.

(Restarting Squid will do this (by sending squid a HUP signal), but this usually isn't convenient because it makes squid become unvailable for a period of time.)

Under Linux, you can do this by typing:

	killall -HUP squirm

On other systems you may have to write a small script:

#!/bin/sh
for PID in `ps -aux | grep redirector | grep -v grep | awk '{ print $2 }'`
do
	kill -HUP $PID
done

Credits & Copyright

Maintained by Chris Foote, chris@foote.com.au
Copyright (C) 1998 Chris Foote & Wayne Piekarski

If you find it useful, I'd like to know - please send email
to chris@foote.com.au - Ta!

Includes the GNU Regex library written by many authors - see
regex/AUTHORS for details.

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Please see the file GPL in the source directory for full copyright
information.

File Last Modified: Aug 21 2005

This site is sponsored by Inetd and HostExpress, written by Chris Foote.

. .