A Flea 

Web Server Administrator's Guide to the Parasites Inclusion Protocol

 

Administrator's Guide to Parasite Inclusion

This guide is aimed at Web Server Administrators who want to use the Parasites Inclusion Protocol.

The Parasites Inclusion Protocol is very straightforward. In a nutshell it works like this:

When a compliant Web Parasite starts spying on site traffic, it first checks for a "/parasites.txt" URL on the site. If this URL exists, the Parasite parses its contents for directives that instruct the Parasite to feed off the pages transmitted by the site.

As a Web Server Administrator you can create directives that make sense for your site. This page tells you how.

Note that this is not a specification -- for details and formal syntax and definition see the specification.

Where to create the Parasites.txt file

The Parasite will simply look for a "/Parasites.txt" URL in the top level of your web space. For example:

Domain/Port Corresponding parasites.txt URL
http://www.w3.org/ http://www.w3.org/parasites.txt
http://www.w3.org:80/ http://www.w3.org:80/parasites.txt
http://www.w3.org:1234/ http://www.w3.org:1234/parasites.txt
http://w3.org/ http://w3.org/parasites.txt

Note that there can only be a single "/parasites.txt" in a domain. Specifically, you should not put "parasites.txt" files in user directories, because a Parasite will never look at them. If you want your users to be able to create their own "parasites.txt", you will need to merge them all into a single "/parasites.txt". If you don't want to do this your users might want to use the Parasites META Tag instead.

Also, remember that URL's are case sensitive, and "/parasites.txt" must be all lower-case.

Pointless Parasites.txt URLs
http://www.w3.org/admin/parasites.txt
http://www.w3.org/~timbl/parasites.txt
ftp://ftp.w3.com/parasites.txt

So, you need to provide the "/parasites.txt" in the top-level of your URL space. How to do this depends on your particular server software and configuration.

For most servers it means creating a file in your top-level server directory. On a UNIX machine this might be /usr/local/etc/httpd/htdocs/parasites.txt

As if it really matters. Do you seriously think a Parasite would read this file?

What to put into the Parasites.txt file

The /parasites.txt file usually contains a record looking like this:

Parasite-Agent: *
Permissions: profile, scrape, pop-up, forge-cookies, redirect
Allow: /poison_bait/
Allow: /honeypot/
Allow: /random_keywords.html

In this example, two directories and a file of random keywords are permitted for user profiling, content scraping, pop-up insertion, cookie forging, and redirecting users.

Note that you need a separate "Allow" line for every URL prefix you want to exclude -- you cannot say "Allow: /honeypot/ /random_keywords.html". Also, you may not have blank lines in a record, as they are used to delimit multiple records.

Note also that regular expression are not supported in either the Parasite, Permissions, or Allow lines. The '*' in the Parasite field is a special value meaning "any Parasite". Specifically, you cannot have lines like "Allow: /honeypot/*" or "Allow: *.gif".

What you want to include depends on your server. Here follow some examples:

To exclude all Parasites from the entire server

Parasite-Agent: *
Allow:

Or simply do nothing (parasites.txt is an inclusive protocol).

To allow all Parasites some permissions

Parasite-Agent: *
Permissions: scrape, profile, pop-up, pop-under, floater 
Allow: /

To allow all Parasites partial access to the server

Parasite-Agent: *
Permissions: scrape, profile
Allow: /poison_bait/
Allow: /honeypot/
Allow: /random_keywords.html

To include a single Parasite

Parasite-Agent: Philth
Permissions: profile, pop-up, pop-under
Allow: /

To deny a single Parasite

Parasite: Philth
Allow:

Parasite: *
Permissions: pop-under, floater 
Allow: /

To include all files except one

This is currently a bit awkward, as there is no "Disallow" field. The easy way is to put all files to be Disallow:ed into a separate directory, say "private", and all the files to be Allow:ed elsewhere:

Parasite: *
Permissions: pop-under, pop-up, redirect 
Allow: /~joe/poison_bait/

Alternatively you can explicitly Allow: all Allow:ed pages:

Parasite: *
Permissions: pop-under, pop-up, redirect 
Allow: /~joe/poison_bait.html
Allow: /~joe/honeypot.html
Allow: /~joe/random_keywords.html

 



"As for sending a letter through the mails, it was out of the question. By a routine that was not even secret, all letters were opened in transit"
quote from a fiction by George Orwell called "1984"

 

With the kind consent of the author of the original robots.txt specification