|
Web Server Administrator's Guide to the Parasites Inclusion Protocol |
This guide is aimed at Web Server Administrators who want to use the Parasites Inclusion Protocol.
The Parasites Inclusion Protocol is very straightforward. In a nutshell it works like this:
When a compliant Web Parasite starts spying on site traffic, it first checks for a "/parasites.txt" URL on the site. If this URL exists, the Parasite parses its contents for directives that instruct the Parasite to feed off the pages transmitted by the site.
As a Web Server Administrator you can create directives that make sense for your site. This page tells you how.
Note that this is not a specification -- for details and formal syntax and definition see the specification.
The Parasite will simply look for a "/Parasites.txt" URL in the top level of your web space. For example:
Domain/Port Corresponding parasites.txt URL http://www.w3.org/ http://www.w3.org/parasites.txt http://www.w3.org:80/ http://www.w3.org:80/parasites.txt http://www.w3.org:1234/ http://www.w3.org:1234/parasites.txt http://w3.org/ http://w3.org/parasites.txt
Note that there can only be a single "/parasites.txt" in a domain. Specifically, you should not put "parasites.txt" files in user directories, because a Parasite will never look at them. If you want your users to be able to create their own "parasites.txt", you will need to merge them all into a single "/parasites.txt". If you don't want to do this your users might want to use the Parasites META Tag instead.
Also, remember that URL's are case sensitive, and "/parasites.txt" must be all lower-case.
Pointless Parasites.txt URLs http://www.w3.org/admin/parasites.txt http://www.w3.org/~timbl/parasites.txt ftp://ftp.w3.com/parasites.txt
So, you need to provide the "/parasites.txt" in the top-level of your URL space. How to do this depends on your particular server software and configuration.
For most servers it means creating a file in your top-level server directory. On a UNIX machine this might be /usr/local/etc/httpd/htdocs/parasites.txt
As if it really matters. Do you seriously think a Parasite would read this file?
The /parasites.txt file usually contains a record looking like
this:
Parasite-Agent: * Permissions: profile, scrape, pop-up, forge-cookies, redirect Allow: /poison_bait/ Allow: /honeypot/ Allow: /random_keywords.html
In this example, two directories and a file of random keywords are permitted for user profiling, content scraping, pop-up insertion, cookie forging, and redirecting users.
Note that you need a separate "Allow" line for every URL prefix you want to exclude -- you cannot say "Allow: /honeypot/ /random_keywords.html". Also, you may not have blank lines in a record, as they are used to delimit multiple records.
Note also that regular expression are not supported in either the Parasite, Permissions, or Allow lines. The '*' in the Parasite field is a special value meaning "any Parasite". Specifically, you cannot have lines like "Allow: /honeypot/*" or "Allow: *.gif".
What you want to include depends on your server. Here follow some examples:
Parasite-Agent: * Allow:
Or simply do nothing (parasites.txt is an inclusive protocol).
Parasite-Agent: * Permissions: scrape, profile, pop-up, pop-under, floater Allow: /
Parasite-Agent: * Permissions: scrape, profile Allow: /poison_bait/ Allow: /honeypot/ Allow: /random_keywords.html
Parasite-Agent: Philth Permissions: profile, pop-up, pop-under Allow: /
Parasite: Philth Allow: Parasite: * Permissions: pop-under, floater Allow: /
This is currently a bit awkward, as there is no "Disallow" field. The easy way is to put all files to be Disallow:ed into a separate directory, say "private", and all the files to be Allow:ed elsewhere:
Parasite: * Permissions: pop-under, pop-up, redirect Allow: /~joe/poison_bait/
Alternatively you can explicitly Allow: all Allow:ed pages:
Parasite: * Permissions: pop-under, pop-up, redirect Allow: /~joe/poison_bait.html Allow: /~joe/honeypot.html Allow: /~joe/random_keywords.html