From d86b9f73126adcaefc521b51db62825c7d9837b5 Mon Sep 17 00:00:00 2001
From: Stuart Gathman <stuart@gathman.org>
Date: Tue, 9 Sep 2008 23:07:48 +0000
Subject: [PATCH] Whitelists and Blacklists

---
 doc/milter.ht | 62 +++++---------------------------------------
 doc/policy.ht | 71 +++++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 69 insertions(+), 64 deletions(-)
diff --git a/doc/milter.ht b/doc/milter.ht
index 682a0f3..844d662 100644
--- a/doc/milter.ht
+++ b/doc/milter.ht
@@ -25,7 +25,7 @@ Mascot by <a href="http://alphard.ethz.ch/hafner/lebl.htm">Christian Hafner</a>
   Stuart D. Gathman</a><br>
 This web page is written by Stuart D. Gathman<br>and<br>sponsored by
 <a href="http://www.bmsi.com">Business Management Systems, Inc.</a> <br>
-Last updated Mar 30, 2007</h4>
+Last updated Aug 26, 2008</h4>
 
 See the <a href="faq.html">FAQ</a> | <a href="http://sourceforge.net/project/showfiles.php?group_id=139894">Download now</a> |
 <a href="http://bmsi.com/mailman/listinfo/pymilter">Subscribe to mailing list</a> |
@@ -44,7 +44,9 @@ Sendmail 8.12 officially releases libmilter.
 Version 8.12 seems to be more robust, and includes new privilege
 separation features to enhance security.  Even better, sendmail 8.13
 supports socket maps, which makes <a href="pysrs.html">pysrs</a> much more
-efficient and secure.  I recommend upgrading.
+efficient and secure.   Sendmail 8.14 finally supports modifying 
+MAIL FROM via the milter API.  Unfortunately, I haven't gotten around
+to supporting that yet in python milter.
 </td></tr>
 </table>
 
@@ -209,60 +211,8 @@ href="http://www.duh.org/cvsweb.cgi/~checkout~/pmilter/doc/milter-protocol.txt?r
 <h3> Confirmed Installations </h3>
 
 Please <a href="mailto:%73%74%75%61%72%74%40%62%6D%73%69%2E%63%6F%6D">email</a>
-me if you successfully install milter on a system not mentioned below.
-<p>
-<table>
-<tr>
-<th>Operating System</th> <th>Compiler</th> <th>Python</th> <th>Sendmail</th>
-<th>milter</th>
-<tr>
-<td>Mandrake 8.0</td><td>gcc-3.0.1</td><td>2.1.1</td><td>8.12.0</td>
-<td>0.3.3</td><tr>
-<td>Mandrake 8.0</td><td>gcc-2.96</td><td>2.0</td><td>8.11.2</td>
-<td>0.3.6</td><tr>
-<td>RedHat 6.2</td><td>egcs-1.1.2</td><td>2.2.2</td><td>8.11.6</td>
-<td>0.5.4</td><tr>
-<td>RedHat 7.1</td><td>gcc-2.96</td><td>?</td><td>8.12.1</td>
-<td>0.3.5</td><tr>
-<td>RedHat 7.3</td><td>gcc-2.96</td><td>2.2.2</td><td>8.11.6</td>
-<td>0.5.5</td><tr>
-<td>RedHat 7.3</td><td>gcc-2.96</td><td>2.3.3</td><td>8.13.1</td>
-<td>0.7.2</td><tr>
-<td>RedHat 7.3</td><td>gcc-2.96</td><td>2.4.1</td><td>8.13.5</td>
-<td>0.8.4</td><tr>
-<td>RedHat 8.0</td><td>gcc-3.2</td><td>2.2.1</td><td>8.12.6</td>
-<td>0.5.2</td><tr>
-<td>RedHat 9.0</td><td>gcc-3.2.2</td><td>2.4.1</td><td>8.13.1</td>
-<td>0.8.2</td><tr>
-<td>RedHat EL3</td><td>gcc-3.2.3</td><td>2.4.1</td><td>8.13.5</td>
-<td>0.8.4</td><tr>
-<td>Debian Linux</td><td>gcc-2.95.2</td><td>2.1.1</td><td>8.12.0</td>
-<td>0.3.7</td><tr>
-<td>Debian Linux</td><td>gcc-3.2.2</td><td>2.2.2</td><td>8.12.7</td>
-<td>0.5.4</td><tr>
-<td>AIX-4.1.5</td><td>gcc-2.95.2</td><td>2.1.1</td><td>8.11.5</td>
-<td>0.3.3</td><tr>
-<td>AIX-4.1.5</td><td>gcc-2.95.2</td><td>2.1.1</td><td>8.12.1</td>
-<td>0.3.4</td><tr>
-<td>AIX-4.1.5</td><td>gcc-2.95.2</td><td>2.1.3</td><td>8.12.3</td>
-<td>0.4.2</td><tr>
-<td>AIX-4.1.5</td><td>gcc-2.95.2</td><td>2.4.1</td><td>8.13.1</td>
-<td>0.8.4</td><tr>
-<td>Slackware 7.1</td><td>?</td><td>?</td><td>8.12.1</td>
-<td>0.3.8</td><tr>
-<td>Slackware 9.0</td><td>gcc-3.2.2</td><td>2.2.3</td><td>8.12.9</td>
-<td>0.5.4</td><tr>
-<td>OpenBSD</td><td>?</td><td>2.3.3?</td><td>8.13.1?</td>
-<td>0.7.2</td><tr>
-<td>SuSE 7.3</td><td>gcc-2.95.3</td><td>2.1.1</td><td>8.12.2</td>
-<td>0.3.9</td><tr>
-<td>FreeBSD</td><td>gcc-2.95.3</td><td>2.2.1</td><td>8.12.3</td>
-<td>0.4.0</td><tr>
-<td>FreeBSD</td><td>gcc-2.95.3</td><td>2.2.2</td><td>?</td>
-<td>0.5.5</td><tr>
-<td>FreeBSD 4.4</td><td>gcc-2.95.3</td><td>?</td><td>8.12.10</td>
-<td>0.6.6</td>
-</table>
+me if you do <i>not</i> successfully install milter.  The confirmed
+installations are too numerous to list at this point.
 
 <h2> Enough Already! </h2>
 
diff --git a/doc/policy.ht b/doc/policy.ht
index 750cf08..3e57bc6 100644
--- a/doc/policy.ht
+++ b/doc/policy.ht
@@ -82,7 +82,7 @@ to determine the official SPF policy result.
 The offical SPF result is then logged in the Received-SPF header field,
 but certain results are subjected to further processing to create
 an effective result for policy purposes.
-
+<p>
 If the official result is 'none', we try to turn it into an effective result of
 'pass' or 'fail'.  First, we check for a local substitute SPF record
 under the domain defined in the <code>[spf]delegate</code> configuration.  
@@ -91,12 +91,12 @@ too clueless to add their own.  If there is no local substitute, we use a "best
 guess" SPF record of "v=spf1 a/24 mx/24 ptr" for MAIL FROM or "v=spf1 a/24
 mx/24" for HELO.  In addition, a HELO that is a subdomain of MAIL FROM and
 resolves to the connect IP results in an effective result of 'pass'.
-
+<p>
 If there is no local SPF record, and the effective result is still not
 'pass', we check for either a valid HELO name or a valid PTR record for
 the connect IP.  A valid HELO or PTR cannot look like a dynamic name
 as determined by the heuristic in <code>Milter.dynip</code>.
-
+<p>
 If HELO has an SPF record, and the result is anything but pass, we reject
 the connection:
 <pre>
@@ -107,16 +107,16 @@ the connection:
 </pre>
 Note that HELO does not have any forwarding issues like MAIL FROM, and so
 any result other than 'pass' or 'none' should be treated like 'fail'.
-
+<p>
 Only if nothing about the SMTP envelope can be validated does the effective
 result remain 'none.  I call this the "3 strikes" rule.
-
+<p>
 If the official result is 'permerror' (a syntax error in the sender's
 policy), we use the 'lax' option in pyspf to try various heuristics to guess 
 what they really meant.  For instance, the invalid mechanism "ip:1.2.3.4" is
 treated as "ip4:1.2.3.4".  The result of lax processing is then used
 as the effective result for policy purposes.
-
+<p>
 With an effective SPF result in hand, we consult the sendmail access
 database to find our receiver policy for the sender.  
 
@@ -159,7 +159,7 @@ SPF-Fail:abeb@adelphia.net     DSN
 This says to accept mail from that adelphia.net user despite the
 SPF fail, but only after annoying them with a DSN about their ISP's broken
 policy. 
-
+<p>
 If there is no match on the full sender, the domain is checked:
 <pre>
 SPF-Neutral:aol.com     REJECT
@@ -168,7 +168,7 @@ This says to reject mail from AOL with an SPF result of neutral.
 This means AOL users can't use their AOL address with another mail service
 to send us mail.  This is good because the other mail service is 
 likely a badly configured greeting card site or a virus.
-
+<p>
 Finally, a default policy for the result is checked.  While there are program
 defaults, you should have defaults in the access database for SPF results:
 <pre>
@@ -192,3 +192,58 @@ independently for each SPF result and sender combination.  So aol.com:neutral
 might have a really bad reputation, while aol.com:pass would be ok.
 Furthermore, when a sender finally publishes an SPF policy and starts
 getting SPF pass, their reputation is effectively reset.
+
+<h2> Whitelists and Blacklists </h2>
+
+The administrator can whitelist or blacklist senders and sending domains by
+appending them to <code>${datadir}/auto_whitelist.log</code> or
+<code>${datadir}/blacklist.log</code> respectively.  In addition,
+recipients of internal senders (except for automatic replies like vacation
+messages and return receipts) are automatically whitelisted for 60 days, and
+senders that fail CBV or DSN checks are automatically blacklisted for 30 days.
+Whitelisted and blacklisted senders are used to automatically train the
+bayesian content filter before being delivered or rejected, respectively.
+<p>
+Real Soon Now users will be able to maintain their own whitelist and
+blacklist that applies only when they are the recipient.
+
+<h2> Content Filter </h2>
+
+Most messages have been rejected or delivered by now, but spammers
+are always finding new places to send their junk from.  For instance,
+we get around 10000 emails a day, of which around 500 are first time
+spam senders.  A bayesian filter is trained by the whitelists and
+blacklists, and scores the message.  What is likely spam is either
+rejected or quarantined.  If the sender is an effective SPF pass,
+then they get a DSN notifying them that their message has been
+quarantined.  (A DSN failure gets the sender auto blacklisted.)
+Else, if the reject_spam option is set, the message is rejected.  
+Otherwise, a CBV is done (failure gets the sender auto blacklisted)
+and the message is silently quarantined.
+<p>
+Normally, you don't want email messages to silently disappear into
+a black hole, so you should set the reject_spam option.  However,
+if you don't want your correspondent's email to get rejected, you can
+check your quarantine frequently instead.
+
+<h3> Honeypot </h3>
+
+You can also blacklist recipients by listing them as aliases of the
+'honeypot' dspam user.  These are collectively called
+the honeypot.  Any email to these recipients is used to train the
+spam filter as spam and chalk up a reputation demerit for the sender, then
+discarded.  It might be a good idea to blacklist the sender if it has SPF pass
+as well, but I'm afraid of accidents.
+
+<h3> Reputation </h3>
+
+Reputation is tracked by sending domain and effective SPF result.
+The GOSSiP server tracks the spam/ham status of the last 1024 messages
+for each domain:result combination.  When the server is queried during
+the SMTP envelope phase (MAIL FROM), it also queries any configured
+peers, and the scores are combined.  Domains with a history of spam for
+a given SPF result are rejected at MAIL FROM.  The GOSSiP system has
+a command line utility to reset (delete) a reputation for cases where a
+sender that was infected with malware is repaired.  In addition,
+the confidence score of a reputation decays with time, so a bad sender
+will eventually be able to try again without manual intervention.

Operating System	Compiler	Python	Sendmail	milter
Mandrake 8.0	gcc-3.0.1	2.1.1	8.12.0	0.3.3
Mandrake 8.0	gcc-2.96	2.0	8.11.2	0.3.6
RedHat 6.2	egcs-1.1.2	2.2.2	8.11.6	0.5.4
RedHat 7.1	gcc-2.96	?	8.12.1	0.3.5
RedHat 7.3	gcc-2.96	2.2.2	8.11.6	0.5.5
RedHat 7.3	gcc-2.96	2.3.3	8.13.1	0.7.2
RedHat 7.3	gcc-2.96	2.4.1	8.13.5	0.8.4
RedHat 8.0	gcc-3.2	2.2.1	8.12.6	0.5.2
RedHat 9.0	gcc-3.2.2	2.4.1	8.13.1	0.8.2
RedHat EL3	gcc-3.2.3	2.4.1	8.13.5	0.8.4
Debian Linux	gcc-2.95.2	2.1.1	8.12.0	0.3.7
Debian Linux	gcc-3.2.2	2.2.2	8.12.7	0.5.4
AIX-4.1.5	gcc-2.95.2	2.1.1	8.11.5	0.3.3
AIX-4.1.5	gcc-2.95.2	2.1.1	8.12.1	0.3.4
AIX-4.1.5	gcc-2.95.2	2.1.3	8.12.3	0.4.2
AIX-4.1.5	gcc-2.95.2	2.4.1	8.13.1	0.8.4
Slackware 7.1	?	?	8.12.1	0.3.8
Slackware 9.0	gcc-3.2.2	2.2.3	8.12.9	0.5.4
OpenBSD	?	2.3.3?	8.13.1?	0.7.2
SuSE 7.3	gcc-2.95.3	2.1.1	8.12.2	0.3.9
FreeBSD	gcc-2.95.3	2.2.1	8.12.3	0.4.0
FreeBSD	gcc-2.95.3	2.2.2	?	0.5.5
FreeBSD 4.4	gcc-2.95.3	?	8.12.10	0.6.6