pymilter/doc/policy.ht

Title: Python Milter Mail Policy

<h1> Python Milter Mail Policy </h1>

These are the policies implemented by the <code>bms.py</code> milter
application.  The milter and Milter modules do not implement any policies
by themselves.

<h3> Classify connection </h3>

When the SMTP client connects, the connection IP address is
saved for later verification, and the connection
is classified as INTERNAL or EXTERNAL by matching the ip
address against the <code>internal_connect</code> configuration.
IP addresses with no PTR, and PTR names that look like
the kind assigned to dynamic IPs (as determined by a heuristic
algorithm) are flagged as DYNAMIC.  IPs that match the
<code>trusted_relay</code> configuration are flagged as TRUSTED.
<p>
Examples from the log file (<i>not</i> the SMTP error message returned):
<pre>
2005Jul29 13:56:53 [71207] connect from p50863492.dip0.t-ipconnect.de at ('80.134.52.146', 1858) EXTERNAL DYN
2005Jul29 18:10:15 [74511] connect from foopub at ('1.2.3.4', 46513) EXTERNAL TRUSTED
2005Jul29 14:41:00 [71805] connect from foobar at ('192.168.0.1', 41205) INTERNAL
2005Jul29 14:41:15 [71806] connect from cncln.online.ln.cn at ('218.25.240.137', 35992) EXTERNAL
</pre>
<p>
Certain obviously evil PTR names are blocked at this point:
"localhost" (when IP is not 127.*) and ".".
<pre>
2005Jul29 14:49:50 [71918] connect from localhost at ('221.132.0.6', 50507) EXTERNAL
2005Jul29 14:49:50 [71918] REJECT: PTR is localhost
</pre>

<h3> HELO Check </h3>

The HELO name provided by the client is saved for later verification
(for example by SPF).  We could validate the HELO at this point
by verifying that an A record for the HELO name matches the connect ip.
However, currently we only block certain obvious problems.
HELO names that look like an IP4 address
and ones that match the <code>hello_blacklist</code> configuration
are immediately rejected.  The hello_blacklist typically contains
the current MTAs own HELO name or email domains.
Clients that attempt to skip HELO are immediately rejected.
<pre>
2005Jul29 18:10:15 [74512] hello from example.com
2005Jul29 18:10:15 [74512] REJECT: spam from self: example.com
2005Jul29 18:17:09 [74581] hello from 80.191.244.69
2005Jul29 18:17:09 [74581] REJECT: numeric hello name: 80.191.244.69
</pre>

<h3> MAIL FROM Check </h3>

Before calling our milter, sendmail checks a DNS blacklist to
block banned sender domains.  We never see a blocked domain.
<p>
The MAIL FROM address is saved for possible use by the smart-alias
feature.  First, the <code>internal_domains</code> is used for
a simple screening if defined.  If the MAIL FROM for an INTERNAL connection
is NOT in <code>internal_domains</code>, then it is rejected (the
PC is most likely infected and attempting to send out spam).
If the MAIL FROM for an EXTERNAL connection IS in
<code>internal_domains</code>, then the message is immediately rejected.
This is quick and effective for most small company MTAs.  For more
complex mail networks, it is too simplistic, and should not be defined.
SPF will handle the complex cases.

<h4> wiretap </h4>

The wiretap feature can screen and/or monitor mail to/from certain
users.  If the MAIL FROM is being wiretapped, the recipients are
altered accordingly.

<!--table-stop-->

<h2> SPF check </h2>

The MAIL FROM, connect IP, and HELO name are checked against
any SPF records published via DNS for the alleged sender (MAIL FROM)
to determine the official SPF policy result.
The offical SPF result is then logged in the Received-SPF header field,
but certain results are subjected to further processing to create
an effective result for policy purposes.
<p>
If the official result is 'none', we try to turn it into an effective result of
'pass' or 'fail'.  First, we check for a local substitute SPF record
under the domain defined in the <code>[spf]delegate</code> configuration.
It is often useful to add local SPF records for correspondents that are
too clueless to add their own.  If there is no local substitute, we use a "best
guess" SPF record of "v=spf1 a/24 mx/24 ptr" for MAIL FROM or "v=spf1 a/24
mx/24" for HELO.  In addition, a HELO that is a subdomain of MAIL FROM and
resolves to the connect IP results in an effective result of 'pass'.
<p>
If there is no local SPF record, and the effective result is still not
'pass', we check for either a valid HELO name or a valid PTR record for
the connect IP.  A valid HELO or PTR cannot look like a dynamic name
as determined by the heuristic in <code>Milter.dynip</code>.
<p>
If HELO has an SPF record, and the result is anything but pass, we reject
the connection:
<pre>
2005Jul30 19:45:16 [93991] connect from [221.200.41.54] at ('221.200.41.54', 3581) EXTERNAL DYN
2005Jul30 19:45:18 [93991] hello from adelphia.net
2005Jul30 19:45:19 [93991] mail from <wendy.stubbsua@link-it.com> ()
2005Jul30 19:45:19 [93991] REJECT: hello SPF: fail 550 access denied
</pre>
Note that HELO does not have any forwarding issues like MAIL FROM, and so
any result other than 'pass' or 'none' should be treated like 'fail'.
<p>
Only if nothing about the SMTP envelope can be validated does the effective
result remain 'none.  I call this the "3 strikes" rule.
<p>
If the official result is 'permerror' (a syntax error in the sender's
policy), we use the 'lax' option in pyspf to try various heuristics to guess
what they really meant.  For instance, the invalid mechanism "ip:1.2.3.4" is
treated as "ip4:1.2.3.4".  The result of lax processing is then used
as the effective result for policy purposes.
<p>
With an effective SPF result in hand, we consult the sendmail access
database to find our receiver policy for the sender.

<table border=1>
<tr><th>REJECT</th><td>
Reject the sender with a 550 5.7.1 SMTP code.  The SMTP rejection
includes a detailed description of the problem.
</td></tr>
<tr><th>CBV</th><td>
Do a Call Back Validation by connecting to an MX of the sender
and checking that using the sender as the RCPT TO is not rejected.
We quit the CBV connection before actualling sending a message.
If the CBV is rejected, our SMTP connection is rejected with the
same error code and message.  CBV results are cached.
</td></tr>
<tr><th>DSN</th><td>
Do a Call Back Validation by connecting to an MX of the sender
and checking that using the sender as the RCPT TO is not rejected.
Unlike a CBV, we continue on to data and send a detailed message
explaining the problem.  This can be useful for reporting PermError
or SoftFail to the sender.  Keep in mind that for any result other
than 'pass', the sender could be forged, and your DSN could annoy the
wrong person.  However, a SoftFail result is requesting such feedback
for debugging and a PermError result needs to be fixed by the sender ASAP
whether forged or not.  DSN results are cached so that senders are
annoyed only weekly.
</td></tr>
<tr><th>OK</th><td>
Accept the sender.  The message may still be rejected via reputation
or content filtering.
</td></tr>
</table>

<h3> SPF policy syntax </h3>

First, the full sender is checked:
<pre>
SPF-Fail:abeb@adelphia.net     DSN
</pre>
This says to accept mail from that adelphia.net user despite the
SPF fail, but only after annoying them with a DSN about their ISP's broken
policy.
<p>
If there is no match on the full sender, the domain is checked:
<pre>
SPF-Neutral:aol.com     REJECT
</pre>
This says to reject mail from AOL with an SPF result of neutral.
This means AOL users can't use their AOL address with another mail service
to send us mail.  This is good because the other mail service is
likely a badly configured greeting card site or a virus.
<p>
Finally, a default policy for the result is checked.  While there are program
defaults, you should have defaults in the access database for SPF results:
<pre>
SPF-Neutral:            CBV
SPF-Softfail:           DSN
SPF-PermError:          DSN
SPF-TempError:          REJECT
SPF-None:               REJECT
SPF-Fail:               REJECT
SPF-Pass:               OK
</pre>

<h2> Reputation </h2>

If the sender has not been rejected by this point, and if a GOSSiP server is
configured, we consult GOSSiP for the reputation score of the sender and
SPF result.  The score is a number from -100 to 100 with a confidence
percentage from 0 to 100.  A really bad reputation (less than -50 with
confidence greater than 3) is rejected.  Note that the reputation is tracked
independently for each SPF result and sender combination.  So aol.com:neutral
might have a really bad reputation, while aol.com:pass would be ok.
Furthermore, when a sender finally publishes an SPF policy and starts
getting SPF pass, their reputation is effectively reset.

<h2> Whitelists and Blacklists </h2>

The administrator can whitelist or blacklist senders and sending domains by
appending them to <code>${datadir}/auto_whitelist.log</code> or
<code>${datadir}/blacklist.log</code> respectively.  In addition,
recipients of internal senders (except for automatic replies like vacation
messages and return receipts) are automatically whitelisted for 60 days, and
senders that fail CBV or DSN checks are automatically blacklisted for 30 days.
Whitelisted and blacklisted senders are used to automatically train the
bayesian content filter before being delivered or rejected, respectively.
<p>
Real Soon Now users will be able to maintain their own whitelist and
blacklist that applies only when they are the recipient.

<h2> Content Filter </h2>

Most messages have been rejected or delivered by now, but spammers
are always finding new places to send their junk from.  For instance,
we get around 10000 emails a day, of which around 500 are first time
spam senders.  A bayesian filter is trained by the whitelists and
blacklists, and scores the message.  What is likely spam is either
rejected or quarantined.  If the sender is an effective SPF pass,
then they get a DSN notifying them that their message has been
quarantined.  (A DSN failure gets the sender auto blacklisted.)
Else, if the reject_spam option is set, the message is rejected.
Otherwise, a CBV is done (failure gets the sender auto blacklisted)
and the message is silently quarantined.
<p>
Normally, you don't want email messages to silently disappear into
a black hole, so you should set the reject_spam option.  However,
if you don't want your correspondent's email to get rejected, you can
check your quarantine frequently instead.

<h3> Honeypot </h3>

You can also blacklist recipients by listing them as aliases of the
'honeypot' dspam user.  These are collectively called
the honeypot.  Any email to these recipients is used to train the
spam filter as spam and chalk up a reputation demerit for the sender, then
discarded.  It might be a good idea to blacklist the sender if it has SPF pass
as well, but I'm afraid of accidents.

<h3> Reputation </h3>

Reputation is tracked by sending domain and effective SPF result.
The GOSSiP server tracks the spam/ham status of the last 1024 messages
for each domain:result combination.  When the server is queried during
the SMTP envelope phase (MAIL FROM), it also queries any configured
peers, and the scores are combined.  Domains with a history of spam for
a given SPF result are rejected at MAIL FROM.  The GOSSiP system has
a command line utility to reset (delete) a reputation for cases where a
sender that was infected with malware is repaired.  In addition,
the confidence score of a reputation decays with time, so a bad sender
will eventually be able to try again without manual intervention.