From f55ddbce836222cd83bbc422060137b6b61497f8 Mon Sep 17 00:00:00 2001
From: Stuart Gathman <stuart@gathman.org>
Date: Sat, 13 Dec 2008 20:45:30 +0000
Subject: [PATCH] Split off milter applications.

---
 TODO       | 254 +----------------------------------------------------
 rejects.py |  38 --------
 report.py  | 138 -----------------------------
 rhsbl.m4   |  44 ----------
 4 files changed, 3 insertions(+), 471 deletions(-)
 delete mode 100644 rejects.py
 delete mode 100644 report.py
 delete mode 100644 rhsbl.m4

diff --git a/TODO b/TODO
index ccfbf23..f273c02 100644
--- a/TODO
+++ b/TODO
@@ -1,251 +1,3 @@
-The recent feature to let a REJECT policy for SPF None be overridden
-by whitelisting is working for CSI and CMS.  However, there could be
-a sender that we want to REJECT even when whitelisted - because they
-normally get a guessed PASS.  Need another policy name - or else just
-add them to local SPF so they won't ever get 'None'.
-
-When policy is OK, do not use cbv_cache for blacklist.
-
-Add postmaster option or general rcpt list to dsn.  Can send dsn to
-user and postmaster on the same connection.
-
-Check ESMTP NOTIFY before sending real DSNs.  Just use CBV if DSNs are
-not wanted.
-
-Support CBV to local domains and cache results so that invalid users
-can be rejected without maintaining valid user lists.
-
-Now that we blacklist IPs for too many bad rcpts, delay SPF until RCPT TO.
-
-When content filtering is not installed, reject BLACKLISTed MFROM
-immediately.  There is no use waiting until EOM.
-
-Configuration is problematic when handling incoming, but not outgoing mail.
-The problem comes when alice@example.com sends mail to bill@example.com,
-and we are the MX for example.com, but alice is sending from some other
-MTA.  The mail is flagged external, so we don't list example.com in
-internal_domains (or we would get "spam from self").  But, if we try to do a
-CBV, we get "fraudulent MX", because the MX is ourself!  So we need to 
-avoid doing CBV on such domains.  Currently, we try to make sure the SPF
-policies don't do CBV.  The real solution is for users to use SMTP AUTH,
-but some of them are stubborn.
-
-We now don't check internal domains for incoming mail if there is an
-SPF record.
-
-On the other hand, if alice is sending internally, or with SMTP AUTH, she
-*does* need the domain to be in internal_domains.  The solution to that 
-is to use the new SMTP AUTH access configuration to specify which domains
-can be used by smtp AUTH (by user if desired).
-
-It would be cleaner if CBV would know which domains we have agreed to
-be MX for. Some ideas for external connections:
-
-a) check access file for To:example.com	RELAY
-b) check mailertable
-c) check mx_domains config list
-d) if there is an SPF record, don't check internal_domains
-   (let SPF block unauthorized machines)
-
-But that still doesn't handle the roaming user, who won't use SMTP
-AUTH, but sends through some hotel MTA.  Maybe we don't want to support
-him?
-
-When setting up pydspam, both sender and rcpt must resolve to dspam users
-for falsepositive recognition.  Usually, this means adding 
-honeypot@mail.example.com to alias list for honeypot in pymilter.cfg.
-This needs to be documented.  I was caught by it setting up a new site.
-
-Add signature (x-sig=AB7485f=TS) to Received-SPF, so it can be used
-to blacklist sources of delayed DSNs.
-
-rcpt-addr may let us know when a recipient is unknown.  That should count
-against reputation.
-
-Need to use wildcards in blacklist.log: *.madcowsrecord.net
-Need to exclude emails like !*-admin@example.com in whitelist_sender.
-Need to exclude robot users from autowhitelist.  Don't want to have to
-list all users, so implement something like !*-admin@bmsi.com,@bmsi.com.
-
-GOSSiP feedback from user training is ignored because UMIS has already been
-removed from queue.  Maybe keep UMIS in queue, and add method to 
-alter last feedback for ID.
-
-Generate DSNs according to RFC 3464
-
-Get temperror policy from access file.
-
-Reporting explanation for failure should show source if sender
-provided explanation.
-
-Bug in Auto-whitelist.  Recent Auto-whitelist doesn't override expired entry.
-
-SPF permerror diagnostics should include corrected mechanism.
-
-Delay SPF check until RCPT TO.  Cache result to avoid repeating
-for multiple RCPT.  This avoids overhead for invalid RCPT, and
-allows for per RCPT local policy.
-
-Check SPF for outgoing mail (including local policy for internal addresses).
-This could also solve the second part of the mail from relay problem below.
-
-Whitelisted senders from trusted relay get PROBATION.  Need to extracted
-SPF result from headers - and in the case of mail internal to relay
-(e.g. bmsi.com), supply 'pass' result.  
-
-Add auto-blacklisted senders to blacklist.log with timestamp.
-Add emails blacklisted via CBV so that they are remembered across milter
-restarts.
-
-Make all dictionaries work like honeypot.  Do not train as ham unless
-whitelisted.  Train on blacklisted messages, or spam feedback.  This
-can be called Train On Error.  Should be possible to startup
-with training on everything to get dictionary built fast, then switch
-to train on error to minimize labor.
-
-Allow unsigned DSNs from selected domains (that don't accept signed MFROM,
-e.g. verizon.net).
-
-Allow verified hostnames for trusted_relay.  E.g. HELO name that
-passes SPF.
-
-When do we get two hello calls?  STARTTLS is one reason.
-
-Option: accept mail from auto-whitelisted senders even with spf-fail,
-but do not update dspam.  This can be done for individual senders or domains 
-using the access file.
-
-pysrs: SRS doesn't get applied to proper recipients when there are
-multiple recipients.  This requires debugging cf scripts - yuk.
-
-auto_whitelist false_positives from quarantine - perhaps only when
-user selects special button (use special header to communicate
-that from dspamcgi.py to milter.)
-
-Use send_dsn.log for blacklist also.  AddrCache needs localpart
-wildcard (e.g. empty localpart).
-
-Quarantined mail is missing headers modified/added by milter after
-checking dspam.
-
-Send DSN for permerror before processing extended result.  An additional
-DSN may be sent based on extended result.  Send permerror DSN to
-postmaster@sending_domain.
-
-Rescind whitelist for banned extensions, in case sender is infected.
-
-Train honeypot on error only.
-
-Find rfc2822 policy for MFROM quoting.
-
-Support explicit errors for SPF policy in access file:
-SPF-Neutral:aol.com	ERROR:"550 AOL mail must get SPF PASS"
-
-Defer TEMPERROR in SPF evaluation - give precedence to security
-(only defer for PASS mechanisms).
-
-Create null config that does nothing - except maybe add Received-SPF
-headers.  Many admins would like to turn features on one at a time.
-
-Can't output messages with malformed rfc822 attachments.
-
-Move milter,Milter,mime,spf modules to pymilter
-milter package will have bms.py application
-
-Web admin interface
-message log for automated stats and blacklisting
-Skip dspam when SPF pass? NO
-Report 551 with rcpt on SPF fail?
-check spam keywords with character classes, e.g.
-	{a}=[a@ãä], {i}=[i1í], {e}=[eë], {o}=[o0ö]
-
-Implement RRS - a backdoor for non-SRS forwarders.  User lists non-SRS 
-forwarder accounts, and a util provides a special local alias for the
-user to give to the forwarder.  (Or user just adds arbitrary alias
-unique to that forwarder to a database.)  Alias only works for mail from that
-forwarder.  Milter gets forwarder domain from alias and uses it to
-SPF check forwarder.
-
-Framework for modular Python milter components within a single VM.
-Python milters can be already be composed through sendmail by running each in
-a separate process.  However, a significant amount of memory is wasted
-for each additional Python VM, and communication between milters
-is cumbersome (e.g., adding mail headers, writing external files).
-
-Copy incoming wiretap mail, even though sendmail alias works perfectly
-for the purpose, to avoid having to change two configs for a wiretap.
-
-Provide a way to reload milter.cfg without stopping/restarting milter.
-
-Allow selected Windows extensions for specific domains via milter.cfg
-
-Fix setup.py so that _FFR_QUARANTINE is automatically defined when
-available in libmilter.
-
-Keep separate ismodified flag for headers and body.  This is important
-when rejecting outgoing mail with viruses removed (so as not to
-embarrass yourself), and also removing Received headers with hidepath.
-
-Need a test module to feed sample messages to a milter though a live 
-sendmail and SMTP.  The mockup currently used is probably not very accurate,
-and doesn't test the threading code.
-
-DONE Table of sendmail macros for documentation.  In API docs on milter.org.
-
-DONE For selected domains, check rcpts via CBV before accepting mail.  Cache
-results.  This will kick out dictonary attacks against a mail domain
-behind a gateway sooner.
-
-DONE Convert DSN to REJECT unless sender gets SPF pass or best guess pass.  Make
-configurable by SPF result with NOTSPAM policy (reject or deliver without DSN).
-Maybe policy should be NODSN - still verify sender with CBV.
-
-DONE Add parseaddr test case for 'foo@bar.com <baz@barf.biz>'
-
-DONE Require signed MFROM for all incoming bounces when signing all outgoing
-mail - except from trusted relays.
-
-DONE Added Message-ID header to DSN with SRS signed sender.  When seen on
-incoming rfc ignorant failure message, blacklist sender.
-
-DONE Option to add Received-SPF header, but never reject on SPF.
-  I think the above will handle this.
-
-DONE Received-SPF header field should show identity that was checked.
-
-DONE When training with spam, REJECT after data so that mistakenly blacklisted
-senders at least get an error.
-
-DONE Milter won't start when it can't change permissions on *.lock to match
-*.log.  Should maybe ignore that error - the effect will be to set
-the permissions to default.
-
-DONE Milter won't start when a whitelist/blacklist file is missing.
-
-DONE Delayed failure detection should parse From header to find email address.
-
-DONE When bms.py can't find templates, it passes None to dsn.create_msg(),
-which uses local variable as backup, which no longer exist.  Do plain
-CBV in that case instead.
-
-DONE Find and use X-GOSSiP: header for SPAM: and FP: submissions.  Would need
-to keep tags longer.
-
-DONE Parse incoming 3464 DSNs for "Action: failed" to recognize delayed
-failures.  This works regardless of Subject.
-
-DONE Reports PROBATION even when rejecting message (works, but confusing in
-log).
-
-DONE Delayed_failure detection needs to handle multi-line header fields.  
-Also, delayed_failure should be recognized when addressed to
-postmaster@helodomain 
-
-DONE DSN for Permerror shows 'None' for error under some condition.
-
-DONE Allow blacklisted emails as well as domains in blacklist.log.  Use same
-data structure as autowhitelist.log.  
-
-DONE Backup copies for outgoing/incoming mail.
-
-DONE Don't match dynamic ptr in bestguess.
+Support smfi_negotiate and auto negotiate only those callbacks for which 
+Milter.Milter methods have been overridden.  (Python should be able to
+do that.)
diff --git a/rejects.py b/rejects.py
deleted file mode 100644
index 1460ceb..0000000
--- a/rejects.py
+++ /dev/null
@@ -1,38 +0,0 @@
-# Analyze milter log to find abusers
-
-fp = open('/var/log/milter/milter.log','r')
-subdict = {}
-ipdict = {}
-spamcnt = {}
-for line in fp:
-  a = line.split(None,4)
-  if len(a) < 4: continue
-  dt,tm,id,op = a[:4]
-  if op == 'Subject:':
-    if len(a) > 4: subdict[id] = a[4].rstrip()
-  elif op == 'connect':
-    ipdict[id] = a[4].rstrip()
-  elif op in ('eom','dspam'):
-    if id in subdict: del subdict[id]
-    if id in ipdict: del ipdict[id]
-  elif op in ('REJECT:','DSPAM:','SPAM:','abort'):
-    if id in subdict:
-      if id in ipdict:
-        ip = ipdict[id]
-	del ipdict[id]
-	f,host,raw = ip.split(None,2)
-	if host in spamcnt:
-	  spamcnt[host] += 1
-	else:
-	  spamcnt[host] = 1
-      else: ip = ''
-      print dt,tm,op,a[4].rstrip(),subdict[id]
-      del subdict[id]
-    else:
-      print line.rstrip()
-print len(subdict),'leftover entries'
-
-spamlist = filter(lambda x: x[1] > 1,spamcnt.items())
-spamlist.sort(lambda x,y: x[1] - y[1])
-for ip,cnt in spamlist:
-  print cnt,ip
diff --git a/report.py b/report.py
deleted file mode 100644
index a84a842..0000000
--- a/report.py
+++ /dev/null
@@ -1,138 +0,0 @@
-# Analyze milter log to find abusers
-import traceback
-import sys
-
-def parse_addr(a):
-  beg = a.find('<')
-  end = a.find('>')
-  if beg >= 0:
-    if end > beg: return a[beg+1:end]
-  return a
-
-class Connection(object):
-  def __init__(self,dt,tm,id,ip=None,conn=None):
-    self.dt = dt
-    self.tm = tm
-    self.id = id
-    if ip:
-      _,self.host,self.ip = ip.split(None,2)
-    elif conn:
-      self.ip = conn.ip
-      self.host = conn.host
-      self.helo = conn.helo
-    self.subject = None
-    self.rcpt = []
-    self.mfrom = None
-    self.helo = None
-    self.innoc = []
-    self.whitelist = False
-
-def connections(fp):
-  conndict = {}
-  termdict = {}
-  for line in fp:
-    if line.startswith('{'): continue
-    a = line.split(None,4)
-    if len(a) < 4: continue
-    dt,tm,id,op = a[:4]
-    if (id,op) == ('bms','milter'):
-      # FIXME: optionally yield all partial connections in conndict
-      conndict = {}
-      termdict = {}
-      continue
-    if id[0] == '[' and id[-1] == ']':
-      try:
-	key = int(id[1:-1])
-      except:
-        print >>sys.stderr,'bad id:',line.rstrip()
-	continue
-    else: continue
-    if op == 'connect':
-      ip = a[4].rstrip()
-      conn = Connection(dt,tm,id,ip=ip)
-      conndict[key] = conn
-    elif op in (
-	'DISCARD:','TAG:','CBV:','Large','No',
-	'NOTE:','From:','Sender:','TRAIN:'):
-      continue
-    else:
-      op = op.lower()
-      try:
-	conn = conndict[key]
-      except KeyError:
-        try:
-	  conn = termdict[key]
-	  del termdict[key]
-	  conndict[key] = conn
-	except KeyError:
-	  print >>sys.stderr,'key error:',line.rstrip()
-	  continue
-      try:
-	if op == 'subject:':
-	  if len(a) > 4:
-	    conn.subject = a[4].rstrip()
-	elif op == 'innoc:':
-	  conn.innoc.append(a[4].rstrip())
-	elif op == 'whitelist':
-	  conn.whitelist = True
-	elif op == 'x-mailer:':
-	  if len(a) > 4:
-	    conn.mailer = a[4].rstrip()
-        elif op == 'x-guessed-spf:':
-          conn.spfguess = a[4]
-	elif op == 'received-spf:':
-	  conn.spfres,conn.spfmsg = a[4].rstrip().split(None,1)
-	elif op == 'received:':
-	  conn.received = a[4].rstrip()
-	elif op == 'temp':
-	  _,conn.tempfile = a[4].rstrip().split(None,1)
-	elif op == 'srs':
-	  _,conn.srsrcpt = a[4].rstrip().split(None,1)
-	elif op == 'mail':
-	  _,conn.mfrom = a[4].rstrip().split(None,1)
-	elif op == 'rcpt':
-	  _,rcpt = a[4].rstrip().split(None,1)
-	  conn.rcpt.append(rcpt)
-	elif op == 'hello':
-	  _,conn.helo = a[4].rstrip().split(None,1)
-	elif op in ('eom','dspam','abort'):
-	  del conndict[key]
-	  conn.enddt = dt
-	  conn.endtm = tm
-	  conn.result = op
-	  yield conn
-	  termdict[key] = Connection(conn.dt,conn.tm,conn.id,conn=conn)
-	elif op in ('reject:','dspam:','tempfail:','reject','fail:','honeypot:'):
-	  del conndict[key]
-	  conn.enddt = dt
-	  conn.endtm = tm
-	  conn.result = op
-	  conn.resmsg = a[4].rstrip()
-	  yield conn
-	  termdict[key] = Connection(conn.dt,conn.tm,conn.id,conn=conn)
-	elif op in ('fp:','spam:'):
-	  del conndict[key]
-	  termdict[key] = Connection(conn.dt,conn.tm,conn.id,conn=conn)
-	else:
-	  print >>sys.stderr,'unknown op:',line.rstrip()
-      except Exception:
-	print >>sys.stderr,'error:',line.rstrip()
-        traceback.print_exc()
-
-if __name__ == '__main__':
-  import gzip
-  for fn in sys.argv[1:]:
-    if fn.endswith('.gz'):
-      fp = gzip.open(fn)
-    else:
-      fp = open(fn)
-    for conn in connections(fp):
-      if conn.rcpt and conn.mfrom:
-        for r in conn.rcpt:
-	  if r.lower().find('iancarter') > 0: break
-	else:
-	  if conn.mfrom.lower().find('iancarter') < 0: continue
-	print >>sys.stderr,conn.result,conn.dt,conn.tm,conn.id,conn.subject,parse_addr(conn.mfrom),
-	for a in conn.rcpt:
-	  print parse_addr(a),
-	print
diff --git a/rhsbl.m4 b/rhsbl.m4
deleted file mode 100644
index 8f886c7..0000000
--- a/rhsbl.m4
+++ /dev/null
@@ -1,44 +0,0 @@
-divert(-1)
-#
-# Copyright (c) 2002 Derek J. Balling
-#	All rights reserved.
-#
-# Permission to use granted for all purposes. If modifications are made
-# they are requested to be sent to <dredd@megacity.org> for inclusion in future
-# versions 
-#
-# Allows (hopefully) for checking of access.db whitelisting now. This ONLY
-# works on sendmail-8.12.x ... use on any other version may require tinkering
-# by you the downloader.
-#
-# Incorporates many changes by Sergey S. Mokryshev <mokr@mokr.net>
-#
-#
-
-divert(0)
-ifdef(`_RHSBL_R_',`dnl',`dnl
-VERSIONID(`$Id$')
-define(`_RHSBL_R_',`')
-ifdef(`_DNSBL_R_',`dnl',`dnl
-LOCAL_CONFIG
-# map for DNS based blacklist lookups based on the sender RHS
-Kdnsbl host -T<TMP>')')
-divert(-1)
-define(`_RHSBL_SRV_', `_ARG_')dnl
-define(`_RHSBL_MSG_', `ifelse(len(X`'_ARG2_),`1',`"550 Mail from " $`'&{RHS} " refused by blackhole site '_RHSBL_SRV_`"',`_ARG2_')')dnl
-define(`_RHSBL_MSG_TMP_', `ifelse(_ARG3_,`t',`"451 Temporary lookup failure of " $`'&{RHS} " at '_RHSBL_SRV_`"',`_ARG3_')')dnl
-
-MAILER_DEFINITIONS
-
-SLocal_check_mail
-# DNS based RHS spam list blackholes.bmsi.com
-R$*			$: <?> $>CanonAddr $1
-R<?> $*<@$+.>		$: <?> $1<@$2.> $| $>SearchList <+ rhs> $| <F:$1@$2> <D:$2> <>
-R<?> $* $| <$={Accept}>	$: OKSOFAR
-R<?> $*<@$+.> $| $*	$: <?> $(dnsbl $2._RHSBL_SRV_. $: OK $) $(macro {RHS} $@ $2 $)
-R<?> OK			$: OKSOFAR
-R<?> $*<@$*>		$: OKSOFAR
-ifelse(len(X`'_ARG3_),`1',
-`R<?>$+<TMP>		$: TMPOK',
-`R<?>$+<TMP>		$#error $@ 4.7.1 $: _RHSBL_MSG_TMP_')
-R<?>$+			$#error $@ 5.7.1 $: _RHSBL_MSG_