[Pymilter] exploding messages

Fri Apr 23 22:38:48 EDT 2004

Stuart D. Gathman wrote:

> On Fri, 23 Apr 2004, Eric S. Johansson wrote:
> 
> 
>>As I described earlier, in camram, there can be either individual or 
>>aggregate filters.  a mail message may hit multiple individual or 
>>aggregate filters.  The message must be compared to each filter and will 
>>get a different rating as to whether or not it is spam.  At this point, 
>>the message will get different additional headers and therefore create 
>>multiple, almost identical, copies of the same message.
> 
> 
> The dspam filter creates a "TAG" that is filter (dictionary) specific and
> is used to lookup a token stats record (signature) to change the status
> of a message.
> 
> Rather than copy the entire message, I simply add all the tags to the
> message.  If the user changes the status (marks as spam), the tags
> not belonging to them are simply ignored.  This way, there
> is only one message delivered.  For smart message stores (like Exchange -
> but don't buy it, it sucks in too many other ways), the message is
> only stored once as well.
> 
> If you simply make it easy to recognize which headers go with a 
> filter/individual (perhaps by including a filter id), you can add 'em all and
> not duplicate the entire message.  If you are worried about users finding
> out about Bcc recipients due to information leakage in the extra headers,
> then only duplicate the message for Bcc recipients.  I suppose you
> might also be concerned that officemates getting the same email would see each
> others spam score ("Your spam score for than pr0n spam is 0.00??"), but
> that wouldn't bother me.

wish we had had this conversation before I wrote the code to aggregate 
by filter group and replicate everything per group. :-)

the information I store in each message is recipient (for reinjecting 
message), score, spamtrap ID.  So if I have the message which is 
interpreted by three different sets a filter rules, I could potentially 
end up with three recipient address groups, three different scores and 
up to three spamtrap ID's.  I must point out that if a message has a 
spamtrap ID it is not propagated to the end-user's mailbox.

while this could work, it would seriously bollix up the internals which 
are, what I perceive as, MTA independent.  for example, the headers I 
added to a message are used during the reinjection process once someone 
has approved a message as "good".  The header information is used during 
the spamtrap message list generation process as well.  Having multiple 
copies of the same headers is unpleasant at best enforces the user 
interface to do more testing for message than I am comfortable with.

on the plus side, it's forced me to redo the filter front-end so that 
it's a little more general.  I think it needs a couple more passes (i.e. 
postfix, exim) before its really decent but it's definitely better.

I will admit to being disappointed that the sendmail folks didn't see 
far enough in the future to see the need for "forking" messages like 
we're talking about doing.  It would have been quite useful.

I was also playing with trying to see what address the traffic came in 
on and it looks like getsymval is the way to go but

Milter: connect something?? redweb.harvee.org None at None

is what I get for the receiver, interface_name and interface_address 
what I connect from local host.  I noticed something in the 
documentation saying that it doesn't give back a valid answer for local 
host and I guess this is what they mean...grumble.  They could at least 
give the port number but I guess I'll have to look into macros a little 
more closely and modify my mc file at the appropriate time.

---eric

<!DSPAM:FC1A60E29336721215824778>