[Pymilter] general gossip questions

Fri Feb 26 15:21:47 EST 2010

On Thu, 25 Feb 2010, Todd Lyons wrote:

> On Mon, Feb 22, 2010 at 9:00 PM, Stuart D. Gathman <stuart at bmsi.com> wrote:
> >
> >> don't have a need for, so now I'll see about getting pygossip up and
> >> running on a test mail system.
> > Beware the unsolved problem of REUSE_ADDR not working for TCPServer.
> 
> I googled to try to understand a little bit about this.  What prevents
> you from just adding:
> SocketServer.TCPServer.allow_reuse_address = True
> right after the import SocketServer?  Or does that not fix the issue?

Doesn't fix the issue.

> I was able to telnet multiple times to the daemon in my testing and
> never had problems with concurrency, so I'm not sure that I even
> comprehend what the REUSE_ADDR issue actually is.

Telnet to daemon.  Restart daemon with session still active (something
that will almost always be the case in production).

Daemon will shutdown for restart, but won't start again for 5 mins. 
Error in log is "socket in use".  The SO_REUSE_ADDR socket option is
supposed to let you immediately reuse the socket without trying to
shutdown active connections.

Somehow, the allow_reuse_address flag doesn't actually result in the
socket option getting set.  I have spent a little bit of time debugging the
TCPServer python code, and can't see where it goes wrong.

> I plan to put pygossip on two servers, configure them as peers, and
> set my TTL to 2.  Then I've got 8 servers that I will point at those
> two instances.  I figured I'd have 4 point to one pygossip server and
> the other 4 point to the other pygossip server.  Since the TTL is 2,
> they should talk to each other and trade reputation info back and
> forth.  This should also result in the data being evenly split between
> the two machines.
> 
> Or am I misunderstanding how the peer system works and is all data is
> stored on both nodes?  I apologize for the lack of understanding this
> question reveals.

I think you have a slight misunderstanding.  But your setup should be
reasonable.  For each query, the pygossip server looks up the reputation
in its own database, then queries peers for their "opinion".  The
peer opinions are weighted by how often the peer agrees with the
local server (one mans spam is another mans daily entertainment),
and combined with the local reputation for the final score.

So while the databases will be different, there will be a lot of overlap.

However, adding a "load sharing peer" variation should be pretty
straightforward to design.

Re your scoring plans.  Gossip tracks just a spam/notspam vote for each
UMIS.  (A bitmap tracks the last N UMISs - where N defaults to 1024.)

So your feedback to gossip has to be yea/nay.  You could combine
the gossip score with the other scores in your main filter.

Note that Gossip is compatible with "AOL style" user feedback as well
(where user complaints about an email form a reputation that will eventually
get a sender kicked off AOL.  AOL tracks IPs rather than domains).

-- 
	      Stuart D. Gathman <stuart at bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.