From andrew at azarov.com Wed Oct 24 03:11:07 2007 From: andrew at azarov.com (Andrew Azarov) Date: Wed, 24 Oct 2007 09:11:07 +0200 Subject: [Pymilter] milter.error: cannot add header Message-ID: <471EF00B.10701@azarov.com> Traceback (most recent call last): File "/usr/local/lib/python2.5/site-packages/Milter/__init__.py", line 203, in milter.set_eom_callback(lambda ctx: ctx.getpriv().eom()) File "./avlmilter", line 259, in eom self.addheader("X-AVLMail-Status", "Passed") File "/usr/local/lib/python2.5/site-packages/Milter/__init__.py", line 110, in addheader return self.__ctx.addheader(field,value,idx) milter.error: cannot add header Where can it come from? From stuart at bmsi.com Wed Oct 24 14:10:44 2007 From: stuart at bmsi.com (Stuart D. Gathman) Date: Wed, 24 Oct 2007 14:10:44 -0400 (EDT) Subject: [Pymilter] milter.error: cannot add header In-Reply-To: <471EF00B.10701@azarov.com> Message-ID: On Wed, 24 Oct 2007, Andrew Azarov wrote: > Traceback (most recent call last): > File "/usr/local/lib/python2.5/site-packages/Milter/__init__.py", line > 203, in > milter.set_eom_callback(lambda ctx: ctx.getpriv().eom()) > File "./avlmilter", line 259, in eom > self.addheader("X-AVLMail-Status", "Passed") > File "/usr/local/lib/python2.5/site-packages/Milter/__init__.py", line > 110, in addheader > return self.__ctx.addheader(field,value,idx) > milter.error: cannot add header > > Where can it come from? Did you include Milter.ADDHDRS in Milter.set_flags() ? See FAQ #8. http://bmsi.com/python/faq.html -- Stuart D. Gathman Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. From andrew at azarov.com Wed Oct 24 14:10:36 2007 From: andrew at azarov.com (Andrew Azarov) Date: Wed, 24 Oct 2007 20:10:36 +0200 Subject: [Pymilter] Forking pymilter? Message-ID: <471F8A9C.3020309@azarov.com> Is there any way to make a forking python milter? From dwayne at oscl.ca Wed Oct 24 14:38:59 2007 From: dwayne at oscl.ca (Dwayne Litzenberger) Date: Wed, 24 Oct 2007 12:38:59 -0600 Subject: [Pymilter] Forking pymilter? In-Reply-To: <471F8A9C.3020309@azarov.com> References: <471F8A9C.3020309@azarov.com> Message-ID: <200710241239.00049.dwayne@oscl.ca> On October 24, 2007 12:10:36 pm Andrew Azarov wrote: > Is there any way to make a forking python milter? The milter API insists on having re-entrant, thread-safe milters. I had oh-so-much fun writing a milter that uses python-ldap, which isn't thread-safe in any meaningful way. If you need to do something complicated, it might be easier to make a milter that communicates with a separate daemon process over a Unix domain socket. -- Dwayne Litzenberger, B.A.Sc. Information Technology Analyst Open Systems Canada Limited 1627 Broad Street Regina, SK S4P1X3 Office: 306.359.6725 http://www.oscl.ca/ From stuart at bmsi.com Wed Oct 24 14:40:39 2007 From: stuart at bmsi.com (Stuart D. Gathman) Date: Wed, 24 Oct 2007 14:40:39 -0400 (EDT) Subject: [Pymilter] Forking pymilter? In-Reply-To: <471F8A9C.3020309@azarov.com> Message-ID: On Wed, 24 Oct 2007, Andrew Azarov wrote: > Is there any way to make a forking python milter? Short answer: Sure. Call os.fork(). Long answer: The above would need a process pool to be reasonably efficient. Libmilter, on which pymilter is based, is threaded, so the libmilter process must have a thread per open connection. Those threads could be thin wrappers talking to forked servers, however. The real question is, why? Depending on the goal, other solutions may be better. Possible reasons: 1) python threading doesn't utilize multiple CPUs. At 200000 connections per day, pymilter CPU for my milter, including bayesian content filtering, is neglible on my low end Dell SC440. Even if it weren't, there are plenty of other processes to keep dual cores busy. If you're looking to process multi-millions of messages per day, you could use multiple low end machines instead of one super computer. If you do want the single super computer running pymilter, feel free to contribute the process pool manager. :-) 2) You are worried about memory corruption. Pure python is fairly immune to that. 3) You are worried about too many threads. There is one thread per open connection (unless you create more). Sendmail limits open connections. There was a bug where some libmilter threads weren't getting recycled, but that seems to be fixed now. A forking server would still be subject to that bug. -- Stuart D. Gathman Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. From stuart at bmsi.com Wed Oct 24 14:48:13 2007 From: stuart at bmsi.com (Stuart D. Gathman) Date: Wed, 24 Oct 2007 14:48:13 -0400 (EDT) Subject: [Pymilter] Forking pymilter? In-Reply-To: <200710241239.00049.dwayne@oscl.ca> Message-ID: On Wed, 24 Oct 2007, Dwayne Litzenberger wrote: > On October 24, 2007 12:10:36 pm Andrew Azarov wrote: > > Is there any way to make a forking python milter? > > The milter API insists on having re-entrant, thread-safe milters. I had > oh-so-much fun writing a milter that uses python-ldap, which isn't > thread-safe in any meaningful way. > > If you need to do something complicated, it might be easier to make a milter > that communicates with a separate daemon process over a Unix domain socket. I found it easy enough to wrap non-thread-safe APIs with mutexes from the thread module. I needed to do that with libdspam, for example. If it comes up a lot, an @synchronized decorator would be easy enough to do, if it isn't in 2.5 already (I'm still on 2.4). Keeping all your data in the connection object makes the actual milter code easy to keep threadsafe. -- Stuart D. Gathman Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. From dwayne at oscl.ca Wed Oct 24 15:29:07 2007 From: dwayne at oscl.ca (Dwayne Litzenberger) Date: Wed, 24 Oct 2007 13:29:07 -0600 Subject: [Pymilter] Forking pymilter? In-Reply-To: References: Message-ID: <200710241329.07893.dwayne@oscl.ca> On October 24, 2007 12:48:13 pm Stuart D. Gathman wrote: > I found it easy enough to wrap non-thread-safe APIs with mutexes from > the thread module. I needed to do that with libdspam, for example. > If it comes up a lot, an @synchronized decorator would > be easy enough to do, if it isn't in 2.5 already (I'm still on 2.4). python-ldap already wraps its API in a mutex. That's useless, though, because when you make a call to, say, ldap.search_s ("_s" stands for synchronous), python-ldap acquires a library-wide lock. Any other thread that tries to access the ldap library in the meantime (including to access other LDAP servers) blocks until the previous call completes. At that point, we've done away with concurrency, which defeats the purpose of using threads in the first place. I ended up spending a lot of time writing code to pass messages between the threads created by libmilter and a thread I created specifically to access python-ldap. If I had been able to fork a server for each connection, I could have simply used ldap.search_s and been done with it. libmilter demands a particular threading model that happens to be incompatible with OpenLDAP. What's not clear to me is why the Sendmail people thought this was necessary. IMHO, it would have been better to publish a document specifying how to communicate over the milter socket, and to provide automatic thread creation as an optional library feature. > Keeping all your data in the connection object makes the actual milter > code easy to keep threadsafe. Which is fine, unless you want to reuse code that already exists for other purposes and therefore doesn't do that. In any case, at least I can deal with this (albeit, unnecessary) complexity in Python rather than in C. Thanks for that. -- Dwayne Litzenberger, B.A.Sc. Information Technology Analyst Open Systems Canada Limited 1627 Broad Street Regina, SK S4P1X3 Office: 306.359.6725 http://www.oscl.ca/ From stuart at bmsi.com Wed Oct 24 15:54:09 2007 From: stuart at bmsi.com (Stuart D. Gathman) Date: Wed, 24 Oct 2007 15:54:09 -0400 (EDT) Subject: [Pymilter] Forking pymilter? In-Reply-To: <200710241329.07893.dwayne@oscl.ca> Message-ID: On Wed, 24 Oct 2007, Dwayne Litzenberger wrote: > python-ldap already wraps its API in a mutex. That's useless, though, because > when you make a call to, say, ldap.search_s ("_s" stands for synchronous), > python-ldap acquires a library-wide lock. Any other thread that tries to > access the ldap library in the meantime (including to access other LDAP > servers) blocks until the previous call completes. At that point, we've done > away with concurrency, which defeats the purpose of using threads in the > first place. This is another example of globals == evil. But it sounds like a process-pool class would be a useful addition to pymilter. > I ended up spending a lot of time writing code to pass messages between the > threads created by libmilter and a thread I created specifically to access > python-ldap. If I had been able to fork a server for each connection, I > could have simply used ldap.search_s and been done with it. Is there any concurrency difference between a single thread calling python-ldap on behalf of multiple threads, and multiple threads calling python-ldap with a mutex? If you want an ansynchronous call (start query, wait for query to finish and get result), then simply have the wrapper return a lazy-result proxy object, which waits for the result when an attribute is queried. Are you forking several processes to serve ldap query requests? That would certainly be a win, performance wise - and I can see it would require some effort to wrap the calls (is there a python RMI equivalent?) > libmilter demands a particular threading model that happens to be > incompatible with OpenLDAP. What's not clear to me is why the Sendmail > people thought this was necessary. IMHO, it would have been better to > publish a document specifying how to communicate over the milter socket, and > to provide automatic thread creation as an optional library feature. Yeah. Especially since libmilter theading is incompatible with Java VM threading. That is what drove me to python milter - python could handle libmilter threading, perl at the time could not, ruby at the time could not, and Java threads are not compatible (signal use, etc). And I wanted a high level garbage collected language. It turned out to be a good choice. Python is higher level than Java, CPU has not been an issue, at least with my milters, and it is easy to add C libraries to python. If there is a threadsafe C ldap API, it would not be hard to wrap it for a threadsafe python API. -- Stuart D. Gathman Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. From dwayne at oscl.ca Wed Oct 24 16:29:00 2007 From: dwayne at oscl.ca (Dwayne Litzenberger) Date: Wed, 24 Oct 2007 14:29:00 -0600 Subject: [Pymilter] Forking pymilter? In-Reply-To: References: Message-ID: <200710241429.00483.dwayne@oscl.ca> On October 24, 2007 01:54:09 pm Stuart D. Gathman wrote: > Is there any concurrency difference between a single thread calling > python-ldap on behalf of multiple threads, and multiple threads calling > python-ldap with a mutex? If you want an ansynchronous call (start query, > wait for query to finish and get result), then simply have the wrapper > return a lazy-result proxy object, which waits for the result when an > attribute is queried. My LDAP thread is a bit of a hack that emulates synchronous behaviour using the asynchronous calls provided by python-ldap[1]. The thread starts by waiting for a serialized request on a Queue.Queue (using Queue.get() with an infinite timeout). Once the LDAP thread receives a request, it makes the request via the asynchronous LDAPObject.search() function, and stores the resulting msgid in a 'pendingRequests' dictionary (along with the request itself, and a per-request Queue object used to deliver the result). It then begins alternately polling the request queue and LDAPObject.result2(), with 10ms timeouts for each. (The polling is done because we don't have something like WaitForMultipleObjects that would allow us to wait on both the Queue.get() and the LDAPObject.result2() calls.) When the thread gets a response from LDAPObject.result2() (which might be an exception), it looks up the msgid in the pendingRequests dictionary, serializes the response, and sends it over the per-request Queue to the calling thread, which decodes it and returns the result (or raises the associated exception). There was also some logic to attempt to reconnect to the server and re-issue the pending searches if the connection died. That was also more complicated than it would have been had I been able to make real synchronous calls. > Are you forking several processes to serve ldap query > requests? That would certainly be a win, performance wise - and I can see > it would require some effort to wrap the calls (is there a python RMI > equivalent?) I didn't actually do any forking. os.fork() duplicates all running threads, right? I've never heard of any Python RMI equivalent, which is why I more-or-less ended up inventing one for this project. Cheers, - Dwayne [1] See http://python-ldap.sourceforge.net/doc/python-ldap/ldap-objects.html -- Dwayne Litzenberger, B.A.Sc. Information Technology Analyst Open Systems Canada Limited 1627 Broad Street Regina, SK S4P1X3 Office: 306.359.6725 http://www.oscl.ca/ From stuart at bmsi.com Wed Oct 24 17:21:33 2007 From: stuart at bmsi.com (Stuart D. Gathman) Date: Wed, 24 Oct 2007 17:21:33 -0400 (EDT) Subject: [Pymilter] Forking pymilter? In-Reply-To: <200710241429.00483.dwayne@oscl.ca> Message-ID: On Wed, 24 Oct 2007, Dwayne Litzenberger wrote: > I didn't actually do any forking. os.fork() duplicates all running threads, > right? > > I've never heard of any Python RMI equivalent, which is why I more-or-less > ended up inventing one for this project. No serialization should be needed if your server is in the same process. Just pass the object directly. RMI and equivalents are for inter-process and inter-machine calls. Or did you want to have it immediately scalable to multiple processes? -- Stuart D. Gathman Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial. From dwayne at oscl.ca Wed Oct 24 17:41:28 2007 From: dwayne at oscl.ca (Dwayne Litzenberger) Date: Wed, 24 Oct 2007 15:41:28 -0600 Subject: [Pymilter] Forking pymilter? In-Reply-To: References: Message-ID: <200710241541.28444.dwayne@oscl.ca> On October 24, 2007 03:21:33 pm Stuart D. Gathman wrote: > On Wed, 24 Oct 2007, Dwayne Litzenberger wrote: > > I didn't actually do any forking. os.fork() duplicates all running > > threads, right? > > > > I've never heard of any Python RMI equivalent, which is why I > > more-or-less ended up inventing one for this project. > > No serialization should be needed if your server is in the same process. > Just pass the object directly. RMI and equivalents are for inter-process > and inter-machine calls. Or did you want to have it immediately scalable > to multiple processes? Sorry, "serialization" isn't the right word. I just meant creating a tuple that contains the request type, args, kwargs, and a newly-created Queue.Queue object used to return the result. On the way back, I pass a tuple containing Success/Fail, followed by either the return value or the result of sys.exc_info(). -- Dwayne Litzenberger, B.A.Sc. Information Technology Analyst Open Systems Canada Limited 1627 Broad Street Regina, SK S4P1X3 Office: 306.359.6725 http://www.oscl.ca/ From andrew at azarov.com Thu Oct 25 13:27:13 2007 From: andrew at azarov.com (Andrew Azarov) Date: Thu, 25 Oct 2007 19:27:13 +0200 Subject: [Pymilter] timeout before data read; init failed to open Message-ID: <4720D1F1.9040506@azarov.com> i've noticed this Oct 25 21:30:42 mail-1 sendmail[22477]: l9PHUWUm022477: Milter (avlmilter): timeout before data read Oct 25 21:30:42 mail-1 sendmail[22477]: l9PHUWUm022477: Milter (avlmilter): to error state Oct 25 21:30:42 mail-1 sendmail[22477]: l9PHUWUm022477: Milter (avlmilter): init failed to open Oct 25 21:30:42 mail-1 sendmail[22477]: l9PHUWUm022477: Milter (avlmilter): to error state in my logs... what can this be????? From stuart at bmsi.com Thu Oct 25 14:06:24 2007 From: stuart at bmsi.com (Stuart D. Gathman) Date: Thu, 25 Oct 2007 14:06:24 -0400 (EDT) Subject: [Pymilter] timeout before data read; init failed to open In-Reply-To: <4720D1F1.9040506@azarov.com> Message-ID: On Thu, 25 Oct 2007, Andrew Azarov wrote: > Oct 25 21:30:42 mail-1 sendmail[22477]: l9PHUWUm022477: Milter > (avlmilter): timeout before data read > > in my logs... what can this be????? http://www.sendmail.org/doc/sendmail-current/libmilter/docs/installation.html You are hitting the R: timeout. You might want to increase it, assuming your milter isn't in a loop or something. In the following example, the R: timeout is set to 5 minutes. INPUT_MAIL_FILTER(`pythonfilter', `S=local:/var/run/milter/pythonsock, F=T, T=C:5m;S:20s;R:5m;E:5m') -- Stuart D. Gathman Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154 "Confutatis maledictis, flammis acribus addictis" - background song for a Microsoft sponsored "Where do you want to go from here?" commercial.