Message tagger

Mailman has a topics system which works like this: a mailing list administrator sets up one or more topics, which is essentially a named regular expression. The topic name can be any arbitrary string, and the name serves double duty as the topic tag. Each message that flows the mailing list has its Subject: and Keywords: headers compared against these regular expressions. The message then gets tagged with the topic names of each hit.

>>> mlist = create_list('_xtest@example.com')

Topics must be enabled for Mailman to do any topic matching, even if topics are defined.

>>> mlist.topics = [('bar fight', '.*bar.*', 'catch any bars', False)]
>>> mlist.topics_enabled = False
>>> mlist.topics_bodylines_limit = 0

>>> msg = message_from_string("""\
... Subject: foobar
... Keywords: barbaz
...
... """)
>>> msgdata = {}

>>> from mailman.handlers.tagger import process
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
Subject: foobar
Keywords: barbaz


>>> msgdata
{}

However, once topics are enabled, message will be tagged. There are two artifacts of tagging; an X-Topics: header is added with the topic name, and the message metadata gets a key with a list of matching topic names.

>>> mlist.topics_enabled = True
>>> msg = message_from_string("""\
... Subject: foobar
... Keywords: barbaz
...
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
Subject: foobar
Keywords: barbaz
X-Topics: bar fight


>>> msgdata['topichits']
['bar fight']

Scanning body lines

The tagger can also look at a certain number of body lines, but only for Subject: and Keyword: header-like lines. When set to zero, no body lines are scanned.

>>> msg = message_from_string("""\
... From: aperson@example.com
... Subject: nothing
... Keywords: at all
...
... X-Ignore: something else
... Subject: foobar
... Keywords: barbaz
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
From: aperson@example.com
Subject: nothing
Keywords: at all

X-Ignore: something else
Subject: foobar
Keywords: barbaz

>>> msgdata
{}

But let the tagger scan a few body lines and the matching headers will be found.

>>> mlist.topics_bodylines_limit = 5
>>> msg = message_from_string("""\
... From: aperson@example.com
... Subject: nothing
... Keywords: at all
...
... X-Ignore: something else
... Subject: foobar
... Keywords: barbaz
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
From: aperson@example.com
Subject: nothing
Keywords: at all
X-Topics: bar fight

X-Ignore: something else
Subject: foobar
Keywords: barbaz

>>> msgdata['topichits']
['bar fight']

However, scanning stops at the first body line that doesn’t look like a header.

>>> msg = message_from_string("""\
... From: aperson@example.com
... Subject: nothing
... Keywords: at all
...
... This is not a header
... Subject: foobar
... Keywords: barbaz
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
From: aperson@example.com
Subject: nothing
Keywords: at all

This is not a header
Subject: foobar
Keywords: barbaz
>>> msgdata
{}

When set to a negative number, all body lines will be scanned.

>>> mlist.topics_bodylines_limit = -1
>>> lots_of_headers = '\n'.join(['X-Ignore: zip'] * 100)
>>> msg = message_from_string("""\
... From: aperson@example.com
... Subject: nothing
... Keywords: at all
...
... %s
... Subject: foobar
... Keywords: barbaz
... """ % lots_of_headers)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> # Rather than print out 100 X-Ignore: headers, let's just prove that
>>> # the X-Topics: header exists, meaning that the tagger did its job.
>>> print(msg['x-topics'])
bar fight
>>> msgdata['topichits']
['bar fight']

Scanning sub-parts

The tagger will also scan the body lines of text subparts in a multipart message, using the same rules as if all those body lines lived in a single text payload.

>>> msg = message_from_string("""\
... Subject: Was
... Keywords: Raw
... Content-Type: multipart/alternative; boundary="BOUNDARY"
...
... --BOUNDARY
... From: sabo
... To: obas
...
... Subject: farbaw
... Keywords: barbaz
...
... --BOUNDARY--
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
Subject: Was
Keywords: Raw
Content-Type: multipart/alternative; boundary="BOUNDARY"
X-Topics: bar fight

--BOUNDARY
From: sabo
To: obas

Subject: farbaw
Keywords: barbaz

--BOUNDARY--

>>> msgdata['topichits']
['bar fight']

But the tagger will not descend into non-text parts.

>>> msg = message_from_string("""\
... Subject: Was
... Keywords: Raw
... Content-Type: multipart/alternative; boundary=BOUNDARY
...
... --BOUNDARY
... From: sabo
... To: obas
... Content-Type: message/rfc822
...
... Subject: farbaw
... Keywords: barbaz
...
... --BOUNDARY
... From: sabo
... To: obas
... Content-Type: message/rfc822
...
... Subject: farbaw
... Keywords: barbaz
...
... --BOUNDARY--
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg['x-topics'])
None
>>> msgdata
{}