Developing Mailman

The following documentation is generated from the internal developer documentation. This documentation is also used by the test suite. Another good source of architectural information is available in the chapter written by Barry Warsaw for the Architecture of Open Source Applications.

For now, this will have to suffice as an overview of the Mailman system.

Object model

Every major component of the system is defined in an interface. Look through src/mailman/interfaces for an understanding of the system components. Mailman objects which are stored in the database, are defined by model classes. Objects such as mailing lists, users, members, and addresses are primary objects within the system.

Obviously, the mailing list is the central object which holds all the configuration settings for a particular mailing list. A mailing list is associated with a domain, and all mailing lists are managed by the mailing list manager.

Users represent people, and have a user id and a display name. Users are linked to addresses which represent a single email address. One user can be linked to many addresses, but an address is only linked to one user. Addresses can be verified or not verified. Mailman will deliver email only to verified addresses.

Users and addresses are managed by the user manager.

A member is created when a user subscribes to a mailing list, either via their preferred address or explicitly via one of their linked and verified addresses. Members link an address to a mailing list, and represent regular members, digest members, list owners, and list moderators. Members can even represent non-members (i.e. people not yet subscribed to the mailing list) for various moderation purposes.

Process model

Messages move around inside the Mailman system by way of queue directories managed by the switchboard. For example, when a message is first received by Mailman, it is moved to the in (for “incoming”) queue. During the processing of this message, it – or copies of it – may be moved to other queues such as the out queue (for outgoing email), the archive queue (for sending to the archivers), the digest queue (for composing digests), etc.

A message in a queue is represented by a single file, a .pck file. This file contains two objects, serialized as Python pickles. The first object is the message being processed, already parsed into a more efficient internal representation. The second object is a metadata dictionary that records additional information about the message as it is being processed.

.pck files only exist for messages moving between different system queues. There is no .pck file for messages while they are actively being processed.

Each queue directory is associated with a runner process which wakes up every so often. When the runner wakes up, it examines all the .pck files in FIFO order, deserializing the message and metadata objects, processing them. If the message needs further processing in a different queue, it will be re-serialized back into a .pck file. If not (e.g. because processing of the message is complete), then no .pck file is written.

The Mailman system uses a few other runners which don’t process messages in a queue. You can think of these as fairly typical server process, and examples include the LMTP server, and the HTTP server for processing REST commands.

All of the runners are managed by a master watcher process. When you type mailman start you are actually starting the master. Based on configuration options, the master will start the appropriate runners as subprocesses, and it will watch for the clean exiting of these subprocesses when mailman stop is called.

Rules and chains

When a message is first received for posting to a mailing list, Mailman processes the message to determine whether the message is appropriate for the mailing list. If so, it approves the message and it gets posted. Mailman can also discard the message so that no further processing occurs. Mailman can also reject the message, bouncing it back to the original sender, usual with some indication of why the message was rejected. Mailman can also hold the message for moderator approval.

Moderation is the phase of processing that determines which of the above four dispositions will occur for the newly posted message. Moderation does not generally change the message, but it may record information in the metadata dictionary. Moderation is performed by the in queue runner.

Each step in the moderation phase is performed by applying a rule to the message and asking whether the rule hits or misses. Each rule is linked to an action which is taken if the rule hits (i.e. matches). If the rule misses (i.e. doesn’t match), then the next rule is tried. All of the rule/action links are strung together sequentially into a chain, and every mailing list has a start chain where rule processing begins.

Actually, every mailing list has two start chains, one for regular postings to the mailing list, and another for posting to the owners of the mailing list.

To recap: when a message comes into Mailman for posting to a mailing list, the incoming runner finds the destination mailing list, determines whether the message is for the entire list membership, or the list owners, and retrieves the appropriate start chain. The message is then passed to the chain, where each link in the chain first checks to see if its rule matches, and if so, it executes the linked action. This action is usually one of the typical accept, reject, discard, and hold, but other actions are possible.

As you might imagine, you can write new rules, compose them into new chains, and configure a mailing list to use your custom chain when processing the message during the moderation phase.

Pipeline of handlers

Once a message is accepted for posting to the mailing list, the message is usually modified in a number of different ways. For example, some message headers may be added or removed, some MIME parts might be scrubbed, added, or rearranged, and various informative headers and footers may be added to the message.

The process of preparing the message for the list membership (as well as the digests, archivers, and NNTP) falls to the pipeline of handlers managed by the pipeline queue.

The pipeline of handlers is similar to the processing chain, except here, a handler can make any modifications to the message it wants, and there is no rule decision or action. The message and metadata simply flow through a sequence of handlers arranged in a named pipeline. Some of the handlers modify the message in ways described above, and others copy the message to the outgoing, NNTP, archiver, or digester queues.

As with chains, each mailing list has two pipelines, one for posting to the list membership, and the other for posting to the list’s owners.

Of course, you can define new handlers, compose them into new pipelines, and change a mailing list’s pipelines.

Other bits and pieces

There are lots of other pieces to the Mailman puzzle, such as the REST API, the set of core functionality (logging, initialization, event handling, etc.), mailing list styles, the API for integrating external archivers and mail servers. The database layer is an critical piece, and Mailman has an extensive set of command line commands, and email commands.