============================= Mailman 3 Core architecture ============================= This is a brief overview of the internal architecture of the Mailman 3 core delivery engine. You should start here if you want to understand how Mailman works at the 1000 foot level. Another good source of architectural information is available in the chapter written by Barry Warsaw for the `Architecture of Open Source Applications`_. User model ========== Every major component of the system is defined by an interface. Look through ``src/mailman/interfaces`` for an understanding of the system components. Mailman objects which are stored in the database, are defined by *model* classes. Objects such as *mailing lists*, *users*, *members*, and *addresses* are primary objects within the system. The *mailing list* is the central object which holds all the configuration settings for a particular mailing list. A mailing list is associated with a *domain*, and all mailing lists are managed (i.e. created, destroyed, looked up) via the *mailing list manager*. *Users* represent people, and have a *user id* and a *display name*. Users are linked to *addresses* which represent a single email address. One user can be linked to many addresses, but an address is only linked to one user. Addresses can be *verified* or *not verified*. Mailman will deliver email only to *verified* addresses. Users and addresses are managed by the *user manager*. A *member* is created by linking a *subscriber* to a mailing list. Subscribers can be: * A user, which becomes a member through their *preferred address*. * An address, which can be linked or unlinked to a user, but must be verified. Members also have a *role*, representing regular members, digest members, list owners, and list moderators. Members can even have the *non-member* role (i.e. people not yet subscribed to the mailing list) for various moderation purposes. Process model ============= Messages move around inside the Mailman system by way of *queue* directories managed by the *switchboard*. For example, when a message is first received by Mailman, it is moved to the *in* (for "incoming") queue. During the processing of this message, it -or copies of it- may be moved to other queues such as the *out* queue (for outgoing email), the *archive* queue (for sending to the archivers), the *digest* queue (for composing digests), etc. A message in a queue is represented by a single file, a ``.pck`` file. This file contains two objects, serialized as `Python pickles`_. The first object is the message being processed, already parsed into a `more efficient internal representation`_. The second object is a metadata dictionary that records additional information about the message as it is being processed. ``.pck`` files only exist for messages moving between different system queues. There is no ``.pck`` file for messages while they are actively being processed. Each queue directory is associated with a *runner* process which wakes up every so often. When the runner wakes up, it examines all the ``.pck`` files in FIFO order, deserializing the message and metadata objects, and processing them. If the message needs further processing in a different queue, it will be re-serialized back into a ``.pck`` file. If not (e.g. because processing of the message is complete), then no ``.pck`` file is written. The Mailman system uses a few other runners which don't process messages in a queue. You can think of these as fairly typical server process, and examples include the LMTP server, and the HTTP server for processing REST commands. All of the runners are managed by a *master watcher* process. When you type ``mailman start`` you are actually starting the master. Based on configuration options, the master will start the appropriate runners as subprocesses, and it will watch for the clean exiting of these subprocesses when ``mailman stop`` is called. Rules and chains ================ When a message is first received for posting to a mailing list, Mailman processes the message to determine whether the message is appropriate for the mailing list. If so, it *accepts* the message and it gets posted. Mailman can *discard* the message so that no further processing occurs. Mailman can also *reject* the message, bouncing it back to the original sender, usually with some indication of why the message was rejected. Or, Mailman can *hold* the message for moderator approval. *Moderation* is the phase of processing that determines which of the above four dispositions will occur for the newly posted message. Moderation does not generally change the message, but it may record information in the metadata dictionary. Moderation is performed by the *in* queue runner. Each step in the moderation phase applies a *rule* to the message and asks whether the rule *hits* or *misses*. Each rule is linked to an *action* which is taken if the rule hits (i.e. matches). If the rule misses (i.e. doesn't match), then the next rule is tried. All of the rule/action links are strung together sequentially into a *chain*, and every mailing list has a *start chain* where rule processing begins. Actually, every mailing list has *two* start chains, one for regular postings to the mailing list, and another for posting to the owners of the mailing list. To recap: when a message comes into Mailman for posting to a mailing list, the incoming runner finds the destination mailing list, determines whether the message is for the entire list membership, or the list owners, and retrieves the appropriate start chain. The message is then passed to the chain, where each link in the chain first checks to see if its rule matches, and if so, it executes the linked action. This action is usually one of *accept*, *reject*, *discard*, and *hold*, but other actions are possible, such as executing a function, deferring action, or jumping to another chain. As you might imagine, you can write new rules, compose them into new chains, and configure a mailing list to use your custom chain when processing the message during the moderation phase. Pipeline of handlers ==================== Once a message is accepted for posting to the mailing list, the message is usually modified in a number of different ways. For example, some message headers may be added or removed, some MIME parts might be scrubbed, added, or rearranged, and various informative headers and footers may be added to the message. The process of preparing the message for the list membership (as well as the digests, archivers, and NNTP) falls to the *pipeline of handlers* managed by the *pipeline* queue. The pipeline of handlers is similar to the processing chain, except here, a handler can make any modifications to the message it wants, and there is no rule decision or action. The message and metadata simply flow through a sequence of handlers arranged in a named pipeline. Some of the handlers modify the message in ways described above, and others copy the message to the outgoing, NNTP, archiver, or digester queues. As with chains, each mailing list has two pipelines, one for posting to the list membership, and the other for posting to the list's owners. Of course, you can define new handlers, compose them into new pipelines, and change a mailing list's pipelines. Integration and control ======================= Humans and external programs can interact with a running Core system in many different ways. There's an extensive command line interface that provides useful options to a system administrator. For external applications such as the Postorius web user interface, and the HyperKitty archiver, the `administrative REST API ` is the most common way to get information into and out of the Core. **Note**: The REST API is an administrative API and as such it must not be exposed to the public internet. By default, the REST server only listens on ``localhost``. Internally, the Python API is extensive and well-documented. Most objects in the system are accessed through the `Zope Component Architecture`_ (ZCA). If your Mailman installation is importable, you can write scripts directly against the internal public Python API. Other bits and pieces ===================== There are lots of other pieces to the Mailman puzzle, such as the set of core functionality (logging, initialization, event handling, etc.), mailing list *styles*, the API for integrating external archivers and mail servers. The database layer is a critical piece, and Mailman has an extensive set of command line commands, and email commands. Almost the entire system is documented in these pages, but it may be a bit of a spelunking effort to find it. Improvements are welcome! .. _`Architecture of Open Source Applications`: http://www.aosabook.org/en/mailman.html .. _`Python pickles`: https://docs.python.org/3/library/pickle.html .. _`more efficient internal representation`: https://docs.python.org/3/library/email.html .. _`Zope Component Architecture`: https://pypi.python.org/pypi/zope.component