-*- text -*-

		       Cyrus IMAP server design
			  September 12, 1995

* Introduction

This document describes the design of the Cyrus IMAP server.  The
Cyrus IMAP server will provide access to personal mail and system-wide
bboards through the IMAP protocol.  It differs from other IMAP server
implementations in that it is intended to be run on "sealed" servers,
where normal users are not normally permitted to log in.  It will
support extensions to allow administrative functions such as ACL's and
quota control and will cooperate with the Cyrus IMSP server.

* Mailbox namespace

The Cyrus IMAP server presents mailboxes using the "netnews" namespace
convention.  Mailbox names match in a case-sensitive manner, with the
exception of "INBOX".  In mailbox names starting with "INBOX.", the
"INBOX" part of the name is case-insensitive, the rest of the name is
case-sensitive.  A mailbox name may not start or end with a "."
character, nor may it contain two "."  characters in a row.

In the initial implementation, non-ASCII characters, shell
metacharacters, and "/" are not permitted in mailbox names.  Later
versions of the server may make the set of permissible characters
site-policy configurable.

All personal mailboxes for user "bovik" begin with the string
"user.bovik.". For example, if user "bovik" had a personal "work"
mailbox, it would be called "user.bovik.work". To user "bovik",
however, the prefix "user.bovik." normally appears as "INBOX.". The
mailbox "user.bovik.work" would therefore appear as "INBOX.work". If
the access control list of the mailbox permitted other users to see
that mailbox, it would appear to them as "user.bovik.work".

The mailbox "user.bovik" is where the user "bovik" normally receives
new mail, and normally appears to user "bovik" as "INBOX".

* IMAP protocol extensions and implementation decisions

The Cyrus IMAP server supports the following protocol extensions:

	QUOTA (STORAGE only)
	ACL
	partitions on CREATE
	DUMP/RESTORE		[unimplemented, to be defined]
	NAMESPACE
	LITERAL+
	UIDPLUS

The EXPUNGE operation is allowed at any time, even when other server
processes are reading the mailbox.  If some other process performs an
EXPUNGE operation on a mailbox, the server will continue to use the
data in the old index and cache files until it is permitted to send
unsolicited EXPUNGE replies.

When executing commands other than FETCH, STORE, and SEARCH, the
server will check to see if the index file has been replaced or
modified since the last status update.  If the index file was
replaced, the server will read the new index file and issue any
necessary unsolicited EXPUNGE replies.  If the index file was replaced
or modified, the server will scan for flags that have changed since
the client was notified about them and issue any necessary unsolicited
FETCH replies.  If the index file was removed, the server assumes the
mailbox was deleted or renamed and sends EXPUNGE notifications for all
messages.  If any EXPUNGE replies were sent or if new messages have
been appended, the server will issue unsolicited EXISTS and RECENT
replies.

The server has compile-time-selctable plug-compatible modules which
determine the site's methods for:

	ACL interpretation (afs or posix)
	Authorization namespace (kerberos, unix, or trivial)
	Authentication mechanism (kerberos, unix-passwd, or trivial)
	Storage of \Seen state (local-file or remote-database)

Posix ACL interpretation and remote database storage of \Seen state
are not implemented in the first versions of the server.

* Mailbox storage format

It is presumed that users will only be able to access mailboxes
through the use of network protocols.  The Cyrus server implementation
is given the freedom to store mailboxes on disk in a format best
suited to its needs.

Each mailbox is represented by a directory in the filesystem.  Within
the directory, each message is stored in its own file in RFC 822
format.  The filenames of the message files are the
sequentially-assigned UID's, with a period appended.  Lines are
terminated by CRLF.

In order to allow the server to export netnews spool directories,
there is a per-mailbox netnews compatability mode.  Mailboxes in this
mode have message files named without an appended period.  Lines in
message files are terminated by just LF.  Messages may not be appended
to or copied into these mailboxes in the normal manner.

The directory contains the following additional files.  All binary
values are in network byte order:

cyrus.header	Contains variable-length information about the mailbox itself.

- magic-number
- Name of the quota root for this mailbox
- The names of the user-defined flags
- Redunant copy of ACL


cyrus.index	Contains a header and a sequence of fixed-length
		records, one record per message in the mailbox.  The
		header contains:

- generation-no	4 bytes		incremented each time cyrus.index is
				rewritten by an expunge operation
- format	4 bytes		nonzero if mailbox in netnews format
- minor-version 4 bytes		mailbox format minor version number
- start-offset	4 bytes		offset of first per-message record
- record-size	4 bytes		size of per-message records
- exists	4 bytes		number of messages in mailbox
- last-date	4 bytes		Unix time of the last insertion
				of a message into the mailbox
- last-uid	4 bytes		UID of the last message	appended to
				the mailbox
- quota-used	4 bytes		size quota usage this mailbox
- pop3-last	4 bytes		UID of last message accessed via POP3
- uidvalidity	4 bytes		UIDVALIDITY value

		Each per-message record contains:

- uid		4 bytes		UID
- internaldate	4 bytes		(in unix date format)
- sentdate	4 bytes		parsed Date: header, in unix date format
- rfc822.size	4 bytes
- header-size	4 bytes		size of message header
- body-offset	4 bytes		offset in file of message body
- cache-offset	4 bytes		offset in cyrus.cache
- last-updated	4 bytes		unix date of last modification
				of flags
- system-flags	4 bytes		bit-vector of system flags:
				\Answered \Flagged \Deleted \Draft
- user-flags	16 bytes	bit-vector of user-defined flags


The minor-version is used to allow backwards-compatbile changes to the
mailbox format.  If a later version of the software adds additional
information to the cache file, it could detect an out-of-date
minor-version on open/check and know to rebuild the cache file.  On
append/expunge, the minor-version is set down to the level that the
appending software supports.


cyrus.cache	Contains a header and a sequence of variable-length
		records, one record per message in the mailbox.  The
		header contains a "generation number" corresponding to
		the one in cyrus.index.  Each record contains:

- IMAP "envelope", in format suitable for use in FETCH reply
- IMAP "bodystructure", in format suitable for use in FETCH reply
- IMAP "body", in format suitable for use in FETCH reply
- size, file offsets, charsets, and encodings of the various MIME body sections
- From, To, Cc, and BCC strings for searching

Each field contains a 4 byte length in network byte order, the data,
and padding to a 4 byte boundary.


cyrus.seen	Contains the \Seen and \Recent information of users.
		One record per user, sorted by user, each record contains:

- Userid
- Time when mailbox last opened (for arbitron)
- UID of last message that is not \Recent
- Time \Seen state last changed (not present in older versions)
- SEQUENCE of UID's of \Seen messages
- Optional space character padding, to avoid having to rewrite file on updates

In order to support replicated mailboxes in a future implementation,
access to cyrus.seen is through an interface which can later be
optionally replaced by a distributed database.


* Mailbox locking semantics

In order to update information in a mailbox, the implementation must
follow set locking procedures in order to prevent concurrency errors.

The locking order is: cyrus.header cyrus.index quota-root cyrus.seen
To prevent deadlocks, an implementation may not lock a given file while
holding a lock on a later file.

After obtaining a lock on any file, the implementation must
re-read any data previously obtained from the file.

It is necessary to lock cyrus.header before locking the quota-root in
order to guard against the possibility of another process changing the
quota root for the mailbox.

The file cyrus.cache is locked by the POP server in order to prevent
multiple concurrent POP sessions on a mailbox.  As POP doesn't allow
for the possibility of concurrent expunges, the expunge operation must
obtain a lock on cyrus.cache after cyrus.index in order to prevent an
expunge operation from occuring concurrently with a POP session.


The order of the specific update operations is:

To append one or more messages:
	open/lock cyrus.header
	check ACL
	open/lock cyrus.index and cyrus.cache
	open/lock appropriate quota root
	check quota
	write cyrus.header if creating new user flags
	assign uid as one plus last-uid
	write or hard-link message file(s)
	append info to cyrus.cache
	append info to cyrus.index
	update last-date, last-uid, and quota-used
	update quota root
	release locks
	perform status update if permitted

To set or clear flags:
	if adding new user flag, lock cyrus.header, update, release lock
	lock already-open cyrus.index file
	if cyrus.index has been replaced since last opened
		open and lock the newer cyrus.index file.
		locate the record
	read previous flags
	calculate changes to flags
	update flags and last-updated
	notify client
	release locks

To expunge:
	lock cyrus.header
	open cyrus.index and cyrus.cache if not already reading them
	lock already-open cyrus.index file
	if cyrus.index has been replaced since last opened, close,
		reopen, lock, and repeat.
	if reopened cyrus.index file, reopen cyrus.cache file.
	nonblocking lock cyrus.cache
	if didn't get lock, fail (mailbox opened via POP)
	create files cyrus.index.NEW and cyrus.cache.NEW
	copy header, incrementing generation-no
	copy non-\DELETED records to new files, remember uid's and sizes
	[remove unused user flags from cyrus.header?]
	update quota-used in cyrus.index.NEW
	lock appropriate quota root
	rename cyrus.index.NEW to cyrus.index
	rename cyrus.cache.NEW to cyrus.cache
	update quota root
	release locks
	remove deleted message files
	release locks
	perform status update

update recent/seen info:
	lock cyrus.seen
	find record
	if user appears to have read everything and previously hadn't
		dropoff request for IMSP SEEN command
	else if user previously had read everything but no longer does
		dropoff request for IMSP SEEN (0) command
	if enough room to update in place
		update record in place
	else
		create cyrus.seen.NEW
		copy data, updating record
		rename cyrus.seen.NEW to cyrus.seen
	release lock

delete mailbox:
	if deleting a user's inbox
		deny permission if not admin
		recursively delete all sub-mailboxes
		remove user's subscriptions
	open/lock mailboxes
	check ACL
	create mailboxes.NEW
	copy over mailboxes, removing entry
	open/lock cyrus.header
	open/lock cyrus.index
	open/lock appropriate quota root
	delete cyrus.seen
	update quota root
	remove all files in mailbox directory
	remove any empty mailbox directories
	rename mailboxes.NEW to mailboxes

rename mailbox:
	determine if renaming INBOX
	open/lock mailboxes
	check ACL
	create mailboxes.NEW
	copy over mailboxes, changing entry
	open/lock cyrus.header
	open/lock cyrus.index
	open/lock appropriate quota root
	create new mailbox
	copy keyword names, write new mailbox header
	if quota roots different
		check space available in new quota root
	link/copy over cyrus.index
	link/copy over cyrus.cache
	link/copy over message files
	copy over cyrus.seen
	update new quota root
	if renaming INBOX
		expunge all messages from old mailbox
	else
		delete old mailbox
	if expunge/delete failed
		back out change to new quota root
	rename mailboxes.NEW to mailboxes

[dump/restore, to implement IMSP MOVE]

* Reading semantics

In order to avoid presenting stale data, the following operations
should be performed when implementing the following IMAP commands.

Select/Examine:
	open cyrus.header, check ACL
	open cyrus.index and cyrus.cache, repeat if generation-no doesn't match
	open cyrus.seen
	calculate FLAGS from information in cyrus.header
	perform status update

Fetch/Partial:
	calculate offset into cyrus.index
	follow pointers to cyrus.cache and/or message file

Search:
	order the search criterea in cheapest-to-most-expensive order
	perform searches, narrowing return set
	send SEARCH reply

	The search criteria order in the following categoreis:

	- cyrus.seen based
		NEW, OLD, RECENT, SEEN, UNSEEN

	- cyrus.index based
		ANSWERED, BEFORE, DELETED, FLAGGED, KEYWORD, LARGER,
		ON, SENTBEFORE, SENTON, SENTSINCE, SINCE, SMALLER,
		UNANSWERED, UNDELETED, UNFLAGGED, UNKEYWORD

	- sequence based
		sequence, UID sequence

	- cyrus.cache based
		BCC, CC, FROM, SUBJECT, TO

	- recursive criteria
		NOT search_criteria, OR search_criteria search_criteria

	- must serch message file contents
		BODY, HEADER, TEXT

status update:
	if newer cyrus.index
		open cyrus.index and cyrus.cache
		send EXPUNGE notifications
	scan for changed flags (using last-updated), send
		unsolicited FETCH FLAGS replies
	read EXISTS from cyrus.index header
	calculate RECENT by info in cyrus.seen and binary search of
		cyrus.index 

* Configuration directory

On each server, there is a directory which contains files specifiying
the server configuration.  The files are:

- /etc/imapd.conf

Specifies site configuration and policy options, such as:

	Location of configuration directory
	The umask to set
	The partition names and their corresponding directory roots.
	Threshold for quota warning messages
	The hostnames of any IMSP servers
	Which Kerberos realms to allow logins from
	The location of the Kerberos srvtab file
	Whether to allow anonymous logins
	Whether to automatically create INBOX mailboxes for users
	Whether to support the SUBSCRIBE/UNSUBSCRIBE IMAP commands

- mailboxes

Lists the names, partitions, and ACLs of all the mailboxes.

- delivered.dir & delivered.pag

Database of delivered messages, for duplicate delivery elimination.
Stores mailbox name, message-id, and delivery time.  The delivery time
is used for purging the database of old entries.

- delivered.lock

File locked before writing to delivered.dir/delivered.pag


* User directory:

The subdirectory "user" under the configuration directory contains
files with per-user information.  For each user "USER" with an INBOX, it
may contain the file "USER.sub".  The file contains a list of the
user's subscribed mailboxes.

* Process registration directory:

The subdirectory "proc" under the configuration directory contains one
file per active server process.  The filename is the ASCII
representation of the process id and the file contains the following
tab-separated fields:

- hostname of client
- login name of user, if logged in
- selected mailbox, if mailbox selected


* Quota root directory

The subdirectory "quota" under the configuration directory contains
one file per quota root.  The filename is the name of the quota root.
Each file contains the quota root's limit and usage.

* Telemetry log directory:

The subdirectory "log" under the configuration directory may contain
zero or more subdirectories, each named after a user.  If a
subdirectory exists for a user, the server will keep a telemetry log
of protocol sessions authenticating as that user.  The telemetry log
is stored in the subdirectory with a filename of the server
process-id.


* IMSP dropoff directory:

Contains one file for each SEEN or LAST command that needs to be sent
to an IMSP server.

Files for LAST commands contain in their filenames:

- The character 'L'
- base64-encoded data (without any '=' padding) containing the
  following 32-bit values, in network byte order
  - the UID of the last message appended
  - the time the mailbox was last modified
  - the number of messages in the mailbox
- The character '='
- The name of the mailbox, converted to lower case and with any
  occurences of the '=' character replaced with 'B'.

Files for SEEN commands contain in their filenames:

- The character 'S'
- base64-encoded data (without any '=' padding) containing the
  following 32-bit values, in network byte order
  - the UID of the last message \Seen, or 0 if user has
    not \Seen all messages in the mailbox
  - the time the user's \Seen data was last modified
- The character '='
- The name of the mailbox, converted to lower case and with any
  occurences of the '=' character replaced with 'B'.
- The character '='
- The name of the user, converted to lower case and with any
  occurences of the '/' character replaced with 'A' and any
  occurences of the '=' character replaced with 'B'.

The order of operations for the update daemon is:

	list all files in directory
	sort in following order
		last before seen
		by mailbox
		by user, for SEEN
		by last update time
	give IMSP SEEN and LAST commands
	unlink files		
	sleep, repeat


* Programs:

The following programs will be written

- imapd
	Exports the IMAP protocol to a connecting client.
	Called by inetd or equivalent. 

- deliver
	Called by the mail system to deliver message to mailbox.
	Will be backwards-interface-compatable with rmail.

- updateimsp [not completed]
	Daemon to monitor the dropoff directory and give SEEN/LAST
	commands to an IMSP server.

- collectnews
	Called by the netnews system to build cyrus header files in
	the news spool area.

- rmnews
	Called by the netnews system to remove and EXPUNGE articles
	either expired or cancelled.

- syncnews
	Run occasionally to synchronize the netnews active file
	with the Cyrus netnews mailbox list.

- expire [not implemented]
	Expires old mailbox posts

- arbitron
	Reports readership statistics, optionally prunes old \Seen
	data.

- pop3d
	Exports POP3/KPOP protocol to a connecting client.

- reconstruct
	Reconstructs/rebuilds a corrupted/missing set of mailbox
	header/index/cache/seen files.

- quota
	Generates a quota-usage report.  Optionally does a consistency
	fix on the quota roots.


* Monitoring

The following things need to be monitored.

In real time:

- the number of connections
	Can use existing tools to count files in process registry
- disk usage, load average
	Already have programs to do this


In log files:

- where connections come from

- logins

- feature usage (count) and timing (total/min/max)
	Can log on close of mailbox, with user and mailbox name


On periodic basis:

- quota usage
	Can write program to walk partitions and read quota roots.


