Understanding Information Store Essentials (Part 2)
By shajifiroz
Created 2005-11-24 21:28

  • Exchange 2000
As an Exchange Administrator, In-depth knowledge in Information Store is always help us in many aspects such as Capacity Planning, Backup & Disaster Recovery, Database defrag, Mail Transport and other Troubleshooting scenarios. So its a must read, again and again until the Information Store concepts & terms gets registered in your mind.


As you know, I have written the Part I of this article three part series. You may want to read it here just in case you missed Understanding Information Store Essentials (Part1) [1]

In future, if you would like to get notified when
Shaji Firoz releases Understanding Information Store Essentials (Part3) please sign up to our Newsletter [2] and choose Exchange Server articles update

Information Store Structure

From user standpoint, the store is organised into a hierarchy of folders containing items for example mail messages, contacts and calendar appointments. Internally, the store is structured in a similar way and provide following features.

  • The store should quickly and efficiently be able to locate all of a user’s folders.
  • Folders can be accessed using different types of views quickly and easily.
  • Single Instance Storage (Stored once if all mailboxes are in the same database & copied once to each database that contains a target mailbox)

If you look at the below Figure 1-1, the store was structured into various tables which are linked to each other. Storage of the tables is handled by ESE which I will cover later in this section. The Store maintains the relationship between the tables.When accessing data, clients use these tables to locate and view messages. For example, Outlook will build the folder list from the Folders table.


Figure 1-1

It is important to note that tables are stored only in the EDB file. The STM file only contains raw data. If a message is to be stored in the STM file, then the tables in EDB will point to the location in the STM file where that message is located

The main tables are described as below: 

Table

Significant Columns

Comment

Mailbox Table

- Root folder ID
- Mailbox DN
- Display Name

There is one row for each mailbox on this store. This entry will point to an entry in the Folders Table representing the root folder of the mailbox. There is only basic information in this table, there is no data.

Folders Table

- Folder ID
- Parent Folder
- Child Folders
- Display Name

This is one big table containing an entry for every single folder in the entire store. Folders have pointers to their children and parent to provide the hierarchical structure. The Folders Table contains only the properties of the folders; message data is not stored here.

Message Folder Tables

- Message ID
- Subject
- Priority
- Received Date
…etc

There are many Message Folder Tables (MFTs). There is a MFT for each view in each folder in the Folders Table. Each row in the Folder Table may have one or more MFTs associated with it. For example, a particular User’s Inbox may have 6 views, therefore there will be 6 MFTs, one for each view. Each MFT will contain the message IDs of the messages in that view, plus the properties of those messages as defined by the view. The actual message body is not stored in the MFT.

Note: Each time you create a new View in Outlook, the Store will create a new MFT to represent that view. You may notice that the first time you create the view there is a delay, but subsequent access to that view is much quicker, because the MFT already exists. Any MFT which is unused after 7 days is deleted.

Messages Table

- Message ID
- Subject …etc
- RTF Compressed
- Pointer to STM

This is one big table containing every single message in the entire store. The table contains the actually message body in RTF Compressed format if the email is in the EDB file, or a pointer to the location of the message in the STM file. Each message in this table will have one or more rows in the MFTs pointing to it. When several MFT entries point to the same message in the Messages Table, that represents Single Instance Message, which saves disk space.

Attachments Table

- Attachment ID
- Attachment Data

Similar to the Messages Table but contains attachments to messages. Messages from the message table point to entries in the attachments table if they contain attachments.

Extensible Storage Engine

In this section I would like to give an overview of ESE as it is used by Exchange 2000 server. It includes details of how ESE works and how it provides recovery features. Also outlined is the optimum disk configuration for running ESE databases. 

The Extensible Storage Engine, or ESE, is the heart of Exchange storage. It is responsible for physically storing Exchange data on the hard disk. ESE is a very robust storage technology and has been designed to be completely recoverable in a disaster scenario. ESE uses transaction logging technology to provide reliability in any disaster recovery.

As we move on to the this section, I would also like to cover little bit on Transactions, Transaction Logs, ESE Operation, new log file creation, Checkpoint file, File Signatures, Soft Recovery & Optimum Disk configuration.

ESE Overview 

Following Figure 1-2 explains the ESE File Structure in Exchange Store. ESE holds the data in two files; EDB and STM (the purpose of the STM file is described later, but for ESE, both files have the same structure).

  • In Exchange ESE Stores data in two storage files, EDB & STM. 
  • Data is stored in B+trees.
  • B+trees are divided into 4KB pages.
       - ESE read/write to file in blocks of pages.
  • Pages are cached in memory and contains checksum. 


Figure 1-2

ESE File Structure 

Everything within the file is stored in B+trees (a variation on balanced trees) which provide for fast searching and efficient storage. Each store table (as described in the previous section) is a collection of B+trees. There is a master tree which references all other tables and their B+trees, this ‘tree of trees’ is called the System Catalogue. Because this table is critical it is stored twice within the file (one starting at page 4 and the other at page 24).

B+trees are designed to provide fast access to data on the disk. Going to the disk is expensive for the store, therefore a B+tree is designed so that ESE can get to the data it wants using the minimum number of disk I/Os.

A B+tree is broken down into 4KB pages. Each of these pages contains either pointers to other pages or the actual data that is being stored in the B+tree. All data is read from and written to the disk in units of 4KB. To increase performance the pages are cached in memory buffers for as long as possible, thus reducing the need to go to the disk. Pages are numbered within the ESE file from 0 up to the size of the file.

The structure of the ESE B+trees is not important to the store process, and in fact the store does not see this structure. Instead it only sees its own set of tables as described in the previous section. When the store saves a message to ESE, that message will be written to one or more pages within a B+tree in the ESE files.

The Checksum and 1018 Errors 

The first four bytes of every page contains the checksum of the page. When a page is ready to be written to the disk, the last thing ESE does is calculate a checksum based on the data in the page and writes it to the checksum. Also, the page header contains the page number of the page itself.
Each time a page is read of the disk the first thing ESE does is recalculate the checksum of the data and compare it to the checksum on the first four bytes of the page. It should be identical. Also, ESE checks the page number of the page to make sure this is the page we asked for. If either of these tests fail ESE reports a -1018 error (or Jet Read Verify errors). Basically ESE is saying that the data it got off the disk was not the same data written to the disk.

Note:
You can dump any page from an offline ESE file using ESEUTIL as follows:
Example: ESEUTIL /m I:\Exchsrvr\mdbdata\SG1MS1.edb /p100




Figure 1-3

The above command will read and dump page 100 from SG1MS1.edb. For example, if you dump page 4 and page 24 you will see that they are identical because that is where the System Catalogue is mirrored for resilience as described earlier.

Transient / Hard Faults and Retries

Sometimes -1018 errors are returned as a result of heavy workload which exposes a bug in the hardware or software driver, and that the data physically on the disk is actually OK. This is considered a transient fault.

A hard fault means that the data is actually wrong (or corrupt) on the hard disk surface. It means something went wrong when the page was physically written or the disk became damaged afterwards.

Through experience with customers over the years, Microsoft concluded that a lot of -1018 errors are actually transient. So in Exchange 5.5 SP2 retry logic was added to ESE; if a read fails with -1018 then it retries up to 16 times with a short pause between each read. Each attempt is logged in the Application log. If after 16 retries ESE still cannot successfully read the page then the store will be dismounted. If you find transient faults in the event logs you should investigate your disk immediately as there could be an imminent failure.

ESE Transactions

Changes to the database are applied through transactions. A transaction is a series of operations which when complete leave the database in a healthy (consistent) state. An operation is the smallest unit of change to a database, and a transaction must always end with a commit operation to indicate that it has finished. By bundling operations into a transaction we can ensure that operations which leave the database in an inconsistent state can be rolled back if there is a crash halfway through a transaction.

Transactions carries ACID properties as defined below:

Atomic – Transactions are the smallest unit of change in a database, either all operations of a transaction are completed, or else they are rolled back to the last completed transaction.
Consistent – A committed (i.e. completed) transaction always leaves the database in a consistent state.
Isolated – The operations of a transaction are not visible to other processes until the whole transaction is complete. In other words, during a transaction, other processes will see the database as it was before the transaction started. Only when the transaction is complete will they see changes in one go.
Durable – Once a transaction has completed then it is permanent. In other words you will not lose that transaction. Transaction log files help to achieve this, ensuring that a database can always be recovered to its last transaction when a failure occurs.

For example consider moving a message from one folder to another. This will involve many modifications; delete message from source folder, add message to target folder, update folder properties on each folder (e.g. item count, read/unread status etc). All these operations must be completed or not at all otherwise we end up with inconsistencies, such as the message disappearing if we crashed half way through.ESE will ensure that none of the operations are permanently applied until the transaction is committed, and when it is committed all of the operations will be permanently applied.

Transaction Logs

  • Changes to database occur through transactions.
  • Transactions are executed in memory for performance
  • changed pages in memory are at risk of loss
  • Writing transaction to database files is slow.
  • Therefore, every transaction is copied to fast, sequential files as they are committed.
  • Later, in background, write them to the database files.
  • Mounted database is always "inconsistent". Clean shutdown results in "consistent" datbase.

For performance reasons, ESE performs all transactions in memory. However, we must consider what will happen if ESE or the machine crashes; all data in memory will be lost. Remember that once a transaction has committed it must be permanent otherwise we could lose data or introduce corruption.

  • 5MB is fized size for all Exchange Transaction log file
  • Current Log File is named is Exx.log
      - where xx is the Storage Group identifier (e.g. 00, 01, 02 or 03)
  • Previous Logs are named as Exxnnnnn.log
      - where nnnnn is 5 digit Hex number which is also incremental.

 

EDB/STM files are Slow

Writing the transaction to the EDB/STM file is an expensive process. These files are large random access file, and ESE would spend the majority of its time waiting for the disk heads to visit all the pages affected by the transaction (remember that data maybe fragmented across multiple pages in the database file).

Transaction Log Files

Instead, ESE uses very fast (and small), sequential transaction log files. Every transaction that occurs in memory must be immediately written to the end of the log file before the transaction is considered committed. That way we can guarantee that all committed transactions are indeed permanent, even if we have a crash and lose everything in memory.

Transaction log files are always 5MB in size for Exchange databases. Transaction log files are shared by all databases within each storage group. The current transaction log file that ESE is using is called Exx.log where xx is the storage group identifier (E00.log for the first storage group, E01.log for the second, E02.log for the third and E03.log for the fourth).

When a transaction log file becomes full, it is renamed using a 5-digit sequential Hex number. A new Exx.log is then created. Previous log files are critical to recovery procedures and should never be deleted manually because they can be used to reconstruct information which may be missing from a backup. See the Disaster Recovery section for more details.

Database Consistency

Transactions have to be applied to the EDB/STM files at some point in time. A background process will handle this task. It is possible that modifications to the database remain in memory for many seconds before being written to the database file. What this means is, that while the store is mounted, the database file will not contain all the complete information. We say that the database is inconsistent. Consistency in this case has nothing to do with the health of the database file, the database is perfectly fine. We are simply indicating that not all of the data is in the file. In fact the EDB/STM header contains a consistency bit, which is always set to False when the database file is online.

If a database is dismounted (or shut down) cleanly, the ESE will flush all transactions in memory to the file and mark the file as consistent. This indicated that the all the data in the database is contained in the EDB/STM files.

If the database crashes, then the file would be inconsistent and ESE would discover this when it tried to mount the database. In this case ESE would initiate a Soft Recovery which is covered in a later section.

To view the consistency bit on a database file run the following command to dump the database header (the file must be offline forESEUTIL to work):

ESEUTIL /mh priv1.edb

Look for the property called State, it will indicate if the database file is consistent. An example screen shot is given below:

 

Figure 1-4

Deleting Log Files

Log files should never be deleted manually if you want to be able to recover data. However, as log files are generated they are taking up disk space. An Exchange online backup will remove log files older than the checkpoint once the database has been backed up to tape.

If you need to recover disk space fast, you can dismount all the databases in a storage group. This ensures that all transactions have been flush to the database files. You can then delete all log files. When the stores are remounted, ESE creates a new Exx.log file and starts a new log series. You should immediately perform a full backup of Exchange, since previous backup are now invalidated because the log sequence has been reset.

ESE Operation


Figure 1-5

The slide shows a typical cycle that occurs when a transaction needs to be executed. In the example the transaction involves modifying page number 7 in the EDB file. The process is as follows:

  1. Read page 7 into a memory buffer (ESE will verify the checksum as described earlier)
  2. Operations are then applied to page 7
  3. The transaction is then immediately written to the transaction log file (only at this stage do   we consider the transaction to be committed)
  4. At some point in the future (indeterminate) the page is flushed to the database file (only      committed transactions, i.e. written to the log file, can be flushed to the database file)
  5. A checkpoint is rolled forward to indicate at which point in the transaction logs information has been flushed to the database files

ESE is basically repeating this cycle continuously for transactions during normal operations. It is important to note a couple of important points:

  1. Transaction are written twice to the disk, first to the log then eventually to the database files.
  2. Transaction logs are only ever written to during normal operations. ESE only reads from transaction logs during recovery. This becomes significant when we look at optimising disk configurations later in the module.

Creating New Log Files

Once the transaction log is full (5MB) then Exchange needs to create a new one:
  • Must keep old logs
  • Must not result in corrupt Exx.log
  • Must remember sequence of logs
  • Therefore, rename current log using sequence number

Create new Log:

  • Crete Exxtemp.log
  • initialise Exxtemp.log with header information
  • Rename Exx.log to Exxnnnnn.log
  • Rename Exxtemp.log to Exx.log
  • Start using Exx.log

 


When the current transaction log becomes full, ESE must create a new file. While doing this ESE must ensure that the current log file does not become corrupt. This can happen if ESE crashes during the creation of this file. Therefore to prevent this, ESE creates a temporary log file and initialises it with the necessary header information. Once that file is ready it is renamed as the current log file.
Previous log files are named with a 5-digit hex number. Because it critical that the sequence of transactions is known, this number is incremented every time a new log file is created.

Checkpoint File



Figure 1-6

The checkpoint file is a small (8KB) which contains information about which transactions in the log file have already been flushed to the disk. The checkpoint file points to the next un-flushed transaction in the log series. In other words every transaction before (or older) than the checkpoint we know has already been written to the database file.

Transactions after the checkpoint may or may not have been flushed. Remember that transactions are not flushed in the same order that they occur in the logs. ESE uses an arbitrary algorithm to flush transactions in order to free up memory.

The checkpoint is only ever used during Soft Recovery (see the soft recovery slide later) and is in fact not essential. If the checkpoint is not available during soft recovery, ESE can still recover the database but the operation may take much longer. Hard Recovery (see the Disaster Recovery module) does not use the checkpoint at all.

The checkpoint file is called Exx.chk, where xx is the storage group designator.

Note:
To see the state of the checkpoint file you can use the following:

ESEUTIL /mk E00.chk

You can run this command even if the databases are online. You will see a label called Checkpoint: It will show three values separated by commas; the first value indicates the transaction log file that the checkpoint is at. The other two values are the offset into that transaction log file.

File Signatures


  • Each Series of Log Files Has its Own Signature
  • Each Database Storage File has its Own Signature
  • It is Vital That Log File Operations Only Replay into the Database for which They were originally Destined.
  • ESE Cross Checks Signatures
       - Checks for log file signature match
        - Checks for database signature match


 














It is vital that ESE associates database files with their own transaction logs. Introducing transaction logs from a different set of database files (e.g. during recovery) will generate corruption in the database.

For this reason ESE uses File Signatures to verify that the correct log files are being used. Each set of log files have a unique signature. This signature is recorded in the header of every log file in a particular series. If a new series is generated (e.g. if you manually deleted log files) then a new file signature is also generated.

The database files contain a reference to the log file signatures. They also have their own file signatures which are in turn referenced by the log files. ESE cross-matches both sets of signatures when mounting a store to ensure that the log files do indeed belong to the current database files. If there is a signature mismatch then the store will not be mounted and event errors are logged.

The signatures themselves consist of a timestamp and a random number to ensure that they are unique, as shown in the slide.

Soft Recovery

There are two types of recovery that ESE can perform, Soft recovery and Hard recovery. Here we explain only Soft recovery. Soft recovery is an automatic process where Exchange can recover data after an unexpected shutdown such as a computer crash or forced power down.

What happen when the Store crashes?

Remember that when a database is online, its state is set to not consistent. If there is a crash or an unexpected stop then the database will not have shutdown cleanly. This will cause the following:

  • Dirty transactions in memory will not have been flushed to the database file.
  • Dirty transactions will exist in the transaction logs.
  • The checkpoint will indicate up to which point in the log files entries have been flushed to the database file. The checkpoint in this case lies somewhere before the end of the log series because of the un-clean shutdown.
  • The state property in the database header will be set to not consistent.

Recovering from the Crash

As soon as the Exchange is restarted and database is remounted dirty transactions in memory just before the crash will start recovering automatically . The following process happens when the store is brought back online:

  • Database is mounted and the moment ESE sees the database state is set to not consistent,  ESE will go to soft recovery mode.
  • ESE looks at the checkpoint and starts with the transaction file indicated by the checkpoint. (If  the checkpoint is not available, then ESE simply starts with the oldest log file, which will take longer to complete).
  • ESE reads each transaction from the log files right up to the end. For each transaction ESE will read pages from the database file, compare thedbTime* value with the page in the log file, and if the transaction in the log file is newer it will execute the transaction.
  • When ESE finishes replaying the transactions, it will perform a clean dismount of the database. This causes all transactions in memory to be flushed to the database file, and sets the file state to consistent.
  • ESE then remounts the database. Because the state is consistent ESE enters normal mode and regular operations can continue.

*dbTime is a number which starts at zero and is incremented every time there is a change in the database.Each ESE database has a dbTime which is recorded in the EDB file header. Every time a page is modified, ESE increments dbTime and stamps the new value on the page. It allows ESE to work out whether a transaction in a log file is newer or older than the page it is trying to modify on the disk.

You can view the current dbTime of a database with the following command:

ESEUTIL /mh priv1.edb

Look for the dbTime: value. You can also view the dbTime on an individual page in the database. This value indicates when the page was last modified (or dirtied):

ESEUTIL /m priv1.edb /p150

The above example will dump page number 150 from priv1.edb. Look for the value called dbTimeDirtied

Circular Logging



Figure 1-7

By default, on Exchange 2000, circular logging is disabled. This simply means that when ESE needs a new transaction log file it creates a brand new one by grabbing 5MB of space from the disk. In other words, it keeps all previous log files. This of course will take up disk space. The only safe way of removing previous log files is to perform regular online back ups.

With circular logging enabled, when ESE needs a new transaction log file it will simply rename and overwrite an existing previous log whose transactions have already been flushed to the database file. In other words it will overwrite log files which are older than the checkpoint. If there are no log files older than the checkpoint (maybe because of high load the server has not had time to flush information to the database file) only then will it create a new log file.
Typically with circular logging you will see a handful of log files (four or five) which ESE is constantly circulating through. It is important to note that although soft recovery is still available with circular logging (because ESE only overwrites logs older than the checkpoint), hard recovery is not.

This means that if you lose the database completely you can only recover to when the database was backed up. Without circular logging, ESE can roll forward changes made after the backup and bring the database to its most current state.

When to Use Circular Logging

In short; never if your data is important enough to be backed up. The cost of an extra disk to hold transaction logs is most probably much less than the cost of losing days of data after a restore.
However, you should use circular logging on databases which are never backed up. If you do not, then your disks will quickly fill up and cause the store to stop because there is no other process which removes the log files. Examples of databases which do not need to be backed up include dedicated connector servers and public stores holding NNTP newsfeeds.

Optimum Disk Configuration

  • ESE can recover from failed database files or from failed log files, but not both
      - Keep log files and database files seperate.
  • Exchange continously writes to log files sequentially
      - Keep log files by themselves on write optimised disjs for maximum performance       (preferably on mirrored drives).
      - This applies to each storage group.
      - This can increase log file performance significantly (up to 40%).
  • EDB/STM/CHK files are accessed randomly
    - Can keep these all together even from different storage groups without affecting  performance (usually on RAID 5 array)
    - For extra fault tolerance keep each storage group separate

 

 

 

When designing a disk configuration for an Exchange server, there are two main criteria to keep in mind:

  1. Maximum Fault Tolerance – We want to be able to recover fully from a failure, i.e. we do not want to lose any data even if that data is not on back up media.
  2. Maximum Performance – We want minimise disk bottlenecks as much as possible.

Understanding how ESE operates under normal conditions and how it can recover lost data is essential in order to design the optimum disk configuration. The points are as follows:

Fact 1

ESE can fully recover a failure in the database files, or a failure in the transaction log files, but not from both failures occurring at the same time. When we say ESE cannot fully recover from a simultaneous failure we mean that in such a scenario you can only recover up to the last back up time. You will lose data that was introduced after the backup. A full recovery will recover all data right up to the point of failure:

  • Conclusion 1 – Keep the database files and their associated log files on physically separate disks from each other. That way a single disk failure will not prevent ESE from fully recovering its data.
  • Conclusion 2 – Use fault tolerant disk drives (e.g. RAID).

Fact 2

ESE is constantly writing to the log files in a sequential manner. ESE never reads from the log files under normal operations:

  • Conclusion 1 – Keep the log files on write optimised disks, i.e. if configurable reserve all cache for write operations. The best write-performing configuration is a striped volume (RAID 0), but this does not provide fault tolerance. RAID 0+1 is a mirrored stripe volume and is ideal. An alternative is standard mirroring (RAID1).

Important:
If using write-back caching, ensure that you use battery backed up controllers to minimise the risk of data loss during a power failure.

  • Conclusion 2 – Ensure that no other files other than the transaction logs for a specific storage group are on this disk. In other words there should be a separate log disk for each storage group on your server. The idea is to keep the disk head at the end of the current transaction log file (Exx.log) as much as possible. One of the slowest parts of disk operations is physically moving the heads across the surface of the disk. If there are other files on the disk then the head will be jumping to those files away from our current transaction log. This includes the checkpoint file (Exx.chk) which should be placed with the database files and not with the transaction logs.

Fact 3

The database files (EDB and STM) are accessed randomly and will be the largest files on an Exchange server. Also, the majority of disk I/O accesses on database files are read operations; approximately twice (or more) the number of reads as there are writes:

  • Conclusion 1 – Keeping the database files from several storage groups together on one disk will not affect performance because all access is random. The large nature of EDB and STM files will mean that RAID 5 (striping with parity) offers the most efficient fault tolerant configuration.

  • Conclusion 2 – Optimise mainly for read operations. Remember that controller caching will have a limited affect on improving performance because ESE has its own in-memory caching anyway. But caching will improve performance overall. Also, for write-back caching ensure that you use battery backed up controllers.

  • Conclusion 3 – For maximum fault tolerance, and to reduce the scope of disk failures it is ideal to place each storage group’s database files on their own disks. This means that if a database disk fails, it will only affect users on that storage group, other storage groups will remain online, even during the recovery process.

Fact 4

The OS (system and boot partitions) and the Exchange binary files (\Program Files\Exchsrvr) are fairly static files. The paging file (Pagefile.sys) is accessed fairly regularly by Windows. Finally, the SMTP Queue folder is accessed heavily by Exchange, especially on a connector server sending and receiving many messages via SMTP. This folder is by default located with the Exchange binaries (\Program Files\Exchsrvr\Mailroot\vs1\Queue).

  • Conclusion 1 – Place the OS and Exchange binaries on their own partition, ideally fault tolerant. A RAID 1 (mirror) configuration is sufficient.

  • Conclusion 2 – If possible, place the page file on its own dedicated disk to increase performance.

  • Conclusion 3 – On bridgehead servers and servers which send and receive large amounts of SMTP messages, move the SMTP Queue folder to a faster disk partition (if not already there).

STM File

In previous versions of Exchange
- All mail was stored in MAPI format (compressed rich text)
- IMAIL process used to converet between MAPI and internet formats (MIME and UUencode)
- In an internet based environment (e.g ISP) this results in a big performance hit
In Exchange 2000
- Messages submitted by a MAPI client are stored in MAPI format|
- Messages submitted by other clients (e.g. SMTP/HTTP/WebDAV/IFS) are stored in their native format i.e. not converted
- Reduces IMAIL conversion in both internet and MAPI environments
- in mixed environments, still needs of conversions


The STM file was introduced to increase the efficiency of storing and accessing Internet based data on Exchange stores. In previous versions of Exchange all data was stored in the EDB file as MAPI properties. Meaning, any message arriving from an Internet client (i.e. SMTP) will be converted to MAPI format.

IMAIL as component of Exchange Information Sotre, process this message conversion between MAPI and internet format (MIME & uuencode). When an Internet client reads a message, that message must go through IMAIL to convert it to the correct Internet format before it is given to the client. Therefore, in an internet scenario (i.e. very few or no MAPI clients), the Exchange store is spending a lot of processing time converting messages.

IMAIL
overhead was minimum in previous versions of Exchange becuase the majority of Exchange organisations were corporate based and used Outlook as their primary email client.Also, Internet clients were normally used as secondary clients on a much less regular basis

Exchange 2000/2003 Environments:

Since Exchange 2000/2003 are targeted at ISPs and Hosted Exchange providers, many such organisations are now in a pure Internet environment with very few or no MAPI clients. Therefore Exchange needs to be able to store data in its native (in this case Internet) format. However, at the same time we still need to cater for corporate organisations which are predominantly MAPI based. This is resolved in Exchange by implementing two database files for each store:

  • EDB – This stores data in MAPI format.
  • STM – Stores data in its native format (without conversion).

Where does Exchange Store the Data?

Exchange will not store data in either EDB and the STM file by deciding where to place an incoming message as it arrives. To do this Exchange must predict who is likely to access this message.If it is likely that it will be accessed by an Internet user then it should be placed in the STM file. If it is likely to be accessed by a MAPI client then it should go to the EDB file. Following are the algorithm Exchange use to do this which appears to be not very sophisticated:

  • If the message came from a MAPI client then it will probably be accessed by a MAPI client (i.e. EDB file).
  • If message came from SMTP then it will probably be accessed by an Internet client (i.e. STM file).

Of course this simple logic will not always get it right. An incoming Internet message may be read by a MAPI client or vice-versa. In this case IMAIL has to be used to convert the message before being given to the client, as before.

Where the algorithm really succeeds is in environments which are predominantly MAPI or predominantly Internet based. A mixed environment will see relatively little gain, i.e. IMAIL will still be used extensively. Most Exchange organisations are predominantly one type or the other, so in most cases Exchange succeeds in reducing IMAIL activity.

STM File Operation



Figure 1-13

As you have seen the Store decides where to store messages based on the origin of message. EDB for MAPI clients and STM for Internet clients.

Folder View Tables

There is an additional consideration. The folder view tables are all stored in the EDB file. All clients, Internet (e.g. POP3, IMAP4 and NNTP) and MAPI, use these tables to see what messages are available to download.

A client cannot download a message unless it can see it in a view table. For example, the POP3 command ‘LIST’ will result in a list of messages in the Inbox showing a message number and the message size. This is derived from a view table in the EDB file.

Property Promotion

Therefore when a message arrives from the Internet, although it will reside it its entirety in the STM files, some of its header properties are copied to the EDB file in order to populate the view tables for that folder. This process is referred to as Property Promotion. The entry in the view table will contain a pointer to the location of the actual message (which is in the STM file).

So in the above example, after issuing a LIST command, the POP3 client may issue a RETR command to read the message off the server. The store will locate the message in the STM file using the pointer in the view table (which is in the EDB file), and stream the message directly to the client without the need to convert it using IMAIL.

MAPI messages are placed directly into the EDB file where again their properties are copied to the view tables. However, this all happens within the EDB file. Promotion only ever occurs in one direction, from the STM to the EDB file.

STM File Operation with TNEF Messages



Figure 1-14

There is one caveat to the process discussed on the previous slide. Exchange 2000 uses SMTP to communicate between servers. Therefore what happens when a message is sent by a MAPI client to another user on a different server? The message should end up in the EDB file, but since it is arriving via SMTP then the store logic will place it in the STM file, therefore breaking our efficiency.

To handle this situation, the Store makes an exception for any messages arriving from non-MAPI clients.Messages which originate from MAPI clients and sent across SMTP always have a header indicating that the content is MAPI. The header is implemented as content-type: application/ms-tnef. So such a message would go through the following procedure when it arrives on an Exchange store:

  • Message is streamed to STM because it arrived from SMTP
  • The header is promoted to the EDB file to populate the view table
  • The Store sees that this is a TNEF messages
  • The store promotes the entire message to the EDB file by converting it using IMAIL and deletes it from the STM file.
This process will ensure that MAPI messages will always reside in the EDB file.

TNEF

In Exchange 2000, content conversion is done on the server where the user submits the message. Because SMTP is the native transport for Exchange 2000, every message that leaves the server (destined for another Exchange 2000 server) needs to be in Internet message format (SMTP).

TNEF stands for Transport-Neutral Encapsulation Format, which is basically MAPI properties encapsulated in a MIME body part. It is also referred to as a Binary Large Object or BLOB. There are two types of TNEF formats used by Exchange 2000 depending on the location of the recipient.
  • Summary TNEF (S-TNEF) – This format is encoded in binary (8-bit) format and contains no plain-text representation of the message. This is used only if the recipient is in the same routing group as the sender. Exchange 2000 SMTP will always use 8-bit MIME when communicating between servers in the same routing group.

  • Standard TNEF (TNEF) – This format encodes the MAPI message into 7-bit format (quoted printable or Base64) plus a plain-text version of the message as well. This is used if the recipient is in a different Routing Group. The reason is that Exchange cannot guarantee that the path to that Routing Group does not contain non-Exchange 2000 servers. For example a 5.5 IMS or ISP could be used to connect the routing groups together. Therefore we must use the 7-bit encoding format that standard SMTP services can understand.

Conclusion

We have drilled down to the Information Store internal Structure. We found the a relational database technology at its simplest form and it stores information in tables and uses matching values in the tables to relate information between the tables. If you understand Exchange Server's database technology, you can head off problems and optimize performance. We shared a lot of basics in this Part 2 of three part series by looking at the Exchange Information Store's structure, discussing how transaction logging works, and understanding the inside of .EDB & .STM files.

On Part 3 I'll show you how full-text indexing is helping the content in an exchange database is indexed with the result of faster content searching. Also, you will be interested to see how the Exchange full-text indexing providing your Outlook users with a fast searching capability of their e-mails and public folders.

Related Links

Exchange Information Store Service Architecture
[3]
[4]Responsibilities of the Information Store

[4]How to Start the Microsoft Exchange Information Store Service (MSExchangeIS)

Microsoft Whitepaper "Best Practices for Deploying Full-Text Indexing"

[5]Microsoft Exchange Server Information Store Viewer (MDBVU32) [6]
[5]

Privacy Policy | Contact Us | Advertising | Link to Us
© 2005 MessagingTalk.org. All rights reserved.

Source URL: http://www.messagingtalk.org/content/252.html

Links:
[1] http://www.messagingtalk.org/content/227.html
[2] http://www.messagingtalk.org/user/5/edit/newsletter
[3] http://www.microsoft.com/technet/prodtechnol/exchange/guides/E2k3TechRef/b5b94b4d-02d3-49e4-959f-b8bcf53d340b.mspx
[4] http://www.microsoft.com/technet/prodtechnol/exchange/guides/E2k3TechRef/8bc90fa8-4f2d-4ccc-81a7-3434ee1656c2.mspx
[5] http://www.microsoft.com/downloads/details.aspx?FamilyID=d7d73256-459c-4b5e-827f-256fa21dd38a&displaylang=en
[6] http://www.microsoft.com/downloads/details.aspx?familyid=3D1C7482-4C6E-4EC5-983E-127100D71376&displaylang=en#overview