In this guide, you'll see how to access and parse emails from Outlook and Thunderbird mailboxes using Python.

Here you can find the short answer:

(1) Access Outlook with win32com

import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application")
inbox = outlook.GetNamespace("MAPI").GetDefaultFolder(6)

(2) Parse Thunderbird mailbox

import mailbox
mbox = mailbox.mbox('Inbox')
for message in mbox:
    print(message['subject'])

(3) Parse email with email library

import email
msg = email.message_from_string(raw_email)
print(msg['From'], msg['Subject'])

So let's see how to connect to email clients and extract messages programmatically.

1: Access Outlook mailbox on Windows

Let's start with accessing Outlook emails on Windows using the win32com.client library:

Install required package:

pip install pywin32

Basic Outlook connection:

import win32com.client

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")

inbox = outlook.GetDefaultFolder(6)

messages = inbox.Items
messages.Sort("[ReceivedTime]", True)

print(f"Total emails in Inbox: {len(messages)}")

for i, message in enumerate(messages[:5], 1):
    print(f"\n--- Email {i} ---")
    print(f"From: {message.SenderName}")
    print(f"Subject: {message.Subject}")
    print(f"Received: {message.ReceivedTime}")
    print(f"Size: {message.Size / 1024:.2f} KB")

result will be:

Total emails in Inbox: 1247

--- Email 1 ---
From: John Smith
Subject: Q4 Budget Review Meeting
Received: 2024-12-15 09:30:00
Size: 12.45 KB

--- Email 2 ---
From: Sarah Johnson
Subject: Project Alpha Status Update
Received: 2024-12-15 08:15:00
Size: 8.92 KB

--- Email 3 ---
From: Microsoft Teams
Subject: New message in Engineering channel
Received: 2024-12-14 16:45:00
Size: 5.67 KB

Outlook folder indices:

  • 3 = Deleted Items
  • 4 = Outbox
  • 5 = Sent Items
  • 6 = Inbox
  • 9 = Calendar
  • 10 = Contacts
  • 16 = Drafts

2: Extract email content and attachments

Access email body, attachments, and metadata from Outlook messages:

import win32com.client
import os

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)

messages = inbox.Items
messages.Sort("[ReceivedTime]", True)

attachment_folder = "C:/email_attachments/"
os.makedirs(attachment_folder, exist_ok=True)

for message in messages[:10]:
    print(f"\nSubject: {message.Subject}")
    print(f"From: {message.SenderEmailAddress}")
    print(f"To: {message.To}")
    print(f"CC: {message.CC}")
    print(f"Date: {message.ReceivedTime}")
    
    print(f"\nBody preview (first 200 chars):")
    body = message.Body if hasattr(message, 'Body') else ""
    print(body[:200] + "...")
    
    if message.Attachments.Count > 0:
        print(f"\nAttachments ({message.Attachments.Count}):")
        for attachment in message.Attachments:
            print(f"  - {attachment.FileName}")
            
            attachment_path = os.path.join(attachment_folder, attachment.FileName)
            attachment.SaveAsFile(attachment_path)
            print(f"    Saved to: {attachment_path}")

print(f"\n✓ Processed {min(10, len(messages))} emails")

result:

Subject: Invoice #12345 - December 2024
From: [email protected]
To: [email protected]
CC: 
Date: 2024-12-15 10:20:00

Body preview (first 200 chars):
Dear John,

Please find attached your invoice for December 2024. The total amount due is $1,250.00. Payment is due by December 31st.

Thank you for your business.

Best regards,
Accounting Team...

Attachments (2):
  - Invoice_12345.pdf
    Saved to: C:/email_attachments/Invoice_12345.pdf
  - Payment_Details.xlsx
    Saved to: C:/email_attachments/Payment_Details.xlsx

✓ Processed 10 emails

3: Parse Thunderbird mailbox files

Thunderbird stores emails in mbox format, which can be accessed using Python's built-in mailbox library:

Locate Thunderbird profile:

  • Windows: C:\Users\YourName\AppData\Roaming\Thunderbird\Profiles\
  • macOS: ~/Library/Thunderbird/Profiles/
  • Linux: ~/.thunderbird/

Parse mbox file:

import mailbox
import email
from email.header import decode_header

mbox_path = r"C:\Users\YourName\AppData\Roaming\Thunderbird\Profiles\xxx.default\Mail\Local Folders\Inbox"

mbox = mailbox.mbox(mbox_path)

print(f"Total messages in mailbox: {len(mbox)}")

for idx, message in enumerate(mbox, 1):
    subject = message['subject']
    from_addr = message['from']
    date = message['date']
    
    subject_decoded = decode_header(subject)[0][0]
    if isinstance(subject_decoded, bytes):
        subject_decoded = subject_decoded.decode()
    
    print(f"\n--- Message {idx} ---")
    print(f"From: {from_addr}")
    print(f"Subject: {subject_decoded}")
    print(f"Date: {date}")
    
    if message.is_multipart():
        for part in message.walk():
            content_type = part.get_content_type()
            if content_type == "text/plain":
                body = part.get_payload(decode=True).decode()
                print(f"Body preview: {body[:150]}...")
                break
    else:
        body = message.get_payload(decode=True).decode()
        print(f"Body preview: {body[:150]}...")
    
    if idx >= 5:
        break

print("\n✓ Mailbox parsing complete")

result:

Total messages in mailbox: 3892

--- Message 1 ---
From: Sarah Chen <[email protected]>
Subject: Team Meeting Notes - Project Alpha
Date: Mon, 16 Dec 2024 14:30:25 +0000

Body preview: Hi Team,

Here are the key takeaways from today's meeting:

1. Sprint 12 completed successfully
2. Demo scheduled for Friday
3. Client feedback session...

--- Message 2 ---
From: GitHub <[email protected]>
Subject: [repo-name] Pull Request #234 merged
Date: Mon, 16 Dec 2024 09:15:42 +0000

Body preview: Your pull request "Add authentication middleware" has been merged into main branch by @john-doe.

Changes included:
- Added JWT token validation
- Implement...

✓ Mailbox parsing complete

4: Filter and search emails by criteria

Implement email filtering to find specific messages based on sender, subject, date, or content:

import win32com.client
from datetime import datetime, timedelta

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)

def search_emails(folder, sender=None, subject_keyword=None, days_back=7):
    """
    Search emails with multiple criteria
    """
    messages = folder.Items
    
    date_filter = datetime.now() - timedelta(days=days_back)
    date_filter_str = date_filter.strftime("%m/%d/%Y")
    
    filter_str = f"[ReceivedTime] >= '{date_filter_str}'"
    
    if sender:
        filter_str += f" AND [SenderEmailAddress] = '{sender}'"
    
    filtered = messages.Restrict(filter_str)
    
    results = []
    for message in filtered:
        if subject_keyword:
            if subject_keyword.lower() in message.Subject.lower():
                results.append(message)
        else:
            results.append(message)
    
    return results

print("=== Search Example 1: Emails from specific sender (last 7 days) ===")
sender_emails = search_emails(inbox, sender="[email protected]")
print(f"Found {len(sender_emails)} emails from [email protected]")
for msg in sender_emails[:3]:
    print(f"  - {msg.Subject} ({msg.ReceivedTime})")

print("\n=== Search Example 2: Emails with 'invoice' in subject (last 30 days) ===")
invoice_emails = search_emails(inbox, subject_keyword="invoice", days_back=30)
print(f"Found {len(invoice_emails)} emails with 'invoice' in subject")
for msg in invoice_emails[:3]:
    print(f"  - {msg.Subject} ({msg.SenderName})")

print("\n=== Search Example 3: Recent meeting invites ===")
meeting_emails = search_emails(inbox, subject_keyword="meeting", days_back=14)
print(f"Found {len(meeting_emails)} emails with 'meeting' in subject")
for msg in meeting_emails[:3]:
    print(f"  - {msg.Subject} ({msg.ReceivedTime})")

result:

=== Search Example 1: Emails from specific sender (last 7 days) ===
Found 12 emails from [email protected]
  - Weekly Status Report (2024-12-15 16:30:00)
  - Q4 Budget Approval (2024-12-14 10:15:00)
  - Team Lunch This Friday (2024-12-13 09:45:00)

=== Search Example 2: Emails with 'invoice' in subject (last 30 days) ===
Found 8 emails with 'invoice' in subject
  - Invoice #12345 - December 2024 (Accounting Department)
  - Payment Received - Invoice #12300 (Finance Team)
  - Overdue Invoice Reminder (Billing System)

=== Search Example 3: Recent meeting invites ===
Found 15 emails with 'meeting' in subject
  - Team Meeting Notes - Project Alpha (2024-12-16 14:30:00)
  - All Hands Meeting Reminder (2024-12-15 08:00:00)
  - Client Meeting Rescheduled (2024-12-13 11:20:00)

5: Parse email headers and extract metadata

Extract detailed email headers, reply-to addresses, and routing information:

import win32com.client
import email

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)

message = inbox.Items[0]

print(f"Subject: {message.Subject}")
print(f"From: {message.SenderName} <{message.SenderEmailAddress}>")
print(f"To: {message.To}")
print(f"CC: {message.CC}")
print(f"BCC: {message.BCC}")
print(f"Reply To: {message.ReplyRecipients}")
print(f"Importance: {message.Importance}")
print(f"Sensitivity: {message.Sensitivity}")
print(f"Size: {message.Size / 1024:.2f} KB")
print(f"Unread: {message.UnRead}")
print(f"Received Time: {message.ReceivedTime}")
print(f"Sent On: {message.SentOn}")
print(f"Message Class: {message.MessageClass}")

try:
    headers = message.PropertyAccessor.GetProperty(
        "http://schemas.microsoft.com/mapi/proptag/0x007D001E"
    )
    print(f"\nFull Headers:\n{headers}")
except:
    print("\nFull headers not available for this message")

if message.Categories:
    print(f"\nCategories: {message.Categories}")

if hasattr(message, 'FlagRequest'):
    print(f"Flag Status: {message.FlagRequest}")

result:

Subject: Q4 Budget Review Meeting
From: John Smith <[email protected]>
To: [email protected]
CC: [email protected]
BCC: 
Reply To: <[email protected]>
Importance: 1
Sensitivity: 0
Size: 12.45 KB
Unread: False
Received Time: 2024-12-15 09:30:00
Sent On: 2024-12-15 09:29:45
Message Class: IPM.Note

Full Headers:
Received: from mail.company.com (10.0.1.5) by exchange.company.com
Date: Mon, 15 Dec 2024 09:29:45 -0800
From: John Smith <[email protected]>
To: [email protected]
Subject: Q4 Budget Review Meeting
Message-ID: <[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"

Categories: Important, Work
Flag Status: Follow up

6: Export emails to CSV for analysis

Create a CSV export of email metadata for reporting and analysis:

import win32com.client
import csv
from datetime import datetime

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)

messages = inbox.Items
messages.Sort("[ReceivedTime]", True)

csv_file = "outlook_emails_export.csv"

with open(csv_file, 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    
    writer.writerow([
        'Subject', 'From', 'To', 'Received Date', 
        'Size (KB)', 'Has Attachments', 'Attachment Count', 'Unread'
    ])
    
    for i, message in enumerate(messages[:100], 1):
        try:
            writer.writerow([
                message.Subject,
                message.SenderEmailAddress,
                message.To,
                message.ReceivedTime.strftime('%Y-%m-%d %H:%M:%S'),
                f"{message.Size / 1024:.2f}",
                'Yes' if message.Attachments.Count > 0 else 'No',
                message.Attachments.Count,
                'Yes' if message.UnRead else 'No'
            ])
            
            if i % 25 == 0:
                print(f"Processed {i} emails...")
                
        except Exception as e:
            print(f"Error processing message: {e}")
            continue

print(f"\n✓ Exported {i} emails to {csv_file}")

result:

Processed 25 emails...
Processed 50 emails...
Processed 75 emails...
Processed 100 emails...

✓ Exported 100 emails to outlook_emails_export.csv

CSV Output:

Subject From To Received Date Size (KB) Has Attachments Attachment Count Unread
Q4 Budget Review [email protected] [email protected] 2024-12-15 09:30:00 12.45 Yes 1 No
Invoice #12345 [email protected] [email protected] 2024-12-15 10:20:00 89.23 Yes 2 No

Common Use Cases

Email Automation: Auto-download attachments, archive old emails, organize by sender

Data Extraction: Parse order confirmations, invoices, shipping notifications

Reporting: Generate email activity reports, response time analysis

Backup: Export emails to local storage for archival

Integration: Connect Outlook/Thunderbird to CRM, ticketing systems, databases

Monitoring: Track emails from specific senders, flag urgent messages

Comparison: Outlook vs Thunderbird Access

Feature Outlook (win32com) Thunderbird (mailbox)
Platform Windows only Cross-platform
Format Proprietary mbox (standard)
Live access Real-time File-based
Attachments Easy extraction Requires parsing
Filtering Built-in MAPI filters Manual implementation
Installation pip install pywin32 Built-in library

Best Practices

  • Close connections properly to avoid locking mailbox files
  • Handle errors with try-except for malformed emails
  • Limit batch size when processing thousands of emails
  • Use filters to reduce processing time
  • Cache results for repeated access to same emails
  • Don't modify Thunderbird mbox files while Thunderbird is running
  • Avoid hardcoding mailbox paths - use environment variables
  • Don't process all emails at once - use pagination

Resources