In this guide, you'll see how to access and parse emails from Outlook and Thunderbird mailboxes using Python.
Here you can find the short answer:
(1) Access Outlook with win32com
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application")
inbox = outlook.GetNamespace("MAPI").GetDefaultFolder(6)
(2) Parse Thunderbird mailbox
import mailbox
mbox = mailbox.mbox('Inbox')
for message in mbox:
print(message['subject'])
(3) Parse email with email library
import email
msg = email.message_from_string(raw_email)
print(msg['From'], msg['Subject'])
So let's see how to connect to email clients and extract messages programmatically.
1: Access Outlook mailbox on Windows
Let's start with accessing Outlook emails on Windows using the win32com.client library:
Install required package:
pip install pywin32
Basic Outlook connection:
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
print(f"Total emails in Inbox: {len(messages)}")
for i, message in enumerate(messages[:5], 1):
print(f"\n--- Email {i} ---")
print(f"From: {message.SenderName}")
print(f"Subject: {message.Subject}")
print(f"Received: {message.ReceivedTime}")
print(f"Size: {message.Size / 1024:.2f} KB")
result will be:
Total emails in Inbox: 1247
--- Email 1 ---
From: John Smith
Subject: Q4 Budget Review Meeting
Received: 2024-12-15 09:30:00
Size: 12.45 KB
--- Email 2 ---
From: Sarah Johnson
Subject: Project Alpha Status Update
Received: 2024-12-15 08:15:00
Size: 8.92 KB
--- Email 3 ---
From: Microsoft Teams
Subject: New message in Engineering channel
Received: 2024-12-14 16:45:00
Size: 5.67 KB
Outlook folder indices:
- 3 = Deleted Items
- 4 = Outbox
- 5 = Sent Items
- 6 = Inbox
- 9 = Calendar
- 10 = Contacts
- 16 = Drafts
2: Extract email content and attachments
Access email body, attachments, and metadata from Outlook messages:
import win32com.client
import os
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
attachment_folder = "C:/email_attachments/"
os.makedirs(attachment_folder, exist_ok=True)
for message in messages[:10]:
print(f"\nSubject: {message.Subject}")
print(f"From: {message.SenderEmailAddress}")
print(f"To: {message.To}")
print(f"CC: {message.CC}")
print(f"Date: {message.ReceivedTime}")
print(f"\nBody preview (first 200 chars):")
body = message.Body if hasattr(message, 'Body') else ""
print(body[:200] + "...")
if message.Attachments.Count > 0:
print(f"\nAttachments ({message.Attachments.Count}):")
for attachment in message.Attachments:
print(f" - {attachment.FileName}")
attachment_path = os.path.join(attachment_folder, attachment.FileName)
attachment.SaveAsFile(attachment_path)
print(f" Saved to: {attachment_path}")
print(f"\n✓ Processed {min(10, len(messages))} emails")
result:
Subject: Invoice #12345 - December 2024
From: [email protected]
To: [email protected]
CC:
Date: 2024-12-15 10:20:00
Body preview (first 200 chars):
Dear John,
Please find attached your invoice for December 2024. The total amount due is $1,250.00. Payment is due by December 31st.
Thank you for your business.
Best regards,
Accounting Team...
Attachments (2):
- Invoice_12345.pdf
Saved to: C:/email_attachments/Invoice_12345.pdf
- Payment_Details.xlsx
Saved to: C:/email_attachments/Payment_Details.xlsx
✓ Processed 10 emails
3: Parse Thunderbird mailbox files
Thunderbird stores emails in mbox format, which can be accessed using Python's built-in mailbox library:
Locate Thunderbird profile:
- Windows:
C:\Users\YourName\AppData\Roaming\Thunderbird\Profiles\ - macOS:
~/Library/Thunderbird/Profiles/ - Linux:
~/.thunderbird/
Parse mbox file:
import mailbox
import email
from email.header import decode_header
mbox_path = r"C:\Users\YourName\AppData\Roaming\Thunderbird\Profiles\xxx.default\Mail\Local Folders\Inbox"
mbox = mailbox.mbox(mbox_path)
print(f"Total messages in mailbox: {len(mbox)}")
for idx, message in enumerate(mbox, 1):
subject = message['subject']
from_addr = message['from']
date = message['date']
subject_decoded = decode_header(subject)[0][0]
if isinstance(subject_decoded, bytes):
subject_decoded = subject_decoded.decode()
print(f"\n--- Message {idx} ---")
print(f"From: {from_addr}")
print(f"Subject: {subject_decoded}")
print(f"Date: {date}")
if message.is_multipart():
for part in message.walk():
content_type = part.get_content_type()
if content_type == "text/plain":
body = part.get_payload(decode=True).decode()
print(f"Body preview: {body[:150]}...")
break
else:
body = message.get_payload(decode=True).decode()
print(f"Body preview: {body[:150]}...")
if idx >= 5:
break
print("\n✓ Mailbox parsing complete")
result:
Total messages in mailbox: 3892
--- Message 1 ---
From: Sarah Chen <[email protected]>
Subject: Team Meeting Notes - Project Alpha
Date: Mon, 16 Dec 2024 14:30:25 +0000
Body preview: Hi Team,
Here are the key takeaways from today's meeting:
1. Sprint 12 completed successfully
2. Demo scheduled for Friday
3. Client feedback session...
--- Message 2 ---
From: GitHub <[email protected]>
Subject: [repo-name] Pull Request #234 merged
Date: Mon, 16 Dec 2024 09:15:42 +0000
Body preview: Your pull request "Add authentication middleware" has been merged into main branch by @john-doe.
Changes included:
- Added JWT token validation
- Implement...
✓ Mailbox parsing complete
4: Filter and search emails by criteria
Implement email filtering to find specific messages based on sender, subject, date, or content:
import win32com.client
from datetime import datetime, timedelta
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
def search_emails(folder, sender=None, subject_keyword=None, days_back=7):
"""
Search emails with multiple criteria
"""
messages = folder.Items
date_filter = datetime.now() - timedelta(days=days_back)
date_filter_str = date_filter.strftime("%m/%d/%Y")
filter_str = f"[ReceivedTime] >= '{date_filter_str}'"
if sender:
filter_str += f" AND [SenderEmailAddress] = '{sender}'"
filtered = messages.Restrict(filter_str)
results = []
for message in filtered:
if subject_keyword:
if subject_keyword.lower() in message.Subject.lower():
results.append(message)
else:
results.append(message)
return results
print("=== Search Example 1: Emails from specific sender (last 7 days) ===")
sender_emails = search_emails(inbox, sender="[email protected]")
print(f"Found {len(sender_emails)} emails from [email protected]")
for msg in sender_emails[:3]:
print(f" - {msg.Subject} ({msg.ReceivedTime})")
print("\n=== Search Example 2: Emails with 'invoice' in subject (last 30 days) ===")
invoice_emails = search_emails(inbox, subject_keyword="invoice", days_back=30)
print(f"Found {len(invoice_emails)} emails with 'invoice' in subject")
for msg in invoice_emails[:3]:
print(f" - {msg.Subject} ({msg.SenderName})")
print("\n=== Search Example 3: Recent meeting invites ===")
meeting_emails = search_emails(inbox, subject_keyword="meeting", days_back=14)
print(f"Found {len(meeting_emails)} emails with 'meeting' in subject")
for msg in meeting_emails[:3]:
print(f" - {msg.Subject} ({msg.ReceivedTime})")
result:
=== Search Example 1: Emails from specific sender (last 7 days) ===
Found 12 emails from [email protected]
- Weekly Status Report (2024-12-15 16:30:00)
- Q4 Budget Approval (2024-12-14 10:15:00)
- Team Lunch This Friday (2024-12-13 09:45:00)
=== Search Example 2: Emails with 'invoice' in subject (last 30 days) ===
Found 8 emails with 'invoice' in subject
- Invoice #12345 - December 2024 (Accounting Department)
- Payment Received - Invoice #12300 (Finance Team)
- Overdue Invoice Reminder (Billing System)
=== Search Example 3: Recent meeting invites ===
Found 15 emails with 'meeting' in subject
- Team Meeting Notes - Project Alpha (2024-12-16 14:30:00)
- All Hands Meeting Reminder (2024-12-15 08:00:00)
- Client Meeting Rescheduled (2024-12-13 11:20:00)
5: Parse email headers and extract metadata
Extract detailed email headers, reply-to addresses, and routing information:
import win32com.client
import email
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
message = inbox.Items[0]
print(f"Subject: {message.Subject}")
print(f"From: {message.SenderName} <{message.SenderEmailAddress}>")
print(f"To: {message.To}")
print(f"CC: {message.CC}")
print(f"BCC: {message.BCC}")
print(f"Reply To: {message.ReplyRecipients}")
print(f"Importance: {message.Importance}")
print(f"Sensitivity: {message.Sensitivity}")
print(f"Size: {message.Size / 1024:.2f} KB")
print(f"Unread: {message.UnRead}")
print(f"Received Time: {message.ReceivedTime}")
print(f"Sent On: {message.SentOn}")
print(f"Message Class: {message.MessageClass}")
try:
headers = message.PropertyAccessor.GetProperty(
"http://schemas.microsoft.com/mapi/proptag/0x007D001E"
)
print(f"\nFull Headers:\n{headers}")
except:
print("\nFull headers not available for this message")
if message.Categories:
print(f"\nCategories: {message.Categories}")
if hasattr(message, 'FlagRequest'):
print(f"Flag Status: {message.FlagRequest}")
result:
Subject: Q4 Budget Review Meeting
From: John Smith <[email protected]>
To: [email protected]
CC: [email protected]
BCC:
Reply To: <[email protected]>
Importance: 1
Sensitivity: 0
Size: 12.45 KB
Unread: False
Received Time: 2024-12-15 09:30:00
Sent On: 2024-12-15 09:29:45
Message Class: IPM.Note
Full Headers:
Received: from mail.company.com (10.0.1.5) by exchange.company.com
Date: Mon, 15 Dec 2024 09:29:45 -0800
From: John Smith <[email protected]>
To: [email protected]
Subject: Q4 Budget Review Meeting
Message-ID: <[email protected]>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Categories: Important, Work
Flag Status: Follow up
6: Export emails to CSV for analysis
Create a CSV export of email metadata for reporting and analysis:
import win32com.client
import csv
from datetime import datetime
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)
csv_file = "outlook_emails_export.csv"
with open(csv_file, 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow([
'Subject', 'From', 'To', 'Received Date',
'Size (KB)', 'Has Attachments', 'Attachment Count', 'Unread'
])
for i, message in enumerate(messages[:100], 1):
try:
writer.writerow([
message.Subject,
message.SenderEmailAddress,
message.To,
message.ReceivedTime.strftime('%Y-%m-%d %H:%M:%S'),
f"{message.Size / 1024:.2f}",
'Yes' if message.Attachments.Count > 0 else 'No',
message.Attachments.Count,
'Yes' if message.UnRead else 'No'
])
if i % 25 == 0:
print(f"Processed {i} emails...")
except Exception as e:
print(f"Error processing message: {e}")
continue
print(f"\n✓ Exported {i} emails to {csv_file}")
result:
Processed 25 emails...
Processed 50 emails...
Processed 75 emails...
Processed 100 emails...
✓ Exported 100 emails to outlook_emails_export.csv
CSV Output:
| Subject | From | To | Received Date | Size (KB) | Has Attachments | Attachment Count | Unread |
|---|---|---|---|---|---|---|---|
| Q4 Budget Review | [email protected] | [email protected] | 2024-12-15 09:30:00 | 12.45 | Yes | 1 | No |
| Invoice #12345 | [email protected] | [email protected] | 2024-12-15 10:20:00 | 89.23 | Yes | 2 | No |
Common Use Cases
Email Automation: Auto-download attachments, archive old emails, organize by sender
Data Extraction: Parse order confirmations, invoices, shipping notifications
Reporting: Generate email activity reports, response time analysis
Backup: Export emails to local storage for archival
Integration: Connect Outlook/Thunderbird to CRM, ticketing systems, databases
Monitoring: Track emails from specific senders, flag urgent messages
Comparison: Outlook vs Thunderbird Access
| Feature | Outlook (win32com) | Thunderbird (mailbox) |
|---|---|---|
| Platform | Windows only | Cross-platform |
| Format | Proprietary | mbox (standard) |
| Live access | Real-time | File-based |
| Attachments | Easy extraction | Requires parsing |
| Filtering | Built-in MAPI filters | Manual implementation |
| Installation | pip install pywin32 |
Built-in library |
Best Practices
- Close connections properly to avoid locking mailbox files
- Handle errors with try-except for malformed emails
- Limit batch size when processing thousands of emails
- Use filters to reduce processing time
- Cache results for repeated access to same emails
- Don't modify Thunderbird mbox files while Thunderbird is running
- Avoid hardcoding mailbox paths - use environment variables
- Don't process all emails at once - use pagination