I'm playing with Python imaplib (Python 2.6) to fetch emails from GMail. Everything I fetch an email with method http://docs.python.org/library/imaplib.html#imaplib.IMAP4.fetch I get whole email. I need only text part and also parse names of attachments, without downloading them. How this can be done? I see that emails returned by GMail follow the same format that browsers send to HTTP servers.
-
I'm afraid you're out of luck. According to this post, there are only two parts to the email - the header and the body. The body is where the attachments are if there are any and you have to download the whole body before extracting only the message text. The info about the FETCH command found here also supports this opinion. While it says you can extract partials of the body, these are specified in terms of octets which doesn't really help.
Viet : +1 thanks for the pointers. I'll do more research on that. -
Take a look at this recipe: http://code.activestate.com/recipes/498189/
I adapted it slightly to print the From, Subject, Date, name of attachments, and message body (just plaintext for now -- its trivial to add html messages).
I used the Gmail pop3 server in this case, but it should work for IMAP as well.
import poplib, email, string mailserver = poplib.POP3_SSL('pop.gmail.com') mailserver.user('recent:YOURUSERNAME') #use 'recent mode' mailserver.pass_('YOURPASSWORD') #consider not storing in plaintext! numMessages = len(mailserver.list()[1]) for i in reversed(range(numMessages)): message = "" msg = mailserver.retr(i+1) str = string.join(msg[1], "\n") mail = email.message_from_string(str) message += "From: " + mail["From"] + "\n" message += "Subject: " + mail["Subject"] + "\n" message += "Date: " + mail["Date"] + "\n" for part in mail.walk(): if part.is_multipart(): continue if part.get_content_type() == 'text/plain': body = "\n" + part.get_payload() + "\n" dtypes = part.get_params(None, 'Content-Disposition') if not dtypes: if part.get_content_type() == 'text/plain': continue ctypes = part.get_params() if not ctypes: continue for key,val in ctypes: if key.lower() == 'name': message += "Attachment:" + val + "\n" break else: continue else: attachment,filename = None,None for key,val in dtypes: key = key.lower() if key == 'filename': filename = val if key == 'attachment': attachment = 1 if not attachment: continue message += "Attachment:" + filename + "\n" if body: message += body + "\n" print message print
This should be enough to get you heading in the right direction.
Viet : +1 thanks! that's great to see sample code! -
You can get only the plain text of the email by doing something like:
connection.fetch(id, '(BODY[1])')
For the gmail messages I've seen, section 1 has the plaintext, including multipart junk. This may not be so robust.
I don't know how to get the name of the attachment without all of it. I haven't tried using partials.
0 comments:
Post a Comment