Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
257 views
in Technique[技术] by (71.8m points)

python - What are the "parts" in a multipart email?

A bit of context...

Some time ago, I wrote Python a program that deals with email messages, one thing that always comes across is to know whether an email is "multipart" or not.

After a bit of research, I knew that it has something to do with emails containing HTML, or attachments etc... But I didn't really understand it.

My usage of it was limited to 2 instances:

1. When I had to save the attachment from the raw email

I just found this on the internet (probably on here - Sorry for not crediting the person who wrote it but I can't seem to find him again :/) and pasted it in my code

def downloadAttachments(emailMsg, pathToSaveFile):
    """
    Save Attachments to pathToSaveFile (Example: pathToSaveFile = "C:\Program Files")
    """
    att_path_list = []
    for part in emailMsg.walk():
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue

        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = part.get_filename()

        att_path = os.path.join(pathToSaveFile, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()
        att_path_list.append(att_path)
    return att_path_list

2. When I had to get the text from the raw email

Also pasted from someone on the internet without really understanding how it works.

def get_text(emailMsg):
    """
    Output: body of the email (text content)
    """
    if emailMsg.is_multipart():
        return get_text(emailMsg.get_payload(0))
    else:
        return emailMsg.get_payload(None, True)

What I do understand...

Is that if the email message is multipart, the parts can be iterated over.

My question is

What exactly are these parts? How do you know which one is html for example? Or which one is an attachment? Or just the body?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is no strict hierarchy or guidance for how exactly to use multipart messages. MIME simply defines a way to collect multiple payloads into a single email message. One of the original motivations I believe was to be able to embed pictures in text; but being able to attach binaries to a text message, and more generally, being able to create structured messages with payloads which are related in arbitrary ways is something which has simply been there for applications to use in whatever way they see fit.

A common misunderstanding is postulating a hierarchy into a "main part" and "subordinate" parts. It's certainly possible to create this structure, but it is by no means universally done. In fact, most multipart messages simply have a sequence of parts without any hierarchy. The user's email client will commonly pick one of the "inline" parts as the preferred "main" part to display in a message pane, but this is by no means dictated by the standard, or possible to enforce by the sending party.

Each MIME part has a set of headers which tell you the type, encoding, and disposition; for parts of type text/* the default disposition is "inline" (so it is often not explicitly spelled out) whereas most other parts have a default disposition of "attachment". You'll need to refer to the pertinent standards for a strict definition, but probably take it with a grain of salt, because many real-world applications are not particularly RFC-conformant.

For your concrete question, find the topmost leaf parts which are (implicitly or explicitly) inline, and display one which supports your use case as the "main" one. If you want to enforce HTML as the preferred format, you can do that; but many email applications defer this to the user to decide, and some users will definitely -- because of technical necessity, physical disabilities, or personal taste -- prefer plain-text when it's available.

Unfortunately, common practice by message producers recently has been to create a multipart/alternative container with text/plain and text/html members, but then provide a completely useless text/plain part and have all the actual content in a text/html part. The correct arrangement in this situation would be to simply not supply a text/plain part if you can't put anything useful in it (but I guess they only care about getting past some misguided spam filter, not about actually accommodating the preferences of the recipients).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...