Breaking DKIM - on Purpose and by Chance

October 2017, Update 08/2018

For the impatient
See here for how to create a mail which looks like it comes from DHL, passes DKIM and DMARC validation, but shows a content which is fully controlled by the attacker. Or see here how DKIM gets broken accidentely in practice, making an innocent message look spoofed.

Summary

DKIM is, together with DMARC and SPF, one of the major ways currently used to combat sender spoofing in e-mail, and thus combat phishing attacks. The main idea of DKIM is that the sending mail server applies a digital signature to the mail which can then be validated by the recipient. This is considered a proof that the mail was actually sent by the mail server responsible for the senders domain.
This article questions the quality of this proof by showing how fragile DKIM is as used in practice. It gets shown how in relevant cases the content of a mail can be changed without invalidating the DKIM signature, thus severely undermining the trust one should have in the signature. It gets also shown how easily DKIM breaks by chance and makes the recipient believe that the mail was spoofed even though it was not. And finally it is shown how DKIM can be used properly to actually meet most of the trust expected from it.

As reaction to a post which dismissed the relevance of this research

I just now (08/2018) became aware of some post dismissing the relevance of this research. See below for my opinion to this post.

Please note that republishing this article in full or in part is only allowed under the conditions described here.

What is DKIM

Sender Spoofing as Nuisance and Attack Vector
Preventing Sender Spoofing with SPF, DKIM and DMARC
A Quick Introduction Into DKIM
Which Parts of the Mail Header Should Be Signed

Breaking DKIM on Purpose

Spoofing Mail Headers: Subject, Content-Type, ...
Spoofing the Mail Body: Displayed Content Fully Controlled by Attacker

Breaking DKIM by Chance
How to Fix the Problems
About claims that this research is irrelevant
Summary

What is DKIM

This gives some intruduction into DKIM, its role in preventing sender spoofing and how it basically works. If you are already familiar with this you can skip directly to Breaking DKIM on Purpose.

Sender Spoofing as Nuisance and Attack Vector

The ability to easily spoof the sender of an e-mail is both a nuisance and a risk. It is regularly done when delivering spam which results in bounced mails or mails from angry users filling the mailbox of the alleged sender. But it is also used to make a pishing mail more credible since it seems to come from a known and trusted sender. Such phishing mails claim to come from Amazon, Apple, DHL, banks or other companies and typically try to steal credentials from the user or infect the users computer with ransomware or other malware.

Preventing Sender Spoofing with SPF, DKIM and DMARC

Because of this preventing or at least detecting sender spoofing is important and several technologies were developed in the last years. The major technologies used in practice are SPF, DKIM and DMARC. With SPF the receiving mail server checks if the senders IP address is the expected one. With DKIM the mail server for the senders domain adds a digital signature to the mail so that the recipient can verify that the mail was sent by the expected server and was not modified. DMARC then builds on top of SPF and DKIM by making sure that the sender domain as displayed to the end user matches the one claimed in SPF and DKIM. DMARC also adds a policy on how to deals with mails which don't match the expectations and provide a way to send reports about such problems to the owner of the domain. All three technologies rely on DNS to provide the policies, i.e. the owner of the domain adds the needed policies in special TXT (or SPF) records in the DNS settings of his domain.

Since SPF can easily result in false positives if mail forwarding or mailing lists are involved and DKIM is not as easy deployed as SPF, DMARC only requires that either the SPF or the DKIM check provides a positive result. But this means also that it is sufficient to bypass either DKIM or SPF by an attacker, i.e. only one is needed instead of both.

A Quick Introduction Into DKIM

The basic idea of DKIM is that the mail server of the senders domain adds a signature to the mail which can be used by the recipient to verify that the mail was sent by this mail server. Such a signature might look like this:

    DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
      d=dhl.com; l=1850; s=20140901; t=1452769712;
      h=date:from:to:message-id:subject:mime-version;
      bh=yCbsFBJJ9k2VYBxKGgyNILalBP3Yzn1N8cMPQr92+zw=;
      b=bnuXrH/dSnyDR/kciZauK4HTgbcDbSFzmHR78gq+8Cdm20G56Ix169SA...

The most important parts of the signature in the context of this article are:

d - the domain of the signer. This part is used in connection with DMARC to check if the signatures domain matches the sender domain visible in the mail client.
h - the list of fields from the mail header which should be included in the signatur.
bh - a hash over the mail body.
l - the number of bytes the body hash contains from the body. This is optional. If not given bh includes the full body.
b - the signature itself, which includes the header fields given with 'h' and also the DKIM-Signature header itself and thus also signs the body since the header includes the body hash.

Apart from this the signature above also contains the signature algorithm (a), the selector (s) used to find the RSA key in the DNS (by getting the TXT record for 20140901._domainkey.dhl.com in this case), the canonicalization methods (c) for header and body and the optional time stamp (t).

Which Parts of the Mail Header Should Be Signed

As described in the previous chapter, the signature includes the body and also specific header fields. Which header fields are included is given in the parameter 'h'. It is important to understand that each occurance of a field in 'h' matches only a single occurence in the mail header, starting from the bottom of the mail header. Thus if the header contains two 'To' fields and both should be protected then 'to' need to be included twice into 'h'.

The only requirement in the standard on which fields should be included in the signature is that 'From' must be included. Apart from that the standard is vague, i.e. section 5.4. Determine the Header Fields to Sign of RFC 6376 mainly says:

The choice of which header fields to sign is non-obvious...signing fields present in the message such as Date, Subject, Reply-To, Sender, and all MIME header fields are highly advised.

Interestingly, the following section 5.4.1 gives examples for fields considered useful for signing. Only, these examples partly contradict the statements in 5.4 in that several new fields are added but others omitted. Still, opendkim is treating the list in 5.4.1 as the recommended fields and thus misses important fields like Content-Type or Content-Transfer-Encoding. Even more strange is that RFC 4871 as the predecessor of the current DKIM standard RFC 6376 has a more extensive list of header fields in section 5.5
and even defines these more clearly as SHOULD be signed instead of just examples.

Apart from being vague about which header fields should be signed in the first place the current standard is even more vague on how to protect against extra header fields added later. While 8.15. Attacks Involving Extra Header Fields acknowledges that this can represent serious attacks it mainly sees the recpient responsible for dealing with this problem even though section 5.4 even offers a way to protect against added header fields by "oversigning":

A header field name need only be listed once more than the actual number of that header field in a message at the time of signing in order to prevent any further additions.

Breaking DKIM on Purpose

The vagueness in the DKIM standard and the lack of secure defaults combined with the complexity, flexibilty and brokeness of the MIME standard and its implementations makes it possible to spoof important information in the mail like the subject, or even change the whole body including adding new (and potential malicious) attachments.

Spoofing Mail Headers: Subject, Content-Type, ...

My research shows that header signing as done in practice is insuffient and makes spoofing possible in many cases. Although in the mails I've analyzed about 97% included the subject in the signature only 3% protected against an additional subject header with oversigning. But for example GMail and AOL webmail implementations and also Thunderbird display the content of the first subject line in case of multiple subject lines while the DKIM signature covers the last subject line only. This way an attacker can easily change the displayed subject without affecting the validity of the DKIM signature.

And, when additionally spoofing the Content-Type (which is covered only by 56% of the signatures in the mails I've analyzed and only protected against extra headers in 2% of the mails) it might also be possible with some clients to show an empty mail body even though there was one before.

For example take the following simple mail:

    DKIM-Signature: v=1; h=from:to:cc:subject:content-type; ...
    From: <dkim-test@chksum.de>
    To: knurrt.hase@gmail.com
    Subject: 20170920:1755 - good
    Content-type: multipart/mixed; boundary=foo
    Date: Wed, 20 Sep 2017 17:55:18 +0200
    
    --foo
    Content-type: text/plain
    
    some text
    --foo--

Using the mail client Thunderbird with the DKIM plugin installed it gets rendered like this:

But, by adding an additional Subject and Content-Type with a different and non-existing boundary on top of the original mail it gets rendered differently:

    Subject: Urgent Update at http://foo
    Content-type: multipart/mixed; boundary=bar
    DKIM-Signature: v=1; h=from:to:cc:subject:content-type; ...
    From: <dkim-test@chksum.de>
    To: knurrt.hase@gmail.com
    Subject: 20170920:1755 - good
    Content-type: multipart/mixed; boundary=foo
    Date: Wed, 20 Sep 2017 17:55:18 +0200
    
    --foo
    Content-type: text/plain
    
    some text
    --foo--

Note that the subject is different and the body is vanished but the original DKIM signature is still successfully validated:

Spoofing the Mail Body: Displayed Content Fully Controlled by Attacker

Given the right circumstances one can not only spoof essential mail headers but also spoof the body of the mail, including changing the displayed text or adding own attachments. And again, the DKIM signature which should protect against this stays valid.

Such more harmful spoofing can be done if the sender uses the 'l' attribute in the signature to restrict which parts of the body are covered by the signature. This feature is usually used to protect the validity of the signature even if mail servers or filters on the way add their own signatures at the end of the body, i.e. unsubscribe information in mailing lists or something like "this mail was scanned by product XYZ" some antivirus products like to add.

Usually the value of 'l' as set by the sending server covers the whole body. It thus guarantees that no changes are made to the original body but allows changes after the body. But I've also stumbled over some misconfigured system by a large german company where all their DKIM signatures cover only the first 10 bytes of the body, no matter how long the body actually was. Such misconfiguration makes attacks even easier but is not required in most cases.

As an example we take an actual mail send from DHL.com at the beginning of 2016. The DKIM signature still validates successfully in september 2017 since DHL did neither add an expiration to the signature nor did it change the RSA key used for signing. The original mail as seen in Gmail webmail looks like this:

When looking at the source code of the mail below it can be seen, that some fields are covered by the signature but are not protected with oversigning against adding another field. Other important fields are not even covered by the signature. And, the body hash covers only a specific part of the mail so that anything added to the original body will not invalidate the signature.

Specifically this means that we can add another Date, To and Message-Id on top of the mail, change the existing Content-Type and add arbitrary data to the body without invalidating the signature. These changes are shown in red while the original mail is shown in black and blue:

   DKIM-Signature: v=1; l=1850; d=dhl.com; s=20140901;
     h=date:from:to:message-id:subject:mime-version; 
     b=...; bh=...
   Date: Thu, 24 Sep 2017 19:08:23 +0800 (MYT)
   Date: Thu, 14 Jan 2016 19:08:23 +0800 (MYT)
   From: DHL Customer Support <support@dhl.com>
   To: knurrt.hase@outlook.de
   To: auftrag@original-company-not-shown
   Message-ID: <9953648784.9145749@dhl.com>
   Message-ID: <1453648784.9145749.1452769703900.JavaMail...dhl.com>
   Subject: DHL Shipment Digest
   MIME-Version: 1.0
   Content-Type: multipart/mixed; boundary=BAD
   Content-Type: multipart/mixed; boundary=----=_Part_9145747_2082645767.1452769703900
   
   ------=_Part_9145747_2082645767.1452769703900
   Content-type: text/plain
   
   The real DHL Shipment Digest ...
   ------=_Part_9145747_2082645767.1452769703900
   --BAD
   Content-type: text/plain
   
   This is a faked mail with valid DKIM signature from DHL.
   --BAD--

The magic in replacing the shown body lies in redefining the Content-Type with a different MIME boundary. Anything before this boundary will be treated as MIME preamble and ignored in any MIME compatible mail client (essentially all of todays clients). Which means the resulting mail will show the body added by the attacker instead of the original body:

The DKIM signature is still valid as is shown in the "signed-by" information. Moreover if we look at the source of the mail Gmail provides a nice summary which includes the attacker set Date and Message-Id and also shows the DKIM passes successfully. And since the DKIM signature matches the domain dhl.com of the displayed sender DMARC also passes, even though the mail was not sent through DHL's mail server:

Not only dhl.com is using 'l' inside the DKIM signature and is thus affected by this problem. I've also seen in the past mails from cisco.com, deutschepost.de or dpdhl.com and others.

Interestingly, the authors of the DKIM standard were already kind of aware of the problems with the 'l' attribute. From 8.2. Misuse of Body Length Limits ("l=" Tag):

Use of the "l=" tag might allow display of fraudulent content without appropriate warning to end users. ... An example of such an attack includes altering the MIME structure, ...
To avoid this attack, Signers should be extremely wary of using this tag, and Assessors might wish to ignore signatures that use the tag.

Given the known potential for misuse and the coy recommendation to ignore mails using this feature it makes you wonder why this feature was included in the standard in the first place.

Breaking DKIM by Chance

The previous chapters have shown how existing mails can be used to create spoofed mails without invalidating the DKIM signature. This undermines the trustability of DKIM, i.e. one cannot be sure that the mail was not spoofed even though the DKIM signature is valid.

But, there is also a problem in the other direction: due to the pecularities of the SMTP protocol it can happen that a DKIM signature gets invalid even though the mail itself was not changed. This means that the mail looks spoofed although it is not spoofed, thus undermining trust in DKIM further.

Traditionally mails are restricted to ASCII only (i.e. 7 bit clean) and a line length of 1000 characters. The MIME standard defines the Content-Transfer-Encoding's base64 and quoted-printable which allow longs lines, non-ASCII characters and also binary data to be presented within the restrictions of the original mail delivery. But, these encodings can be inefficient and it would be much nicer if the client could ignore the historic restrictions and transfer the mail by using the full 8 bit.

This was made possible using the 8BITMIME extension. If a mail server supports this extension the client can ignore the restrictions of ASCII only, although not the restriction of a limited line length. But since mail delivery is not end-to-end but hop-by-hop it can happen that the first mail server (MTA) in the path supports 8BITIME and accepted such a mail, while another MTA in the path does not support 8BITMIME. In this case the sending MTA needs to convert the mail to ASCII-only, i.e. within the historic restrictions. Unfortunately, this conversion breaks any existing DKIM signatures:

This problem is not new. In fact the DKIM standard itself mentions in section 5.3 this problem and shows how to deal with it:

Some messages, particularly those using 8-bit characters, are subject to modification during transit, notably conversion to 7-bit form. Such conversions will break DKIM signatures. In order to minimize the chances of such breakage, Signers SHOULD convert the message to a suitable MIME content-transfer encoding such as quoted-printable or base64 as described in [RFC2045] before signing. Such conversion is outside the scope of DKIM; the actual message SHOULD be converted to 7-bit MIME by an MUA or MSA prior to presentation to the DKIM algorithm.

Still, several major senders seem to be not fully aware of the issue and thus are affected from this conversion problem. I have for example mails from Paypal or Booking.com affected from this problem, although most of their mails seem to be fine. And, there are major mail providers which don't support 8BITMIME and are thus affected as recipient by this problem. This includes for example 1&1 (i.e. kundenserver.de, Web.de, GMX,...) and AOL. But also providers of security services around mail like Mimecast or Spamfence are affected, which might use an invalid DKIM signature as an indicator of detected spoofing and classify the message accordingly.

How to Fix the Problems

While the DKIM standard tries to shift most of the work in fixing such problems to the recipient, history shows that this does not work. Instead both sides should do their best: The sender should make sure that the mail cannot be changed without breaking the signature in the first place. And the recipient should check if the signature is good enough so that the mail is definitely not spoofed.

On the sender side this means first to make sure that the mail conforms to the historic restrictions mails have, i.e. all-ASCII and a line length of at most 1000 characters. If the mail is not there yet it needs to be converted before any DKIM signature gets added.

The signature itself need to include all mail headers which might affect the display of the message. Each of these should be oversigned to protect against an attacker adding extra headers. The headers which obviously needs to be signed are any headers directly displayed to the user, i.e. Subject, From, To, Date and Sender. Additionally any headers affecting the display of the message should be included, i.e. Content-Type, Content-Transfer-Encoding, Content-Disposition and Mime-Version. And there are also headers which affect the future message flow or how this message is displayed in the context of others, i.e. Reply-To, In-Reply-To and References. It might also be useful to add the length of the body with the 'l' attribute as long as all headers which might affect the display of the message are included in the signature and oversigned.

On the recpient site it should be checked that each relevant header is actually included in the signature. Any headers which are not included in the signature should be treated with outmost care and should better not be relied on when displaying the message. Given that this is not possible in many cases one should at least signal the user that the DKIM signatures does not include critical headers and that the message thus might be spoofed even if the signature looks valid. Also, if the 'l' attribute is set, only the part covered by the limited body hash should be shown to the end user or the part outside the hash should be explicitly displayed as untrusted.

Of course, the best would be if all senders would sign their messages using S/MIME or PGP and all clients would check this end-to-end signature. But this is probably a dream only for the next years and thus we need to make the current workarounds like SPF, DKIM and DMARC to be more reliable.

About claims that this research is irrelevant

I've only recently became aware of the post Breaking DKIM? Or Simply Misunderstanding How It Works In Practice which is dismissing this research. Let me reply to some points I think are wrong with this post:

... his test system uses the independently developed Thunderbird DKIM plugin for validation, not the DKIM signature result from GMail. That’s because modern mail services like GMail, Office 365, and others are aware of these kind of issues.

Actually I've tested the Google, Outlook (i.e. Microsoft) and AOL as mail providers since these included at least an Authentication-Results header with the result of the check. And, all of these claimed that the DKIM signature was fine, i.e. contrary to the authors claim about modern mail services being aware of the problem. And to repeat - I've tested with GMail contrary to the authors claim and I've even got a small bug bounty. There are even images in thist post which show how the message is displayed in GMail, and these images were there from beginning.

I did not show the failures of others since contrary to GMail these services had not even a user interface to display the status of DKIM or DMARC, one had to dig into the source code of the mail and check the headers instead - which probably nobody would do anyway. And to say it again - these headers showed DKIM pass and DMARC pass.

From the desktop clients I've tried Outlook and Thunderbird. Only, I could not find any DKIM plugin or similar for Outlook so I showed only the examples from Thunderbird which actually had such a plugin.

A second potential vulnerability Ulrich highlights depends on the use of the optional “l” (lowercase L, for “length”) attribute in the DKIM signature, which can be used to limit how much of the message body is signed. ... As Ulrich notes, even the authors of the DKIM standard recognized that the l attribute was a risky and not particularly effective way to solve that problem. ... As a result, almost no senders today use the attribute, and email best practices advise against it.

It would be nice if almost no senders would use it as the author claims. But from the mails I've analyzed about 8% had a DKIM signature with an l-Attribute, and among these were not-so-irrelevant companies like DHL and Cisco. I've even showed an example for DHL.

... and his post doesn’t outline a workable vector that an attacker could actually exploit.

I'm not sure if the author of the post was aware of the images in this post from the GMail user interface which show a clear pass for DKIM and DMARC even though the content of the mail was heavily changed. And a DMARC pass should actually mean that the mail is not spoofed at all. And DHL actually has a DMARC policy of "reject", so this mail should never have made to the recipient in the first place (again, same problem with Google, Outlook and AOL).

If creating a mail with different content and a spoofed sender and using it with a valid DKIM signature and making the mail clients believe that the sender is not spoofed at all - if this is not a vector against the very thing DKIM and DMARC tried to protect against then I'm not sure what kind of vector the author expects.

Summary

DKIM tries to address sender spoofing by having the sending MTA sign the mail. While the idea is sound in theory the standard is overly flexible. It only issues vague recommendations and then relies on the specific implementation and configuration to provide the necessary security and resiliance. Given the lack of clear requirements and secure defaults it is no surprise that DKIM as used in practice fails to provide the expected trust in many cases.