Breaking DKIM - on Purpose and by Chance
October 2017For the impatient
See here for how to create a mail which looks like it comes from DHL, passes DKIM and DMARC validation, but shows a content which is fully controlled by the attacker. Or see here how DKIM gets broken accidentely in practice, making an innocent message look spoofed.
SummaryDKIM is, together with DMARC and SPF, one of the major ways currently used to combat sender spoofing in e-mail, and thus combat phishing attacks. The main idea of DKIM is that the sending mail server applies a digital signature to the mail which can then be validated by the recipient. This is considered a proof that the mail was actually sent by the mail server responsible for the senders domain.
This article questions the quality of this proof by showing how fragile DKIM is as used in practice. It gets shown how in relevant cases the content of a mail can be changed without invalidating the DKIM signature, thus severely undermining the trust one should have in the signature. It gets also shown how easily DKIM breaks by chance and makes the recipient believe that the mail was spoofed even though it was not. And finally it is shown how DKIM can be used properly to actually meet most of the trust expected from it.
Please note that republishing this article in full or in part is only allowed under the conditions described here.
- What is DKIM
- Sender Spoofing as Nuisance and Attack Vector
- Preventing Sender Spoofing with SPF, DKIM and DMARC
- A Quick Introduction Into DKIM
- Which Parts of the Mail Header Should Be Signed
- Breaking DKIM on Purpose
- Spoofing Mail Headers: Subject, Content-Type, ...
- Spoofing the Mail Body: Displayed Content Fully Controlled by Attacker
- Breaking DKIM by Chance
- How to Fix the Problems
This gives some intruduction into DKIM, its role in preventing sender spoofing and how it basically works. If you are already familiar with this you can skip directly to Breaking DKIM on Purpose.
The ability to easily spoof the sender of an e-mail is both a nuisance and a risk. It is regularly done when delivering spam which results in bounced mails or mails from angry users filling the mailbox of the alleged sender. But it is also used to make a pishing mail more credible since it seems to come from a known and trusted sender. Such phishing mails claim to come from Amazon, Apple, DHL, banks or other companies and typically try to steal credentials from the user or infect the users computer with ransomware or other malware.
Because of this preventing or at least detecting sender spoofing is important and several technologies were developed in the last years. The major technologies used in practice are SPF, DKIM and DMARC. With SPF the receiving mail server checks if the senders IP address is the expected one. With DKIM the mail server for the senders domain adds a digital signature to the mail so that the recipient can verify that the mail was sent by the expected server and was not modified. DMARC then builds on top of SPF and DKIM by making sure that the sender domain as displayed to the end user matches the one claimed in SPF and DKIM. DMARC also adds a policy on how to deals with mails which don't match the expectations and provide a way to send reports about such problems to the owner of the domain. All three technologies rely on DNS to provide the policies, i.e. the owner of the domain adds the needed policies in special TXT (or SPF) records in the DNS settings of his domain.
Since SPF can easily result in false positives if mail forwarding or mailing lists are involved and DKIM is not as easy deployed as SPF, DMARC only requires that either the SPF or the DKIM check provides a positive result. But this means also that it is sufficient to bypass either DKIM or SPF by an attacker, i.e. only one is needed instead of both.
The basic idea of DKIM is that the mail server of the senders domain adds a signature to the mail which can be used by the recipient to verify that the mail was sent by this mail server. Such a signature might look like this:
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dhl.com; l=1850; s=20140901; t=1452769712; h=date:from:to:message-id:subject:mime-version; bh=yCbsFBJJ9k2VYBxKGgyNILalBP3Yzn1N8cMPQr92+zw=; b=bnuXrH/dSnyDR/kciZauK4HTgbcDbSFzmHR78gq+8Cdm20G56Ix169SA...
The most important parts of the signature in the context of this article are:
- d - the domain of the signer. This part is used in connection with DMARC to check if the signatures domain matches the sender domain visible in the mail client.
- h - the list of fields from the mail header which should be included in the signatur.
- bh - a hash over the mail body.
- l - the number of bytes the body hash contains from the body. This is optional. If not given bh includes the full body.
- b - the signature itself, which includes the header fields given with 'h' and also the DKIM-Signature header itself and thus also signs the body since the header includes the body hash.
Apart from this the signature above also contains the signature algorithm (a), the selector (s) used to find the RSA key in the DNS (by getting the TXT record for 20140901._domainkey.dhl.com in this case), the canonicalization methods (c) for header and body and the optional time stamp (t).
As described in the previous chapter, the signature includes the body and also specific header fields. Which header fields are included is given in the parameter 'h'. It is important to understand that each occurance of a field in 'h' matches only a single occurence in the mail header, starting from the bottom of the mail header. Thus if the header contains two 'To' fields and both should be protected then 'to' need to be included twice into 'h'.
The only requirement in the standard on which fields should be included in the signature is that 'From' must be included. Apart from that the standard is vague, i.e. section 5.4. Determine the Header Fields to Sign of RFC 6376 mainly says:
The choice of which header fields to sign is non-obvious...signing fields present in the message such as Date, Subject, Reply-To, Sender, and all MIME header fields are highly advised.
Interestingly, the following section 5.4.1 gives examples for fields considered
useful for signing. Only, these examples partly contradict the statements in 5.4
in that several new fields are added but others omitted. Still,
opendkim is treating the list in 5.4.1 as the
recommended fields and thus misses important fields like Content-Type or
Content-Transfer-Encoding. Even more strange is that RFC 4871 as the predecessor of
the current DKIM standard RFC 6376 has a more extensive list of
header fields in section 5.5
and even defines these more clearly as SHOULD be signed instead of just examples.
Apart from being vague about which header fields should be signed in the first place the current standard is even more vague on how to protect against extra header fields added later. While 8.15. Attacks Involving Extra Header Fields acknowledges that this can represent serious attacks it mainly sees the recpient responsible for dealing with this problem even though section 5.4 even offers a way to protect against added header fields by "oversigning":
A header field name need only be listed once more than the actual number of that header field in a message at the time of signing in order to prevent any further additions.
The vagueness in the DKIM standard and the lack of secure defaults combined with the complexity, flexibilty and brokeness of the MIME standard and its implementations makes it possible to spoof important information in the mail like the subject, or even change the whole body including adding new (and potential malicious) attachments.
My research shows that header signing as done in practice is insuffient and makes spoofing possible in many cases. Although in the mails I've analyzed about 97% included the subject in the signature only 3% protected against an additional subject header with oversigning. But for example GMail and AOL webmail implementations and also Thunderbird display the content of the first subject line in case of multiple subject lines while the DKIM signature covers the last subject line only. This way an attacker can easily change the displayed subject without affecting the validity of the DKIM signature.
And, when additionally spoofing the Content-Type (which is covered only by 56% of the signatures in the mails I've analyzed and only protected against extra headers in 2% of the mails) it might also be possible with some clients to show an empty mail body even though there was one before.
For example take the following simple mail:
DKIM-Signature: v=1; h=from:to:cc:subject:content-type; ... From: <email@example.com> To: firstname.lastname@example.org Subject: 20170920:1755 - good Content-type: multipart/mixed; boundary=foo Date: Wed, 20 Sep 2017 17:55:18 +0200 --foo Content-type: text/plain some text --foo--
Using the mail client Thunderbird with the DKIM plugin installed it gets rendered like this:
But, by adding an additional Subject and Content-Type with a different and non-existing boundary on top of the original mail it gets rendered differently:
Subject: Urgent Update at http://foo Content-type: multipart/mixed; boundary=bar DKIM-Signature: v=1; h=from:to:cc:subject:content-type; ... From: <email@example.com> To: firstname.lastname@example.org Subject: 20170920:1755 - good Content-type: multipart/mixed; boundary=foo Date: Wed, 20 Sep 2017 17:55:18 +0200 --foo Content-type: text/plain some text --foo--
Note that the subject is different and the body is vanished but the original DKIM signature is still successfully validated:
Given the right circumstances one can not only spoof essential mail headers but also spoof the body of the mail, including changing the displayed text or adding own attachments. And again, the DKIM signature which should protect against this stays valid.
Such more harmful spoofing can be done if the sender uses the 'l' attribute in the signature to restrict which parts of the body are covered by the signature. This feature is usually used to protect the validity of the signature even if mail servers or filters on the way add their own signatures at the end of the body, i.e. unsubscribe information in mailing lists or something like "this mail was scanned by product XYZ" some antivirus products like to add.
Usually the value of 'l' as set by the sending server covers the whole body. It thus guarantees that no changes are made to the original body but allows changes after the body. But I've also stumbled over some misconfigured system by a large german company where all their DKIM signatures cover only the first 10 bytes of the body, no matter how long the body actually was. Such misconfiguration makes attacks even easier but is not required in most cases.
As an example we take an actual mail send from DHL.com at the beginning of 2016. The DKIM signature still validates successfully in september 2017 since DHL did neither add an expiration to the signature nor did it change the RSA key used for signing. The original mail as seen in Gmail webmail looks like this:
When looking at the source code of the mail below it can be seen, that some fields are covered by the signature but are not protected with oversigning against adding another field. Other important fields are not even covered by the signature. And, the body hash covers only a specific part of the mail so that anything added to the original body will not invalidate the signature.
Specifically this means that we can add another Date, To and Message-Id on top of the mail, change the existing Content-Type and add arbitrary data to the body without invalidating the signature. These changes are shown in red while the original mail is shown in black and blue:
DKIM-Signature: v=1; l=1850; d=dhl.com; s=20140901; h=date:from:to:message-id:subject:mime-version; b=...; bh=... Date: Thu, 24 Sep 2017 19:08:23 +0800 (MYT) Date: Thu, 14 Jan 2016 19:08:23 +0800 (MYT) From: DHL Customer Support <email@example.com> To: firstname.lastname@example.org To: auftrag@original-company-not-shown Message-ID: <email@example.com> Message-ID: <1453648784.9145749.1452769703900.JavaMail...dhl.com> Subject: DHL Shipment Digest MIME-Version: 1.0 Content-Type: multipart/mixed; boundary=BAD Content-Type: multipart/mixed; boundary=----=_Part_9145747_2082645767.1452769703900 ------=_Part_9145747_2082645767.1452769703900 Content-type: text/plain The real DHL Shipment Digest ... ------=_Part_9145747_2082645767.1452769703900 --BAD Content-type: text/plain This is a faked mail with valid DKIM signature from DHL. --BAD--
The magic in replacing the shown body lies in redefining the Content-Type with a different MIME boundary. Anything before this boundary will be treated as MIME preamble and ignored in any MIME compatible mail client (essentially all of todays clients). Which means the resulting mail will show the body added by the attacker instead of the original body:
The DKIM signature is still valid as is shown in the "signed-by" information. Moreover if we look at the source of the mail Gmail provides a nice summary which includes the attacker set Date and Message-Id and also shows the DKIM passes successfully. And since the DKIM signature matches the domain dhl.com of the displayed sender DMARC also passes, even though the mail was not sent through DHL's mail server:
Not only dhl.com is using 'l' inside the DKIM signature and is thus affected by this problem. I've also seen in the past mails from cisco.com, deutschepost.de or dpdhl.com and others.
Interestingly, the authors of the DKIM standard were already kind of aware of the problems with the 'l' attribute. From 8.2. Misuse of Body Length Limits ("l=" Tag):
Use of the "l=" tag might allow display of fraudulent content without
appropriate warning to end users. ... An example of such an attack includes
altering the MIME structure, ...
To avoid this attack, Signers should be extremely wary of using this tag, and Assessors might wish to ignore signatures that use the tag.
Given the known potential for misuse and the coy recommendation to ignore mails using this feature it makes you wonder why this feature was included in the standard in the first place.
The previous chapters have shown how existing mails can be used to create spoofed mails without invalidating the DKIM signature. This undermines the trustability of DKIM, i.e. one cannot be sure that the mail was not spoofed even though the DKIM signature is valid.
But, there is also a problem in the other direction: due to the pecularities of the SMTP protocol it can happen that a DKIM signature gets invalid even though the mail itself was not changed. This means that the mail looks spoofed although it is now spoofed, thus undermining trust in DKIM further.
Traditionally mails are restricted to ASCII only (i.e. 7 bit clean) and a line length of 1000 characters. The MIME standard defines the Content-Transfer-Encoding's base64 and quoted-printable which allow longs lines, non-ASCII characters and also binary data to be presented within the restrictions of the original mail delivery. But, these encodings can be inefficient and it would be much nicer if the client could ignore the historic restrictions and transfer the mail by using the full 8 bit.
This was made possible using the 8BITMIME extension. If a mail server supports this extension the client can ignore the restrictions of ASCII only, although not the restriction of a limited line length. But since mail delivery is not end-to-end but hop-by-hop it can happen that the first mail server (MTA) in the path supports 8BITIME and accepted such a mail, while another MTA in the path does not support 8BITMIME. In this case the sending MTA needs to convert the mail to ASCII-only, i.e. within the historic restrictions. Unfortunately, this conversion breaks any existing DKIM signatures:
This problem is not new. In fact the DKIM standard itself mentions in section 5.3 this problem and shows how to deal with it:
Some messages, particularly those using 8-bit characters, are subject to modification during transit, notably conversion to 7-bit form. Such conversions will break DKIM signatures. In order to minimize the chances of such breakage, Signers SHOULD convert the message to a suitable MIME content-transfer encoding such as quoted-printable or base64 as described in [RFC2045] before signing. Such conversion is outside the scope of DKIM; the actual message SHOULD be converted to 7-bit MIME by an MUA or MSA prior to presentation to the DKIM algorithm.
Still, several major senders seem to be not fully aware of the issue and thus are affected from this conversion problem. I have for example mails from Paypal or Booking.com affected from this problem, although most of their mails seem to be fine. And, there are major mail providers which don't support 8BITMIME and are thus affected as recipient by this problem. This includes for example 1&1 (i.e. kundenserver.de, Web.de, GMX,...) and AOL. But also providers of security services around mail like Mimecast or Spamfence are affected, which might use an invalid DKIM signature as an indicator of detected spoofing and classify the message accordingly.
While the DKIM standard tries to shift most of the work in fixing such problems to the recipient, history shows that this does not work. Instead both sides should do their best: The sender should make sure that the mail cannot be changed without breaking the signature in the first place. And the recipient should check if the signature is good enough so that the mail is definitely not spoofed.
On the sender side this means first to make sure that the mail conforms to the historic restrictions mails have, i.e. all-ASCII and a line length of at most 1000 characters. If the mail is not there yet it needs to be converted before any DKIM signature gets added.
The signature itself need to include all mail headers which might affect the display of the message. Each of these should be oversigned to protect against an attacker adding extra headers. The headers which obviously needs to be signed are any headers directly displayed to the user, i.e. Subject, From, To, Date and Sender. Additionally any headers affecting the display of the message should be included, i.e. Content-Type, Content-Transfer-Encoding, Content-Disposition and Mime-Version. And there are also headers which affect the future message flow or how this message is displayed in the context of others, i.e. Reply-To, In-Reply-To and References. It might also be useful to add the length of the body with the 'l' attribute as long as all headers which might affect the display of the message are included in the signature and oversigned.
On the recpient site it should be checked that each relevant header is actually included in the signature. Any headers which are not included in the signature should be treated with outmost care and should better not be relied on when displaying the message. Given that this is not possible in many cases one should at least signal the user that the DKIM signatures does not include critical headers and that the message thus might be spoofed even if the signature looks valid. Also, if the 'l' attribute is set, only the part covered by the limited body hash should be shown to the end user or the part outside the hash should be explicitly displayed as untrusted.
Of course, the best would be if all senders would sign their messages using S/MIME or PGP and all clients would check this end-to-end signature. But this is probably a dream only for the next years and thus we need to make the current workarounds like SPF, DKIM and DMARC to be more reliable.
DKIM tries to address sender spoofing by having the sending MTA sign the mail. While the idea is sound in theory the standard is overly flexible. It only issues vague recommendations and then relies on the specific implementation and configuration to provide the necessary security and resiliance. Given the lack of clear requirements and secure defaults it is no surprise that DKIM as used in practice fails to provide the expected trust in many cases.