Sendmail Return-path, php, and the importance of the variables you don't think of

{% raw %}
I got called in for an interesting issue on one of the servers I co-maintain today.  Sites were messing up left right and center, in all sorts of bizarre ways.  CSS wouldn't be delivered, or half of it would be, or some of the content would disappear... and different on every page load!  It only took me a moment to confirm - the server was out of space on the root device.

This was readily visible in the logs - any mysql transaction you tried would return "Got error 28 from storage engine", which typically means the drive is out of space.  And a quick df -ht confirmed:


root@production:/# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.9G  9.4G  160K 100% /
none                  7.3G  128K  7.3G   1% /dev
none                  7.6G     0  7.6G   0% /dev/shm
none                  7.6G   76K  7.6G   1% /var/run
none                  7.6G     0  7.6G   0% /var/lock
none                  7.6G     0  7.6G   0% /lib/init/rw
/dev/sdb              414G  5.9G  387G   2% /mnt
/dev/sdf               50G  8.8G   42G  18% /ebs

Yep, /dev/sda1 was at 100% .  So I ran through the usual suspects, and tracked it down to /var/log pretty quickly:


root@production:/var# du -h |grep G
7.2G ./log
7.8G .

Sweet noodly goodness, 7 gigs of logs?  

root@production:/var/log# ls -lh |grep G
total 6.6G
-rw-r----- 1 syslog    adm  2.2G 2011-05-20 18:51 mail.info
-rw-r----- 1 syslog    adm  2.2G 2011-05-20 18:51 mail.log

Specifically, the mail logs were taking up all the space.  The first order of business was to get them off the root device, so I moved them to /mnt , the freebie extra-ethereal EBS volume that Amazon gives you with every S3 instance.  Then I could restart mysql/apache2, and at least the sites would be back up and normal. So what was going on in these logs?




root@production:/var/log# tail /mnt/mail.log

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGv30H001061: to=<www-data@production.swearing.com>, delay=1+01:54:53, xdelay=00:00:00, mailer=esmtp, pri=13980000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGuuWt000738: to=<www-data@production.swearing.com>, delay=1+01:55:00, xdelay=00:00:00, mailer=esmtp, pri=13980000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGvCZe001480: to=<www-data@production.swearing.com>, delay=1+01:54:44, xdelay=00:00:00, mailer=esmtp, pri=13980000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGuf9L032603: to=<www-data@production.swearing.com>, delay=1+01:55:15, xdelay=00:00:00, mailer=esmtp, pri=13980000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGv5qh001300: to=<www-data@production.swearing.com>, delay=1+01:54:51, xdelay=00:00:00, mailer=esmtp, pri=13980000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGvPKQ002038: to=<www-data@production.swearing.com>, delay=1+01:54:31, xdelay=00:00:00, mailer=esmtp, pri=13980000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGvHfI001678: to=<www-data@production.swearing.com>, delay=1+01:54:39, xdelay=00:00:00, mailer=esmtp, pri=13980000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGvMxj001908: to=<www-data@production.swearing.com>, delay=1+01:54:34, xdelay=00:00:00, mailer=esmtp, pri=13980000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

May 20 18:51:56 ip-10-118-81-22 sm-mta[21749]: p4JGvHgB001693: to=<www-data@production.swearing.com>, delay=1+01:54:3May 20 18:53:11 ip-10-118-81-22 sm-mta[21749]: p4FJ5XYp028180: to=<www-data@production.swearing.com>, delay=4+23:40:11, xdelay=00:00:00, mailer=esmtp, pri=64650000, relay=production.swearing.com., dsn=4.0.0, stat=Deferred: Connection timed out with production.swearing.com.

OK, clearly something is looping delivery failures.  This server hosts several websites, without their accompanying email, so it's configured to force an MX lookup before attempting to send anything anywhere.  But it doesn't receive mail, so it would have just timed out waiting for itself to respond. 

This is a problem, but that's a condition that should never happen.  So what on earth would make it try to  deliver mail to production.swearing.com?  I headed over to the mail queue to see.  The output of mailq was enormous, and I had to Ctrl+C it partway through.  Every record was "Deferred" due to a timeout on production.swearing.com .   OK, so clearly there's a lot in the queue.  So I had a look at some of the messages.


root@production:/var/spool/mqueue# cat dfp4I0uHUF025487 

This is a MIME-encapsulated message

--p4I0uHUF025487.1305680177/ip-10-118-81-22.ec2.internal

The original message was received at Wed, 18 May 2011 00:56:14 GMT
from localhost [127.0.0.1]

   ----- The following addresses had permanent fatal errors -----
<asf@gmail.com>
    (reason: 550-5.1.1 The email account that you tried to reach does not exist. Please try)

   ----- Transcript of session follows -----
... while talking to gmail-smtp-in.l.google.com.:
>>> DATA
<<< 550-5.1.1 The email account that you tried to reach does not exist. Please try
<<< 550-5.1.1 double-checking the recipient's email address for typos or
<<< 550-5.1.1 unnecessary spaces. Learn more at                             
<<< 550 5.1.1 http://mail.google.com/support/bin/answer.py?answer=6596 dz10si728275vdc.78
550 5.1.1 <asf@gmail.com>... User unknown
<<< 503 5.5.1 RCPT first. dz10si728275vdc.78

--p4I0uHUF025487.1305680177/ip-10-118-81-22.ec2.internal
Content-Type: message/delivery-status

Reporting-MTA: dns; ip-10-118-81-22.ec2.internal
Received-From-MTA: DNS; localhost
Arrival-Date: Wed, 18 May 2011 00:56:14 GMT

Final-Recipient: RFC822; asf@gmail.com
Action: failed
Status: 5.1.1
Remote-MTA: DNS; gmail-smtp-in.l.google.com
Diagnostic-Code: SMTP; 550-5.1.1 The email account that you tried to reach does not exist. Please try
Last-Attempt-Date: Wed, 18 May 2011 00:56:17 GMT

--p4I0uHUF025487.1305680177/ip-10-118-81-22.ec2.internal
Content-Type: text/rfc822-headers

Return-Path: <www-data@production.swearing.com>
Received: from production.swearing.com (localhost [127.0.0.1])
by ip-10-118-81-22.ec2.internal (8.14.3/8.14.3/Debian-9.1ubuntu1) with ESMTP id p4I0uEUF025482
for <asf@gmail.com>; Wed, 18 May 2011 00:56:14 GMT
Received: (from www-data@localhost)
by production.swearing.com (8.14.3/8.14.3/Submit) id p4I0uEcp025480;
Wed, 18 May 2011 00:56:14 GMT
Date: Wed, 18 May 2011 00:56:14 GMT
Message-Id: <201105180056.p4I0uEcp025480@production.swearing.com>
To: asf@gmail.com
Subject: Swearing at Computers :: new comment for your post.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed; delsp=yes
Content-Transfer-Encoding: 8Bit
X-Mailer: Drupal
Errors-To: info@swearing.com
Sender: info@swearing.com
From: info@swearing.com

--p4I0uHUF025487.1305680177/ip-10-118-81-22.ec2.internal--


They were all in this vein.  So for some reason, the swearing.com site was trying to send out a comment notification to a non-existent email address, which was getting rejected (rightly so).  A little inspection in Drupal bore this out: asf@gmail.com, and many other addresses from these queued messages, were being subscribed to comment notifications.  There is a comment form somewhere on swearing.com that allows people to sign up for notifications without going through a CAPTCHA.  I took this up with the owner of the site immediately.

But there's still another layer of problem here.  Why were delivery failures not being sent to the from: address?  A little research led me to the appropriate RFP.  The Return-path header is added by the MTA at the last possible moment, specifically as an address for delivery errors.  It's also used as an indicator when the message is being sent from this host or just relayed.  And by default, sendmail uses the return-path of user@hostname.  

The solution is to have a real email address set up somewhere for this kind of error.  You can modify your php.ini (or apache config) to set the default return-path pretty easily.  Simply add this to your php.ini :

sendmail_path "/usr/sbin/sendmail -t -i -f servers@trellon.com"
That's it! Now hopefully I got all the spammy addresses out of that contact form... or servers@swearing.com is gonna have a lot of mail waiting on Monday!
{% endraw %}
comments powered by Disqus