Now Reading
updown.io – Web site monitoring, easy and cheap

updown.io – Web site monitoring, easy and cheap

2024-01-02 03:27:13

Blog  〉The humorous guidelines of SpamAssassin in 2023 (deep dive)

This investigation was stunning to me so I believed it could be attention-grabbing to share my findings and I hope you will prefer it.

A few of my shoppers often reported that the updown affirmation e-mail (used to verify a brand new e-mail tackle, supplied by Devise) had been labeled as spam, we’re speaking about this one:

confirmation email screenshot

Does not look too spammy up to now however typically mails servers working SpamAssassin have been certainly reporting a ranking above 5 on its “Spam-Rating”. 5 being the default threshold from SpamAssassin to contemplate an e-mail as spam. If now we have entry to the uncooked e-mail with headers, that is one thing we are able to typically see simply (actual instance supplied by one consumer):

So I began investigating why SpamAssassin was making use of these guidelines to this e-mail and oh boy I wasn’t prepared for what I discovered 😅

I first tried reproducing the issue regionally by putting in SpamAssassin and working some checks on the very same e-mail from that consumer (instance directions used on Ubuntu 22.04):

> sudo apt set up spamassassin

> spamassassin -V
SpamAssassin model 3.4.6
  working on Perl model 5.34.0

> spamassassin -t < confirmation-instructions.eml
# ...
Content material evaluation particulars:   (0.6 factors, 5.0 required)

 pts rule identify              description
---- ---------------------- --------------------------------------------------
-1.0 RCVD_IN_MSPIKE_H5      RBL: Wonderful repute (+5)
                            [104.245.209.212 listed in wl.mailspike.net]
-0.0 SPF_HELO_PASS          SPF: HELO matches SPF document
 0.7 HTML_IMAGE_ONLY_28     BODY: HTML: photos with 2400-2800 bytes of
                            phrases
 0.0 HTML_MESSAGE           BODY: HTML included in message
-0.1 DKIM_VALID             Message has a minimum of one legitimate DKIM or DK signature
-0.1 DKIM_VALID_AU          Message has a legitimate DKIM or DK signature from
                            creator's area
 0.1 DKIM_SIGNED            Message has a DKIM or DK signature, not essentially
                            legitimate
-0.0 RCVD_IN_MSPIKE_WL      Mailspike good senders
 1.0 URI_PHISH              Phishing utilizing internet kind

Dissapointingly the consequence was very totally different and the rating very low. We might nonetheless see the identical impacting guidelines although (HTML_IMAGE_ONLY_28 and URI_PHISH) however with decrease scores.

I additionally tried with the -Lt choices which suggests “local-only check” (no calls to distant servers, on-line blacklists, and so on.) and in that case there’s fewer assessments as anticipated but it surely will increase the rating of others:

> spamassassin -Lt < confirmation-instructions.eml
# ...
Content material evaluation particulars:   (3.8 factors, 5.0 required)

 pts rule identify              description
---- ---------------------- --------------------------------------------------
 0.0 HTML_MESSAGE           BODY: HTML included in message
 2.8 HTML_IMAGE_ONLY_28     BODY: HTML: photos with 2400-2800 bytes of
                            phrases
 1.0 URI_PHISH              Phishing utilizing internet kind

That is more likely to make up for the truth that there’s much less sign for use so they should amplify the one obtainable indicators to be able to attain the spam rating threshold of 5 earlier, I assume.

When you do/can have an area DNS resolver, I might suggest ensuring you allow network rules for extra dependable outcomes. If utilizing spampd, that is configured with LOCALONLY=0 in /and so on/default/spampd

So despite the fact that the scores have been decrease, I knew they may very well be multiplied for some causes and likewise by configuration so higher see if I can keep away from the e-mail being flagged as HTML_IMAGE_ONLY_28 and URI_PHISH solely to get rid of the issue.

I first wrote a fast technique to check these emails spam scores in my specs (utilizing spamd the daemon model of SpamAssassin, and spamc the command-line consumer). So as to have the ability to iterate and check modifications shortly, but additionally to keep away from regressions sooner or later. Future modifications of my emails or future variations of SpamAssassin:

def spam_check e-mail
  # Utilizing spamc/spamd (daemon) if obtainable, a lot quicker
  cmd = "spamc --full --connect-retries=1"
  # Utilizing spamassassin (standalone cmd), slower however helps local-only possibility
  # cmd = "spamassassin -Lt"
  # Inject Obtained header to set off extra guidelines like __VIA_ML (return-path accommodates "bounces@")
  stdin = "Obtained: by mta212a-ord.mtasv.internet id h6qj0s27tk4a for <#{e-mail.to.first}>; #{e-mail.date.rfc2822} (envelope-from <pm_bounces@bounce.updown.io>)n" + e-mail.to_s
  out, err, standing = Open3.capture3(cmd, stdin_data: stdin)
  if out == "0/0n"
    skip "spamd just isn't working: `sudo systemctl begin spamassassin.service`"
  elsif standing.success?
    # minor processing to have steady guidelines orders and take away width restrict
    headers, guidelines = out.chomp.break up("--n")
    guidelines.gsub!(/ns{5,}/m, " ")
    return headers + "n" + guidelines.traces.type.be part of
  else
    increase Error.new("Command `#{cmd}` exited with standing #{standing.to_i}: #{err}")
  finish
rescue Errno::ENOENT => e
  skip "SpamAssassin not put in: #{e.to_s}"
finish
require "rails_helper"

describe UserMailer do # Devise inherited mailer
  let(:consumer) { create :consumer }
  let(:e-mail) { ActionMailer::Base.deliveries.final }

  describe '#confirmation_instructions' do
    topic { consumer }

    it "passes spam verify" do
      topic
      count on(spam_check(e-mail)).to embrace(<<~REPORT)
        Content material evaluation particulars:   (0.0 factors, 5.0 required)

         pts rule identify              description
        ---- ---------------------- ------------------------------------------------
         0.0 HTML_IMAGE_ONLY_32     BODY: HTML: photos with 2800-3200 bytes of phrases
         0.0 HTML_MESSAGE           BODY: HTML included in message
        -0.0 NO_RELAYS              Informational: message was not relayed by way of SMTP
      REPORT
    finish
  finish
finish

Now let’s take a look at these two guidelines. It is onerous to seek out clear definitions typically however happily SpamAssassin is open source so the place there’s a will there is a means.

HTML_IMAGE_ONLY_24

This one is the best and probably the most self-explanatory, it merely checks if the e-mail accommodates a picture (it does, the updown.io emblem) and if the content material is between 2000 and 2400 bytes. So principally if the e-mail is brief and has a picture, it is extra more likely to be spam (that is due to spam e-mail which conceal textual content as photos to keep away from filters). Solely two choices right here:
1. Take away the picture
2. Enhance the content material size

I select the later to maintain a constant look and likewise due to the second rule. Ultimately I solely elevated it a bit and now it matches the HTML_IMAGE_ONLY_32 rule, this rule scores 2.2 in local-only testing however 0 (surprisingly) when community check are enabled. (If we comply with the identical logic as HTML_IMAGE_ONLY_24, it ought to have been 2.2/4 ≃ 0.55)

Eliminating this rule would require way more textual content bloat or dishonest (invisible textual content, and so on..) and it matches extra of my emails, so for the second I made a decision to go away it like that and look ahead to the following downside. 2.2 just isn’t sufficient on it is personal to journey the spam threshold (5) and hopefully spamassassin will enhance this half earlier than I have to hack round it.

URI_PHISH

Now for probably the most attention-grabbing half, after some on-line search I first found this which appears to be a plugin checking for URL in opposition to a blacklist, but it surely offers the URI_PHISHING rule (not precisely the identical) and I did not set up any plugin, so this isn’t the one.

I then discovered this very interesting report in 2021 a couple of comparable affirmation e-mail receiving a “false constructive” classification as URI_PHISH, and the official reply was:

It isn’t primarily based on “phishing URLs” or the particular hyperlink, it is primarily based on having physique textual content that appears like account phishing and having a URL. The physique textual content that appears suspiciously like phishing is, unsurprisingly, “affirm your account”.

See Also

As Loren mentioned, this isn’t a FP, as the overall rating for the message didn’t exceed the spam threshold. This can be a single-rule hit on spammy-looking content material with out different indicators to help it. That occurs.

It isn’t a bug {that a} given rule will hit some ham. The one suggestion I can supply is that you simply reword your message to make it look much less like phishing.

So let’s skip over the truth that it’s now very unhappy that anti-spam filters have to dam any easy affirmation e-mail simply because scamers are efficiently abusing folks with them…

That piqued my curiousity: what are they on the lookout for within the e-mail precisely? how can I guarantee that the change I make will not be matched by one other rule or sooner or later? (sure we sadly must assume like scammers now to be able to get our common e-mail accepted…)

So by looking for URI_PHISH into the code I ended up on this big rules file which does comprise this (extract barely simplified):

meta        __URI_PHISH    __HAS_ANY_URI && !__URI_GOOGLE_DOC && !__URI_GOOG_STO_HTML && (__EMAIL_PHISH || __ACCT_PHISH)
meta        URI_PHISH      __URI_PHISH && !ALL_TRUSTED && !__UNSUB_LINK && !__TAG_EXISTS_CENTER && !__HAS_SENDER && !__CAN_HELP && !__VIA_ML && !__UPPERCASE_URI && !__HAS_CC && !__NUMBERS_IN_SUBJ && !__PCT_FOR_YOU && !__MOZILLA_MSGID && !__FB_COST && !__hk_bigmoney && !__REMOTE_IMAGE && !__HELO_HIGHPROFILE && !__RCD_RDNS_SMTP_MESSY && !__BUGGED_IMG && !__FB_TOUR && !__RCVD_DOTGOV_EXT 
describe    URI_PHISH            Phishing utilizing internet kind
rating       URI_PHISH            4.00   # restrict

Okay so we now have an entry level which accommodates MANY different guidelines after all (a few of which additionally accommodates different guidelines). I checked ALL of them for you ^^ and listed below are my most attention-grabbing findings:

First within the constructive guidelines, which must be true:

  • __HAS_ANY_URI → easy regexp on /^w+:///
  • __EMAIL_PHISH || __ACCT_PHISH → these the sub guidelines the place the principle “phishing” heuristics occurs
    • __WEBMAIL_ACCT, __MAILBOX_FULL, __MAILBOX_FULL_SE, __CLEAN_MAILBOX, __VALIDATE_MAILBOX, __VALIDATE_MBOX_SE, __UPGR_MAILBOX, __LOCK_MAILBOX, __SYSADMIN, __ATTN_MAIL_USER, __MAIL_ACCT_ACCESS1, __MAIL_ACCT_ACCESS2, __ACCESS_REVOKE, __PASSWORD_UPGRADE, __PENDING_MESSAGES, __RELEASE_MESSAGES, __PASSWORD_EXP_CLUMSY → these are all regexps for typical e-mail scams (mailbox full, click on right here to regain entry to your account, and so on…), nothing matching in my e-mail.
    • __PDS_FROM_NAME_TO_DOMAIN ⚠️ this one is attention-grabbing, it triggers if the From identify is the same as the To area (for instance if the emails is From “instance.com” To “adrien@example.com“). → it’s because many rip-off use that to make it appear like the e-mail comes out of your “area administrator”. It wasn’t the case for me right here, however ensure you do not do this.
    • __VERIFY_ACCOUNT → ✅ that is the one matching our e-mail so I needed to change the wording to keep away from it. The regexp is: /(?:affirm|up to date?|verif(?:y|ied)) (?:your|the) (?:(?:account|present|billing|private|on-line)? ?(?:data?|info|account|identification|entry|information|login)|"?[^@s]+@S+"? (?:account|mail ?field)|affirm verification|confirm ok?now|Ihre Angaben .berpr.ft und finest.tigt)/i
    • __FAILED_LOGINS, __ACCOUNT_REACTIV, __SECURITY_DEPT, __ACCOUNT_ERROR, __ACCOUNT_DISRUPT, __ACCOUNT_UPGRADE, __ACCOUNT_SECURE, __SUSPICION_LOGIN, __ACCESS_SUSPENDED, __ACCESS_RESTORE, __ACCESS_REVOKE → one other set of regexp for traditional account scams primarily based on concern, I made positive my “account locked” e-mail doesn’t match any of these.

Now let us take a look at all of the unfavorable guidelines right here (beginning with a !) which are supposed to exclude content material (if this rule is true, then the URI_PHISH rule will NOT apply):

  • !__URI_GOOGLE_DOC and !__URI_GOOG_STO_HTML → regexp on docs.google.com and storage.googleapis.com, they acquired their very own particular rule so are excluded right here.
  • !ALL_TRUSTED → that is for while you configure some inner e-mail servers as “trusted”, not relevant right here
  • !__UNSUB_LINK → ⚠️ Additionally attention-grabbing, this one tries to match unsubscribe hyperlinks with /b(?:(?:un)?subscri(?:ber?|ptions?)|abuses?|decide(?:ing)?.?out)b/i. That is good to know that merely having an unsubscribe hyperlink might stop URI_PHISH, however sadly for an account affirmation e-mail you’ll be able to’t actually “unsubscribe” folks, this isn’t a mailing listing or on-boarding e-mail. In any other case this is able to have a been a superb possibility to enhance each the spam rating and the consumer expertise.
  • !__VIA_ML → this rule checks if the envelope-from/return-path accommodates “bounces@” to declare it is a “Mailing Record”. In my case utilizing Postmark that is the case and can’t be custom-made sadly (solely the area: pm_bounces@bounce.updown.io). So I assume you must keep away from utilizing “bounces@” in your return path addresses for transactional emails when you can…

And now let’s take a look at my favorites: the completely WTF guidelines 😱:

  • !__TAG_EXISTS_CENTER → this rule simply checks for the presence of a <middle> tag. So when you add one, magically your e-mail is not URI_PHISH… WAIT, WHAT? Certainly in case your e-mail is centered the outdated means, then it is not phishing (examined regionally).
  • !__HAS_SENDER → when you add ANY Sender header, the URI_PHISH rule is skipped… The purpose of the Sender header is for providers sending emails on behalf of different customers, it helps for authentication validations. However anyone can put something in right here, so there is not any cause to contemplate an e-mail “much less phishing” as a result of it accommodates this header. (examined regionally)
  • !__CAN_HELP → even easier, it will skip the rule if the e-mail accommodates “may help”… (examined regionally)
  • !__UPPERCASE_URI → fairly self-explanatory
  • !__HAS_CC → what? why?
  • !__NUMBERS_IN_SUBJ → OK so greater than 3 digits in topic line additionally helps… /d{3}/
  • !__FB_COST → that is one checks for the phrase… “price”. Yep, simply that. Put it in an e-mail and abruptly it is not phishing… (examined regionally)
  • !__FB_TOUR → equally this one checks for the phrase “tour”…

It is possible that a few of these guidelines are solely right here to exchange URI_PHISH by one other yet one more particular possibly (like we noticed the case with Google Doc URLs), however nonetheless on this state it is fairly straightforward to take advantage of them and in my testing regionally, utilizing these phrases to set off these guidelines did not trigger different spam guidelines to look…

Which signifies that ultimately now we have a spam filter which could be very straightforward to idiot, but simply tripped by sincere emails…

What I modified ultimately

  1. I attempted altering the return-path to keep away from “bounces@” however could not do it with Postmark sadly.
  2. I didn’t need to ~use~ exploit any of the silly hacks like “price” or “<middle>”.
  3. I modified the wording of the e-mail to make it longer and keep away from the frequent phrase mixtures matched within the regexp (see screenshot under for the brand new model)
  4. I additionally added a Sender header (just for some emails and with the identical worth as From) to be able to please the foundations as a result of this one does not look too hackish, however I nonetheless do not feel nice about this 🙃.

New e-mail

new email screenshot

Created on December 04, 2023 · Suggest changes to this page

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top