The RFC822 validation monster

November 4th, 2012

I recently saw the RFC822 valid email address regex again, raised on a Hacker News post. It’s enormous - too long to make any sense of. Here are the first 3 lines (of 82):

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:

\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(

Validating email addresses has always been a pain and there is no perfect solution. Even if you were to pass the above validation, there’s no guarantee the provided address actually exists.

Fortunately there was some good advice for the sanest way to deal with addresses. To summarise:

  • The only way to know if an email is valid is to send one.
  • There is a recommended regex in the HTML5 spec.
  • A really simple approach is check for a character either side of an @ before sending an email.
    To get an idea of the kinds of addresses which lead to such a monstrous regex, check out this list.

Credit to Jabbles, 3ds 37 and meaty for raising the above points.

Photo credit to Markus Spiske