Multiple implementations on email address validation.
MSDN, Reference Source, Phil Haack and JStedfast implementations on email address validation.
using EmailAddressLibrary;
EmailAddressValidator.Msdn("[email protected]"); // True
EmailAddressValidator.ReferenceSource("[email protected]"); // True
EmailAddressValidator.Haacked("[email protected]"); // True
EmailAddressValidator.JStedfast("[email protected]"); // True
It's not perfect (and probably no email address validation code will ever be).
False negatives:
False positives:
Here's what I did with the Reference Source implementation:
MailAddressParser
class:public
MailAddressParser(string data)
to public
ParseMultipleAddresses(string data)
ParseAddress(string data)
and ParseAddress(string data, bool expectMultipleAddresses, ref int index)
to voidSR
:SR
from corefx/src/System.Net.Http/src/Resources/Strings.resx
SR.GetString
for string.Format
No soup package for you
To validate an email address... should be easy, right? C'mon, we saw those everywhere.
I bet there's a great NuGet package out there waiting for me.
(searches NuGet)
Okay, no package. That was a let down.
Let's make one, how hard could it be? After all an address is just "local part + @ + domain part". And I'm going to man up and go straight to the RFC, no Wikipedia or Stack Overflow, that's kid stuff.
(reads RFC)
Shit. A email with a space is valid if you precede it with a backslash. Same for an "at" sign. Now you are telling me that I can surround part of the local part with quotes, where there's no need for backslash.
"dear programmer, good luck h@h@"@somedomain.com
is valid.
I'm out.
And looks like I'm not alone: 1 2 3. Ah, and by the way, here is one regex I found:
/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5
C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\
x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:
\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn
--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:
(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a
-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})
(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|
(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/iD
Searching around for an algorithm I found this MSDN snippet. MSDN is Microsoft, these guys probably know what they're doing.
(reads code)
Yuck, look at that.
The thing isn't even thread safe: there's an invalid
field being read/written in each call (state is bad, mmkay?).
But hey, nobody's perfect. Since it's just for this little utility of mine (and not a production system), let's fix the thread safety, create the package and call it a day.
Guess what? Now I have to use the thing in a production system™. And, to my surprise, there's a first issue (!). Of course, the surprise is not that there's an issue with the code (not at all), it's that there's someone else besides me using this little thing.
That MSDN code is bad and I'm not in the mood of writing such a beast of a parser. What now?
Wait a second. MailAddress does this already (throws when it's invalid). I even forgot that I've grumbled about it before on my blog.
Ah, open source is a beautiful thing. There's an internal class called MailAddressParser just for that. Let's borrow it.
(reads MailAddressParser.cs)
Sadly, it interfaces with outside code through exceptions (FormatException to be more precise). It's bad from the optimization side, since exception handling is not very performatic (when used with high volume stuff might be an issue) and also bad from the philosophical side, after all finding format errors is the whole point of the class (nothing exceptional there). But that's just me being picky.
So there's a second issue. Yep, validating email address is a lost battle.
So in this (last) version I've put several implementations for you to choose, the first one (MSDN), the second one (Reference Source) and also Phil Haack and JStedfast implementations.