Address Verification Technologies
To check whether an e-mail address exists or not, it is necessary to perform the same two phases as a mail server does to deliver a message to a recipient (see the previous section). First, we need to find out address of the server that receives messages for the recipient. Then, we have to connect to the mail server and ask it if it can receive a message for the user with that particular address.
Unfortunately, this method allows detecting no more than about 2/3 of invalid addresses. The problem is that some mail servers receive all messages for their mail domains, but if a mailbox doesn't exist, a server notifies the sender via e-mail that the message is undeliverable.
Current statistics show that about 30% of detectable 2/3 of dead addresses can be detected in the first phase, and 70% can be detected in the second phase. On the average, the second stage takes 10 times longer and involves 5 times greater network traffic compared to the first phase. In fact, the two-stage checking requires as much time and traffic as sending of a small message to the address being checked.
Consider the both phases in more details. In the first phase, checking software analyses e-mail address syntax, identifies mail domain and inquires DNS server about mail server address for that domain. For interaction with DNS server UDP protocol is used, this protocol is faster than TCP, because it is not oriented to establishing connection between servers. Normally, DNS server inquiring time doesn't exceed 1..2 seconds. During that time, one packet with the query is sent (about 60 bytes including the packet heading) and one packet with the answer is received (it's size doesn't exceed 512 bytes; normally it's no more than 200..300 bytes). Obviously, in this phase all addresses with wrong syntax and address with non-existent domains are screened.
In the second phase connection is established with a mail server using the SMTP protocol (based on TCP). TCP is oriented to establishing connection, therefore the servers involved in the process first send service packets to establish connection. Once the connection is established, the servers exchange greetings (see the first three lines in the log below); then, the sender's address is submitted, and the receiving server confirms its readiness to receive a message from that address; after that, message recipient's address is submitted:
< 220 ns.watson.ibm.com ESMTP Sendmail AIX4.3/8.9.3/8.9.0; Thu, 22 Aug 2002 20:44:07 > HELO cisco.my.net < 250 ns.watson.ibm.com Hello cisco.my.net [188.8.131.52], pleased to meet you > MAIL FROM:<email@example.com> < 250 <firstname.lastname@example.org>... Sender is valid. > RCPT TO:<email@example.com> < 550 <firstname.lastname@example.org>... User unknown > RSET < 250 Resetting the state. > QUIT
In this instance, the receiving server answered that user with the address email@example.com was unknown to it and refused to receive the message. After that, the serves exchanged commands to terminate the connection.
While checking the address, the servers sent to each other 10 messages with total size about 500 bytes; but to send all those messages, they had to exchange over 20 packets, so the total traffic was about 2K. Of note, most of the action time was spent on waiting for reply from the other server.
We are pleased to offer you two software products designed to check e-mail addresses for existence Advanced Maillist Verify (AMV), which does the two-phase checking, and High Speed Verifier (HSV), which only performs the first phase.
AMV is helpful when you need to thoroughly check relatively small mailing lists (containing no more than 50..100 thousand addresses). Advanced Maillist Verify is also capable of checking addresses in databases, address books of popular applications, it has COM/ActiveX interfaces for integration into various software systems and CGI/ISAPI modules for simpler integration into wed-servers. However, technical principles underlying AMV interface solutions don't allow using it for longer lists.
High Speed Verifier is offered as a solution for quick removal of garbage from lists with millions of addresses. For purely technical reasons, its operating rate is 10..15 times greater than that of AMV with relatively small lists, and with lists containing millions of addresses the difference in operating rate might be up to thousands of times. Growth of the HSV operating rate with longer lists is ensured by the fact that HSV stores the results of all queries to DNS servers in RAM cache, so, with longer lists the rate of cache hits is greater.