find us on facebook!
 

Technical Description of HSV

This section describes HSV operation algorithms. Knowing these algorithms, one can get more from the product.

Based on the checking results, HSV sorts all the addresses into four groups - existing addresses, addresses which checking failed (unchecked), non-existing addresses, and addresses with wrong syntax. Please note that incorrect and non-existing address cannot be saved in different files on end of the verifing process — they are always saved in the same file — the file with non-existing addresses.

To check an e-mail address, HSV performs the following operations. First, it analyses address syntax and identifies the mail domain. In the domain name, the first-level domain is identified (e.g., .com for mail.com) and checked using the first-level zone list (the file zones.txt in the program folder). If analysis of address syntax fails, or its first-level domain is not found in the list, the address is deemed invalid.

After that, HSV inquires DNS server about domain mail server address. If the list of servers that receive mail for the domain under consideration is obtained, the address is deemed valid. If no such address is found in DNS or there are no mail servers that receive mail for it, the address is deemed invalid. If DNS server failed to return any reply (as you know from the first section, queries are normally executed recursively, and a situation when DNS servers responsible for the domain queried about are physically inaccessible is quite probable), the address is considered unchecked.

HSV keeps domain checking results in its inner cache, so if another address from the same domain is found in a mailing list, it will take the result immediately from the inner cache instead of querying a DNS server again. The cache size is limited only by the RAM size on a machine. To store one domain checking result, 40 bytes in RAM is required. So, to store results of checking a million of different domains, 40 Mb is needed. Time of search in the cache practically doesn't depend on its size.

Checking of a list with million addresses from one domain by HSV takes 2..4 minutes on a modern machine. So, the difference in checking time of a list with 50000 addresses from unique domains and a list with million addresses from 50000 unique domains will not exceed 2..4 minutes, and in fact it might be just a few seconds (because of multithreaded operation of the program).

That's why HSV operating rate is higher with longer lists. HSV might spend a couple of hours on checking the first million addresses from a list, and ten minutes on checking each next million addresses from the same list.

HSV operating rate greatly depends on quality of the list of DNS servers it works with. When checking an address, HSV sends query to a DNS server, and if there is no reply in 30 seconds, it sends a query once again. If there is no reply after three attempts, the check stops with the "Timeout" error. HSV makes five attempts to check an address if its checking fails. DNS server for each next attempt is chosen from a list in a random way. If all the five attempts failed, the address is marked as unchecked.

The more DNS servers there are in the list HSV works with, the smaller is the probability that failure of one or two DNS servers would influence operating rate of the program.

HSV works in a multithread mode, you can set the thread number from 1 to 600. One thread is used to check one address. So, if 600 threads are used, 600 domains are checked simultaneously, and HSV sends up to 15000 queries to DNS servers per minute, with peak traffic up to 700 kbps. Loading one DNS server so heavily looks very much like a hackers' attack aimed at disrupting its operation. Therefore DNS server software system might block you until the reason is checked by system administrator. It is also possible that DNS server will accept a limited number of queries from your address per second, and ignore the rest packets to reserve some free resources for processing queries from other users. In that instance, HSV operating rate will sharply drop down, addresses will be checked several times, because previous checking attempts were terminated on timeout.

So, if your network connection is capable of maintaining operation of over 50 threads, you need to have a list of DNS servers with one server for each 10 threads. Then you can be sure that servers won't fail due to overload.

Multithread applications work in different ways under different Windows versions. Windows XP easily manages 600 threads, and processor load increases just slightly. Older systems like Windows 98 and Windows NT4 are quite sensitive to great number of threads, so a hundred of threads is enough to load a processor heavily. We recommend using HSV on machines working under Windows XP to ensure the best operating rate.

 
(c) EMMA Labs, 2017 | No Spam Policy