Membership is FREE, giving all registered users unlimited access to every DNForum feature, resource, and tool! Optional membership upgrades unlock exclusive benefits like profile signatures with links, banner placements, appearances in the weekly newsletter, and much more - customized to your membership level!

Need Good tool for Zone queries

Status
Not open for further replies.

kengreenwood

Platinum Lifetime Member
Legacy Exclusive Member
Joined
May 11, 2006
Messages
332
Reaction score
3
OK - I have the 180+ million record .com table in my database and I'm having a hell of a time putting an index on the domain field due to it being a long text string. I'll put that on hold for the time being - but I did run an example query against the table... as Adam suggested "Florida".... it took about a minute to return the results and here they are:

If anyone is still in disbelief about the fact that all of the good domain names have been gone for years, take a look at this file:

http://www.oneworldmedia.com/florida.xls (4 meg file)

There are 49,801 .com domains registered that begin with the string "Florida". That's quite a lot of domains for a single state. And if people are wasting their money on domains like:

FLORIDACERAMICTILECONTRACTORSMULTIFAMILYCONDOHOTELSCHOOLS.COM
FLORIDAKEYSADVENTUREFISHINGBOATINGKAYAKHOUSEBOATVACATIONS.COM
FLORIDAKEYSKEYLARGOFISHINGSNORKELINGSIGHTSEEINGBOATCHARTER.COM
FLORIDA-CONDOS-CONDOMINIUMS-REAL-ESTATE-FORT-LAUDERDALE-SOUTH.COM

(all over 57 characters) then the next time someone asks you "Hey, can you check if FloridaRealEstate.com is taken", you can be assured that it is (and those of us who know better will always chuckle at questions like that...)
 
Last edited:

A D

Level 14
Legacy Exclusive Member
Joined
Feb 20, 2003
Messages
15,040
Reaction score
1,188
Excellent info,

This was just a test keyword, after I filtered what i was looking for out, I was left with a few thousand.

Much easier to work with.

Thanks,

-=DCG=-
 

Theo

Account Terminated
Joined
Feb 28, 2004
Messages
30,306
Reaction score
2,216
Ken, that's hilarious :D

On the subject of indexing the database, unless you plan to run thousands of queries against it, it's not really worth the pain of doing it.

Power to the *nix :D
 

kengreenwood

Platinum Lifetime Member
Legacy Exclusive Member
Joined
May 11, 2006
Messages
332
Reaction score
3
Ken, that's hilarious :D

On the subject of indexing the database, unless you plan to run thousands of queries against it, it's not really worth the pain of doing it.

Power to the *nix :D

Acro - yeah, you're right. I was hoping that I could build something that I could put up on a web page that a user could type in a value and it would dynamically find matches...pairing down as they typed more and more... but that is really an unrealistic goal with a 180+ million record table that can't be easily indexed. This script is really a more down and dirty way for an individual domainer to get a dataset of domains within a reasonable amount of time (couple minutes at most). Here's how the script works:

1. You first need to have authority to the top level zone files. The first step ftp's the zipped up zone file down to my unix server. Takes about 1 minute to download a 1.5 gigabyte zip file.
2. Second step unzips the file... takes about 2 minutes and the result is a 6.5 gigabyte file with around 185 million rows in it.
3. Third step strips off some unnecessary rows at the beginning of the file... takes about 2 minutes.
4. Fourth step - and this takes the longest... need to convert the CRLF line terminators to LF so that the file can be imported into a MySQL database. This is taking like 15 to 20 minutes... I'm researching to see if there is a better and faster way of doing this but it's working for now.
5. Lastly, I load the file into a MySQL table with 3 fields - Domain, Type, and Nameserver. The type is always NS.

The whole thing takes less than 30 minutes and can be cron'd to run every night while you're sleeping ....zzzzzzzzzz

The main requirements:

- Unix host with MySQL and at least 15 gig free disk space. Need that much because the file get swapped a couple times and you need double the size of the original file.

That's it... Right now, I'm just using PhpMyAdmin to run the queries against it... example: select * from comzone where domain like 'FLORIDA%'

That query took a little over a minute to run against the 180+ million unindexed table and returned the 49k rows.

One more thing: Keep in mind that the zone files do NOT contain domains in the following status:

* Registry Hold
* Registrar Hold
* PendingDelete
* RedemptionPeriod

... and domains that have no name server assigned to them are not included either.

Removed steps 3 and 4 - no longer necessary.

Entire process takes 15 minutes.
 
Last edited:

kengreenwood

Platinum Lifetime Member
Legacy Exclusive Member
Joined
May 11, 2006
Messages
332
Reaction score
3
Check this out:

http://www.zfbot.com/

I built a front end application that will allow you to search the entire .com zone file (around 80 million domains - 185 records on the zone file... multiple names servers per domain). I suggest you type at least 3 characters in to the input box or the results will come back sloooooowwwly. Anything 3 and up comes back pretty quick.

You can also dump the results to an excel file - just make sure you filter down enough so that the results will fit in an excel tab (65000 rows approx... unless you are on latest version of Excel which supports 1 million rows per tab).

The grid on the right shows the popular domain beginnings - counts and percent to total ... but you can sort those columns if you want. Some interesting stuff...

Let me know what you think... Right now, the most difficult part is setting the data up so that it can be queried quickly considering the magnitude of the data it's got to crunch through. Instead of indexing one enormous table, I load the data into one table and then split it off into several hundred other tables and the search picks the correct table based on what you type.

Enjoy!
 
Last edited:

Theo

Account Terminated
Joined
Feb 28, 2004
Messages
30,306
Reaction score
2,216
I have to say, ingenious (the splitting process). So you're using Adobe flex? That's the first domain-related application I've seen. Great stuff. Maybe expand on it to query for a string, not just beginning with.
 

kengreenwood

Platinum Lifetime Member
Legacy Exclusive Member
Joined
May 11, 2006
Messages
332
Reaction score
3
Acro - the string within a string is on the list... stay tuned. It's much more difficult though!

Also - the total number of domains went down to around 80 million - I wasn't doing a distinct select ... now I am. The group of domains starting with 'th' really caused me some troubles... there are around 3.5 million rows in the zone file that start with "the" .... getting the distinct values from that many rows takes a long time.

I have to say, ingenious (the splitting process). So you're using Adobe flex? That's the first domain-related application I've seen. Great stuff. Maybe expand on it to query for a string, not just beginning with.

And regarding domain related flex apps - I've built several:

http://www.domainerport.com (use [email protected] as user and password as password)
http://www.domainerport.com/showcase (use admin or domainer as user and admin or domainer as password)

and this latest one.... http://www.zfbot.com
 
Last edited:

jmcc

Level 4
Legacy Platinum Member
Joined
Oct 6, 2006
Messages
155
Reaction score
74
49884 distinct domains as of 01/March/2009. 0.39 seconds query time. (MySQL db on a single 3.0GHz P4 box). Database queries can be deceptively fast when checking domains beginning with or ending with particular strings. Using grep is probably the easiest if it is only necessary to look for specific strings in domain names. I just checked the ^th domains and it came back with 1690109 distinct domains with a query time of 1 min 15.44 sec. A non-distinct query took 2.45 seconds. The DISTINCT function is what is eating the time. You could probably do the same with grep, sort and uniq on *nix on the zonefile. However this was all done on domain/hoster tables. A distinct domain name only table would be faster (1.86 seconds).

Regards...jmcc
 
Last edited:

nascar59

Level 3
Legacy Platinum Member
Joined
Oct 24, 2008
Messages
89
Reaction score
0
all this info is great Iam building some sites and my questions and options were posted here
 
Status
Not open for further replies.

Who has viewed this thread (Total: 1) View details

Who has watched this thread (Total: 7) View details

The Rule #1

Do not insult any other member. Be polite and do business. Thank you!

Members Online

Premium Members

Our Mods' Businesses

*the exceptional businesses of our esteemed moderators

Top Bottom