Membership is FREE, giving all registered users unlimited access to every DNForum feature, resource, and tool! Optional membership upgrades unlock exclusive benefits like profile signatures with links, banner placements, appearances in the weekly newsletter, and much more - customized to your membership level!

Anyone know of a reverse parsing tool on the web?

Status
Not open for further replies.

JuniperPark

Level 9
Legacy Exclusive Member
Joined
Aug 3, 2003
Messages
2,909
Reaction score
91
Is there a tool on the web that does a good job of reverse parsing strings into phrases?

Example #1 (easy):
FIREDEPARTMENT => FIRE DEPARTMENT

Example #2 (harder):
NEWYORKFIREDEPARTMENT => NEW YORK FIRE DEPARTMENT

I would be willing to pay for something like this if necessary.

Thanks!
 

Salient

DNF Member
Legacy Exclusive Member
Joined
Aug 23, 2004
Messages
549
Reaction score
0
Nice one MC. Seems to work as advertised.


I was curious if someone would respond to JP's thread.
 

JuniperPark

Level 9
Legacy Exclusive Member
Joined
Aug 3, 2003
Messages
2,909
Reaction score
91
I wasn't able to get it to work for more than a couple dozen domains. In fact most of the tools on that site seem to either simply not work, or have a very limited function. Sometimes they show code debug responses, and sometimes they give no response at all. But this is along the idea of what I was looking for.
 

Ubiquitous

Since 1997
Legacy Platinum Member
Joined
May 5, 2006
Messages
398
Reaction score
0
Is there a tool on the web that does a good job of reverse parsing strings into phrases?

Example #1 (easy):
FIREDEPARTMENT => FIRE DEPARTMENT

Example #2 (harder):
NEWYORKFIREDEPARTMENT => NEW YORK FIRE DEPARTMENT

I would be willing to pay for something like this if necessary.

Thanks!

I actually know of a few + wrote something similar not too long ago. Are you doing one word at a time or manually putting the phrases in? If you let me know a few more details I can guide you in the right direction ... just let me know.

Regards ~
 

Dale Hubbard

Formerly 'aZooZa'
Legacy Exclusive Member
Joined
Jan 24, 2003
Messages
5,578
Reaction score
91
To do this effectively and in bulk you need a massive amount of recursive code. And you need a pretty huge dictionary.

Consider this:

NEWYORKFIREDEPARTMENT

NEW
YORK
FIRE
RED
FIRED
PAR
ART
PART
MEN

See the problem?
 

JuniperPark

Level 9
Legacy Exclusive Member
Joined
Aug 3, 2003
Messages
2,909
Reaction score
91
I actually know of a few + wrote something similar not too long ago. Are you doing one word at a time or manually putting the phrases in? If you let me know a few more details I can guide you in the right direction ... just let me know.

Regards ~


Ideally I would like a SQL Server stored procedure... failing that, VbScript code, and failing that a bulk checker somewhat like the one suggested above, except that it works :)

Thanks!

To do this effectively and in bulk you need a massive amount of recursive code. And you need a pretty huge dictionary.

Consider this:

NEWYORKFIREDEPARTMENT

NEW
YORK
FIRE
RED
FIRED
PAR
ART
PART
MEN

See the problem?

I definately understand the complexity, in fact I've written smaller versions of these myself.. that's why I'm looking outside to see if there is something better. I already have 'large' and 'small' dictionaries on my SQL server for high speed lookups.
 

Dale Hubbard

Formerly 'aZooZa'
Legacy Exclusive Member
Joined
Jan 24, 2003
Messages
5,578
Reaction score
91
It's an old problem, and unfortunately one in which computers are slower than humans.

It would be quicker to eyeball the list.
 

JuniperPark

Level 9
Legacy Exclusive Member
Joined
Aug 3, 2003
Messages
2,909
Reaction score
91
It's an old problem, and unfortunately one in which computers are slower than humans.

It would be quicker to eyeball the list.

Nah... it can be done. Step 1 would be to try all possible combinations of spacing (well, up to 4 spaces) where every word is in the dictionary. WHere there is more than one possible result, hit Google for a popularity check to determine the best word split. This should be mroe than 0.1 seconds per domain on a modern server, then run the Google side for as long as that takes.
 

Dale Hubbard

Formerly 'aZooZa'
Legacy Exclusive Member
Joined
Jan 24, 2003
Messages
5,578
Reaction score
91
Well, you obviously seem to know best. You cannot arbitrarily decide on 4-space steps. You don't want to be hitting Google until your parsing is done. So how would your method work on a domain such as CREDITCARDSFORALL.COM? I'm keen to know...
 

Ubiquitous

Since 1997
Legacy Platinum Member
Joined
May 5, 2006
Messages
398
Reaction score
0
For manual input dictionary.com has the best splitting algorithm ... take a look at the handy work it did for the more difficult example you listed. Give that a try and see if something like that (although in bulk) would ultimately work.

One question I had for you - are the names always all uppercase?

Regards ~
 

Ubiquitous

Since 1997
Legacy Platinum Member
Joined
May 5, 2006
Messages
398
Reaction score
0
Do a logical AND &HDF on the list and they'll all be upper case. Or use a Linux awk line which includes a 'toupper' command. Dictionary.com is excellent, but as per my example it can't do this: http://www.reference.com/search?q=CREDITCARDSFORALL

That is a definitely a tough one ... I'm thinking we'll need to borrow some supercomputing power to take care of CREDITCARDSFORALL.com - In the meantime I think this might take the prize for the most comprehensive tool for those pesky dot coms!

BTW ~ I would jump on Flaccid Lard Resort.com while it's still available ... :yes:
 

Dale Hubbard

Formerly 'aZooZa'
Legacy Exclusive Member
Joined
Jan 24, 2003
Messages
5,578
Reaction score
91
Anagrams - useful but of no direct consequence to this debate I fear :D

Anagrams are far easier to produce - you have a pool of letters to strike out against a dictionary - not so difficult - but amusing nevertheless ;)
 

DomainingCom

DNF Regular
Legacy Exclusive Member
Joined
Dec 7, 2005
Messages
833
Reaction score
10
We have such technology (check DomainScore.com), PM to explain why you intend to do with this, maybe we can partner.
 

JuniperPark

Level 9
Legacy Exclusive Member
Joined
Aug 3, 2003
Messages
2,909
Reaction score
91
OK, spent a little time in this stuff this morning. Here's the MS SQL for the basics, just need to make it recursive for the 3 and 4 word splits. Actually, I think I can make it work for unlimited word counts. This runs in < 1 second per name on my older, stessed server running 13,000 websites.


declare @str varchar(100)
set @str = 'firedepartment'
--set @str = 'CREDITCARDSFORALL'
set @str = lower(@str)

declare @pos1 int, @pos2 int, @done char(1), @word1 varchar(100), @word2 varchar(100), @pos1_stop int
declare @word1_dict int, @word2_dict int

set @done = 'N'
set @pos1 = 0
set @pos2 = 0

set @pos1_stop = len(@str) - 2

while @done='N'
begin

set @word1_dict = 0
set @word2_dict = 0

set @pos1 = @pos1 + 1
set @word1 = substring(@str,1,@pos1)
set @word2 = substring(@str,@pos1+1,100)

if @pos1 = 1
begin
if @word1 = 'a'
set @word1_dict = 1
end
else
begin
select @word1_dict = count(*) from dictionary.dbo.dictionary where @word1 = word
end
if @word1_dict > 0
select @word2_dict = count(*) from dictionary.dbo.dictionary where @word2 = word
-- TODO: recursive check of remainder of string for 2 more words, then 3 more words

-- if @word1_dict > 0 and @word2_dict > 0
-- TODO: Insert into DB for Google pop check if more than one result

-- Test-only Check of results
select @pos1, @word1, @word1_dict, @word2, @word2_dict

if @pos1 = @pos1_stop
set @done = 'Y'

end
 
Status
Not open for further replies.

Who has viewed this thread (Total: 1) View details

Who has watched this thread (Total: 4) View details

The Rule #1

Do not insult any other member. Be polite and do business. Thank you!

Members Online

☆ Premium Listings (Last 30 Days)

Premium Members

Upcoming events

Our Mods' Businesses

*the exceptional businesses of our esteemed moderators

Top Bottom