Not to dump on your parade, but VCs, banks... whatever kind of investment you are looking for they wont look at you if you are under 18.
A Business Plan is a must have too.
Read a book called 'boo hoo'... see my blog for a review.
Search Engine market is way too competitive. Why should i use yours over google?
And the best advice i can give you. If you get investments, you do not own your idea any longer. You really need to assess this: do you really need funding? Think about it, then wait a week, and think about it again.
I'll reply to the quotes after up above as I am playing volleyball now

(needs exercise everyday)
I'm 16. =/ so i'm guessing it's something not benefital?
I'll try to make up a business plan.
Basically the funds will go into server, company registration, and development.
Nothing else.
And as for uniqueness first step is having a amazing template some people on dnforum has seen it.
Hopefully it loads fast. But I love it. It's amazing

every better than Live.com's template (live>bing)
Well first i wasn't going to reply on VC's at all.
I was thinking about doing this all with the funds. But then it was mostly the server problem that killed me.
Like if I want to create a successful facebook, youtube related site you start off you can have a server just one then as more visitors come you can have more.
As for a search engine you have to start with a lot or you will never index the whole web (i think it takes a year for me thats along time)
Fortune magazine has had some articles in the last year or so on VC funds, who the main players are, etc. Plus if you search "venture capital" along with 'twitter', 'digg', 'facebook', etc, you will find more.
Last thing I read was about the guy who developped the Netscape browser, Marc Andreessen, he is in a recent Fortune article with a list of things he is helping to fund right now. There are names of other VC's in the article. You would want someone with cash but also someone with connections who can give you good advice.
Side note: In the article it mentions how Andreessen is dealing with people now who are too young to remember the Netscape browser, lol, it was only what, 10 years ago?!
Here is the article, a good read as well:
http://money.cnn.com/2009/07/02/tec...fund.fortune/index.htm?postversion=2009070605
and good luck!
---
I think it's longer than 10 years as I don't remember netscape either.
But I did try it out a few years ago (the new one) with the little fireworks. I did read about it though the history of it and how it was once the best browsers (i was really surprised) and how it got killed by IE when microsoft released IE with all the computers.
*reading the article at the moment thanks*
This is actually a very complicated, very difficult process.
Every month, THOUSANDS of entrepreneurs virtually beg venture capitalists to invest in their startup.
The only thing that will grab their attention is either:
a) A solid, very marketable product
b) A great team with a great track record
c) Clever marketing
d) Proven performance
You might not necessarily need a working prototype, but your pitch has to be perfect.
Visit YCombinator.com - they are early stage ventures that provide seed money to startups (usually $20k)
Keep an eye on blogs like Mashable.com and TechCrunch.com
Getting venture capital firms to invest in your startup is one of the hardest parts of the business. Your market (search) is monopolized and very saturated with the big players. The costs of entering the search engine market are just too high for new players. Not to burst your bubble, but I doubt you'll be able to index even 1/100000th of the web with one single dedicated server; you'll need a server farm the size of half a city block.
Also, the technology has to be very, very good, revolutionary even.
Think of all the engineers that Microsoft has - even with them, it hasn't been able to make a search engine that performs better than Google (okay, Bing is awesome, but still..it took them a decade)
First thanks for the link.
As for b) i think its like proving to the VC how like successful you been or like potential I guess I could prove that I had start up something like microsoft a software company. Well not really a company i did not register anything. Now I do not know why I get these big big ideas. I had infact sort of been successful 35k alexa page rank 5 (back in the days where it was hard hard). Now I was only 11 then. I had the site closed because well when your 11 and you start a site...you normally don't start hte site with nothing. Someone who sponsored me decided to now list my site on sedo even though I BUILT IT. But i haven't had the domain in my control as I was sponsored. Foolish. He sold it though not sure. I am prettty sure in the thousands as my software was on download.com tucows etc.
Yah I heard it's really hard to get start up money but I ensure you search is one of the most profitable things. And also I can probably index alot of sites but not the whole web with a single dedi. I can probably however do it in a year (i think in the calculation). Of course that is still a long time. And a server farm isn't really needed. I'm going to actually rent a block of servers but actually not buy them. That will reduce the cost alot.
The thing is bing took them a decade. Not really. The thing is microsoft wasn't really focusing on the search niche. After a while they realized google is earning tons of money and they wanted to go into this field. It actually was like one year (correct me if i was wrong) to have bing all up and go. Microsoft wanted a part of this share.
Google makes billions each year. Now a little fraction of the pie means microsoft could earn billions.
That sounds a nice system

It's always best to code with flexibility in mind.
Yep. Something I think was a good feature of it.
Hehe no worries. Yeah I'm not too sure how best to solve it, although it's not a major practise (anymore; it was a black hat SEO method about 5 years ago) so TBH I wouldn't worry about it now. Although I guess a basic CSS check would be possible, but not a massive biggie.
Yah I really don't see anyone doing it anymore. Anyhow I probably could just give pentalities out if anyone did it. I have a blacklist system on my crawler.

Anyone does that they are blacklisted.
That might be it. I'm either on a 10 mbps dedicated line or 100 mbps shared line (capped at 3 TB) - either way it's sort of quick, but is still the 'weak link' :yes:
Yeah. I really can't afford the thousand of dollars to have a gigabite dedicated upline. I really don't know how much it effects it anyhow.
I would rather to have a thing where I have 10 dedicated server low end running crawlers same time (like how google has 50 runnign same time with super servers lmfao) and then have it joined into one database. I just need to feel and try it and see if it can join together. I am going to try though setting it up with different location on the server i have right now to see if it will work with 10 different low end dedis. That way 2000-3000 links per hour could be X 10 or possible 100 when i have more money. As low end servers are way cheaper. Now that would be fast and could index the whole web hopefully in a few months instead of years.
Ahh, I see. It basically tries and does it all in one go (ish), hence driving up server loads? Hmm - I guess the only solution is to have it check server loads and end execution of the program temporarily until server loads go back to below (say) 0.5
I saw another crawler someone had (another friend tried to open a search engine also). After 3000 or 4000 links the crawler stops for 5 minutes then starts again. I need to put that to my crawler of some what. It just keeps going and going and going then if it gets a error it's really hard for me as the crawler stops. If i let it start again it just creates a sitemap file (my crawler creates sitemaps so its faster to reindex and index). Except the fact that the crawler isn't done indexing the whole site. So I get the links from the site when the crawler gets a error for example if it dies at 24k crawling the site it will have 24k unless i delete the site and delete temp and delete sitemap and delete media files. The positive thing is about 10% im throwing a number out there has sites that has more than those pages. Almost every site i'm crawling little ones i see on dp now and then just test it out takes like 1-10 minutes done crawling no errors. Big sites like dmoz.org are the ones that kills.
Remember that Google won't index a site all in one go. In-fact I heard of someone who purchased a site with 20k pages of unique content. They submitted their sitemap to Google and it taken Google 12/18 months before it indexed all the pages (well, 99% of them)
Maybe I should do that.
I'm not sure how Google and all would figure out naturally which are the most important pages, and index them first, but this must be what Google does.
No I can do that. I can set a depth of the server indexing the site. For example I can set it so it only indexes one level. So if i index the homepage it only index the links on the homepage and it stops. It does not index the links from the subpages and keeps going until all the sites are done. I believe that how google did it. The problem is I don't want to go through the trouble of deleting the site and then deleting sitemap and delting media files to index the site properly afterwards as I don't have a way to index it fully after a partial index.
Yeah the template is class

Nah, I haven't seen the new mascot - it sounds really nice.
Here's the new template with the mascot on the bottom. The drop down box is the keyword suggestion like on google.
(Mods can remove this if i'm not allowed to link)
http://img.brivy.com/images/m5ryd7no5liogn8s069.jpg (thats the mascot on the bottom nice clean look)
Results page:
http://img.brivy.com/images/4vh2r1syhorhrs1vhgn.jpg
I got more results page (just go back to the main image hosting site then prss public gallery to see it. Most of them are in the first draft like comming soon and the final draft will be done soon.
I think I explained it a bit badly (although I completely agree with the points you made)
Basically the tool (I know I'm self-linking, I hope that's okay mods!) is:
http://www.cogah.com/index.php/WebsiteSize/
Say I enter in:
http://www.dnforum.com/f31/venture-capital-thread-381363.html
It'll scan the HTML, fetch the data, and return a table with the exact size of all files which make up the page.
I tested out a site I own. I think your tool is pretty good. Some recommendation make a seperate tool just like that code but make it so it calculates the time to load the site up. Then divide it by the time it took or the other way around. Then you can calculate kb per second how fast the server is. Theres tools out there like that and it really helps me as it makes it know how fast the server is not just what to make smaller but how fast it is. How fast it is from the server that is going to your server.
It's not brilliant yet (it doesn't really support frames yet or flash files, and doesn't always download JS files correctly etc) but the basic idea is okay IMO.
Ah my crawler downloads the media like flash also and the video. But sometimes there's problems with it. and it also downloads the music.
I quite agree, though, that if it did crawl multiple pages, it could have the potential to download the same file over and over again. I do try and run array_unique() over the array of files/URLs I need to download (to check their size), although this isn't perfect.
Well it's pretty good at the moment. I recommendated how the tool could be modified or make a new one with the same features but even useful up there ^
To answer your first bit, it downloads the HTML in raw form. An extract of the code is:
PHP:
// Start the cURL stuff
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_URL, $url);
$content = curl_exec ($ch);
$pageSize = strlen($content);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
So the $content var will simply contain the HTML you'd get if you go to a page and click "View Source"
(The $httpcode is used just in-case a 404 error etc is returned; then I'd output an error message)
Ah I see. does the html stay on your server or is it deleted after the results are shown?
Sounds a nice system

Yes having a cache/index system is the best way around an issue like this.
Yes its much much faster. And its way better as in since if one person search cars the rest of the people in the world who search cars after him will have a quicker result. It will be fast.
But I was thinking about also a cache like google cache. How they store the webpages on their site. Cause I was thinking if there's a script or I build one where it stores the cache of the site basically storing the whole webpage it might be faster than crawling it. Now I will send the crawler to my own site...which will crawl faster. But then again ugh it's a bad idea some how. But I will need a cache system as I think its a great thing. And then again I already store thumbnails of the images already. So better just download the whole site together.
By the way, I came across the following site earlier:
http://ask.metafilter.com/65244/How-Does-a-Google-Query-Work
Some of the comments further down have some nice information about how this may (sort of) work on a large scale.
Still quite difficult to get ones head around though
Reading it right now. I really isn't surprised. THey have thousands of servers with super things installed in them. The thign is they been here since 1998 once hosted on stanford servers. I mean if i was at 1996 my search engine would kill theirs as they didn't get image search till 2002 (archive.org)
I already got image search. But they got 13 years ahead of me. I was 3 when google existed. LOL
Hmm yeah, I see what you mean. I guess one way around it is to have a sort of 'array' of results (i.e. sites) for each keyword. Then the 'array' (well, database) can be sorted based on the quality of the site, but this wouldn't be easy.
Yeah definately. No but once I have most of the site crawled already it won't be a problem. Since everytime i reindex the site if the cache gets cleared it will be only a lucky few who gets the 3 second or 4 second search slowness. The rest after him searching with the same keyword will never get that.
Seems fine to me
Yep

Master at quotation now.
A lot of great ideas often come from one or two guys working in their garage. Google came along while Microsoft and Yahoo already existed. YouTube came along with G, M and Y already established. Twitter started small and grew. I agree that there are probably thousands of people pitching ideas out there, but don't let that stop you if you think you have something good.
If your issue it servers, you may just need to find a partner with access to lots of server capacity.
I know it's not a unique idea. But like agreed above neither is google, gmail etc.
Good examples of sites/software not unique ideas which are successful.
New V.s. Old (format)
IE V.S. Netscape(read it on internet)
Mozilla Firefox V.S. IE
Gmail V.S. Yahoo Mail, Hotmail
Youtube V.S. Google Video, Yahoo Video. etc.
Twitter V.S. Facebook, Myspace etc.
The problem is with search engines like agreed above with these are that the problem with servers.
Like for example since microsoft opened up bing.com they just built two massive datacenters.
Before I had money to buy my own dedicated I did contact a few hosting company asking. I had a old friend of mine who runs a successful hosting company at uk who knows me since 12 that i built a decent software site listed above he sponsored me with a dedicated. Rest of the company big ones you can google any big dedi companies they dont' even bother. Ask for sponsorship in any big dedicated companies u will find rejection one after another.
I don't know what if its recession or not but most companies don't do those. Oh yes I did get a server from the uk friend i know but I ended up buying one a better one.
I might hit him up for the offer asking him for ten dedis. I doubt it would work.
I don't know but I'll keep focusing on the development of the site rather than servers as I really want to perfect the script.
I'll look at the sites you gave me so far. I'm also preparing for SAT next month and

lifeguarding.
Thanks for all the suggestion so far though. Thanks I really do appreciate it.