Hi all, sorry not been around a while been busy. I have created a spider for index pages ( will eventual be the basis for my accessible search engine ). What I would like to know is what language do you think will be the best to run this script. I have wrote it in PHP which, takes AGES to spider 1 site. ( been running for 45 mins and curently only indexed 121 pages from smashinmagazine ( that is only due to the way that site is wrote, it indexed 2000 actual google pages in less than 10 mins ) but is still too slow. Any ideas much appreciated.
well, i want to start by asking how your spider is set up, and how the code is arranged, i dont want the actual code itself, not if we can prevent.i would like to know how the spider works in line with a general idea of how it searches a web page, and i would like to know how the code is arranged.... more of less how many lines of working code and how many If's you have in your code... i assume its pure php, and probubly uses an SQL database to house the data, in which case i want to know if your maintaining a static (continious) connection to the database or if you connect everytime you need to insert code into it....ive never really worked with spiders so maybe someone else could be a little more helpful but these are the main areas i can think of as far as General PHP bottlenecks
You must be logged in to reply to this topic.