Commit Graph

22 Commits

Author SHA1 Message Date
orbiter
e27aeb7fdc patch for bad crawl filter at crawl start
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4086 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-09 19:21:41 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
34858be5ef added option to simple crawl start: complete domain crawl
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-02 19:55:14 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
3cacb3bc95 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=168#p861
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3981 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-16 14:11:34 +00:00
orbiter
a45216b479 fix to prevent bad-formed news messages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3960 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-13 09:41:55 +00:00
orbiter
3b46f0460f moved crawl profile table from watch crawler to profile editor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3824 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 23:23:25 +00:00
orbiter
139c59ebbd - fixed dht selction problem: the seed tables used a wrong ordering
- cleaned some code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 17:59:36 +00:00
low012
a50256aba2 *) removed surplus replacements of HTML
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-08 00:16:17 +00:00
theli
6f46245a51 *) Bookmarks: Ajax icon is displayed while loading title
*) First version of a sitemap parser added
   - currently only autodetection of sitemap files is supported
*) DB-Import restructured
   - pause/resume should work again now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 09:52:04 +00:00
orbiter
dd44a1394f disabled automatic performance setting change
- during crawl start
- each indexing cycle
- for delay values
- for short memory cycles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-02 15:39:27 +00:00
orbiter
e192f616a2 collection of small bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-26 14:28:57 +00:00
(no author)
4f4d3d71dd *) Faster appearance of ConfigBasic by bypassing UPNP-scan in case of existing external connects
*) Marked two deprecated source-points
*) Added possibility to dump words from indexing to file. Should not affect performance in the current form.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3592 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-24 16:33:31 +00:00
karlchenofhell
c5c3ecc67e - fixed display of last entered value at IndexCreate_p plus minor usability/HTML adjustments
- removed double XML-escaping from CacheAdmin_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3588 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-23 18:53:21 +00:00
theli
589cbd8cbf *) replacing all yacy-news-category strings with corresponding constants
Note: please use these constants from now on

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3495 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-21 11:09:15 +00:00
orbiter
861f41e67e redesigned NURL-handling:
- the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks
- the new NURL-index is managed by the crawl balancer
- the crawl balancer does not need an internal index any more, it is replaced by the NURL-index
- the NURL.Entry was generalized and is now a new class plasmaCrawlEntry
- the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future
- the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names)
- the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information
- the EURL index is now filled with ZURL objects
- a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers
- redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another
- found and fixed numerous bugs in the context of crawl state handling
- fixed a serious bug in kelondroCache which caused that entries could not be removed
- fixed some bugs in online interface and adopted monitor output to new entry objects
- adopted yacy protocol to handle new delegatedURL entries
all old crawl queues will disappear after this update!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 13:25:56 +00:00
orbiter
a5d668c0c6 added speed-buttons for easy performance setting
appears in crawl start and on indexing monitor page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-12 16:24:28 +00:00
low012
ce360ef43e *) no more HTML in plasmaCrawlProfile.java anymore
*) <br> will not be displayed in items in Auto Filter Content on WatchCrawler_p.html anymore
*) removed unnecessary replaceHTML()


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3425 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-02 21:09:28 +00:00
karlchenofhell
bf7a69197d - fix for possible NPE in queues_p
- WatchCrawler_p:
  - display crawler traffic
  - pause/resume local- and global crawler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-22 22:26:11 +00:00
karlchenofhell
18c841b3c0 - fix for http://www.yacy-forum.de/viewtopic.php?t=3269 [don't put 2 template-expressions back-to-back => bug?]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-22 13:37:48 +00:00
orbiter
61798f0ae6 added option to distinguish between text crawl and media crawl
- for each crawl start, there is now a flag for text and media
- the localCrawl flag is superfluous
- added new crawl profiles
- if an image search is done, only media links are crawled for the snippets


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3100 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-19 03:10:46 +00:00
orbiter
6866bcd0e0 added missing file for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-19 00:29:45 +00:00