Commit Graph

10 Commits

Author SHA1 Message Date
theli
a0ddf2ec11 *) AbstractCrawlWorker.java: delete already downloaded data on crawling error
*) plasmaSwitchboard.java: log unexpected errors while parsing/indexing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2552 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 04:50:12 +00:00
theli
fded1f4a5d *) better handling of maximum file size limit in crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 08:26:39 +00:00
theli
63893003be *) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use.
*) adding first version of maximum filesize check for the crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-09 15:06:49 +00:00
theli
043edfa4d8 *) ftp/ResourceInfo.java ResourceInfo object for ftp resources added
*) ftp/CrawlWorker.java better errorhandling for ftp crawler
*) plasmaCrawlEURL.java: some errorcodes added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2499 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 04:12:52 +00:00
theli
dae763d8e3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542 2006-09-06 14:31:17 +00:00
theli
393a7d10be *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:03:54 +00:00
theli
ab5a9bee66 *) adding some copyright headers
*) next step of restructuring for new crawlers
   - adding first testversion of ftp crawler class
   -- does not create a htCache entry yet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 14:38:29 +00:00
theli
fce9e7741b *) next step of restructuring for new crawlers
- renaming of http specific crawler settings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:56:47 +00:00
theli
09b106eb04 *) next step of restructuring for new crawlers
- adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads 
   - moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java
   - the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher)
     to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...])

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 09:00:18 +00:00
theli
eb9b138986 *) next step of restructuring for new crawlers
- conversion of the crawler pool into a keyed object pool
   - crawlers are now loaded based on the url protocol (of course works only for http now)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 06:52:55 +00:00