Commit Graph

834 Commits

Author SHA1 Message Date
orbiter
03835c2ee8 enhanced search result computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2527 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 20:26:44 +00:00
orbiter
ac3419b65f better debugging for indexOutOfBoundException bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2525 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 19:41:52 +00:00
orbiter
a8bc768206 enhancements to ranking evaluation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2523 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 16:04:50 +00:00
theli
33898ae7e9 *) ResourceInfoFactory.java: Bugfix for classNotFoundException
See: http://www.yacy-forum.de/viewtopic.php?t=2797

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2521 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 13:43:23 +00:00
theli
406e170e25 *) more verbose error message
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 11:48:49 +00:00
theli
b298474e22 *) Bugfix needed because of changed plasmaCrawlLURL.load behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2518 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 11:33:27 +00:00
orbiter
96c6e4e322 - enhancements to detailed search page
- enhancements to search ranking computation process
- removed bugs in postranking

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 01:26:06 +00:00
orbiter
9340dbb501 fixed all possible problems with nullpointer exception for LURLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 18:24:39 +00:00
theli
a5ed86105b *) bugfix for handling of ResourceInfo object in proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 15:50:45 +00:00
hermens
ff4362b02d some more fixes for new plasmaCrawlLURL.load behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 14:32:46 +00:00
hermens
7aeadbe7cc another NullPointerException in http.ResourceInfo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 13:13:19 +00:00
orbiter
141f9e5bb4 fix for new plasmaCrawlLURL.load behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2509 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 12:27:32 +00:00
hermens
087f7511f8 prevent NullPointerException in http.ResourceInfo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2507 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 11:46:22 +00:00
orbiter
a2525072f2 bugfix for kelondroRow - property generation
this bug affected ranking parameters :-(

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 10:55:34 +00:00
theli
b44514242a *) crawler/ftp/CrawlWorker.java: better errorhandling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:22:35 +00:00
theli
7d7f30139c *) crawler/ftp/CrawlWorker.java: delete old cache file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2502 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:08:35 +00:00
theli
4ae0f122f8 *) ResourceInfo.java: License header added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2501 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:01:07 +00:00
theli
043edfa4d8 *) ftp/ResourceInfo.java ResourceInfo object for ftp resources added
*) ftp/CrawlWorker.java better errorhandling for ftp crawler
*) plasmaCrawlEURL.java: some errorcodes added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2499 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 04:12:52 +00:00
orbiter
4866868c0e added write cache for LURLs
This was necessary to speed up the index receive process during global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 01:13:03 +00:00
orbiter
8a0e35618b enhancements to search result preparation
- added detailed count on remote search results
- enhanced search sequence during remote searches (doing local search in sequence)
- strict adherence to timout limits

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-06 17:51:28 +00:00
theli
5c1bb53d2a Missing description for last commit
*) next step of restructuring for new crawlers
   > HTCaching should now work protocol independent
   -- introduction of new ResourceInfo objects containing protocolspecific metadata
      of a resource. 
   -- the ResourceInfo objects now implement old functions like shallIndexCacheForXXX, 
      shallStoreCacheForXXX in a protocol dependent manner   
   > Indexing should also work protocol independent now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-06 14:35:45 +00:00
theli
dae763d8e3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542 2006-09-06 14:31:17 +00:00
theli
4825bfaaf3 *) Bugfix for PrintWriter Problem
See: http://www.yacy-forum.de/viewtopic.php?t=2792

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2494 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-05 18:55:45 +00:00
theli
7930839594 *) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path
*) CrawlWorker.java: using new dirhtml function of ftpc

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2492 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-05 05:17:57 +00:00
theli
7a35b8e237 *) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:36:19 +00:00
theli
ffbf416e76 *) direct access to requestheader of htCache.Entry removed to make it more http independent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:29:45 +00:00
theli
3870d615e3 *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:06:58 +00:00
theli
393a7d10be *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:03:54 +00:00
theli
ab5a9bee66 *) adding some copyright headers
*) next step of restructuring for new crawlers
   - adding first testversion of ftp crawler class
   -- does not create a htCache entry yet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 14:38:29 +00:00
theli
5847492537 *) next step of restructuring for new crawlers
- IndexCreate_p.java: correcting problems with ftp urls
   - URL.java does not cutout the userinfo anymore 
    (needed to transport authentication info in ftp urls, e.g. ftp://username:pwd@ftp.irgendwas.de)
   - plasmaCrawlLoader.java: 
   -- hack to re enable https urls
   -- adding function getSupportedProtocols

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 13:17:11 +00:00
theli
fce9e7741b *) next step of restructuring for new crawlers
- renaming of http specific crawler settings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:56:47 +00:00
theli
e3f0136606 *) next step of restructuring for new crawlers
- adding function isSupportedProcotol to plasmaCrawlLoader.java
   - disabling robots.txt check for protocols other than http(s)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:46:17 +00:00
theli
9ded4e8d5a *) Bugfix for name resolution in proxy mode
See: http://www.yacy-forum.de/viewtopic.php?p=25241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:26:53 +00:00
theli
1c8300fcec *) Bugfix for name resolution in proxy mode
See: http://www.yacy-forum.de/viewtopic.php?p=25241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:23:57 +00:00
theli
4e2a950ac9 *) next step of restructuring for new crawlers
- avoid using the http crawler class directly. Using the interface class instead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 09:24:24 +00:00
theli
09b106eb04 *) next step of restructuring for new crawlers
- adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads 
   - moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java
   - the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher)
     to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...])

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 09:00:18 +00:00
theli
eb9b138986 *) next step of restructuring for new crawlers
- conversion of the crawler pool into a keyed object pool
   - crawlers are now loaded based on the url protocol (of course works only for http now)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 06:52:55 +00:00
theli
1395aae742 *) starting restructuring which is needed to add crawlers for additional protocols
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2472 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 06:09:20 +00:00
theli
b4acbdaa97 *) better handling of server shutdown
See: e.g. http://www.yacy-forum.de/viewtopic.php?p=25234

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2470 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 05:17:37 +00:00
theli
f3ac4dbbb9 *) better handling of server shutdown
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 14:59:00 +00:00
theli
959b779aba *) avoid performance loss if log level is greater than 'fine'
See: http://www.yacy-forum.de/viewtopic.php?p=25180

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 08:42:46 +00:00
orbiter
18b6876860 new cache flush configuration settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-25 22:31:21 +00:00
hermens
f0278b4092 Bugfix for / by zero when the AssortmentCluster is empty
See: http://www.yacy-forum.de/viewtopic.php?t=2746



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-25 20:23:04 +00:00
orbiter
14e0bb0dcf allow more references per word for new db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-25 12:06:23 +00:00
orbiter
985dcbde7f changed some parameters that may cause better memory usage and more indexing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 23:39:52 +00:00
orbiter
b7f4a1521b added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 22:21:22 +00:00
orbiter
c26da4893b turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 10:03:38 +00:00
orbiter
db1eae0227 * simplified initialization of database objects
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 02:19:25 +00:00
hermens
0b73f2b132 Repair DNS prefetch during cacheScan
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 01:31:08 +00:00
orbiter
27a159b401 * documentation update
* removed doc from release
* release information in doc/News.html
* release 0.46

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-23 11:36:09 +00:00