Commit Graph

237 Commits

Author SHA1 Message Date
orbiter
1af0e3bd84 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-06 00:56:56 +00:00
orbiter
5605887571 refactoring of search processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-05 23:57:25 +00:00
orbiter
e76fe1c078 - replaced unicode characters in copyright holder name ('Brausse')
- more logging for bootstrap seedlist loading
- larger DHT chunks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-31 10:00:17 +00:00
orbiter
9ca46a8c69 indexing of local (intranet) urls enabled
To do this, one must create a separate YaCy network that has a local URL domain
A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-24 00:46:17 +00:00
orbiter
f5a4efb76e fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=192&hilit=&p=1034#p1034
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-20 08:06:21 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
b6d9cca67e - fixed problem with yacyVersion and own version generation
- within this context: generalized date format handling
- extended Update interface:
 * a version lookup can be triggered manually
 * a complete lookup + download + re-boot process can be triggered with one click

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-16 23:47:21 +00:00
orbiter
f40566f9bb separate YaCy networks:
- added server-side network unit identification
- added server-side network access authorization
- enhanced client-side network authentification essentials generation
- implemented first peer-peer salted-magic authentification method

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-04 23:48:52 +00:00
orbiter
9bbd39b67c - removed unfinished auto-updater from roland and martin
- added new download-option for releases on the status page
still mising:
- thomas-style restart for linux/mac
- untar/gunzip on shell basis
(comes next)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-28 14:52:26 +00:00
orbiter
069562a14d fixed problem with re-crawl; replaced error file-db with ram-db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-15 23:47:08 +00:00
orbiter
07b4e5066b bugfix in messages
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3886 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 08:08:35 +00:00
orbiter
f04add6cb4 limitation of remote search result number
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-13 13:47:58 +00:00
orbiter
4f5496062c protection against too large seeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3877 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-12 22:08:33 +00:00
karlchenofhell
601fc7d1c5 - added source to J7Zip-modifed.jar and it's license (changelog is still to come)
- moved HTML-*replace-methods from wikiCode to de.anomic.data.htmlTools
- prepared use of different wiki parsers as suggested here: http://www.yacy-forum.de/viewtopic.php?p=34444#34444

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 13:29:12 +00:00
orbiter
d6480dc670 fix for long transfer pauses
see http://www.yacy-forum.de/viewtopic.php?p=35243#35243

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 21:43:20 +00:00
orbiter
06b6e35484 fix for a null pointer exception if clusters are not defined
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3632 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-02 12:26:29 +00:00
orbiter
d4428947af fix for http://www.yacy-forum.de/viewtopic.php?p=34962#34962
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-30 23:21:13 +00:00
orbiter
81844e85b2 - fixed more cluster routing problems
- fixed a problem in remote search when balancer caused shift process to wait too long

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-30 00:39:53 +00:00
orbiter
e48189c710 enhanced cluster routing
- cluster definitions can now contain an addition for local ip addresses
- cluster-cluster communication uses the local ip address instead the global address, if one is given

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3624 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-29 22:05:34 +00:00
karlchenofhell
97d4ab2053 - handle null from iterator in IndexCreateWWWLocalQueue_p.java
- fixed ETA to reach next peer in Network.java
- added some <label>s and fxied minor XHTML errors in ConfigNetwork.html
- try to avoid returning null in servlets as it is unexpected and causes a NPE in the file handler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3623 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-29 21:45:01 +00:00
orbiter
b33cef421e better routing for public clusters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-29 00:08:38 +00:00
orbiter
f8de19fb2f robinson cluster: added client-side protocol implementation
- the network configuration page shows a new option: robinson clusters
- when a global search is made, all robinson peers are excluded, but:
- robinson peers/clusters that provide peer tags and where search words match
  such tags, they are included in global search. Therefore, robinson peers/clusters
  support the global yacy network with their indexes, without doin DHT-exchange


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3598 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-26 09:51:51 +00:00
orbiter
657585fe0d network functions for robinson peers: server-side protection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-24 15:11:12 +00:00
orbiter
2e052eb816 fixed a bug in remote search with remote search tracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3565 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-12 13:49:50 +00:00
orbiter
b79b4082e2 completed search exclusion:
- exclusion on index-level (not only from search snippets)
- exclusion hand-over at remote search protocol

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3556 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-10 12:27:03 +00:00
orbiter
40c14a4f0e - better implementation of search query properties
- basic protection against start-up problems when database files are corrupted
- auto-delete of not-critical databases during startup when load error occurs
- on-the-fly reset option for all database tables
- automatic on-the-fly reset for seed tables during enumeration exceptions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 10:14:48 +00:00
orbiter
2cb16824e3 removed support for old database structures.
The new collection index will be more generalized to support other indexes
i.e. YBR block-rank computation. A clean-up of the many conditions to support
the old database was necessary.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3506 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-21 15:35:35 +00:00
orbiter
861f41e67e redesigned NURL-handling:
- the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks
- the new NURL-index is managed by the crawl balancer
- the crawl balancer does not need an internal index any more, it is replaced by the NURL-index
- the NURL.Entry was generalized and is now a new class plasmaCrawlEntry
- the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future
- the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names)
- the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information
- the EURL index is now filled with ZURL objects
- a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers
- redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another
- found and fixed numerous bugs in the context of crawl state handling
- fixed a serious bug in kelondroCache which caused that entries could not be removed
- fixed some bugs in online interface and adopted monitor output to new entry objects
- adopted yacy protocol to handle new delegatedURL entries
all old crawl queues will disappear after this update!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 13:25:56 +00:00
orbiter
dc0c06e43d PLEASE MAKE A BACK-UP OF YOUR COMPLETE DATA DIRECTORY BEFORE USING THIS
redesign for better IO performance
enhanced database seek-time by avoiding write operations at distant
positions of a database file. until now, a USEDC counter was written
at the head-section of a kelondroRecords database file (which is the
basic data structure of all kelondro database files) to store the
actual number of records that are contained in the database. Now, this
value is computed from the database file size. This is either done
only once at start-time, or continuously when run in asserts enabled.
The counter is then updated only in RAM, and written at close of the
file. If the close fails, the correct number can be computed from the
file size, and if this is not equal to the stored number it is a strong
evidence that YaCY was not shut down properly.
To preserve consistency, the complete storage-routine had to be re-written.
Another change enhances read of nodes in some cases, where the data-tail
can be read together with the data-head. This saves another IO lookup during
each DB node fetch.
Includes also many small bugfixes.
IF ANYTHING GOES WRONG, ALL YOUR DATA IS LOST: PLEASE MAKE A BACK-UP

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 08:35:51 +00:00
karlchenofhell
c016fcb10f - added streaming-support to CrawlURLFetchStack_p servlet
- bug for NPE in list.java
- use more constants

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-19 12:47:46 +00:00
karlchenofhell
d114a0136e - crawl profile: don't add null-values
- added some settings and statistics for url-fetcher 'server'-mode
- added own stack for fetchable URLs
- added possibility to fill stack via shift from peer's queues, via POST (addurls=$count and url$num=$url) or via file-upload
- added "htroot" to classpath of linux start-script

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-17 19:16:53 +00:00
karlchenofhell
e6ddf135bb - enabled fetching new crawls via /yacy/list.html?list=queueUrls for testing purposes
- sent URLs are taken off the limit-stack (of the global crawl trigger) (may be moved somewhere else in future versions)
- added option to set the requested chunk-size

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-14 14:50:55 +00:00
orbiter
30d79d69a6 fix for wrong display of search statistics
see http://www.yacy-forum.de/viewtopic.php?p=31242#31242

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-08 10:42:35 +00:00
orbiter
c464157a6e replaced some toString()
see http://www.yacy-forum.de/viewtopic.php?p=31151#31151

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 16:26:56 +00:00
orbiter
b2f4087400 redesign of last-seen fieln inside seed:
the field contains now a time in UDC-0 (instead relative to local UDC offset)
this fixes a bug in peer selection, where an iteration over all seeds
ordered by lastseen did not work correctly.
Problems may occur because the new meaning of this field may mix with
the different meaning of that field in older peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-02 23:54:27 +00:00
orbiter
e00e850a98 removed constants (no connection with yacySeed.dna identifier)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-02 14:52:54 +00:00
orbiter
c2d6edf21d integrated number of remote targets as 'partitions' into remote search protocol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-01 13:27:23 +00:00
orbiter
4f6eed5623 QPM increment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3309 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-31 16:21:20 +00:00
orbiter
f3f99b19c6 extended search statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3249 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 01:45:29 +00:00
orbiter
c0851ee943 refactoring: moved and renamed de.anomic.data.searchResults to plasma package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 00:38:03 +00:00
orbiter
76fab83395 fixed bugs in seach statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3240 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-18 00:26:16 +00:00
karlchenofhell
fdb45378fb - don't spam log because of some old URLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3227 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 05:31:27 +00:00
allo
0c81bd39d4 XSS-safe put as default.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-16 14:07:54 +00:00
orbiter
52c6461e6b some bugfix for statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3211 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-15 16:03:00 +00:00
(no author)
fe72b772cf added a monitor page for search requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3206 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-15 01:50:57 +00:00
auron_x
9699b094e8 *) fixed hello reporting yourip=UNRESOLVED_PATTERN
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-14 16:57:24 +00:00
orbiter
0a050bc043 enhanced ranking
- redesign of data storage in plasmaSearchRankingProfile
- profiles are extended by new ranking parameters
- new RWI ranking parameters are considered during ranking
- appearance attributes (i.e. emphasised text) is now considered
- faster ranking
- some attributes that had been checked during post-ranking can now be
  checked during pre-ranking phase
- removed old ranking parameter on index.html page (will be replaced by profiles in the future)
- ranking can now consider appearances of media content
- snippet-loading for media types now work correctly (fetches only from the wanted media)
- ranking-profiles can be handed over the remote peers and apply there also
- re-search of same query with different domain now also re-triggers remote search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-20 15:44:29 +00:00
orbiter
1377c53aa3 extraction of media links from search results
these links are mixed to the snippets for testing purpose
(a final version will handle this differently)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-11 01:31:23 +00:00
orbiter
bf0d820659 - added correct flagging of word properties
- added self-healing to database in case that wrong free-pointers exist
- added presentation of media links in snippets (does not yet work correctly)
- code cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-08 02:14:56 +00:00
orbiter
109ed0a0bb - cleaned up code; removed methods to write the old data structures
- added an assortment importer. the old database structures can
  be imported with
  java -classpath classes yacy -migrateassortments
- modified wordmigration. The indexes from WORDS are now imported
  to the collection database. The call is
  java -classpath classes yacy -migratewords
  (as it was)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 02:47:51 +00:00