Commit Graph

94 Commits

Author SHA1 Message Date
orbiter
7ef80c1026 more debugging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2566 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-13 13:52:46 +00:00
orbiter
75b198bc02 - updated references to indexContainer
- more bugfixes and debugging for indexAbstract processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 11:13:27 +00:00
orbiter
4f9e42d5ed more changes towards better join-search
- fixed problems with index-abstract generation
- added analysis output for index abstract receive

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 00:42:42 +00:00
orbiter
82a6054275 - fixed bug with new indexAbstract generation
- added partly evaluation of indexAbstracts during remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 10:39:25 +00:00
orbiter
96c6e4e322 - enhancements to detailed search page
- enhancements to search ranking computation process
- removed bugs in postranking

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 01:26:06 +00:00
orbiter
4866868c0e added write cache for LURLs
This was necessary to speed up the index receive process during global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 01:13:03 +00:00
theli
eee44be602 *) adding an interface for customized blacklist classes
- now it's possible to use a customized blacklist engine
     instead of the default one
   - this can be done by configuring the property BlackLists.class
   See: http://www.yacy-forum.de/viewtopic.php?t=2108

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:28:14 +00:00
theli
d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
See: http://www.yacy-forum.de/viewtopic.php?t=2541
        http://www.yacy-forum.de/viewtopic.php?p=24516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 02:42:10 +00:00
orbiter
ebc2233092 * implemented (finished) class indexRowSetContainer
* replaced indexTreeMapContainer by indexRowSetContainer
* deleted indexTreeMapContainer and abstract class
This is another step to the new database structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 23:20:03 +00:00
orbiter
9183d21f25 renamed new index class to old name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 20:01:59 +00:00
orbiter
c4e922885a replaced indexURLEntry by new class that uses a kelondroRow.Entry object
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 19:59:28 +00:00
orbiter
58df8b7bbf a large collection of different changes
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:39:41 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
orbiter
671fd9a5c9 work towards new indexing database structure
(no effect on current functionality yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2277 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-04 14:47:27 +00:00
hermens
d4645062bc Correct usage of vhost in wget/wput requests:
- yacyClient: don't use own .yacyh domain in requests, instead use .yacyh domain of target peer for everything but ranking distribution
- natLib: use full hostname instead of just SLD.TLD



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2232 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-21 14:25:27 +00:00
theli
4ca0857c0c *) Index transfer now considers the pause time send by busy peers during
index transfer / index distribution
   See: http://www.yacy-forum.de/viewtopic.php?p=22647#22491

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2205 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-14 09:40:42 +00:00
orbiter
5041d330ce refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-28 11:44:50 +00:00
orbiter
7b3b12888c refactoring: integrated indexContainer abstraction layer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2149 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-28 01:09:31 +00:00
orbiter
a930be4ba3 refactoring of index management:
generalized the index entry

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:19:20 +00:00
orbiter
82b2bc6932 patch for index-transfer DoS problem
see http://www.yacy-forum.de/viewtopic.php?p=21627#21627
note that this function will make the index-transfer functionality void

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-18 22:24:51 +00:00
orbiter
a474669338 start with refactoring of index management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-16 16:11:55 +00:00
orbiter
55c5b41bd0 modified kelondroDyn to work better with new object caches
(removed own single object cache)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2077 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-10 13:57:31 +00:00
orbiter
26e3216bcc update to profile fetch behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2076 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-10 09:37:18 +00:00
orbiter
fd7c17e624 added virtual host support:
all yacy-to-yacy communication now send the <peer-hexhash>.yacyh
virtual domain inside the http 'Host' property field.
This shall enable running a yacy peer on a virtual host.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-09 13:11:00 +00:00
orbiter
60e5aff9fc some enhancements to the remote crawl trigger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-20 11:53:15 +00:00
orbiter
dbe96e6541 added hand-over of search filter and prefer ranking to yacy protocol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2029 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-20 10:15:00 +00:00
orbiter
bd283b8443 fixed bugs:
- null pointer exception during startup of a robinson-configured peer
- wrong time calculation of default value of re-crawl option

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2005 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-06 16:28:28 +00:00
orbiter
a469874e3f added and fixed time-out behaviour during search
see also: http://www.yacy-forum.de/viewtopic.php?p=19823#19823

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-02 20:40:07 +00:00
orbiter
63f39ac7b5 added 3 new crawling steering options:
- re-crawl by age of page (enter in minutes)
- auto-domain-filter
- maximum number of pages per domain
NOT YET TESTED!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 16:05:16 +00:00
orbiter
1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-20 22:31:59 +00:00
orbiter
3286b1f498 re-organisation of lurl-creation and -stacking
this was necessary to prevent useless write to the database
in case of blacklist appearance of the url

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 10:16:07 +00:00
allo
3b7e66ab48 staticIP should now work
(with resolved Conflict)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-28 12:31:57 +00:00
orbiter
eaffcfefe2 * added more ranking attributes (without function; this will be added later)
* added ranking coefficient transmission to remote peer (without evaluation on server side, will be added later)
* changed ranking coefficients slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-26 11:30:37 +00:00
theli
468ca5b0e6 *) Bugfix for url.toString problem in yacyClient crawlOrder
Thanks to Stephan for the advice

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-22 07:18:27 +00:00
theli
651bce8e2f *) adding missing function to transmit url chunks for crawl-order jobs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1680 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-17 11:04:35 +00:00
hydrox
a627162f13 *)fixed logginglevel for Debugmsg
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1585 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-09 09:10:49 +00:00
hermens
5f5eee1ae9 *) replace System.out.println with log
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-05 01:17:08 +00:00
orbiter
eab1805bca refactoring: plasmaSearchProfile -> plasmaSearchTimingProfile
This was made to distiguish this profile from the
(to-be-implemented) plasmaSeachOrderProfile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1538 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-04 23:11:31 +00:00
orbiter
fa90c3ca7a - removed some usage of indexEntity
- changed index collection process: indexes are not first flushed to indexEntity,
  but now collected directly from ram cache and assortments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1489 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-30 12:42:06 +00:00
orbiter
03c65742ba changes towards the new index storage scheme:
- replaced usage of temporary IndexEntity by EntryContainer
- added more attributes to word index
- added exact-string search (using quotes in query)
- disabled writing into WORDS during search; EntryContainers are used instead


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-30 00:42:38 +00:00
orbiter
b946e28e61 some ranking enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-27 02:48:27 +00:00
hermens
66c889138e *) Bugfix: Principals are reported back as 'principal', so IWasAccessed should also be true
*) make it easier to include legacy peers switching between timezones +0100 and +0200



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-25 01:50:24 +00:00
hermens
75b268f16d *) use majority voting for peer type decision
*) reduce the number of peer pings sent out
see: http://www.yacy-forum.de/viewtopic.php?t=1748



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-22 23:14:37 +00:00
orbiter
f14d49fae9 enhancements, bugfixes and additions to word index attribute storage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-22 00:07:00 +00:00
orbiter
f4ffa9aee5 - implemented more attributes to index entries
- implemented hand-over of new word index attributes during remote search
- implemented word-distance computation during search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-20 15:14:21 +00:00
orbiter
9544c47684 added some UTF-8 handling.
hope this will help somehow.. for shure not THE solution to our UTF-8 problem


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-10 16:48:59 +00:00
orbiter
9086261476 refactoring of base64 encoding:
the kelondro database needs specific information about the order of
base64-encoded keys. Since no other package depends on base64
(only the httpd uses base64 for encryption, but does not need to encode these strings)
it is good to move base64 encoding to the new ordering classes in kelondro.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-04 00:39:00 +00:00
orbiter
3d8a5ae652 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 14:24:13 +00:00
orbiter
79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-10 23:48:20 +00:00
theli
6f8d7d3bcd *) Adding first version of YaCy bookmarklet
- this can be used to easily crawl a webpage which is currently opened in the browser
   - to get the bookmarklet javascript simply call http://localhost:8000/QuickCrawlLink_p.html
     and drag and drop the link shown to your Browsers Toolbar/Link-Bar.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-08 12:14:51 +00:00