Commit Graph

145 Commits

Author SHA1 Message Date
theli
5e0b6f8f83 *) sorting peer name list on Blacklist_p.html
*) restructuring of sharedBlacklist_p.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-13 13:29:50 +00:00
theli
6c8366aea1 *) Bugfix for blacklist import function
- wrong property name
   - list was accidentally imported into a new blacklist file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-13 09:20:43 +00:00
theli
eee44be602 *) adding an interface for customized blacklist classes
- now it's possible to use a customized blacklist engine
     instead of the default one
   - this can be done by configuring the property BlackLists.class
   See: http://www.yacy-forum.de/viewtopic.php?t=2108

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:28:14 +00:00
theli
66f1eb07d9 *) Bugfix for IllegalArgumentException in transferURL
See: http://www.yacy-forum.de/viewtopic.php?p=24560

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 10:54:19 +00:00
theli
d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
See: http://www.yacy-forum.de/viewtopic.php?t=2541
        http://www.yacy-forum.de/viewtopic.php?p=24516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 02:42:10 +00:00
orbiter
f43c90fa98 fixed handling of null referer in crawlOrder
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 21:46:34 +00:00
orbiter
abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
This method was implemented in de.anomic.net.URL


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 15:09:22 +00:00
orbiter
ec5149ff3b fix for busyCacheFlush detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2365 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 22:28:09 +00:00
orbiter
f58283def2 better control of index flush
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 22:07:17 +00:00
orbiter
80b6c90d54 enhancements to prevent blocking during dht transfer receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 21:49:39 +00:00
hermens
d56f06401e - Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible
- Small logging updates



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:42:00 +00:00
theli
c7b6389ca1 *) renaming indexDistribution.dhtReceiptLimitEnabled property to indexDistribution.transferRWIReceiptLimitEnabled
so that the default value is taken over by all peers


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:01:01 +00:00
orbiter
9183d21f25 renamed new index class to old name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 20:01:59 +00:00
orbiter
c4e922885a replaced indexURLEntry by new class that uses a kelondroRow.Entry object
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 19:59:28 +00:00
orbiter
5f72be2a95 some redesign of EURL storage
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 15:25:47 +00:00
orbiter
58df8b7bbf a large collection of different changes
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:39:41 +00:00
hydrox
8ba8e2b7d9 *) added cache for blacklists urlhashs recieved by DHT. DHT does not request URLs listed in this cache.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-28 08:51:34 +00:00
hermens
53cbcc6d6e Implement emergency break in index receive when the limit of the ramCache is exceeded by more than cacheLimit
See: http://www.yacy-forum.de/viewtopic.php?p=22911#22911



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-27 11:14:30 +00:00
theli
b20496e42b *) make DHT DoS check configurable (requested by KoH)
- check can be disabled via property indexDistribution.dhtReceiptLimitEnabled
   - upper bound can be configured via indexDistribution.dhtReceiptLimit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-21 19:28:42 +00:00
hermens
38a1410361 Don't test a remote peer's seed during hello.respond as its IP might not be proper, especially while still virgin
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-07 23:59:45 +00:00
orbiter
5041d330ce refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-28 11:44:50 +00:00
orbiter
90d569d70f refactoring of index management:
url storage is part of index management; moved plasmaURL to indexURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00
orbiter
a930be4ba3 refactoring of index management:
generalized the index entry

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:19:20 +00:00
orbiter
7dd57a3828 added a busy-time estimation at DHT/RWI-Receive
to be done: usage of this value on client-side

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2116 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 14:52:00 +00:00
theli
fcec40fcc6 *) don't accept messages without subject or payload
See: http://www.yacy-forum.de/viewtopic.php?p=21656

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2115 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 11:57:17 +00:00
orbiter
82b2bc6932 patch for index-transfer DoS problem
see http://www.yacy-forum.de/viewtopic.php?p=21627#21627
note that this function will make the index-transfer functionality void

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-18 22:24:51 +00:00
orbiter
a474669338 start with refactoring of index management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-16 16:11:55 +00:00
allo
799c04091d Bugfix for Spam-Bug (Header manipulation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-04 16:41:30 +00:00
orbiter
dbe96e6541 added hand-over of search filter and prefer ranking to yacy protocol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2029 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-20 10:15:00 +00:00
orbiter
00a5d435e2 - fixed some bugs with domain filter
- added new ranking filter "prefermask": urls that match the filter are ranked better


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-13 23:19:36 +00:00
orbiter
bd283b8443 fixed bugs:
- null pointer exception during startup of a robinson-configured peer
- wrong time calculation of default value of re-crawl option

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2005 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-06 16:28:28 +00:00
orbiter
0a4c2e89ed remote crawl orders are now only accepted if sum over all
queues is less than 100 (the indexing queue was not measured before)
see also: http://www.yacy-forum.de/viewtopic.php?p=19374#19374

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 14:23:24 +00:00
orbiter
1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-20 22:31:59 +00:00
orbiter
3286b1f498 re-organisation of lurl-creation and -stacking
this was necessary to prevent useless write to the database
in case of blacklist appearance of the url

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 10:16:07 +00:00
hermens
289da326e5 *) Bugfix: remove blacklisted URL from loadedURL, when received via DHT transfer
see: http://www.yacy-forum.de/viewtopic.php?p=18976#18976



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1904 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-16 23:58:44 +00:00
rramthun
9f979d4fa5 Domain-lists gzip-compressable and sendable via cr-send/receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1883 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-13 20:12:31 +00:00
orbiter
f188611fc6 apply blacklist on rwis during dht receive
very experimental!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-09 10:46:02 +00:00
theli
5ee0125046 *) adding possibility to configure the server port for seed uploading via scp.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 16:34:05 +00:00
allo
7afa5c1b8e staticIP fix
tried to solve http://www.yacy-forum.de/viewtopic.php?p=18663#18663
D 2006/03/08 07:08:20 YACY yacyClient.publishMySeed mySeed error - not proper: IP is not proper: -UNRESOLVED_PATTERN-


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 12:23:26 +00:00
theli
f108048a2c *) Bugfix for NullpointerException in hello.java
*) Correcting for loop in hello.java   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1854 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 06:40:38 +00:00
orbiter
bae3783d38 added a snippet marking
(search words are now bold in snippets)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-05 01:11:06 +00:00
allo
f73d51f94b reverted last change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 19:20:35 +00:00
allo
8997b83806 store the staticIP(dyndns) in seed, not the real IP
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 17:33:05 +00:00
allo
7c5f8f997a some more staticIP fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-28 12:20:19 +00:00
orbiter
d31a4e0b4f some small enhancements with cache flushing parameters and data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1767 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-25 16:10:31 +00:00
hermens
3208fe14ed *) log exceptions in crawlOrder.java to the logfile instead of stdout
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-22 01:04:38 +00:00
orbiter
7eb10675b3 re-organization of index management
this was done to be prepared for new storage algorithms


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-14 00:12:07 +00:00
theli
d0f76fc9bc *) setting logging level for thread pools to info
*) new layout for bookmark list 
   (Allo: please take a look if it's acceptable for you)
*) crawlReceipt.java: displaying peer name in logging message
*) Network.html: adding button for manual peer ping

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-09 08:29:07 +00:00
orbiter
fb7411d7bb re-structuring of ranking application:
concentration of all ranking attributes in the
plasmaSearchRankingProfile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1541 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-05 01:47:51 +00:00
orbiter
d98418390b - introduced rankingProfile Class
- selection of ranking and timing profiles for each search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-04 23:51:00 +00:00