Commit Graph

210 Commits

Author SHA1 Message Date
orbiter
6e2907135a bugfixes for remote search server part
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-13 22:19:34 +00:00
orbiter
cf9884e22b first attempt to implement a secondary search
this is a set of search processes that shall enrich search results
with specialized requests to realize a combination of search results
from different peers.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2571 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-13 17:13:28 +00:00
orbiter
75b198bc02 - updated references to indexContainer
- more bugfixes and debugging for indexAbstract processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 11:13:27 +00:00
orbiter
4f9e42d5ed more changes towards better join-search
- fixed problems with index-abstract generation
- added analysis output for index abstract receive

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 00:42:42 +00:00
orbiter
82a6054275 - fixed bug with new indexAbstract generation
- added partly evaluation of indexAbstracts during remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 10:39:25 +00:00
orbiter
74d1dea30b changes towards better join-search
- added generation of a compressed index within remote peers during global search
- added selection of specific urls within remote peers during secondary global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-10 22:36:47 +00:00
orbiter
c543028dd4 fixed double/missing null check for LURLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2520 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 11:54:28 +00:00
orbiter
96c6e4e322 - enhancements to detailed search page
- enhancements to search ranking computation process
- removed bugs in postranking

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 01:26:06 +00:00
orbiter
9340dbb501 fixed all possible problems with nullpointer exception for LURLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 18:24:39 +00:00
hermens
ff4362b02d some more fixes for new plasmaCrawlLURL.load behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 14:32:46 +00:00
orbiter
4866868c0e added write cache for LURLs
This was necessary to speed up the index receive process during global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 01:13:03 +00:00
orbiter
8a0e35618b enhancements to search result preparation
- added detailed count on remote search results
- enhanced search sequence during remote searches (doing local search in sequence)
- strict adherence to timout limits

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-06 17:51:28 +00:00
theli
f3ac4dbbb9 *) better handling of server shutdown
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 14:59:00 +00:00
orbiter
18b6876860 new cache flush configuration settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-25 22:31:21 +00:00
orbiter
6ad471ef96 * applied many compiler warning recommendations
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-16 19:49:31 +00:00
theli
5e0b6f8f83 *) sorting peer name list on Blacklist_p.html
*) restructuring of sharedBlacklist_p.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-13 13:29:50 +00:00
theli
6c8366aea1 *) Bugfix for blacklist import function
- wrong property name
   - list was accidentally imported into a new blacklist file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-13 09:20:43 +00:00
theli
eee44be602 *) adding an interface for customized blacklist classes
- now it's possible to use a customized blacklist engine
     instead of the default one
   - this can be done by configuring the property BlackLists.class
   See: http://www.yacy-forum.de/viewtopic.php?t=2108

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:28:14 +00:00
theli
66f1eb07d9 *) Bugfix for IllegalArgumentException in transferURL
See: http://www.yacy-forum.de/viewtopic.php?p=24560

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 10:54:19 +00:00
theli
d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
See: http://www.yacy-forum.de/viewtopic.php?t=2541
        http://www.yacy-forum.de/viewtopic.php?p=24516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 02:42:10 +00:00
orbiter
f43c90fa98 fixed handling of null referer in crawlOrder
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 21:46:34 +00:00
orbiter
abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
This method was implemented in de.anomic.net.URL


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 15:09:22 +00:00
orbiter
ec5149ff3b fix for busyCacheFlush detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2365 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 22:28:09 +00:00
orbiter
f58283def2 better control of index flush
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 22:07:17 +00:00
orbiter
80b6c90d54 enhancements to prevent blocking during dht transfer receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 21:49:39 +00:00
hermens
d56f06401e - Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible
- Small logging updates



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:42:00 +00:00
theli
c7b6389ca1 *) renaming indexDistribution.dhtReceiptLimitEnabled property to indexDistribution.transferRWIReceiptLimitEnabled
so that the default value is taken over by all peers


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:01:01 +00:00
orbiter
9183d21f25 renamed new index class to old name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 20:01:59 +00:00
orbiter
c4e922885a replaced indexURLEntry by new class that uses a kelondroRow.Entry object
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 19:59:28 +00:00
orbiter
5f72be2a95 some redesign of EURL storage
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 15:25:47 +00:00
orbiter
58df8b7bbf a large collection of different changes
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:39:41 +00:00
hydrox
8ba8e2b7d9 *) added cache for blacklists urlhashs recieved by DHT. DHT does not request URLs listed in this cache.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-28 08:51:34 +00:00
hermens
53cbcc6d6e Implement emergency break in index receive when the limit of the ramCache is exceeded by more than cacheLimit
See: http://www.yacy-forum.de/viewtopic.php?p=22911#22911



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-27 11:14:30 +00:00
theli
b20496e42b *) make DHT DoS check configurable (requested by KoH)
- check can be disabled via property indexDistribution.dhtReceiptLimitEnabled
   - upper bound can be configured via indexDistribution.dhtReceiptLimit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-21 19:28:42 +00:00
hermens
38a1410361 Don't test a remote peer's seed during hello.respond as its IP might not be proper, especially while still virgin
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-07 23:59:45 +00:00
orbiter
5041d330ce refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-28 11:44:50 +00:00
orbiter
90d569d70f refactoring of index management:
url storage is part of index management; moved plasmaURL to indexURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00
orbiter
a930be4ba3 refactoring of index management:
generalized the index entry

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:19:20 +00:00
orbiter
7dd57a3828 added a busy-time estimation at DHT/RWI-Receive
to be done: usage of this value on client-side

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2116 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 14:52:00 +00:00
theli
fcec40fcc6 *) don't accept messages without subject or payload
See: http://www.yacy-forum.de/viewtopic.php?p=21656

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2115 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 11:57:17 +00:00
orbiter
82b2bc6932 patch for index-transfer DoS problem
see http://www.yacy-forum.de/viewtopic.php?p=21627#21627
note that this function will make the index-transfer functionality void

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-18 22:24:51 +00:00
orbiter
a474669338 start with refactoring of index management
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-16 16:11:55 +00:00
allo
799c04091d Bugfix for Spam-Bug (Header manipulation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-04 16:41:30 +00:00
orbiter
dbe96e6541 added hand-over of search filter and prefer ranking to yacy protocol
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2029 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-20 10:15:00 +00:00
orbiter
00a5d435e2 - fixed some bugs with domain filter
- added new ranking filter "prefermask": urls that match the filter are ranked better


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2022 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-13 23:19:36 +00:00
orbiter
bd283b8443 fixed bugs:
- null pointer exception during startup of a robinson-configured peer
- wrong time calculation of default value of re-crawl option

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2005 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-06 16:28:28 +00:00
orbiter
0a4c2e89ed remote crawl orders are now only accepted if sum over all
queues is less than 100 (the indexing queue was not measured before)
see also: http://www.yacy-forum.de/viewtopic.php?p=19374#19374

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-22 14:23:24 +00:00
orbiter
1f4412a146 adopted isListed to discussed new behavior as discussed (url, getFile)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-20 22:31:59 +00:00
orbiter
3286b1f498 re-organisation of lurl-creation and -stacking
this was necessary to prevent useless write to the database
in case of blacklist appearance of the url

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-17 10:16:07 +00:00
hermens
289da326e5 *) Bugfix: remove blacklisted URL from loadedURL, when received via DHT transfer
see: http://www.yacy-forum.de/viewtopic.php?p=18976#18976



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1904 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-16 23:58:44 +00:00
rramthun
9f979d4fa5 Domain-lists gzip-compressable and sendable via cr-send/receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1883 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-13 20:12:31 +00:00
orbiter
f188611fc6 apply blacklist on rwis during dht receive
very experimental!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-09 10:46:02 +00:00
theli
5ee0125046 *) adding possibility to configure the server port for seed uploading via scp.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 16:34:05 +00:00
allo
7afa5c1b8e staticIP fix
tried to solve http://www.yacy-forum.de/viewtopic.php?p=18663#18663
D 2006/03/08 07:08:20 YACY yacyClient.publishMySeed mySeed error - not proper: IP is not proper: -UNRESOLVED_PATTERN-


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 12:23:26 +00:00
theli
f108048a2c *) Bugfix for NullpointerException in hello.java
*) Correcting for loop in hello.java   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1854 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 06:40:38 +00:00
orbiter
bae3783d38 added a snippet marking
(search words are now bold in snippets)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1823 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-05 01:11:06 +00:00
allo
f73d51f94b reverted last change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 19:20:35 +00:00
allo
8997b83806 store the staticIP(dyndns) in seed, not the real IP
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 17:33:05 +00:00
allo
7c5f8f997a some more staticIP fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-28 12:20:19 +00:00
orbiter
d31a4e0b4f some small enhancements with cache flushing parameters and data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1767 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-25 16:10:31 +00:00
hermens
3208fe14ed *) log exceptions in crawlOrder.java to the logfile instead of stdout
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-22 01:04:38 +00:00
orbiter
7eb10675b3 re-organization of index management
this was done to be prepared for new storage algorithms


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-14 00:12:07 +00:00
theli
d0f76fc9bc *) setting logging level for thread pools to info
*) new layout for bookmark list 
   (Allo: please take a look if it's acceptable for you)
*) crawlReceipt.java: displaying peer name in logging message
*) Network.html: adding button for manual peer ping

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-09 08:29:07 +00:00
orbiter
fb7411d7bb re-structuring of ranking application:
concentration of all ranking attributes in the
plasmaSearchRankingProfile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1541 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-05 01:47:51 +00:00
orbiter
d98418390b - introduced rankingProfile Class
- selection of ranking and timing profiles for each search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-04 23:51:00 +00:00
allo
1f3eaf9f8e use DATA/HTDOCS for notifier.gif. Works even if htroot is readonly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-03 21:21:42 +00:00
orbiter
fa90c3ca7a - removed some usage of indexEntity
- changed index collection process: indexes are not first flushed to indexEntity,
  but now collected directly from ram cache and assortments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1489 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-30 12:42:06 +00:00
orbiter
03c65742ba changes towards the new index storage scheme:
- replaced usage of temporary IndexEntity by EntryContainer
- added more attributes to word index
- added exact-string search (using quotes in query)
- disabled writing into WORDS during search; EntryContainers are used instead


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-30 00:42:38 +00:00
rramthun
5942f6334c Some language fixes.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-20 20:32:25 +00:00
orbiter
f4ffa9aee5 - implemented more attributes to index entries
- implemented hand-over of new word index attributes during remote search
- implemented word-distance computation during search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-20 15:14:21 +00:00
borg-0300
c5b6154136 added CRDistOn = true/false
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-18 02:18:23 +00:00
borg-0300
8d8a40c2d9 added properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-18 00:03:28 +00:00
orbiter
cfd1e5e376 more security for index transfer protocol:
- allow only specific file names
- log IP number of accessing peer in case of attack attempts

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-17 22:19:18 +00:00
orbiter
423ce9bf59 quickfix for http://www.yacy-forum.de/viewtopic.php?p=15336#15336
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-17 21:50:40 +00:00
allo
5eba6c66c6 thelis fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-17 19:01:20 +00:00
rramthun
c59027e520 Translated status_p.inc a bit further, but it didn't work.
See http://www.yacy-forum.de/viewtopic.php?p=15180#15180

Added my seed to superseed.txt as I am now proud owner of a PC which runs YaCy most of the day.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-14 22:01:21 +00:00
orbiter
9544c47684 added some UTF-8 handling.
hope this will help somehow.. for shure not THE solution to our UTF-8 problem


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-10 16:48:59 +00:00
orbiter
9086261476 refactoring of base64 encoding:
the kelondro database needs specific information about the order of
base64-encoded keys. Since no other package depends on base64
(only the httpd uses base64 for encryption, but does not need to encode these strings)
it is good to move base64 encoding to the new ordering classes in kelondro.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-04 00:39:00 +00:00
orbiter
b3dca06bb1 added location column to network pages.
The location is computed from the userAgent string of connecting peers.
Therefore this information is not available right after start-up.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-22 01:01:46 +00:00
orbiter
bb79fb5d91 - changed handling of error cases retrieving urls from database
(no more NULL values are returned, instead, an IOException is thrown)
- removed ugly damagedURLS implementation from plasmaCrawlLURL.java
  (this inserted a static value into the Object which is not really a good style)
- re-coded damagedURLS collection in yacy.java by catching an exception and evaluating the exception message
to do:
- the urldbcleanup feature must be re-tested


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-11 00:25:02 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
orbiter
8f1f2daa5e implemented interactive link deletion of search results.
next steps: attach voting and restrict to administrator
to see the deletion button, move the mouse pointer to the left of a search result

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1172 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 16:15:21 +00:00
orbiter
7920e1547d code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 09:13:13 +00:00
orbiter
1d6a6d1f85 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 00:17:12 +00:00
orbiter
a04930f025 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-04 23:51:28 +00:00
orbiter
b9cc9029e3 added ybr selection for remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1119 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 16:10:24 +00:00
theli
89fab9f200 *) Correcting Problems with lURLEntries containing null URLs.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1104 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 21:37:24 +00:00
theli
23dc904e0e *) Correcting Problems with lURLEntries containing null URLs.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 15:43:21 +00:00
theli
0610ff4fe9 *) small changes to crawlReceipt.java
- we do not know if the URL was stored in the noticeURL-DB with the old or new hash.
     therefore we now try to remove the URL from the noticeURL-DB using both hash values

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 08:40:49 +00:00
orbiter
e9d6defce6 qquickfix for http://www.yacy-forum.de/viewtopic.php?p=12638#12638
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-14 09:48:39 +00:00
orbiter
f763923e0a added missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 08:02:46 +00:00
orbiter
d2731418bf added creation of global ranking files and changed url normal form usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 12:33:02 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
borg-0300
440e6ed747 see http://www.yacy-forum.de/viewtopic.php?t=1416
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1025 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 23:49:50 +00:00
theli
b8ceb1ffde *) Adding better https support for crawler
- solving problems with unkown certificates by implementing a dummy trust Manager
   - adding https support to robots-parser 
   - Seed File can now be downloaded from https resources
   - adapting plasmaHTCache.java to support https URLs properly

*) URL Normalization
   - sub URLs are now normalized properly during indexing
   - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function
   - normalizing URLs which were received by a crawlOrder request

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 15:28:37 +00:00
theli
f871408729 *) sharedBlacklist_p.java
- Setting Pragma: no-cache
   - increasing timeout to 12 sec.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 08:32:43 +00:00
theli
8194fde340 *) trying to continue transferRWI processing even if this error occures:
|> Caused by: de.anomic.kelondro.kelondroException: kelondroTree.searchproc: nullpointernull in db '.../urlHash.db' 
   - if URL existence can not be determined, we request it from the remote peer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@997 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-28 07:47:30 +00:00
orbiter
4dcbc26ef1 introduction of search profiles; very experimental
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 17:50:27 +00:00
theli
7256bea45f *) Bugfix for nameLookup parameter handling
*) Bugfix for Received xx Words [xxxxxxx .. null] Bug



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 05:38:04 +00:00
theli
40777556c5 *) Connection Tracking
- adding automatic refresh
   - accepts new parameter nameLookup which can be used to deactivate 
     yacy-peer name lookup (because we have problems with this on large seed-dbs)

*) ViewFile
   New page that can be used to view 
   - original content 
   - plain text content 
   - parsed content
   - parsed sentences 
   of a webpage specified by there url hash
   Mainly for debugging purpose at the moment

*) Robots.txt 
   Bugfix for if-modified-since usage
   TODO: synchronization of downloads to avoid loading the same robots-file 
   multiple times in parallel by different threads

*) Shutdown
   Better abortion of transferRWI and transferURL sessions on server shutdown

*) Status Page
   Adding icon to start/stop crawling via status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-18 07:45:27 +00:00