Commit Graph

214 Commits

Author SHA1 Message Date
orbiter
25192e0d36 added a deletion button to indexControlRWIs that deletes the complete web index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-24 12:30:50 +00:00
orbiter
4229cd275c fixed several details about network switching, default password, random password and localhost authentification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-20 09:29:01 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
orbiter
fbb712c669 refactoring:
moved importer classes to crawler and plasma package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4770 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 13:44:38 +00:00
orbiter
1689030ee8 refactoring: moved all crawler classes into their own package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-06 00:32:41 +00:00
orbiter
d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-05 23:13:47 +00:00
orbiter
32b5b057b9 - modified, simplified old kelondroHTCache object; I believe it should be replaced by something completely new
- removed tree data type in kelondroHTCache
- added new class kelondroHeap; may be the core for a storage object that will once replace the many-files strategy of kelondroHTCache
- removed compatibility mode in indexRAMRI


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4747 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-29 22:31:05 +00:00
orbiter
ff755fb858 small corrections and enhancements after search timing profiling
search should be a little bit faster now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 13:31:55 +00:00
orbiter
512f48e7d6 - removed unused methods
- fixed xss attack on peer list in CrawlStartSimple

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-19 03:33:07 +00:00
orbiter
202a3adb3e refactoring of HttpClient Writer processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 22:47:05 +00:00
orbiter
c3342e1178 - removed class with only one static method
- removed connection method with too long time-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 23:35:20 +00:00
danielr
7a35126e91 http timeouts von alten httpc wieder gesetzt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 11:02:14 +00:00
orbiter
2c1c3bb6eb - some refactoring (sorry Daniel, hab in deinem Code rumgewütet)
- fixed broken downloads (flush was missing)
- different problem handling when download is corrupted
- different default values in yacy.init

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 21:36:33 +00:00
orbiter
696b8ee3f5 fix for http://forum.yacy-websuche.de/viewtopic.php?p=6806#p6806
- removed all InputStream.available() because this does not work for files > 2GB
- iterator terminate when a IOException occurs
- added handling of non-executing index.add methods to enhance assert usage
- added index for file indexes > 2GB, to be used in new indexHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 11:55:59 +00:00
orbiter
225f9fd429 various fixes
- shutdown behavior (killing of client sessions)
- EcoFS reading better
- another synchronization in balancer.size()


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4662 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 13:12:58 +00:00
orbiter
a9cf6cf2f4 generalization of index container-heap class.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 20:31:16 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
783a4c9edb strong speed enhancements for the index cache dump and restore:
storage and loading is 30 times faster! a cache of 100000 RWIs needed 180 seconds
to store and 100 seconds to restore; now the same cache needs only 6 seconds to store and
3 seconds to restore. The cache size has decreased now by 30% (95 MB instead of 150 MB).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-02 13:18:23 +00:00
orbiter
20dadba426 - added a deadlock prevention function in cache flushing
- removed unused methods in collection index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 17:51:51 +00:00
orbiter
3ce3a4a3a1 added stub for new index container heap data structure (purpose: index folding)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 22:58:42 +00:00
lulabad
c4c0d54b22 * added regex extended blacklistengine
* removed my own engines

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4618 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 08:50:09 +00:00
lulabad
9fb5d661f2 added my Blacklistengines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4608 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-28 08:18:21 +00:00
orbiter
9b0e20fb06 next refactoring step in document indexing to prepare concurrency environment for document parsing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4604 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 19:51:05 +00:00
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 14:13:05 +00:00
orbiter
fa1090113d - next try to fix the networking problem:
set the maximum transfer size to less than MTU=1500-52: buffer size <= 1448
- some refactoring of transfer methods (naming)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 00:16:04 +00:00
orbiter
7cc4ff05c9 some code enhancements and bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:48:24 +00:00
orbiter
275a226cc5 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-04 22:45:45 +00:00
danielr
fbe335db73 consistent use of de.anomic.server.serverMemory to get information about memory statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4522 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-02 15:42:50 +00:00
orbiter
1dce2f1079 more multithreading support:
- replaced some synchronized classes by classes from util.concurrent
- used a util.concurrent.SynchronousQueue to implement a persistent sorting thread in
  the very basic kelondroRowCollection which supports sorting with a second thread
  in case that a double-core processing CPU is used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4517 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 15:16:47 +00:00
orbiter
727feb4358 - fixed some bugs in ranking computation
- introduced generalized method to organize ranked results (2 new classes)
- added a post-ranking after snippet-fetch (before: only listed) using the new ranking data structures
- fixed some missing data fields in RWI ranking attributes and correct hand-over between data structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 10:06:57 +00:00
orbiter
f4c73d8c68 - fixed highslide usage
- some enhancement to index management, better types

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 14:13:35 +00:00
orbiter
2327451653 - changed order of database initialisation (index first)
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 09:14:07 +00:00
orbiter
bd63999801 - faster search: using different data structures that avoid multiplr calculations
- no more table copy for error-eco table
- optional table copy for lurl-entries
- more abstractions (less single constant strings)
- better logging (using host names instead of ips)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-07 22:16:36 +00:00
orbiter
efd5807a7c - some renaming of variables to support DC
- initial 120mb RAM for fresh peers
- release 0.57

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4445 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-04 22:58:40 +00:00
orbiter
a1e9e6e2e6 fix for search result page navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 02:23:04 +00:00
orbiter
7404256997 - no more search time-out!
- fixed a bug with last commit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 23:53:39 +00:00
orbiter
acf771d5e1 - fixed bug with too much RAM in crawler queue
- fixed dir bug
- better calculation of TF for join
- better waiting-on-result logic

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4424 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-31 23:40:47 +00:00
orbiter
a8a5df4a51 - more dublin core naming of page metadata
- better presentation of result counters in search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-30 21:58:30 +00:00
orbiter
fa3b8f0ae1 fixed bug in remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-30 00:15:43 +00:00
orbiter
9d693ee635 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 16:41:09 +00:00
orbiter
974fea7933 added term-frequency ranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4413 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 23:41:39 +00:00
orbiter
1a296af6ff more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 20:08:32 +00:00
orbiter
da8c850a25 disabled IO path optimization (seems to block other methods)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-26 00:21:37 +00:00
orbiter
85dc62c16f refactoring: more dublin core - compliant naming
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 19:03:47 +00:00
orbiter
f4e9ff6ce9 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 00:40:19 +00:00
borg-0300
53367d941a more information (BASE64)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-12 00:24:24 +00:00
orbiter
a5054c038d - added large number of generics
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 00:12:01 +00:00
orbiter
016fc594af more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4311 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-09 09:58:56 +00:00
orbiter
03e7782269 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-06 19:23:38 +00:00