Commit Graph

350 Commits

Author SHA1 Message Date
orbiter
2fcd18a972 - fixed bad behaviour of search event worker processes
- fixed export of url lists in xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4229 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-23 01:08:16 +00:00
orbiter
6f1308da2f - some enhancements to IndexControlURLs (shows more links, connects referrer to another query)
- some refactoring to search process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-17 01:53:02 +00:00
fuchsi
425e4ead66 Allow absolute paths in configuration settings.
- before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging).
- abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.

- exceptions (hardcoded): 
  DATA/LOG/yacy.logging
  DATA/SETTINGS/httpProxy.conf
  DATA/SETTINGS/user.db
TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example.

- add missing workPath to yacy.init (it was used in code, but there was no default in the file)
- fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos.
- replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-04 10:36:25 +00:00
borg-0300
e8d32d9f62 other loglevel
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4195 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-02 16:06:54 +00:00
borg-0300
a5d28785b1 less OOM (works for me)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4194 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-02 14:55:46 +00:00
hermens
18144043e6 Correct UTC Offset at beginning/end of daylight savings time
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4185 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-30 19:20:02 +00:00
orbiter
a31b9097a4 preparations for mass remote crawls:
two main changes must be implemented to enable mass remote crawls:
- shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote
  crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused
  as crawl agent for unwanted file retrieval
- implement new index files that control double-check of remotely crawled urls

After removal of robots.txt checking from stacker threads, the multi-threading of this process is void.
Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since
creation of these threads is not resource-consuming, for a detailed explanation see svn 4106

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-29 01:43:20 +00:00
fuchsi
0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.

* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).

TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 21:38:19 +00:00
fuchsi
f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers.
- some minor code cleanups (mostly unnecessary casts, null checks)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 04:13:46 +00:00
orbiter
711641f167 extended client connection clean-up:
there are now two time-outs, one for the complete connection time, and one for an idle time
connections that are idle for more than 2 minutes are closed, and connections that are alive since more than one hour are also closed
if the complete number of connections exceeds 64, all connections more than 64 and have most idle time are also closed

During normal operation of peers these forced closings should never appear,
but the existence of the idle connection check ensures the availability of the peer and the usability of the host.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 15:06:12 +00:00
orbiter
b19bb6e5b1 - reverted svn 4132; this did not solve the problem and removed the emergency mehtod which caused production failure for shure within some hours
- removed and added some debugging lines

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 14:34:05 +00:00
orbiter
01e0669264 re-designed some parts of DHT position calculation (effect is the same as before)
and replaced old fist hash computation by new method that tries to find a gap in the current dht
to do this, it is necessary that the network bootstraping is done before the own hash is computed
this made further redesigns in peer initialization order necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-01 12:30:23 +00:00
orbiter
2f1ff048ba some fixes to socket connection time-out
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4111 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 23:45:05 +00:00
orbiter
1488769e1f cleanup of unmaintained and outdated performance methods:
removed object pools in httpc. Object pooling is not recommended,
if the creation of the object is not time-intensive. Object pools are only useful,
if there is much computation necessary to create some basic data that is stored
in the object pool and can be re-used. This does not apply to object pools in YaCy.
Object pooling of client sessions would make sense if they would allow re-use of
living connections to other yacy clients. But every connection is closed after usage
of an object in the client pool, therefore the YaCy server client objects are not such
that hold hardware/network-allocated entities.
See:
http://www.javaperformancetuning.com/news/qotm033.shtml
http://java.sun.com/docs/hotspot/HotSpotFAQ.html#gc_pooling
http://docs.sun.com/source/816-7159-10/pt_chap5.html
http://www.microjava.com/articles/techtalk/recylcle2


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-23 20:49:52 +00:00
orbiter
3cb9cdc9be try to fix connection problem, possible cause for wrong junior status and non-passive passive peers:
the YaCy client treats disconnections during data transmissions as error and discards all data transmitted so far
this did not happen so far until I removed a delay time at the end of the daemon session which prevented this case.
To fix this problem, disconnections during transmissions are not treated as error now, which means that end-of-transmissions
with sudden disconnections are not a cause for peer diconnections any more. To be nice to non-updated peers, the sleep time
at the end of server sessions is also re-enabled.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-23 17:31:29 +00:00
fuchsi
ae4b9308ef Fix problems with some web servers which couldn't handle the way yacy was sending requests. Thx to celle for the patch.
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=320

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4089 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-10 09:15:28 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
a34d9b8609 * added a search history cache that maintains search results for 10 minutes
it is necessary for the new search process that will do automatic re-searches
a positive effect is, that when a re-search is done it can be monitored how many
results had been contributed from other peers. The message for this contribution
was moved from the end of the result page to the top.
* enhanced re-search time when a global search was done an the local index has
already a great number of results for this word
* re-organised presearch computation; must be further enhanced

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-24 23:12:59 +00:00
orbiter
bb426565f0 added new yacy protocol for mass url-pull for better remote crawling distribution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-22 00:59:05 +00:00
low012
54004e929b *) Better Bourne-Shell (OpenSolaris) compatibility, update and restart really work now. As the Bourne-Shell is the grandfather of most modern shells, it should also work with Linux (tested with Mandriva, works) and OSX (Please test!).
*) Fixed a typo.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 21:52:52 +00:00
orbiter
344911bfaa shorter minimum delay values for intranet crawl targets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 23:18:12 +00:00
orbiter
b5346141b3 made the plasmaHTCache static (there is only one internet, so we need only one cache)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4045 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 21:31:31 +00:00
orbiter
757703a938 synchronization of access tracker to avoid java-internal loop in TreeMap during shutdown
see http://forum.yacy-websuche.de/viewtopic.php?p=1178#p1178

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4017 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-31 10:42:11 +00:00
orbiter
9ca46a8c69 indexing of local (intranet) urls enabled
To do this, one must create a separate YaCy network that has a local URL domain
A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-24 00:46:17 +00:00
orbiter
511dcbb172 fixed encoding bug made in SVN 3993
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3998 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-23 00:50:57 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
b6d9cca67e - fixed problem with yacyVersion and own version generation
- within this context: generalized date format handling
- extended Update interface:
 * a version lookup can be triggered manually
 * a complete lookup + download + re-boot process can be triggered with one click

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-16 23:47:21 +00:00
orbiter
5444b07674 fixed bug with decompression of index abstracts
this fixes a problem that occurred when searching for several words

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-15 12:39:16 +00:00
orbiter
924ae39170 replaced old map loading method with new implementation which is more robust against change of line termination methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-15 11:45:41 +00:00
orbiter
36a37f758b fix for oom exception during release download
see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=101&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-03 22:55:47 +00:00
orbiter
21fabe259b another fix to the restart function; now tested under linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-01 22:43:08 +00:00
orbiter
28baecd41b another fix for the concurrentModificationException in AccessTracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3944 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-30 23:01:57 +00:00
orbiter
19786b73b6 next try for a better restart
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-29 21:36:18 +00:00
orbiter
c5c268c43e tried to fix restart button
** kann das mal jemand auf seiner linux-platform testen **
** und feed-back geben ob der restart funktionier ? **

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-29 12:46:08 +00:00
orbiter
e03fcf4627 SSI fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=29
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-29 10:45:13 +00:00
orbiter
9bbd39b67c - removed unfinished auto-updater from roland and martin
- added new download-option for releases on the status page
still mising:
- thomas-style restart for linux/mac
- untar/gunzip on shell basis
(comes next)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-28 14:52:26 +00:00
orbiter
1782ef57e5 - added SSI parser and include directive for <!--# include virtual="<file>" -->
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-26 14:37:10 +00:00
orbiter
0e57a8062b added network definition for different YaCy networks
(needs much more work)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-22 14:29:14 +00:00
michitux
25529290ca - 2 small changes in documentation
- hopefully fixed logging of GCs (in order to avoid things like "performed necessary GC, freed 18014398509481565 KB (requested/available/average: 4096 / 1631 / 2957 KB)") with the help of KoH


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-17 19:32:38 +00:00
orbiter
6518bb6c08 changed release strategy:
we will provide two different releases in the future, one standard release and one 'pro'-release.
the 'pro'-release contains all additional parsers AND has different default performance values.
The pro-version differs therefore from the previous 'all'-version by this default values.
The pro-configuration is automatically choosen if the libx-folder exists. If a version is once initialized, its configuration stays independently from an existing libx folder.
The ant targets had been changed. There are now 3 different targets to create standard and pro-releases, and one target to upgrade:
- dist: creates a standard release (only, no libx target any more)
- distPro: creates a pro-release (includes the libx)
- distExt: creates a libx-release which includes the libx-folder only. It may be used to upgrade from standard to pro
Furthermore, the naming of 'dev'-releases had been removed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-16 14:11:52 +00:00
orbiter
069562a14d fixed problem with re-crawl; replaced error file-db with ram-db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-15 23:47:08 +00:00
orbiter
c7a614830a several bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3899 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-15 17:45:49 +00:00
(no author)
2784820ee3 *) moving sleep to a better place
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 16:53:22 +00:00
theli
7a1b811d18 *) bugfix for SocketException:
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 15:58:10 +00:00
orbiter
2b937abef1 slighlty different behavior in shutdown sequence for http server threads:
- first close streams
- make pause (that one that was made in httpdFileHandler)
- close sockets

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 11:58:20 +00:00
karlchenofhell
e1d809d5f1 - more detailed logging of MEMORY messages
- forced GCs don't contribute to heuristics anymore

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3881 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-13 15:03:56 +00:00
orbiter
0b10ef64ba better server access tracking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-13 13:05:51 +00:00
orbiter
66ec8b63c1 added a httpd access tracker:
- all requests to the own httdp can now be listed in the access tracker menu
- the search statistics had been renamed to access tracker and extended by this tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-11 14:05:20 +00:00
karlchenofhell
8bff810d19 - fixed logging output of serverMemory.request()
- don't start up if DATA/yacy.running exists as this is usually a sign of an already started yacy-instance

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 12:45:03 +00:00
karlchenofhell
f05ca43780 - the wiki-parser works for remote wiki-code now, not displaying links anymore as if they were local (ViewProfile comment)
- fixed wrong link to CrawlStart on Status-page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 11:35:48 +00:00