Commit Graph

338 Commits

Author SHA1 Message Date
orbiter
2f1ff048ba some fixes to socket connection time-out
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4111 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 23:45:05 +00:00
orbiter
1488769e1f cleanup of unmaintained and outdated performance methods:
removed object pools in httpc. Object pooling is not recommended,
if the creation of the object is not time-intensive. Object pools are only useful,
if there is much computation necessary to create some basic data that is stored
in the object pool and can be re-used. This does not apply to object pools in YaCy.
Object pooling of client sessions would make sense if they would allow re-use of
living connections to other yacy clients. But every connection is closed after usage
of an object in the client pool, therefore the YaCy server client objects are not such
that hold hardware/network-allocated entities.
See:
http://www.javaperformancetuning.com/news/qotm033.shtml
http://java.sun.com/docs/hotspot/HotSpotFAQ.html#gc_pooling
http://docs.sun.com/source/816-7159-10/pt_chap5.html
http://www.microjava.com/articles/techtalk/recylcle2


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-23 20:49:52 +00:00
orbiter
3cb9cdc9be try to fix connection problem, possible cause for wrong junior status and non-passive passive peers:
the YaCy client treats disconnections during data transmissions as error and discards all data transmitted so far
this did not happen so far until I removed a delay time at the end of the daemon session which prevented this case.
To fix this problem, disconnections during transmissions are not treated as error now, which means that end-of-transmissions
with sudden disconnections are not a cause for peer diconnections any more. To be nice to non-updated peers, the sleep time
at the end of server sessions is also re-enabled.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-23 17:31:29 +00:00
fuchsi
ae4b9308ef Fix problems with some web servers which couldn't handle the way yacy was sending requests. Thx to celle for the patch.
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=320

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4089 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-10 09:15:28 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
a34d9b8609 * added a search history cache that maintains search results for 10 minutes
it is necessary for the new search process that will do automatic re-searches
a positive effect is, that when a re-search is done it can be monitored how many
results had been contributed from other peers. The message for this contribution
was moved from the end of the result page to the top.
* enhanced re-search time when a global search was done an the local index has
already a great number of results for this word
* re-organised presearch computation; must be further enhanced

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-24 23:12:59 +00:00
orbiter
bb426565f0 added new yacy protocol for mass url-pull for better remote crawling distribution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-22 00:59:05 +00:00
low012
54004e929b *) Better Bourne-Shell (OpenSolaris) compatibility, update and restart really work now. As the Bourne-Shell is the grandfather of most modern shells, it should also work with Linux (tested with Mandriva, works) and OSX (Please test!).
*) Fixed a typo.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 21:52:52 +00:00
orbiter
344911bfaa shorter minimum delay values for intranet crawl targets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 23:18:12 +00:00
orbiter
b5346141b3 made the plasmaHTCache static (there is only one internet, so we need only one cache)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4045 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 21:31:31 +00:00
orbiter
757703a938 synchronization of access tracker to avoid java-internal loop in TreeMap during shutdown
see http://forum.yacy-websuche.de/viewtopic.php?p=1178#p1178

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4017 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-31 10:42:11 +00:00
orbiter
9ca46a8c69 indexing of local (intranet) urls enabled
To do this, one must create a separate YaCy network that has a local URL domain
A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-24 00:46:17 +00:00
orbiter
511dcbb172 fixed encoding bug made in SVN 3993
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3998 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-23 00:50:57 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
b6d9cca67e - fixed problem with yacyVersion and own version generation
- within this context: generalized date format handling
- extended Update interface:
 * a version lookup can be triggered manually
 * a complete lookup + download + re-boot process can be triggered with one click

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-16 23:47:21 +00:00
orbiter
5444b07674 fixed bug with decompression of index abstracts
this fixes a problem that occurred when searching for several words

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-15 12:39:16 +00:00
orbiter
924ae39170 replaced old map loading method with new implementation which is more robust against change of line termination methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-15 11:45:41 +00:00
orbiter
36a37f758b fix for oom exception during release download
see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=101&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-03 22:55:47 +00:00
orbiter
21fabe259b another fix to the restart function; now tested under linux
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-01 22:43:08 +00:00
orbiter
28baecd41b another fix for the concurrentModificationException in AccessTracker
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3944 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-30 23:01:57 +00:00
orbiter
19786b73b6 next try for a better restart
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-29 21:36:18 +00:00
orbiter
c5c268c43e tried to fix restart button
** kann das mal jemand auf seiner linux-platform testen **
** und feed-back geben ob der restart funktionier ? **

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-29 12:46:08 +00:00
orbiter
e03fcf4627 SSI fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=29
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-29 10:45:13 +00:00
orbiter
9bbd39b67c - removed unfinished auto-updater from roland and martin
- added new download-option for releases on the status page
still mising:
- thomas-style restart for linux/mac
- untar/gunzip on shell basis
(comes next)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-28 14:52:26 +00:00
orbiter
1782ef57e5 - added SSI parser and include directive for <!--# include virtual="<file>" -->
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-26 14:37:10 +00:00
orbiter
0e57a8062b added network definition for different YaCy networks
(needs much more work)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-22 14:29:14 +00:00
michitux
25529290ca - 2 small changes in documentation
- hopefully fixed logging of GCs (in order to avoid things like "performed necessary GC, freed 18014398509481565 KB (requested/available/average: 4096 / 1631 / 2957 KB)") with the help of KoH


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-17 19:32:38 +00:00
orbiter
6518bb6c08 changed release strategy:
we will provide two different releases in the future, one standard release and one 'pro'-release.
the 'pro'-release contains all additional parsers AND has different default performance values.
The pro-version differs therefore from the previous 'all'-version by this default values.
The pro-configuration is automatically choosen if the libx-folder exists. If a version is once initialized, its configuration stays independently from an existing libx folder.
The ant targets had been changed. There are now 3 different targets to create standard and pro-releases, and one target to upgrade:
- dist: creates a standard release (only, no libx target any more)
- distPro: creates a pro-release (includes the libx)
- distExt: creates a libx-release which includes the libx-folder only. It may be used to upgrade from standard to pro
Furthermore, the naming of 'dev'-releases had been removed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-16 14:11:52 +00:00
orbiter
069562a14d fixed problem with re-crawl; replaced error file-db with ram-db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-15 23:47:08 +00:00
orbiter
c7a614830a several bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3899 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-15 17:45:49 +00:00
(no author)
2784820ee3 *) moving sleep to a better place
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 16:53:22 +00:00
theli
7a1b811d18 *) bugfix for SocketException:
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 15:58:10 +00:00
orbiter
2b937abef1 slighlty different behavior in shutdown sequence for http server threads:
- first close streams
- make pause (that one that was made in httpdFileHandler)
- close sockets

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 11:58:20 +00:00
karlchenofhell
e1d809d5f1 - more detailed logging of MEMORY messages
- forced GCs don't contribute to heuristics anymore

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3881 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-13 15:03:56 +00:00
orbiter
0b10ef64ba better server access tracking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-13 13:05:51 +00:00
orbiter
66ec8b63c1 added a httpd access tracker:
- all requests to the own httdp can now be listed in the access tracker menu
- the search statistics had been renamed to access tracker and extended by this tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-11 14:05:20 +00:00
karlchenofhell
8bff810d19 - fixed logging output of serverMemory.request()
- don't start up if DATA/yacy.running exists as this is usually a sign of an already started yacy-instance

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 12:45:03 +00:00
karlchenofhell
f05ca43780 - the wiki-parser works for remote wiki-code now, not displaying links anymore as if they were local (ViewProfile comment)
- fixed wrong link to CrawlStart on Status-page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 11:35:48 +00:00
karlchenofhell
30c3d909b1 - fixed charset problem in ConfigProfil_p.html (use accept-charset="UTF-8" in forms)
- fixed wrong XML output if no peers are known in Network.xml
- simplified parsing of table properties in wikiCode and ZTableToken
- reimplemented GC heuristics. They are needed to constantly ensure that an amount of free memory is available which is higher than Java's max. limit for performing a Full GC (please use serverMemory.request(long, boolean) rather than serverMemory.available(long, boolean) to provide data for averaging over the last GCs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-05 11:37:19 +00:00
theli
e1a5babff1 *) Logging GUI handler: line-size is now set to max-size if max-size was exceeded
See: http://www.yacy-forum.de/viewtopic.php?p=36355

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3786 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-02 21:23:32 +00:00
(no author)
94cc9f05f5 *) Improvements for restart via update wrapper
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-02 15:25:13 +00:00
orbiter
33ad0c8246 added a web structure computation and logging:
- all web page parsing operations will now increase a web structure file
- the file is computed in memory and dumped at shutdown-time to PLASMASB/webStructure.map in readable form (not a database)
- the file can be used externally to analyse the link structure of the crawled pages
- the web structure can also be retrieved using a xml-interface at http://localhost:8080/xml/webstructure.xml
- the short-term purpose is the computation of a link-graph image (before linuxtag!)
- a long-term purpose could be a decentralized computation of the citation rank



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-22 08:13:48 +00:00
karlchenofhell
baa9402b97 - wiki-parser is now configurable via the config setting wikiParser.class which holds the class-name for the parser to use
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 16:19:25 +00:00
karlchenofhell
601fc7d1c5 - added source to J7Zip-modifed.jar and it's license (changelog is still to come)
- moved HTML-*replace-methods from wikiCode to de.anomic.data.htmlTools
- prepared use of different wiki parsers as suggested here: http://www.yacy-forum.de/viewtopic.php?p=34444#34444

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 13:29:12 +00:00
karlchenofhell
0a64047081 - plasmaParserDocument can process subdocuments now (other archive-parsers may want to use this method)
- added 7zip parser
- added 'text/sgml' to realtime parseable mimetypes (sometimes returned by the mime type parser)
- added new cached output stream class, very suitable for parsers because of limited memory

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3740 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-18 23:13:44 +00:00
theli
b30e64daab *) passing homepath to serverLog.configureLogging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-18 13:04:26 +00:00
orbiter
b3f97b5c38 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3735 6c8d7289-2bf4-0310-a012-ef5d649a1542 2007-05-16 17:45:39 +00:00
karlchenofhell
086239da36 - added servlet: remote crawler queue overview
- added servlet: crawl profile editor

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-16 10:11:25 +00:00
orbiter
2fa8b50e54 reverting svn 3691+3692
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 19:31:40 +00:00
orbiter
22a0e9f117 more timeout-control
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 14:53:17 +00:00