Commit Graph

307 Commits

Author SHA1 Message Date
theli
7a1b811d18 *) bugfix for SocketException:
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 15:58:10 +00:00
orbiter
2b937abef1 slighlty different behavior in shutdown sequence for http server threads:
- first close streams
- make pause (that one that was made in httpdFileHandler)
- close sockets

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-14 11:58:20 +00:00
karlchenofhell
e1d809d5f1 - more detailed logging of MEMORY messages
- forced GCs don't contribute to heuristics anymore

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3881 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-13 15:03:56 +00:00
orbiter
0b10ef64ba better server access tracking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-13 13:05:51 +00:00
orbiter
66ec8b63c1 added a httpd access tracker:
- all requests to the own httdp can now be listed in the access tracker menu
- the search statistics had been renamed to access tracker and extended by this tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-11 14:05:20 +00:00
karlchenofhell
8bff810d19 - fixed logging output of serverMemory.request()
- don't start up if DATA/yacy.running exists as this is usually a sign of an already started yacy-instance

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3831 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-08 12:45:03 +00:00
karlchenofhell
f05ca43780 - the wiki-parser works for remote wiki-code now, not displaying links anymore as if they were local (ViewProfile comment)
- fixed wrong link to CrawlStart on Status-page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3816 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-07 11:35:48 +00:00
karlchenofhell
30c3d909b1 - fixed charset problem in ConfigProfil_p.html (use accept-charset="UTF-8" in forms)
- fixed wrong XML output if no peers are known in Network.xml
- simplified parsing of table properties in wikiCode and ZTableToken
- reimplemented GC heuristics. They are needed to constantly ensure that an amount of free memory is available which is higher than Java's max. limit for performing a Full GC (please use serverMemory.request(long, boolean) rather than serverMemory.available(long, boolean) to provide data for averaging over the last GCs)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-05 11:37:19 +00:00
theli
e1a5babff1 *) Logging GUI handler: line-size is now set to max-size if max-size was exceeded
See: http://www.yacy-forum.de/viewtopic.php?p=36355

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3786 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-02 21:23:32 +00:00
(no author)
94cc9f05f5 *) Improvements for restart via update wrapper
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-02 15:25:13 +00:00
orbiter
33ad0c8246 added a web structure computation and logging:
- all web page parsing operations will now increase a web structure file
- the file is computed in memory and dumped at shutdown-time to PLASMASB/webStructure.map in readable form (not a database)
- the file can be used externally to analyse the link structure of the crawled pages
- the web structure can also be retrieved using a xml-interface at http://localhost:8080/xml/webstructure.xml
- the short-term purpose is the computation of a link-graph image (before linuxtag!)
- a long-term purpose could be a decentralized computation of the citation rank



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-22 08:13:48 +00:00
karlchenofhell
baa9402b97 - wiki-parser is now configurable via the config setting wikiParser.class which holds the class-name for the parser to use
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3742 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 16:19:25 +00:00
karlchenofhell
601fc7d1c5 - added source to J7Zip-modifed.jar and it's license (changelog is still to come)
- moved HTML-*replace-methods from wikiCode to de.anomic.data.htmlTools
- prepared use of different wiki parsers as suggested here: http://www.yacy-forum.de/viewtopic.php?p=34444#34444

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-20 13:29:12 +00:00
karlchenofhell
0a64047081 - plasmaParserDocument can process subdocuments now (other archive-parsers may want to use this method)
- added 7zip parser
- added 'text/sgml' to realtime parseable mimetypes (sometimes returned by the mime type parser)
- added new cached output stream class, very suitable for parsers because of limited memory

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3740 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-18 23:13:44 +00:00
theli
b30e64daab *) passing homepath to serverLog.configureLogging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3738 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-18 13:04:26 +00:00
orbiter
b3f97b5c38 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3735 6c8d7289-2bf4-0310-a012-ef5d649a1542 2007-05-16 17:45:39 +00:00
karlchenofhell
086239da36 - added servlet: remote crawler queue overview
- added servlet: crawl profile editor

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-16 10:11:25 +00:00
orbiter
2fa8b50e54 reverting svn 3691+3692
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3696 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 19:31:40 +00:00
orbiter
22a0e9f117 more timeout-control
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 14:53:17 +00:00
orbiter
24db55a541 added timeout for httpd-sockets during read
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 14:30:01 +00:00
orbiter
7f56c8d4aa fixed some seed selection details
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3685 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-07 22:22:35 +00:00
theli
0b5fc3c28c *) moving date functions to serverDate class
*) Sitemap-parser
   - logging added
   - parsing of modDate added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-06 12:36:49 +00:00
rramthun
d6811ac243 *) Moving tar.jar from libx to lib
*) Enhanced interface

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3649 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-04 19:46:23 +00:00
theli
469583ea80 *) new interface class. should be implemented by the updater to allow communication between the updater and yacy
(not yet functional)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3648 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-04 14:22:10 +00:00
theli
7c902996b5 *) changes required for the uploaderWrapper
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3618 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-28 16:04:37 +00:00
orbiter
595ee10468 fixed datatabase inconsistency bugs
inserted many debug lines
added a huge number of asserts
extended database test methods


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3579 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-19 13:37:02 +00:00
orbiter
7a7a1c7c29 fight against problems with remove-methods and synchronization
- some bugs may have been fixed with wrong removal operations
- removed temporary storage of remove-positions and replaced by direct deletions
- changed synchronization
- added many assets
- modified dbtest to also test remove during threaded stresstest

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3576 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-17 15:15:47 +00:00
(no author)
6186185775 *) Moved some comments to javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3573 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-14 10:11:37 +00:00
orbiter
fcdf000fbc bugfix for http://www.yacy-forum.de/viewtopic.php?p=33838#33838
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-03 22:08:40 +00:00
orbiter
ba2c307ab3 optimized memory allocation in kelondroRow.Entry
such an entry cannot be instantiated without allocation of new byte[]; instead
it can re-use memory from other kelondroRow.Entry objects.
during bugfixing also other bugs may have been solved, maybe the INCONSISTENCY problem
could have been solved. One cause can be missing synchronization during bulk storage
when a R/W-path optimization is done. To test this case, the optimization is currently
switched off.
More memory enhancements can be done after this initial change to the allocation scheme.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3536 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-03 12:10:12 +00:00
orbiter
a5d668c0c6 added speed-buttons for easy performance setting
appears in crawl start and on indexing monitor page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-12 16:24:28 +00:00
orbiter
d755a8026d - better OOM protection
- better memory allocation for FlexTable indexes
- splitting between static index and dynamic index (only the dynamic part must grow)
- to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes
- added new iterator classes that support cloneable iterators
- adopted all iterator classes to implement cloneable itarators

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 16:15:40 +00:00
orbiter
5d5e6ebfcc fix for http://www.yacy-forum.de/viewtopic.php?p=32631#32631
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 08:54:07 +00:00
orbiter
51e12049fa third generation of R/W head path optimization
- data from collection arrays are read in order
- merged data is written in order

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-28 11:13:23 +00:00
karlchenofhell
6fbe31425a - some code-cleanup (no more syntax-warnings here)
- added deletion from loadedURLs of URLs to be blacklisted in IndexControl_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 12:56:50 +00:00
orbiter
dc0c06e43d PLEASE MAKE A BACK-UP OF YOUR COMPLETE DATA DIRECTORY BEFORE USING THIS
redesign for better IO performance
enhanced database seek-time by avoiding write operations at distant
positions of a database file. until now, a USEDC counter was written
at the head-section of a kelondroRecords database file (which is the
basic data structure of all kelondro database files) to store the
actual number of records that are contained in the database. Now, this
value is computed from the database file size. This is either done
only once at start-time, or continuously when run in asserts enabled.
The counter is then updated only in RAM, and written at close of the
file. If the close fails, the correct number can be computed from the
file size, and if this is not equal to the stored number it is a strong
evidence that YaCY was not shut down properly.
To preserve consistency, the complete storage-routine had to be re-written.
Another change enhances read of nodes in some cases, where the data-tail
can be read together with the data-head. This saves another IO lookup during
each DB node fetch.
Includes also many small bugfixes.
IF ANYTHING GOES WRONG, ALL YOUR DATA IS LOST: PLEASE MAKE A BACK-UP

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 08:35:51 +00:00
karlchenofhell
c016fcb10f - added streaming-support to CrawlURLFetchStack_p servlet
- bug for NPE in list.java
- use more constants

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-19 12:47:46 +00:00
orbiter
1f1f398bfa enhanced speed of RAM cache flush by factor 20 (twenty times faster)
- the speed was doubled by avoiding read access during the dump
- the speed was dramatically increased at least by factor 10
   by using a temporary ram-file where the structures are flushed to
   before it is dumped then as a whole byte-chunk to the file system.
The speed enhancements also affects some other parts of the database.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-08 23:21:46 +00:00
orbiter
7673f0869b minor enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 16:01:03 +00:00
orbiter
fcc11391a8 some redesign attempts because sorting of lastseen does not work correctly
not finished yet
target: better selection of peer-ping targets, which should enhance stabilization of the net

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3319 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-02 13:12:31 +00:00
orbiter
306c50ac40 QPM (queries per minute) statistic stub
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-31 15:39:11 +00:00
orbiter
7598e1243e removed unused variables/imports
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3306 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-31 09:28:47 +00:00
allo
98cb777e18 abstract wikiCode in putWiki
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3293 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-29 15:09:58 +00:00
karlchenofhell
15f0334cd3 - fixed IllegalThreadStateException in LogParser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-21 14:45:52 +00:00
hydrox
814a09a0ed *) reversed r3250 and parts of r3252 (nanotime() is an java1.5 function)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3253 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 11:10:57 +00:00
hydrox
f7623f5d24 *) added missing measuring points for Parser-Runtime
*) changed precision of Parser-Runtime from ms to ns

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3250 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 09:25:04 +00:00
karlchenofhell
5d540b219e - LogalizerHandler skips interfaces again
- added LogParser stats to LogStatistics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3234 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 17:01:20 +00:00
allo
e1fb3550ab fix for profile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 14:52:51 +00:00
hydrox
6faf9b70b7 *) LogParserPLASMA now counts its total runtime.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3229 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 13:35:33 +00:00
hydrox
e5f854bc37 *) added LogalizerHandler-settings to yacy.logging.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 13:25:11 +00:00