Commit Graph

577 Commits

Author SHA1 Message Date
orbiter
66c0a8e849 more PMD recommendations
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6567 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-11 22:18:38 +00:00
orbiter
909a4f91c7 added a logging output for crawl starts that shows the URL that can be used to start the crawl again
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6566 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-11 18:10:39 +00:00
orbiter
dd459281c8 applied code changes that are recommended by PMD
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6563 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 23:09:48 +00:00
orbiter
a3b8b7b5c5 some redesign of the main menu structure:
- moved all index generation servlets to it's own main menu item, including proxy indexing
- removed external index import because this operation is not recommended any more. Joining an index can simply be done by moving the index files from one peer to the other peer; they will be merged automatically
- fix to prevent endless loops when disconnecting http sessions
- fix to prevent application of bad blacklist entries that can cause a 'Dangling meta character' exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-01-10 00:10:43 +00:00
orbiter
8281e29963 - more configuration for profiling graph (number of events)
- more logging for a shutdown: print reason and accessing IP into log


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6520 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-08 14:25:51 +00:00
orbiter
e34e63a039 preset of proper HashMap dimensions: should prevent re-hashing and increase performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 14:01:19 +00:00
orbiter
4a5100789f replaced _all_ size() == 0 with isEmpty() and all size() > 0 with !isEmpty(). The isEmpty() method is much faster in some cases, especially when used to access badly balanced hashtables where an size() operation becomes a large iteration.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-02 00:37:59 +00:00
orbiter
f4946eaf27 - better thread dump
- suppressed one server exception

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6509 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-01 22:53:36 +00:00
orbiter
9743b70d1c disabled keep-alive of server, not really needed for speed but a cause for much trouble and memory occupancy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6508 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-12-01 19:14:16 +00:00
orbiter
23aef43786 - better synchronization in SortStack
- better ThreadGroup organization
- less worker threads for media search (64 was too much...)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-20 14:35:33 +00:00
orbiter
013f337d3f - avoid unnecessary host name lookups for localhost
- avoid unnecessary reverse domain name lookups for remote access

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6481 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-16 23:00:54 +00:00
orbiter
18b21eaffe small fixes to search default values and server logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-06 19:13:35 +00:00
orbiter
4431b9767e added about 450 replacements for printStackTrace() methods to pipe such traces into the log at DATA/LOG/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 20:28:37 +00:00
orbiter
e3025ee691 - new icon for OAI-PMH loading action
- added many stack trace outputs for exceptions in crawl profile handler to find the 'missing profile handle' bug
- catched one more timeout exception in httpd file loader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 16:40:15 +00:00
orbiter
f0b8db93f0 - more abstraction of serverCore thread access
- no more keep-alive when number of connections exceeds 1/2 of the allowed number of connection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-05 14:54:43 +00:00
orbiter
2889b9426e missing code for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 12:03:19 +00:00
orbiter
b6a8887ff5 better handling of running sessions without explicit hashtable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 11:59:15 +00:00
orbiter
1dc7ea986a added a dynamic keep-alive time-out for http server sessions:
if there are many concurrent server sessions, the timout is decreased.
This should avoid a situation where the clean-up thread is too
late to stop running http sessions that should be terminated
before the maximum number of server sessions is reached.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 11:01:09 +00:00
orbiter
77c99e500f added more control over memory allocation
should avoid some of the OOMs

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-27 15:25:48 +00:00
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
orbiter
b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 00:53:43 +00:00
low012
8829ec5f18 *) made sure that   is replaced with a space and not just deleted in CharacterCoding.java
*) added annotations and made minor changes to serverObjects.java
*) set subversion properties for several files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-13 20:57:56 +00:00
orbiter
e7f18ba24b refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 00:24:42 +00:00
orbiter
ce8dc575ca refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 00:12:19 +00:00
orbiter
bea3b99aff moved table and util classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 01:14:19 +00:00
orbiter
f677d534b1 start of a really extensive refactoring which will produce a hierarchical package structure with the domain yacy.net as package root
- moved here the logging classes as part of the new net.yacy.kelondro package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 23:13:30 +00:00
orbiter
ea473e32b8 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6390 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-09 22:27:50 +00:00
orbiter
09de5da74a once again a performance hack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-08 18:26:54 +00:00
orbiter
6aa474f529 - better logging for web cache access and fail reasons
- better Exception handling for web cache access
- distinction between access of web cache for proxy and crawler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-01 13:08:19 +00:00
orbiter
58a00205d5 re-activated the emergency close when too many server connections exist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-30 14:29:43 +00:00
orbiter
c57d2070e6 more logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6363 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-30 13:25:08 +00:00
orbiter
a995b95367 tried a fix for the httpd access bug (too many unclosed sessions)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-30 13:18:02 +00:00
orbiter
23ab6fbca4 - navigation appear at correct position when opengeodb-results are also presented after a search
- show an about box if about.headline and about.body is set

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-20 22:10:45 +00:00
orbiter
68465c37af added a convenience class to add files into a YaCy index
to make this possible, the yacyURL must be able to process file:// urls, which has also been implemented
testing of the new class resulted in some bugfixes in other classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-09-14 21:17:42 +00:00
orbiter
1a9cfd8718 some performance hacks (CPU only, not IO)
this will cause better computation speed for single- and multi-core;
there are enhancements that will speed up old and slow machines as well
as multi-core CPUs. Indexing of surrogates has been speed up
from 4000 PPM to over 20000 PPM on a simple dual core office computer.
Since the enhancements are mostly in core routines, the hack should also
speed up search performance.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-28 13:28:11 +00:00
orbiter
1d8d51075c refactoring:
- removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here:
http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html
We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages.
- cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http.
- because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-19 20:37:44 +00:00
orbiter
5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
- The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well.
- Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified.
- Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed.
- The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here.
- Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 13:59:21 +00:00
orbiter
499723891d removed all non-http daemons; they had not been used and may be a potential security risk.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6185 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-08 22:24:34 +00:00
orbiter
dafffd0153 refactoring of parsers and document processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-08 21:48:08 +00:00
orbiter
aac89bf8ca trying to avoid "exceeding limit" message of server
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-02 15:47:42 +00:00
orbiter
409538e17a code cleanup and code simplifcation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6161 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 22:20:55 +00:00
orbiter
1f1399e5c5 extending visibility of objects and methods to avoid synthetic accessor methods and increase performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 13:25:46 +00:00
orbiter
d1083a6913 maybe we have less problems with open connections to the server if we don't do BF forced sleeps (just a test)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6149 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-28 10:35:12 +00:00
f1ori
7eb3bff5b3 * workaround for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2220&hilit=#p16128
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6143 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-26 14:05:39 +00:00
orbiter
bdda140c02 fix for json output (no doubleqotes any more, doublequote quoting did not work)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-20 23:33:29 +00:00
orbiter
fd31a3616a - more logging in server process
- fix for bas ascii in comment

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-16 15:10:59 +00:00
orbiter
ce1adf9955 serialized all logging using concurrency:
high-performance search query situations as seen in yacy-metager integration showed deadlock situation caused by synchronization effects inside of sun.java code. It appears that the logger is not completely safe against deadlock situations in concurrent calls of the logger. One possible solution would be a outside-synchronization with 'synchronized' statements, but that would further apply blocking on all high-efficient methods that call the logger. It is much better to do a non-blocking hand-over of logging lines and work off log entries with a concurrent log writer. This also disconnects IO operations from logging, which can also cause IO operation when a log is written to a file. This commit not only moves the logger from kelondro to yacy.logging, it also inserts the concurrency methods to realize non-blocking logging.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-15 21:19:54 +00:00
orbiter
0fc1168554 - reduced time-out for socket-connection communication from 20 seconds to 5 seconds. This is a test to find out if the time-out was a cause for problems in metager environments
- turned a fine log entry in case of rejected connections on the server socket into a warning. (look for 'exceeding limit')


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-11 10:20:31 +00:00
orbiter
1c77db670f re-designed response format for navigation:
- changed json and rss response templates


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6019 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-04 10:54:49 +00:00
orbiter
e735d3a69f fix for http://forum.yacy-websuche.de/viewtopic.php?p=15175#p15175
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-26 15:03:50 +00:00