Commit Graph

5974 Commits

Author SHA1 Message Date
orbiter
6354b5e447 removed possible deadlock, see
http://forum.yacy-websuche.de/viewtopic.php?p=17017#p17017

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-07 12:04:14 +00:00
orbiter
5cc17ccf8a a better caching with less overhead and more appropriate
synchronisation use in more than 10 different data objects

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6250 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-07 11:55:32 +00:00
orbiter
2e01bd955d wrong display of hints / hints wrong / incomplete
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6249 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-07 11:32:41 +00:00
orbiter
39ae96450b draw more peers in network picture
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-07 08:36:15 +00:00
orbiter
92edd24e70 fixed problem with switching of networks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6247 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-30 15:49:23 +00:00
orbiter
0575f12838 fix for deadlock
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6246 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-30 09:08:44 +00:00
orbiter
fbfdaf063d - patch to omit IndexOutOfBoundsException when a b64-encoded key appears not to be well-formed. In that case the key is still accepted but rated higher than other regular keys to create a virtual ordering between well-formed and ill-formed keys
- check routine at the beginning of the import of table keys that check that all imported keys are well-formed. All records that have a ill-formed key are deleted. This is a hack and is not tested since I don't have bad data here to test with. If the effect is seen in the wild, please report in the forum.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6245 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-29 19:43:11 +00:00
orbiter
65b1d51e70 added xml version of windows office test files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6244 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-27 12:45:15 +00:00
orbiter
c0e17de2fb - fixes for some problems with the new crawling/caching strategies
- speed enhancements for the cache-only cache policy by using special no-delay rules in the balancer
- fixed some deadlock- and 100% CPU problems in the balancer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6243 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-25 21:38:57 +00:00
orbiter
634a01a9a4 replaced wget-requests with caching requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6242 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-24 14:52:27 +00:00
orbiter
c6c97f23ad - added cache usage properties to crawl start
- added special rule to balancer to omit forced delays if cache is used exclusively
- extended the htCache size by default to 32GB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-24 11:54:04 +00:00
orbiter
c4ae2cd03f fixed bug that caused deletion of crawl profiles at every application startup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6240 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-23 22:09:02 +00:00
orbiter
161d2fd2ef redesign of access to the HTCache (now http.client.Cache):
- better control to the cache by using combined request-header and content access methods
- refactoring of many classes to comply to this new access method
- make shure that the cache is always written if something was loaded
- some redesign of the process how http response results are feeded into the new indexing queue
- introduction of a cache read policy:
 * never use the cache
 * use the cache if entry exist
 * use the cache if the proxy freshness rule confirmes
 * use only the cache and go never online
- added configuration options for the crawl profiles to use the new cache policies. There is not yet a input during crawl start to set the policy but this will be added in another step.
- set the default policies for the existing crawl profiles. If you want them to appear in your default profiles you must delete the crawl profiles database; othervise the policy is 'proxy freshness rule'
- enhanced some cache access methods in such a way that unnecessary retrievals are omitted (i.e. for size computation). That should reduce some IO but also a lot of CPU computation because sizes were computed after decompression of content after retrieval of the content from the disc.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6239 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-23 21:31:51 +00:00
lulabad
da43164dd6 fix for UNRESOLVED_PATTERN see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2300
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6238 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-23 06:02:36 +00:00
daburna
d7c9c765bb changes by Thomas Süß
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6237 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-22 17:24:04 +00:00
f1ori
ba2e6de538 fix empty version string again
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6236 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-21 19:56:40 +00:00
daburna
53081ee6da changes by Thomas Süß
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-21 12:22:05 +00:00
orbiter
51534df0cb fix for possible synchronization problem
see also: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2292&hilit=&p=16787#p16787

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6234 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-20 08:21:17 +00:00
orbiter
4da9042e8a code simplification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6233 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-19 21:59:29 +00:00
orbiter
1d8d51075c refactoring:
- removed the plasma package. The name of that package came from a very early pre-version of YaCy, even before YaCy was named AnomicHTTPProxy. The Proxy project introduced search for cache contents using class files that had been developed during the plasma project. Information from 2002 about plasma can be found here:
http://web.archive.org/web/20020802110827/http://anomic.de/AnomicPlasma/index.html
We stil have one class that comes mostly unchanged from the plasma project, the Condenser class. But this is now part of the document package and all other classes in the plasma package can be assigned to other packages.
- cleaned up the http package: better structure of that class and clean isolation of server and client classes. The old HTCache becomes part of the client sub-package of http.
- because the plasmaSwitchboard is now part of the search package all servlets had to be touched to declare a different package source.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6232 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-19 20:37:44 +00:00
f1ori
67da20647f * add new odf parser based on sax-xml-parser
* remove odf_utils-jar
* test metadata in ParserTest


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-18 15:04:34 +00:00
lotus
de4f0a006f removed superfluous windows target
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6230 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-18 14:46:42 +00:00
f1ori
06557485f5 * added parser unittest!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6229 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 22:03:34 +00:00
f1ori
69dfd03985 reactivate unittests
* fix old tests
* add buildtarget "ant test"


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 20:58:21 +00:00
f1ori
6d0e6d591b * ops, fix compiler error :(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6227 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 20:02:56 +00:00
f1ori
3e5beb1654 * fix for empty version in seedlist
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6226 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 19:16:26 +00:00
orbiter
5bb8074150 removed the indexing queue. This queue was superfluous since the introduction of the blocking queues last year, where documents are parsed, analysed and stored in the index with concurrency.
- The indexing queue was a historic data structure that was introduced at the very beginning at the project as a part of the switchboard organisation object structure. Without the indexing queue the switchboard queue becomes also superfluous. It has been removed as well.
- Removing the switchboard queue requires that all servlets are called without a opaque generic ('<?>'). That caused that all serlets had to be modified.
- Many servlets displayed the indexing queue or the size of that queue. In the past months the indexer was so fast that mostly the indexing queue appeared empty, so there was no use of it any more. Because the queue has been removed, the display in the servlets had also to be removed.
- The surrogate work task had been a part of the indexing queue control structure. Without the indexing queue the surrogates needed its own task management. That has been integrated here.
- Because the indexing queue had a special queue entry object and properties attached to this object, the propterties had to be moved to the queue entry object which is part of the new indexing queue withing the blocking queue, the Response Object. That object has now also the new properties of the removed indexing queue entry object.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 13:59:21 +00:00
orbiter
597393db3b changed default visibility of classes/objects in upnp lib
(eclipse tells me that this would improve performance,
 however, this removes compiler warnings)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6224 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-16 12:19:40 +00:00
orbiter
eea4c17ef2 removed rpm parser
- no-one used that thing
- loading huge rpm files bay be causes for crashes


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6223 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-16 11:06:49 +00:00
orbiter
b332dfad67 - inserted request object into response object which carries this now instead generating new objects
- fixed a problem with the crawler introduced in SVN 6216

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 23:08:35 +00:00
orbiter
ca72ed7526 -removed superfluous crawl cache
-refactoring of crawler classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6221 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 21:07:46 +00:00
orbiter
8103ccec4c removed compiler warnings in imported classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 20:44:23 +00:00
lotus
52e371b8f7 suppress warnings for upnplib code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6219 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 16:22:56 +00:00
lotus
477807e0e6 * updated jxpath to latest v1.3
* added upnplib as source
	without packages:
	jmx
	remote
	samples

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6218 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 16:13:24 +00:00
orbiter
049fb23a8d removed unused/unsupported ant targets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 14:16:25 +00:00
orbiter
13c63f4082 a set of small fixes to crawling behaviour
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6216 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 14:15:51 +00:00
orbiter
a564df3984 update to mime types in parsers and httpd.mime
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6215 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-15 14:10:29 +00:00
orbiter
43c8defd79 enhanced parser with more extension + mime attributes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 13:32:53 +00:00
orbiter
aee35bff6f replaced StringBuffer with StringBuilder in tar lib
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6213 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 13:31:57 +00:00
orbiter
49bbb9bd45 replaced tar library with integrated apache ant tar lib
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6212 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 11:31:40 +00:00
orbiter
f987fc6b4a added tar classes from apache ant tools
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6211 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 11:25:40 +00:00
orbiter
f2d4b6d7fa added tar classes from apache ant tools
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6210 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 11:25:05 +00:00
orbiter
b2263bc720 enhanced document type recognition
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6209 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 11:01:05 +00:00
lotus
aa38eb5a20 * maxfilesize -1 for infinite filesize
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6208 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 08:39:39 +00:00
orbiter
7d493cf8cc moved parser configuration in separate servelet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-14 06:57:13 +00:00
lotus
9cfe89c8fc * process content-length as soon as it is received
* corrected indentation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6206 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-13 19:55:13 +00:00
orbiter
5240d22773 removed unused library jsmooth
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6205 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-13 18:16:03 +00:00
orbiter
3d26161dd1 removed unused libraries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6204 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-13 14:47:09 +00:00
orbiter
50cf80056f removed jmimemagic library
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6203 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-13 10:58:37 +00:00
orbiter
e3c7f61145 removed unused libraries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6202 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-13 10:21:22 +00:00