Commit Graph

525 Commits

Author SHA1 Message Date
allo
ada06b0674 bugfix for Networkimage from Hydrox
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-26 10:08:37 +00:00
orbiter
1aa4ba8b62 added post-search filtering of redundant urls (longer than existing cited)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@982 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-25 09:06:00 +00:00
orbiter
8d827cdb30 tried to fix problems with order of network list by last-seen (which could also improve the network picture)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@980 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-24 14:07:43 +00:00
orbiter
097009d910 experimental visualization of DHT access during global search (temporary)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-24 00:34:15 +00:00
orbiter
4dcbc26ef1 introduction of search profiles; very experimental
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 17:50:27 +00:00
theli
6c48c3ce39 *) Bugfix for ArithmeticException during IndexTransfer
See: http://www.yacy-forum.de/viewtopic.php?t=1362

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 16:07:44 +00:00
theli
525c8dcbd4 *) Adding Traffic Statistic for Crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@972 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 12:35:48 +00:00
theli
9a5ab62928 *) Adding yacy specific X-YACY-Index-Control header which can be used by clients
to disallow yacy to index the response that belongs to the request where 
   X-YACY-Index-Contro is set to "no-index"

*) Bugfix for Seed-List download via Remote Proxy.
   Now the pragma and cache-control http headers of the request are properly set to "no-cache" 
   See: http://www.yacy-forum.de/viewtopic.php?p=11639#11639

*) Bugfix for http-Proxy
   yacy has ignored "no-cache"- pragma and cache-control http headers that were send in requests.
   Now, these request headers are evaluated properly

TODO: Missing evaluation of "no-store" request headers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 10:35:05 +00:00
theli
02d9af1a70 *) Restructuring and extending of Remote Proxy Support
- remote proxy configuration can now be "really" changed on the fly and takes effect immediately
   - adding possibility to disable remote proxy usage for yacy->yacy communication
   - adding possibility to disable remote proxy usage for ssl
   - restructuring proxy configuration so that it is stored in a single place now

*) Adding possibility to import a foreign word DB (or even more of them in parallel) 
   at runtime into the peers DB
   - this can be done by calling IndexImport_p.html 
   - ATTENTION: please not that at the moment this thread must be aborted via gui
     before a normal server shutdown is done. 
   - TODO: integrating IndexImport Thread into normal server shutdown
   - TODO: Adding posibility to import crawl-queues, etc. from foreign peers
   - TODO: removing old import function from yacy.java and calling the new routines instead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-22 13:28:04 +00:00
borg-0300
58b670201d now, changed HTCacheSize needs no restart
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 17:59:54 +00:00
theli
40777556c5 *) Connection Tracking
- adding automatic refresh
   - accepts new parameter nameLookup which can be used to deactivate 
     yacy-peer name lookup (because we have problems with this on large seed-dbs)

*) ViewFile
   New page that can be used to view 
   - original content 
   - plain text content 
   - parsed content
   - parsed sentences 
   of a webpage specified by there url hash
   Mainly for debugging purpose at the moment

*) Robots.txt 
   Bugfix for if-modified-since usage
   TODO: synchronization of downloads to avoid loading the same robots-file 
   multiple times in parallel by different threads

*) Shutdown
   Better abortion of transferRWI and transferURL sessions on server shutdown

*) Status Page
   Adding icon to start/stop crawling via status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-18 07:45:27 +00:00
rramthun
a98bafb939 Changes to german language file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-14 20:36:45 +00:00
theli
95abdeb685 *) Bugfix for nextElement function of URL Enumerator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-14 08:51:02 +00:00
orbiter
6260942590 changed search process: received indexes are now buffered and written to wordIndex after search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-13 13:57:15 +00:00
borg-0300
7ee03acce0 new function cutUrlText added to shortens the URLs on IndexMonitor.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-13 12:05:39 +00:00
orbiter
bc56a88cc8 further refactoring of search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-13 00:05:30 +00:00
orbiter
d29dfb0a12 refactoring of search / preparation for better search methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@921 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 12:28:49 +00:00
theli
0ae166c522 *) Small changes to Index Transfer.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 10:19:01 +00:00
theli
461374e175 *) Restricting amount of files that yacy is allowed to open during index transfer/distribution
This option is configurable via config file and is set per default to 800
   See: http://www.yacy-forum.de/viewtopic.php?p=11137#11137

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 09:38:40 +00:00
theli
c8a35a0130 *) Adding new connection tracking page (currently only for incoming connections)
*) Displaying statistic for incoming connections on status page
*) Bugfix for Loop-Access Bug when trying to access the yacy page while yacy is configured as proxy
   See: http://www.yacy-forum.de/viewtopic.php?p=6826
*) Bugfix for Referer Bug
   See: http://www.yacy-forum.de/viewtopic.php?p=11098#11098
*) Adding reverse Name lookup for yacy-domain names (used by the connection tracking page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 08:17:43 +00:00
orbiter
b80b2fbdcc crawling peers now produce waves in network graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-11 13:06:07 +00:00
orbiter
10d3627c90 changed word cache flush scheduling and removed possible locks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@910 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-11 07:06:33 +00:00
orbiter
839db8869c added high/low priority for index adding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@899 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 09:28:28 +00:00
theli
1688be8590 *) plasmaSwitchboard.java
adding more verbose logging output for db initialization
*) httpdFileHandler.java
   adding cache for servlet response methods


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@897 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 09:13:17 +00:00
orbiter
e9eb5e4b56 refactoring of index-entity join methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 00:45:18 +00:00
orbiter
258fd9eb8e adding missing file for websearch refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@894 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 00:33:25 +00:00
orbiter
77ae30063d refactoring of websearch process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 00:32:15 +00:00
orbiter
579b22d8ff small update to network drawing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 23:11:17 +00:00
orbiter
2b5829c3da small fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 19:29:25 +00:00
orbiter
4c7918f5b5 added shotdown to crawl stacker (moved from 882)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@889 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 16:40:44 +00:00
orbiter
2851658c2a re-integrated Martins last change to crawl stacker from svn 882 that I had deleted accidently
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@888 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 16:11:41 +00:00
orbiter
c83594528c integrated crawl stacker into thread control
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@887 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 15:59:09 +00:00
theli
959eefbc4f *) Robots.txt parser/ppt
cutting of comments at the line end
*) Adding Threadpool for stackCrawl Thread to speedup robots.txt download
   and double url checks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 04:43:07 +00:00
allo
f65c939a60 userDB Auth
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@874 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-07 13:49:07 +00:00
orbiter
1a5d98cd6d better imagePainter example and fix for typo http://www.yacy-forum.de/viewtopic.php?p=10920#10920
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@868 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-06 11:51:35 +00:00
orbiter
f6cf3967de fix for compile-bug in svn 583 (Martin guck mal ob das richtig ist: fifo oder filo-stack?)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@854 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 12:21:30 +00:00
theli
a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
various checks like the blacklist check or the robots.txt disallow check are now
   done by a separate thread to unburden the indexer thread(s)
   TODO: maybe we have to introduce a threadpool here if it turn out that this single
         thread is a bottleneck because of the time consuming robots.txt downloads

*) improved index transfer
   The index selection and transmission is done in parallel now to improve index 
   transfer performance.
   TODO: maybe we could speed up performance by unsing multiple transmission threads in 
         parallel instead of only a single one.

*) gzip encoded post requests
   it is now configureable if a gzip encoded post request should be send on
   intex transfer/distribution

*) storage Peer (very experimentell and not optimized yet)
   Now it's possible to send the result of the yacy indexer thread to a remote peer 
   istead of storing the indexed words locally. 
   This could be done by setting the property "storagePeerHash" in the yacy config file
   - Please note that if the index transfer fails, the index ist stored locally.
   - TODO: currently this index transfer is done by the indexer thread. 
     To seedup the indexer
     a) this transmission should be done in parallel and
     b) multiple chunks should be bundled and transfered together


*) general performance improvements  
   - better memory cleanup after http request processing has finished
   - replacing some string concatenations with stringBuffers
   - replacing BufferedInputStreams with serverByteBuffer
   - replacing vectors with arraylists wherever possible
   - replacing hashtables with hashmaps wherever possible
   This was done because function calls to verctor or hashtable functions
   take 3 time longer than calls to functions of arraylists or hashmaps.
   TODO: we should take a look on the class serverObject which is inherited from hashmap
         Do we realy need a synchronization for this class?
   TODO: replace arraylists with linkedLists if random access to the list elements is not needed

*) Robots Parser supports if-modified-since downloads now
   If the downloaded robots.txt file is older than 7 days the robots parser tries to
   download the robots.txt with the if-modified-since header to avoid unnecessary downloads
   if the file was not changed. Additionally the ETag header is used to detect changes.

*) Crawler: better handling of unsupported mimeTypes + FileExtension

*) Bugfix: plasmaWordIndexEntity was not closed correctly in 
   - query.java
   - plasmaswitchboard.java

*) function minimizeUrlDB added to yacy.java 
   this function tests the current urlHashDB for unused urls
   ATTENTION: please don't use this function at the moment because
              it causes the wordIndexDB to flush all words into the
              word directory!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 10:45:33 +00:00
orbiter
6d5d0ac801 bugfix for startup problems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 00:52:55 +00:00
orbiter
0c3a20d44f more + changed log for better understanding of outOfMemory bug and others
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@846 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-04 00:28:59 +00:00
theli
0fd9aa6c6e *) Bugfix: supportedFileExt Function didn't detect the file extension correctly because of missing conversion to lower case
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@837 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-03 10:48:41 +00:00
theli
8a33c9b309 *) Bugfix: supportedFileExt Function didn't detect the file extension correctly if there was a dot
in one of the parent directories of the file.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-03 10:21:13 +00:00
theli
28c5687ff9 *) Bugfix for "download of non supported file content" via crawler
See: http://www.yacy-forum.de/viewtopic.php?p=10724#10724

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@835 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-03 08:45:39 +00:00
theli
2b3f964037 *) Bugfix: supportedFileExt Function didn't chop http parameters before trying to detect the file extension
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-03 08:42:55 +00:00
allo
ff1d3d0680 Init of userDB
Pagelayout of User_p.html


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-30 13:48:26 +00:00
orbiter
9c4306e41e fixed problem with htcache path
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-29 00:24:09 +00:00
orbiter
1669eaaa1a fixed svn 805
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@807 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-28 14:47:57 +00:00
borg-0300
ca82d690a9 changed in SVN 805 one line too much
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-28 13:58:42 +00:00
borg-0300
4bb1f849a0 Bugfix for http://www.yacy-forum.de/viewtopic.php?t=1233
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@805 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-28 13:49:57 +00:00
orbiter
2c7b490e30 memory-logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-28 00:52:54 +00:00
orbiter
7fc822a59b changed handling of time-zones
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-27 16:28:55 +00:00
theli
9b7f37fc37 *) Minor changes
- more debugging output: storageTime for indexed document is logged now
   - saving memory in plasmaParserDocument.java, plasmaWordIndexEntryContainer.java (not a big deal)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-27 07:10:24 +00:00
theli
b5a8992d29 *) Setting some object fields to final
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-26 09:39:54 +00:00
theli
023be89586 *) Bugfix for "Robots.txt wird immer wieder geladen"
See: http://www.yacy-forum.de/viewtopic.php?p=10241#10233

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-26 08:05:59 +00:00
theli
35c6c5ead7 *) Bugfix for "Blacklist und Crawlen" Bug.
: Crawling continues even if URL is listed in Blacklist
   See: http://www.yacy-forum.de/viewtopic.php?p=10279#10279
   - missing return statement added. Thanks to allo for the
     code review.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@793 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-26 06:51:11 +00:00
orbiter
9e2fc7e5fe load balancing of crawl target domains
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@791 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-25 01:09:21 +00:00
orbiter
3fcc95a82c integrated crawl-profiles db in memory-performance monitor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@788 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-24 00:33:27 +00:00
theli
fe6a6abc0b *) Adding robots.txt db to Performance Settings for Memory menue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-23 01:31:29 +00:00
orbiter
3274ae725e increased cache size of robots database; however, this should be integrated into new memory control
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-23 00:37:31 +00:00
orbiter
c6d2f50375 changed order of robots and double-check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@783 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-23 00:18:08 +00:00
orbiter
68d5ff2ef1 added stringbuffer in condenser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@782 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 23:43:45 +00:00
orbiter
495bc8bec6 removed cache-control from low and medium priority caches which reduces memory use and computation overhead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 20:01:26 +00:00
orbiter
18d9e1a256 fix for http://www.yacy-forum.de/viewtopic.php?p=10026#10026
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@768 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-21 21:56:39 +00:00
orbiter
07f30931ec various configuration options in memory performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@763 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-21 14:21:45 +00:00
theli
b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
*) Replacing PDFBox 0.7.1 lib with newer version 0.7.2
*) Refactoring of classes httpd/httpc/httpHeaders to
   make many methods for httpHeader/Requestline parsing
   reusable for new icap implementation
*) adding chunked input stream support
   - needed by new icap implementation
   - needed by future httpc HTTP/1.1 support 
*) httpd.java
   - moving all connection property contants to class httpHeader
   - moving readHeader function to class httpHeader
   - moving parseQuery function to class httpHeader
   - moving handleTransparentProxy function to class httpHeader
*) httpHeader.java
   - adding new fuction to parse the http response line
   - adding new function to converte http headers to a string that
     can be send to the client
   - adding a function that generates a proper url using all parsed
     connection properties
*) ICAP Support
   - yacy now supports handling of icap response modification requests
   - this feature can be used by other icap enabled proxies to contact 
     yacy as icap server, and to handover the downloaded content to yacy.logging
     for indexing
   - functionality was successfully tested with squid 2.5Stable 10 + icap patch
   - further icap services e.g. URL filtering based on yacy's blacklists are possible
*) plasmaSwitchboard.java
   - htcache entries that are still needed for indexing are now properly registered 
     as in use after system restart
   - extended logging: log message now shows parsing and indexing time for each sb. entry
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 21:49:47 +00:00
borg-0300
6d1de8abfd finals; cleaned;
Properties;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 15:43:31 +00:00
orbiter
14bc880fa4 fixed bug with crashed profile database
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@753 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 11:20:29 +00:00
orbiter
71a31f0902 integrated and extended new memory performance menu; found and fixed bug in DHT caching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@752 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 10:54:20 +00:00
orbiter
fb52a82008 added new performance page for memory settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@751 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 10:10:34 +00:00
orbiter
cddd9aaa33 fixed SERIOUS bug with kelondroStack; affected all stack processing since 729
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-15 22:17:51 +00:00
orbiter
416c126815 fix for a profile = null problem and new monitor in crawl queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@730 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-15 21:39:37 +00:00
orbiter
2148c0cf49 replaced kelondro storage core; much less objects in kelondro cache now; less IO from DB
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@724 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-14 10:10:49 +00:00
theli
beefddf0e8 *) Adding option which allows to do a Index-Transfer without deletion of index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@722 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-14 07:14:24 +00:00
rramthun
4036ee812a Updated german language file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@721 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-13 16:29:59 +00:00
theli
40925f4fb7 *) Improving complete index transfer performance by automatically increasing size of transfered word chunk
for fast connections (much similar to normal dht behavior) 
   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@719 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-13 10:29:04 +00:00
theli
91ab4d044b *) Adding automatic retry functionality to complete index transfer function
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@718 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-13 08:32:24 +00:00
theli
a62677f761 *) Adding additional logging output for complete index transfer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@717 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-13 06:44:38 +00:00
theli
b991d2e7dd *) Additional logging message for complete index transfer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@712 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-12 12:02:45 +00:00
theli
3c00c5f6c7 *) Complete Index Transfer
See: http://www.yacy-forum.de/viewtopic.php?p=9622

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@711 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-12 11:39:32 +00:00
theli
2cb084d426 *) Complete Index Transfer
See: http://www.yacy-forum.de/viewtopic.php?p=9622

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@707 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-12 10:37:16 +00:00
theli
d1de71e9f6 *) Suppress stacktrace on proxy error for "No route to host Exception"
See: http://www.yacy-forum.de/viewtopic.php?t=1153

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-11 20:21:38 +00:00
theli
56160cbd01 *) Bugfix for "YaCy verzählt sich ..." Bug.
See: http://www.yacy-forum.de/viewtopic.php?p=9559

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-11 05:26:01 +00:00
orbiter
43b42854a0 fix for null-entries and http://www.yacy-forum.de/viewtopic.php?p=8649
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@699 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-11 03:54:52 +00:00
theli
3587407039 *) Fixing problems of list operation if index and queue size are both 0.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@687 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-08 22:27:48 +00:00
theli
51b48a10e8 *) Suppress stacktrace on proxy error for "ValidatorException: No trusted certificate found"
See: http://www.yacy-forum.de/viewtopic.php?t=1110

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@686 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-08 20:37:01 +00:00
theli
7fe8784231 *) URLs pointing to a server having a private ip addess will not be indexed anymore
See: http://www.yacy-forum.de/viewtopic.php?p=9408

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 21:38:03 +00:00
theli
0aafb83edc *) Bugfix for robots.txt isDisallowed Check.
Setting path to "/" if it is null or empty.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 13:18:34 +00:00
borg-0300
8260128ee9 changed getFreeSize();
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 11:22:41 +00:00
theli
f8ad65eae1 *) First trial implementation of robots.txt support
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 11:17:21 +00:00
borg-0300
0a57fbcde5 Added new HashSet filesInUse;
Added new Function getFreeSize();

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 09:37:00 +00:00
borg-0300
8cd6a52dd0 Convention
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 07:26:19 +00:00
borg-0300
c0e3d18bbf *) remove import java.lang
*) Added Super()
*) replaced startsWith()
*) cleaned


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-06 16:58:12 +00:00
borg-0300
b1cd1fa917 cleaned
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-06 14:56:19 +00:00
borg-0300
da9c6857fb *) changed a misunderstand, no BUG ;)
*) finals and other

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-06 14:17:53 +00:00
borg-0300
fbac053c03 small change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 11:23:48 +00:00
theli
578f36ae18 *) Speedup of indexer. Proxy files will not be enqueued by the cachemanager
into the sb-queue anymore if the mimeType or fileExtension is not supported
   by the installed parsers.
- Advantage: Avoiding unnecessary enqueueing and dequeueing from queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 11:17:37 +00:00
theli
1219ef99f0 *) Bugfix for NullpointerException in yacyDebugMode Init
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@663 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 10:51:15 +00:00
theli
6c722706b7 *) Moving yacyDebugMode intialization to switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 10:34:34 +00:00
theli
4e07828807 *) httpdProxyHandler.java
- harmonizing proxy exception handling
- adding malformed URL + blacklist check for http head method
- adding malformed URL check to http post method
- chunked encoding is now not used anymore for http post if clients
  are http/0.9 or http/1.0 clients (same behaviour as already implemented for get)
- now an exception will be thrown on internal httpc errors to force an error output
  to the client or a connection close. This should help to fix the "binary data in browser window" bug

*) plasmaSwitchboard.java
- fixing the following Bug
  E 2005/09/03 18:02:42 PLASMA Could not index URL http://mis04.de/FAIL/snot.php: null
  java.lang.NullPointerException
	at de.anomic.plasma.plasmaSwitchboard.processResourceStack(plasmaSwitchboard.java:1000)
	at de.anomic.plasma.plasmaSwitchboard.deQueue(plasmaSwitchboard.java:625)
	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:585)
	at de.anomic.server.serverInstantThread.job(serverInstantThread.java:95)
	at de.anomic.server.serverAbstractThread.run(serverAbstractThread.java:243)
  This bug could occure if the cached responseHeader is null
- getting the mimeType now from the parsed document instead of the responseHeader because the 
  mimeType could have been changed during content parsing (e.g. because of the mimetypeParser)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@656 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 10:10:00 +00:00
borg-0300
81cb8feb15 back to 649 :/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@651 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-04 22:03:44 +00:00
borg-0300
5194511e8e *) attempt to find bug
See: http://www.yacy-forum.de/viewtopic.php?t=1121

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@650 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-04 19:08:51 +00:00
theli
6991b9e2b9 *) Suppress stacktrace on crawler error for "Connection reset"
See: http://www.yacy-forum.de/viewtopic.php?p=9071

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@645 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-03 15:17:19 +00:00
theli
a47f9238fe *) Blacklist is now also used by the crawler
See: http://www.yacy-forum.de/viewtopic.php?t=1069

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@642 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 12:09:45 +00:00
theli
dc0a2d4c11 *) Bugfix for Loader Queue:
Job count was not displayed correctly
*) IndexingQueue:
- now it's possible to delete single entries from the queue
- now it's possible to clear the whole queue
  See: http://www.yacy-forum.de/viewtopic.php?t=995

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@641 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 11:40:40 +00:00
theli
732a107160 *) Bugfix for "-UNRESOLVED_PATTERN-" Bug on IndexCreateWWWLocalQueue_p.html and "urlEntry.url() == null" Bug
- Logging message for "urlEntry.url() == null" is now displayed as info
   - IndexCreateWWWLocalQueue_p.html now detects null entries while looping throug the list and removes them automatically
   See: 
   - http://www.yacy-forum.de/viewtopic.php?t=532#8781
   - http://www.yacy-forum.de/viewtopic.php?t=639
   - http://www.yacy-forum.de/viewtopic.php?t=1071
   - http://www.yacy-forum.de/viewtopic.php?t=338
   - http://www.yacy-forum.de/viewtopic.php?t=980

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 09:33:05 +00:00
theli
33aaffbfc6 *) Displaying content size of each entry in indexing queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@639 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 08:22:11 +00:00
borg-0300
7626823519 BUGFIX for last 'commit'
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 23:43:27 +00:00
borg-0300
971756e8dd the delete size is smaller
See: http://www.yacy-forum.de/viewtopic.php?t=1084

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 23:35:00 +00:00
theli
0471019606 *) IndexCreateIndexingQueue_p.html now also shows indexing jobs that are currently in process
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@633 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 22:05:20 +00:00
borg-0300
cc493ef8c1 Added change from Hermes
See: http://www.yacy-forum.de/viewtopic.php?t=1050

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@629 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 11:18:41 +00:00
theli
bead8a32aa *) IndexCreate_p.java:
Crawler StartURLs will now also added to the errorURL-DB if an error occures on this url
*) kelondroStack.java, plasmaSwitchboardQueue.java
   Adding method which returns a list of all entries in the queue. This list is used by IndexCreate_p.java 
   instead of an iterator to display the indexing-list. 
   Advantages: avoid concurrent modifications of the list while displaying it. 
               Speedup because now we have to access only one sync function instead of multiple ones 
               (one for each entry)
*) IndexCreateIndexingQueue_p.java
   Using new list() function of plasmaSwitchboardQueue
*) httpdFileHandler.java
   If a servelet returns the special value "LOCATION" the httpFileHandler does a Redirection of 
   the Browser to the URL specified by the servelet. This can e.g. be used when a http get request is
   used insead of a post request, but a refresh should not be allowed.
*) IndexCreateWWWLocalQueue_p.html
   Now it's possible to delete single entries of the local crawler queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@626 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 07:52:46 +00:00
theli
48aaf703cc *) Adding additional logging output to detect crawling problems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@625 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 06:55:21 +00:00
theli
59b8a98c7e *) Bugfix for suppressing of stacktrace in log on crawler error "MalformedURLException"
See: http://www.yacy-forum.de/viewtopic.php?p=8840

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@623 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 06:31:30 +00:00
borg-0300
c1d7527929 better cache cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-31 13:07:08 +00:00
theli
2e6df95786 *) adding toString method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-31 10:43:03 +00:00
theli
4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logSevere instead of logFailure and logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:32:59 +00:00
theli
6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logFailure instead of logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:10:39 +00:00
theli
f19c09b227 *) Suppress stacktrace on crawler error for "MalformedURLException"
See: http://www.yacy-forum.de/viewtopic.php?p=8733#8733

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 20:25:07 +00:00
theli
cc1df08069 *) Adding missing synchronized blocks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@608 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 14:57:32 +00:00
borg-0300
bf14e6def5 *) proxyCache, proxyCacheSize can be changed under 'Proxy Indexing'
- path now are absolute
*) move path check from plasmaHTCache to plasmaSwitchboard
   - only one path check when starting
*) small other

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@606 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 12:50:30 +00:00
theli
9b818b1ce3 *) Pausing Crawlers if there is not enough space on disk
See: http://www.yacy-forum.de/viewtopic.php?p=8648

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 09:43:27 +00:00
theli
b33094e925 *) Trying to solve "Too many open files bug"
*) Temp.Bugfix for "Bug in Index Restore"
   See: http://www.yacy-forum.de/viewtopic.php?p=8647#8647
   Orbiter: Please take a look



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 09:07:42 +00:00
theli
34790acf02 *) Bugfix for suppressing of stacktrace in log on crawler error "unknown host"
See: http://www.yacy-forum.de/viewtopic.php?p=8615#8615

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 06:24:23 +00:00
theli
af7b8f75bd *) Making proxyAccessLogging configureable via yacy.logging file
- logging can be disabled now
   - logging directory / filelimit / rotation count can be configured now
   See: http://www.yacy-forum.de/viewtopic.php?t=965&postdays=0&postorder=asc&start=30#8280

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@595 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-29 11:31:58 +00:00
theli
2a081c9ee5 *) Adding additional logging message for "NURL.entry() == null" Bug
See: http://www.yacy-forum.de/viewtopic.php?p=8446

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-28 05:39:26 +00:00
theli
cb1f11c96b *) Suppress stacktrace on crawler error for "Unknown Host"
See: http://www.yacy-forum.de/viewtopic.php?p=8431

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@590 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-28 05:08:26 +00:00
theli
e338a13de3 *) Suppress stacktrace on crawler error for "Read timed out"
See: http://www.yacy-forum.de/viewtopic.php?p=8433

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@589 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-27 18:48:02 +00:00
theli
2e43e744de *) Suppress stacktrace on crawler error for "connect timed out"
See: http://www.yacy-forum.de/viewtopic.php?p=8420 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@588 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-27 04:53:25 +00:00
theli
36cbe04e3e *) Bugfix for Crawler Redirection Bug
See: http://www.yacy-forum.de/viewtopic.php?p=8422

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@587 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-27 04:36:13 +00:00
theli
b70de495a0 *) Remembering Crawler-isPaused setting
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-25 09:51:24 +00:00
theli
e569a84dc0 *) Using the same configuration settings for all indexing threads on server Startup
See: http://www.yacy-forum.de/viewtopic.php?p=8349

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-24 09:34:04 +00:00
theli
17be77a468 *) Bugfix for "Crawler data will not be removed from htcache if content parsing failed"
See: http://www.yacy-forum.de/viewtopic.php?t=965&highlight=ramdisk
*) Making ACCEPT_LANGUAGE configureable for crawler
   See: http://www.yacy-forum.de/viewtopic.php?p=8327

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-24 07:47:42 +00:00
theli
5f55dff297 *) Bugfix for "Binäre Nullen auf der page: Index Creation: Indexing Queue"
See: http://www.yacy-forum.de/viewtopic.php?p=6877#6877

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@577 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-23 08:37:42 +00:00
allo
eb6365c069 local Bootstrapping bug.
use yacyDebugMode=true to allow local bootstrapping


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@572 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-22 12:13:19 +00:00
theli
330eae7cf3 *) Normalizing CrawlerStartURL now before crawling is started
*) CrawlWorker also does a URL normalization now before following the redirection URL
*) CrawlWorker removes redirection URL correctly from noticeURL stack now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@571 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-21 22:52:46 +00:00
theli
ab894d26bc *) Bugfix for "plasmaSwitchboard.deQueue: null" Bug (hopefully)
See: http://www.yacy-forum.de/viewtopic.php?p=8135#8135

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@570 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-21 22:48:37 +00:00
theli
eaf9f26cc3 *) Bugfix for NULL PROFILE HANDLE 'null' Bug:
See: http://www.yacy-forum.de/viewtopic.php?p=7855#7855

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@569 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-21 22:16:19 +00:00
rramthun
4cb382decb Adding changes by borg-0300 from http://www.yacy-forum.de/viewtopic.php?t=997
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@565 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-20 17:05:01 +00:00
theli
ec4c70d722 *) If there are at most 10 entries left while doing an index transfer, these entries will also be appended
to the index list
   |> D 2005/08/18 10:00:02 PLASMA Selected partial index (33 from 37 URLs, 0 not bound) for word fSuQM0xAJK1G
   See: http://www.yacy-forum.de/viewtopic.php?t=970

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@556 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-18 10:04:45 +00:00
theli
d4a045d7b1 *) Trying to solve "de.anomic.plasma.plasmaSwitchboard.deQueue': null" Bug
See: http://www.yacy-forum.de/viewtopic.php?p=7791

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-18 06:40:26 +00:00
theli
ea9a992f05 *) Before the crawler retries to download a URL it checks if the server is already doing a shutdown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@554 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-17 11:36:48 +00:00
theli
ea26b84eed *) Bugfix for http://www.yacy-forum.de/viewtopic.php?t=954
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@553 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-17 10:12:52 +00:00
theli
0c8a48e2cb *) converting php Session ID to lower case in funktion isCGI
See: http://www.yacy-forum.de/viewtopic.php?p=7671#7671

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@552 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-17 05:50:18 +00:00
orbiter
e616395c3b latest changes and cut for 0.40
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-16 15:15:19 +00:00
orbiter
c47bb1182d bugfix for assortment initialization error
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-16 11:43:14 +00:00
theli
4654eae4e2 *) adding php Session ID to argument in funktion isCGI
See: http://www.yacy-forum.de/viewtopic.php?p=7671#7671

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@546 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-16 11:33:31 +00:00
orbiter
25f632dbd9 more DHT bugfixes and better logging of DHT effects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-16 00:31:15 +00:00
orbiter
5cb00889d9 enhancements to dht selection, search and search presentation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-15 01:12:25 +00:00
orbiter
ba0a486328 moved printStackTrace() to logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-14 23:35:18 +00:00
orbiter
3094045d34 fix for http://www.yacy-forum.de/viewtopic.php?p=7454#7454
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@536 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-14 16:16:11 +00:00
orbiter
cd10370992 several bugfixes and dht selection / logging improvement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-14 00:57:30 +00:00
orbiter
3610fe6b3a see http://www.yacy-forum.de/viewtopic.php?p=7410#7410
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@530 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-13 22:04:18 +00:00
orbiter
c8a7a85ce2 fix for http://www.yacy-forum.de/viewtopic.php?p=7384#7384
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@529 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-13 21:41:22 +00:00
orbiter
6594541ef5 fix for http://www.yacy-forum.de/viewtopic.php?p=7361#7361
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-12 22:53:49 +00:00
orbiter
7db543a9fa fixes for several dht misbehaviours
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-12 22:14:24 +00:00
orbiter
5716f8521d bug fixes for word ordering and dht index selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@521 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-12 14:06:47 +00:00
orbiter
f5259f29e8 word cache behaviour fix and other fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-11 23:33:19 +00:00
orbiter
2c234e1b82 better log output for search result
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-10 09:52:50 +00:00
theli
89c9faa89e *) More graceful logging output in crawler
See: http://www.yacy-forum.de/viewtopic.php?t=894

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-10 06:15:47 +00:00
orbiter
248c24b60a intermission-feature usage in case of local and remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-09 20:43:37 +00:00
theli
b32e7c516c git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@507 6c8d7289-2bf4-0310-a012-ef5d649a1542 2005-08-09 09:07:19 +00:00
theli
86305f051d *) Trying to solve "java.net.BindException: Address already in use: JVM_Bind" Problem
by retrying Socket bind
   See: http://www.yacy-forum.de/viewtopic.php?p=6935

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-06 14:38:58 +00:00
theli
865b9490a2 *) Making DHT Transfer while Crawling configurable
See: http://www.yacy-forum.de/viewtopic.php?p=6904

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-06 11:40:12 +00:00
theli
1d83d7e4d7 *) httpdFileHandler.java:
no stacktrace will be printed into log file for "Connection timed out" Errors now
   See: http://www.yacy-forum.de/viewtopic.php?p=6381

*) plasmaCrawlWorker.java:
   If a "Read timed out" error occurs while crawling a site, the failed crawl will be
   retried.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-04 11:05:04 +00:00
orbiter
2d8557cb10 minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-03 02:02:39 +00:00
orbiter
91163db52e fix for more time-related problems in proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@486 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-03 00:52:32 +00:00
orbiter
fb6f238d70 fix for expires-problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-03 00:28:12 +00:00
rramthun
eacff63eda Typos...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 16:09:19 +00:00
orbiter
40da910f41 bugfixes and automatic news-cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@481 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 16:03:35 +00:00
theli
228b04b499 *) Bugfix for "wrong seed-upload timestamp" problem
http://www.yacy-forum.de/viewtopic.php?t=817

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 15:36:10 +00:00
theli
470839a16a *) Crawler/Session pool settings will now be stored properly into configfile
Bugfix for:
- http://www.yacy-forum.de/viewtopic.php?t=502
- http://www.yacy-forum.de/viewtopic.php?t=778

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 12:20:03 +00:00
orbiter
4377e119f3 bugfix for http://www.yacy-forum.de/viewtopic.php?p=6620#6620
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 10:41:59 +00:00
orbiter
e84a177c49 many bigfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@475 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 02:18:01 +00:00
orbiter
7e3e9ba0de fix for http://www.yacy-forum.de/viewtopic.php?p=6563#6563
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@472 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-01 20:36:27 +00:00
orbiter
1022fbeb65 many YaCyNews fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-31 01:54:46 +00:00
orbiter
13abd8b6e7 added news-creation at crawl start
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-30 11:57:19 +00:00
rramthun
f555b9d5f2 Translation, spelling...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-30 11:11:32 +00:00
orbiter
cdbbfd50fb fixed bad remote crawl behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-29 06:50:36 +00:00
orbiter
36707586c7 filtering of jsessionid
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@447 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-27 23:20:17 +00:00
rramthun
6f2f54a312 Translation, spelling...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@444 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-27 20:29:35 +00:00
orbiter
81e564edb8 faster crawl profile list cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-27 14:16:47 +00:00
orbiter
ad90f0ad13 activated RWI distribution to DHT for senior peers (default redundancy 3), necessary now for network growth
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-27 12:51:00 +00:00
orbiter
b9d18d40cb configuration of proxy idle time in performance menue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-26 15:17:29 +00:00
orbiter
3470a72d48 fixed div by zero, set default delays, fixed release number format and display
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@435 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-26 11:47:50 +00:00
orbiter
be1f324fca performance setting for remote indexing configuration and latest changes for 0.39
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@424 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-22 13:56:19 +00:00
orbiter
c64970fa47 re-implemented proxy-busy-check and fixed some other things
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@421 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-21 11:17:04 +00:00
orbiter
b73557ed2d better assortment monitoring and enhanced profile menue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-20 13:03:41 +00:00
orbiter
1f36bf4dae enhanced assortment capacity; added extended WORDS migration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-20 00:39:06 +00:00
rramthun
0f11399d16 Some corrections...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-19 16:07:13 +00:00
orbiter
9f505af7aa preparations for bulk remote crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@408 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-19 00:26:31 +00:00
orbiter
9c72b4cdec replaced index dump stack by an dump array and limited url number in assortment ram (prevents too much RAM occupation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-18 13:32:44 +00:00
orbiter
51962d55bf added 'PPM', page-per-minute statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-18 00:44:51 +00:00
orbiter
159f795f65 bugfix (null pointer exception in assortments)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-17 22:25:50 +00:00
orbiter
1d2155675b changed assortment memory cache flush
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@403 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-17 21:22:18 +00:00
orbiter
19dbed7cc8 code clean-up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-12 15:09:35 +00:00
orbiter
00f63ea00d fail-save patch for pattern matching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@400 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-12 00:31:35 +00:00
orbiter
0a6be961ea added pattern organization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-12 00:21:46 +00:00
orbiter
40036ba69c fixed dht transmission; added url-blacklist blocking also for remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-12 00:07:09 +00:00
orbiter
311e627363 blocking of blacklisted urls in indexReceive and small changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-11 15:36:10 +00:00
orbiter
2f0d7ea8d3 removed htcache stati (superfluous now)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@396 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-09 00:33:34 +00:00
orbiter
277048501e bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@395 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-08 16:24:07 +00:00
orbiter
8b89c46afe fixed problem with cache write
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@394 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-08 16:06:35 +00:00
orbiter
455ae9f55f fixed htcache-store problem and due-time for remote crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-08 15:17:50 +00:00
theli
55d10b864c *) further improvements in shutdown behaviour
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-07 22:03:52 +00:00
orbiter
419f8fb398 fixed bugs/missing code regarding new crawl stack
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-07 01:38:49 +00:00
orbiter
112c5d3332 the new file-based indexing queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-06 14:50:01 +00:00
orbiter
858cd94299 replaced indexing ram-queue by file-based stack-queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-06 14:48:41 +00:00
theli
57c30f1d78 *) bugfix for usage of httpc without gzip content encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-04 11:25:25 +00:00
theli
0e2c33ee55 *) Network.html/Network.java:
- Adding function to manually force peer ping to remote yacy peer
  See:Network.html?page=4
- for debugging purpose only!

*) serverAbstractThread.java:
- Adding posibility to notify a server thread via a synchronization object
- this is needed e.g. by the port forwarding feature to send a notification
  to the peerPing thread to redo peer-ping with the new ip/port Settings_p.html

*) Port Forwarding Feature (it should work now)
- adding a serverThread which is responsible to detect broken port forwarding 
  connections and to do reconnect if needed
- serverCore.java: moving port forwarding initialization into a separate function
- adding positility to configure the ssh port 
- moving configuration section on the gui into a separate fieldset
- hello.java: only trying to do a second connect to the clientIp address during
  peer handshake if either remote port forwarding is not enabled locally or
  the clientIP is not equal to any local ip

*) httpdFileHandler.java:
- printout a more verbose errormessage

*) httpc.java
- allowing to deactivate content encoding from outside


 

*) plasmaCrawlWorker.java
- the crawler worker now tries to refetch the content of a website without
  gzip content encoding if a gzip error occured



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-04 11:09:48 +00:00
orbiter
5159a090b0 fixed parser bug with lowercase force (appeared in: http://spellbound.sourceforge.net/)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-03 23:33:25 +00:00
orbiter
7f7cbc5019 fixed bug with snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@365 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-03 13:37:20 +00:00
orbiter
eb74fa0c82 fixed a bug with snippet-length
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@359 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-01 23:35:36 +00:00
orbiter
86f2aa8478 fixed seed-load date bug (evaluating server date for age computation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-30 23:19:08 +00:00
orbiter
664bceced5 removed debug-lines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@351 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-30 18:56:01 +00:00
orbiter
75ebdbc852 enhanced snippet-generation (case where snippet is too long)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-30 18:54:00 +00:00
orbiter
8a4f297324 fixed/enhanced snippet error-handling; suppression of results where no snippet exists
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-30 00:01:53 +00:00
orbiter
712fe9ef18 bugfixed utf-8 decoding and parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-29 22:55:37 +00:00
theli
eee6322aaf *) Adding redirection support to plasmaCrawlWorker.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@328 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-28 08:07:41 +00:00
theli
cd279907c0 *) Adding redirection support to plasmaCrawlWorker.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-28 08:01:26 +00:00
theli
6697d5e52e *) correcting fkt. mediaExtContains
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@326 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-28 06:44:31 +00:00
orbiter
3addf58046 enhanced snippet-loading with threads
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-24 07:41:07 +00:00
orbiter
56d28a16f0 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 14:40:39 +00:00
orbiter
d6c85228a6 enhanced snippet computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@319 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 12:12:12 +00:00
theli
fafda068f9 *) allowing crawler to process resources with statuscode 203
- this is needed if yacy is behind a second proxy 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@316 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 10:00:31 +00:00
theli
aae9a433a6 *) correcting usage of supportedFileExt-List
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 07:43:59 +00:00
orbiter
1e7f062350 many bugfixes, memory leak fixes, performance enhancements; new kelondroHashtable; activated snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 02:07:45 +00:00
orbiter
68dc2b0c6b added kelondroArray, the basis for upcoming kelondroHash and some bug fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@311 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-21 01:17:25 +00:00
orbiter
a19541e563 code-enhancements after analysis with AppPerfect
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@307 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-20 16:36:31 +00:00
orbiter
85075269a6 extended fail-safe memory-managament. prevents too much allocation, too often GC and should help for the 100%CPU-bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@303 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-20 00:46:23 +00:00
orbiter
e3c92818db avoiding OutOfMemoryError routines
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@302 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-19 13:37:17 +00:00
orbiter
3e8ee5a46d enhanced caching in kelondroRecords and added better synchronization/finalizer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@301 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-19 05:27:42 +00:00
theli
db3ed75728 *) closing stream correctly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@293 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-17 07:58:02 +00:00
orbiter
5d06ded005 enhanced html parser speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-17 01:26:51 +00:00
orbiter
5a490aa065 fixed html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@289 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 21:49:56 +00:00
orbiter
a25b5b4986 fixed possible memory leak in htmlScraper: be aware that now links can get lost; further work necessary
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 18:31:28 +00:00
theli
9e47ba5ad6 *) adding missing calls for function close() to avoid "too many open file" bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@282 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 08:34:52 +00:00
theli
9a98988c3c *) Bugfix for SSL/NIO Bug
See: http://www.yacy-forum.de/viewtopic.php?t=516
   - removing NIO from server/serverCore.java because of massive problems
     with socket close issues
*) Adding support for remote port forwarding via sch
   @Orbiter: Please take a look into
   - hello.java
   - server/serverCore.java.publicIP()
   - yacy/yacyClient.java.publishMySeed(...)
*) Making startup loading of additional content parsers more failsafe


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@281 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 07:28:07 +00:00
orbiter
a1ffc27041 preparations for image/movie/music indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@280 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 00:31:13 +00:00
orbiter
a5b40923b6 added word migration to assortments (start with 'java -classpath classes yacy -migratewords')
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@278 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-15 01:22:07 +00:00
theli
890e3f4d4a *) adding missing calls for function close() to avoid "too many open file" bug*) adding
*) bugfix in plasma/plasmaParser.java:
   - parsers with missing dependencies wehre not ignored correctly
*) passing a logger instance to the parsers modules which can be used 
   for logging purposes by the parsers (not done yet)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-13 13:49:17 +00:00
theli
6dd3ec0dc4 *) Adding debug="true" debuglevel="lines,vars,source" to ant build files
See: http://www.yacy-forum.de/viewtopic.php?p=4099


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@270 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-12 05:22:06 +00:00
orbiter
4f9c30ef49 using mime-type instead of file extension for doctype
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@269 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-10 12:34:18 +00:00
theli
ee9e110366 *) removing old logging configuration properties from yacy.init
*) serverLog.java logging functions now also accept exceptions als
   additional parameters.
   The Stacktrace of this ecceptions will then be appended to the 
   logging message and can e.g. be viewed on the gui logging page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-10 09:19:24 +00:00
theli
c1a4e0dc28 *) changing reference to logger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@252 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 10:44:55 +00:00
theli
d0083f845f *) changing reference to logger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 10:39:09 +00:00
theli
1b5ae054f8 *) changing reference to logger
*) parser will not be returned into pool if the parser was deactivated
   via gui

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@250 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 10:38:00 +00:00
theli
68f30811fa *) changing reference to logger
*) bugfix in function getCachePath

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@249 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 10:36:39 +00:00
theli
fbbea813c5 *) changing references to logger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 10:34:20 +00:00
orbiter
4574fa4ce7 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@224 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-08 15:28:29 +00:00
theli
83b41ef2f7 *) Adding timeouts for shutdown
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@223 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-08 13:44:25 +00:00
theli
ef6851798b *) changing thread priority while parsing a pdf file to avoid 100% CPU usage.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-08 13:23:35 +00:00