Commit Graph

65 Commits

Author SHA1 Message Date
allo
ae6a4650bc reenabling debugMode (60 Seconds timeout for *all* http connections)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1165 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 11:14:11 +00:00
orbiter
1d6a6d1f85 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 00:17:12 +00:00
orbiter
a04930f025 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-04 23:51:28 +00:00
orbiter
79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-10 23:48:20 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
orbiter
c86d801b0f removed dyndns domains from dns caching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-06 22:12:08 +00:00
theli
b8ceb1ffde *) Adding better https support for crawler
- solving problems with unkown certificates by implementing a dummy trust Manager
   - adding https support to robots-parser 
   - Seed File can now be downloaded from https resources
   - adapting plasmaHTCache.java to support https URLs properly

*) URL Normalization
   - sub URLs are now normalized properly during indexing
   - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function
   - normalizing URLs which were received by a crawlOrder request

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 15:28:37 +00:00
hydrox
56b9f34411 *)removed unused imports
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 16:30:45 +00:00
theli
ec3af327f7 *) Bugfix for Proxy-Authentication against remote proxy
See: http://www.yacy-forum.de/viewtopic.php?p=11804#11804

*) Adding first version of db test for mysql
   NOTES:
   - db user + db + db table must be created before starting the test
   - db table must be empty. Entries can not be updated at the moment
   - db connection properties must be changed in the sourcecode at the moment
   TODOs:
   - accepting connection properties via command line
   - implementing update + remove + read operations
   - 'maybe' adding code to create db + table if it doesn't exists

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@991 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-27 11:28:37 +00:00
theli
525c8dcbd4 *) Adding Traffic Statistic for Crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@972 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 12:35:48 +00:00
theli
9a5ab62928 *) Adding yacy specific X-YACY-Index-Control header which can be used by clients
to disallow yacy to index the response that belongs to the request where 
   X-YACY-Index-Contro is set to "no-index"

*) Bugfix for Seed-List download via Remote Proxy.
   Now the pragma and cache-control http headers of the request are properly set to "no-cache" 
   See: http://www.yacy-forum.de/viewtopic.php?p=11639#11639

*) Bugfix for http-Proxy
   yacy has ignored "no-cache"- pragma and cache-control http headers that were send in requests.
   Now, these request headers are evaluated properly

TODO: Missing evaluation of "no-store" request headers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 10:35:05 +00:00
theli
02d9af1a70 *) Restructuring and extending of Remote Proxy Support
- remote proxy configuration can now be "really" changed on the fly and takes effect immediately
   - adding possibility to disable remote proxy usage for yacy->yacy communication
   - adding possibility to disable remote proxy usage for ssl
   - restructuring proxy configuration so that it is stored in a single place now

*) Adding possibility to import a foreign word DB (or even more of them in parallel) 
   at runtime into the peers DB
   - this can be done by calling IndexImport_p.html 
   - ATTENTION: please not that at the moment this thread must be aborted via gui
     before a normal server shutdown is done. 
   - TODO: integrating IndexImport Thread into normal server shutdown
   - TODO: Adding posibility to import crawl-queues, etc. from foreign peers
   - TODO: removing old import function from yacy.java and calling the new routines instead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-22 13:28:04 +00:00
theli
6e3201b74d *) Bugfix in httpc.java
- Requestheader was not passed to the underlying post function properly
   - Bug seems not to have caused any side-effect until yet

*) Bugfix for manual peer ping functionality

*) Bugfix for UnresolvedPattern Problem if an Exception occurred in a servlet.
   See: http://www.yacy-forum.de/viewtopic.php?t=1353

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-20 09:55:12 +00:00
theli
959eefbc4f *) Robots.txt parser/ppt
cutting of comments at the line end
*) Adding Threadpool for stackCrawl Thread to speedup robots.txt download
   and double url checks

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@882 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 04:43:07 +00:00
theli
a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
various checks like the blacklist check or the robots.txt disallow check are now
   done by a separate thread to unburden the indexer thread(s)
   TODO: maybe we have to introduce a threadpool here if it turn out that this single
         thread is a bottleneck because of the time consuming robots.txt downloads

*) improved index transfer
   The index selection and transmission is done in parallel now to improve index 
   transfer performance.
   TODO: maybe we could speed up performance by unsing multiple transmission threads in 
         parallel instead of only a single one.

*) gzip encoded post requests
   it is now configureable if a gzip encoded post request should be send on
   intex transfer/distribution

*) storage Peer (very experimentell and not optimized yet)
   Now it's possible to send the result of the yacy indexer thread to a remote peer 
   istead of storing the indexed words locally. 
   This could be done by setting the property "storagePeerHash" in the yacy config file
   - Please note that if the index transfer fails, the index ist stored locally.
   - TODO: currently this index transfer is done by the indexer thread. 
     To seedup the indexer
     a) this transmission should be done in parallel and
     b) multiple chunks should be bundled and transfered together


*) general performance improvements  
   - better memory cleanup after http request processing has finished
   - replacing some string concatenations with stringBuffers
   - replacing BufferedInputStreams with serverByteBuffer
   - replacing vectors with arraylists wherever possible
   - replacing hashtables with hashmaps wherever possible
   This was done because function calls to verctor or hashtable functions
   take 3 time longer than calls to functions of arraylists or hashmaps.
   TODO: we should take a look on the class serverObject which is inherited from hashmap
         Do we realy need a synchronization for this class?
   TODO: replace arraylists with linkedLists if random access to the list elements is not needed

*) Robots Parser supports if-modified-since downloads now
   If the downloaded robots.txt file is older than 7 days the robots parser tries to
   download the robots.txt with the if-modified-since header to avoid unnecessary downloads
   if the file was not changed. Additionally the ETag header is used to detect changes.

*) Crawler: better handling of unsupported mimeTypes + FileExtension

*) Bugfix: plasmaWordIndexEntity was not closed correctly in 
   - query.java
   - plasmaswitchboard.java

*) function minimizeUrlDB added to yacy.java 
   this function tests the current urlHashDB for unused urls
   ATTENTION: please don't use this function at the moment because
              it causes the wordIndexDB to flush all words into the
              word directory!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 10:45:33 +00:00
theli
90f02ea455 *) removing metainfo from serverargs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@780 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 23:05:34 +00:00
theli
1dc94e7753 *) Adding support for gzip content-encoding of http post requests
used to transferRWIs and transferURLs.
   See: http://www.yacy-forum.de/viewtopic.php?t=1167#10020

*) adding yacyVersion.java containing constants defining yacy versions
   that support a given feature.
   Needed to determine if a remote peer is able to decode gzip 
   content-encoded http post bodies properly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 10:30:55 +00:00
theli
b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
*) Replacing PDFBox 0.7.1 lib with newer version 0.7.2
*) Refactoring of classes httpd/httpc/httpHeaders to
   make many methods for httpHeader/Requestline parsing
   reusable for new icap implementation
*) adding chunked input stream support
   - needed by new icap implementation
   - needed by future httpc HTTP/1.1 support 
*) httpd.java
   - moving all connection property contants to class httpHeader
   - moving readHeader function to class httpHeader
   - moving parseQuery function to class httpHeader
   - moving handleTransparentProxy function to class httpHeader
*) httpHeader.java
   - adding new fuction to parse the http response line
   - adding new function to converte http headers to a string that
     can be send to the client
   - adding a function that generates a proper url using all parsed
     connection properties
*) ICAP Support
   - yacy now supports handling of icap response modification requests
   - this feature can be used by other icap enabled proxies to contact 
     yacy as icap server, and to handover the downloaded content to yacy.logging
     for indexing
   - functionality was successfully tested with squid 2.5Stable 10 + icap patch
   - further icap services e.g. URL filtering based on yacy's blacklists are possible
*) plasmaSwitchboard.java
   - htcache entries that are still needed for indexing are now properly registered 
     as in use after system restart
   - extended logging: log message now shows parsing and indexing time for each sb. entry
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 21:49:47 +00:00
theli
9444852896 *) Correcting problems if the port number was set to -1, e.g. because of the usage of function
URL.getPort()

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 09:54:11 +00:00
theli
6c722706b7 *) Moving yacyDebugMode intialization to switchboard
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@660 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 10:34:34 +00:00
theli
a7256e8f4e *) Adding X-Forwarded-For Header
See: http://www.yacy-forum.de/viewtopic.php?t=1118&highlight=xforwardedfor
*) httpc.java: Bugfix for incorrect http response statuscode parsing 
   In some situations the statustext whas chopped
*) Adding a lot of fileheaders containing YaCy copyright and license
*) httpd.java: Adding additional debugging http header that should help du detect
   the "binary data in browser window" bug.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@653 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 08:01:54 +00:00
theli
a20814291f *) Bugfix for "Race condition zwischen httpc und switchboard"
See: http://www.yacy-forum.de/viewtopic.php?p=9036

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@644 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-03 13:40:32 +00:00
theli
286853fd39 *) Bugfix for "YACY hängt sich beim Beenden auf" Bug
See: http://www.yacy-forum.de/viewtopic.php?p=8997

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@643 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-03 07:40:51 +00:00
allo
022c1ab179 performance fix for yacyDebugMode and useYacyReferer.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@638 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 08:21:33 +00:00
allo
286442fbc5 do not Use YaCy-Sites as Referer, if useYacyReferer = false
http://www.yacy-forum.de/viewtopic.php?p=8896#8896


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@637 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 06:26:38 +00:00
theli
4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logSevere instead of logFailure and logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:32:59 +00:00
theli
6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logFailure instead of logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:10:39 +00:00
allo
60074b4301 more DebugMode(60 Secs Timeout)
needed for Yacy with tor.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@573 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-22 14:02:34 +00:00
theli
7d8af6b41a *) Bugfix for heise newsletter Problem
See: http://www.yacy-forum.de/viewtopic.php?p=7836#7836

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@559 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-19 08:23:12 +00:00
theli
4335bfe822 *) Using timeout also to establish a connection
See: http://www.yacy-forum.de/viewtopic.php?t=979&highlight=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-19 06:47:34 +00:00
orbiter
ba0a486328 moved printStackTrace() to logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-14 23:35:18 +00:00
jerri
fa154e6ce5 Added some more javadoc into httpc.java. Moved the inner class response to the
end of the class definition, as this eases the reading of the outer class.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@514 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-10 21:50:17 +00:00
theli
b32e7c516c git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@507 6c8d7289-2bf4-0310-a012-ef5d649a1542 2005-08-09 09:07:19 +00:00
jerri
09193023fe Began with some documentation for the httpc-class. The code of the httpc-class
looks very disordered? Inner classes and methods mixed together. Maybe the code
should be cleaned up a little bit?


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-07 16:27:38 +00:00
jerri
7792e5ae9b Added an build-target to the ant-configuration to create the
yacy-javadoc-documentation in doc/api. Just do ant create-doc and point your
favourite browser to doc/api/index.html. As most of the classes are not
documented right now this just gives a great overview of all classes.
Hopefully this helps stimulating the creation of
javadoc-insource-documentation.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@502 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-07 15:16:33 +00:00
orbiter
2d8557cb10 minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-03 02:02:39 +00:00
rramthun
eacff63eda Typos...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 16:09:19 +00:00
orbiter
40da910f41 bugfixes and automatic news-cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@481 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 16:03:35 +00:00
orbiter
2f0d7ea8d3 removed htcache stati (superfluous now)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@396 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-09 00:33:34 +00:00
theli
0b95c9c434 *) Bugfix for Thread.getID() usagage + PeerPing-Shutdown Deadlock
See:
   - http://www.yacy-forum.de/viewtopic.php?p=4937
   - http://www.yacy-forum.de/viewtopic.php?p=4939

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@390 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-07 21:00:11 +00:00
theli
13eeaa08f3 *) httpc.java:
- Now it's possible to interrupt pending httpc-actions on server shutdown  
   - this is possible because of a newly introduced registration mechanism for
     open sockets
*) yacyCore.java
   - blocking peerPing threads can now be interrupted on server shutdown
*) serverCore.java
   - restructuring shutdown code 
*) error.html
   - port number is now set correctly if port forwarding was enabled


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-07 13:58:54 +00:00
orbiter
858cd94299 replaced indexing ram-queue by file-based stack-queue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-06 14:48:41 +00:00
theli
57c30f1d78 *) bugfix for usage of httpc without gzip content encoding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-04 11:25:25 +00:00
theli
0e2c33ee55 *) Network.html/Network.java:
- Adding function to manually force peer ping to remote yacy peer
  See:Network.html?page=4
- for debugging purpose only!

*) serverAbstractThread.java:
- Adding posibility to notify a server thread via a synchronization object
- this is needed e.g. by the port forwarding feature to send a notification
  to the peerPing thread to redo peer-ping with the new ip/port Settings_p.html

*) Port Forwarding Feature (it should work now)
- adding a serverThread which is responsible to detect broken port forwarding 
  connections and to do reconnect if needed
- serverCore.java: moving port forwarding initialization into a separate function
- adding positility to configure the ssh port 
- moving configuration section on the gui into a separate fieldset
- hello.java: only trying to do a second connect to the clientIp address during
  peer handshake if either remote port forwarding is not enabled locally or
  the clientIP is not equal to any local ip

*) httpdFileHandler.java:
- printout a more verbose errormessage

*) httpc.java
- allowing to deactivate content encoding from outside


 

*) plasmaCrawlWorker.java
- the crawler worker now tries to refetch the content of a website without
  gzip content encoding if a gzip error occured



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-04 11:09:48 +00:00
theli
d53b2393e5 *) autoconfig.java: ip address was not reported correctly when port-forwardin is on
*) hello.java: reportedip my be empty at peer startup
*) httpc.java: adding method to determine if the connection was already closed or is broken
*) httpdProxyHandler.java: trying to do a better errorhandling
*) server/serverCore.java
- setting myseed ip-address and port correctly if port-forwarding is on
- doing a more failsafe close and adding some debugging output
*) yacyClient.java: adding some logging statements to allow a better detection of 
   "degraded to senior"-bug
*) yacyCore.java: restructuring publishMySeed
   (@Orbiter: pleas take a look)
- to avoid buzy waiting
- to allow a gracefull shutdown on server shutdown
- new seed count was not calculated correctly in the previous version
*) yacySeedDB.java: host ip and port was not initialized correctly if port-forwarding
   was activated

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 11:00:26 +00:00
orbiter
3be98f194d tried to find the socket bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@300 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-18 01:48:11 +00:00
orbiter
5d06ded005 enhanced html parser speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-17 01:26:51 +00:00
theli
06b0db2cac *) adding toString method to
- httpc
   - response
*) simplifying gzip encoding
*) remembering http version of contacted server
   (neede for later support of keep alive by httpc)
*) moving function shallTransportZipped to httpd.java
   because this function is used multiple times

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@242 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 09:56:41 +00:00
theli
0e1d9e9722 *) shrinking httpc linebuffer when httpc is returned to pool. This is done to free memory
*) Making Seed-Upload configuration more verbose.
*) Some Changes in SOAP Search API (not finished yet).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-23 10:10:51 +00:00
theli
4dd387aae9 *) moving constants (see last commit) to proper httpHeader class
*) migrating fileHandler + proxyHandler to use constants instead of hardcoded values

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@114 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-13 09:14:12 +00:00