Commit Graph

56 Commits

Author SHA1 Message Date
borg-0300
a1777788a5 small change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-07 15:04:03 +00:00
borg-0300
64acb46a91 cleaned, finals, Properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 13:16:53 +00:00
borg-0300
52168fab9b cleaned, finals, Properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@856 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 13:14:18 +00:00
theli
a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
various checks like the blacklist check or the robots.txt disallow check are now
   done by a separate thread to unburden the indexer thread(s)
   TODO: maybe we have to introduce a threadpool here if it turn out that this single
         thread is a bottleneck because of the time consuming robots.txt downloads

*) improved index transfer
   The index selection and transmission is done in parallel now to improve index 
   transfer performance.
   TODO: maybe we could speed up performance by unsing multiple transmission threads in 
         parallel instead of only a single one.

*) gzip encoded post requests
   it is now configureable if a gzip encoded post request should be send on
   intex transfer/distribution

*) storage Peer (very experimentell and not optimized yet)
   Now it's possible to send the result of the yacy indexer thread to a remote peer 
   istead of storing the indexed words locally. 
   This could be done by setting the property "storagePeerHash" in the yacy config file
   - Please note that if the index transfer fails, the index ist stored locally.
   - TODO: currently this index transfer is done by the indexer thread. 
     To seedup the indexer
     a) this transmission should be done in parallel and
     b) multiple chunks should be bundled and transfered together


*) general performance improvements  
   - better memory cleanup after http request processing has finished
   - replacing some string concatenations with stringBuffers
   - replacing BufferedInputStreams with serverByteBuffer
   - replacing vectors with arraylists wherever possible
   - replacing hashtables with hashmaps wherever possible
   This was done because function calls to verctor or hashtable functions
   take 3 time longer than calls to functions of arraylists or hashmaps.
   TODO: we should take a look on the class serverObject which is inherited from hashmap
         Do we realy need a synchronization for this class?
   TODO: replace arraylists with linkedLists if random access to the list elements is not needed

*) Robots Parser supports if-modified-since downloads now
   If the downloaded robots.txt file is older than 7 days the robots parser tries to
   download the robots.txt with the if-modified-since header to avoid unnecessary downloads
   if the file was not changed. Additionally the ETag header is used to detect changes.

*) Crawler: better handling of unsupported mimeTypes + FileExtension

*) Bugfix: plasmaWordIndexEntity was not closed correctly in 
   - query.java
   - plasmaswitchboard.java

*) function minimizeUrlDB added to yacy.java 
   this function tests the current urlHashDB for unused urls
   ATTENTION: please don't use this function at the moment because
              it causes the wordIndexDB to flush all words into the
              word directory!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 10:45:33 +00:00
borg-0300
a9c466ef21 cleaned, finals, StringBuffer, Properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@849 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-04 17:51:32 +00:00
orbiter
0c3a20d44f more + changed log for better understanding of outOfMemory bug and others
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@846 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-04 00:28:59 +00:00
orbiter
7fc822a59b changed handling of time-zones
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-27 16:28:55 +00:00
(no author)
1aa79f5bb5 cleaned;
Properties;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-24 11:58:17 +00:00
theli
c42a543bc3 *) Adding peername to logmessage when receiving URLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@781 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 23:40:03 +00:00
theli
1dc94e7753 *) Adding support for gzip content-encoding of http post requests
used to transferRWIs and transferURLs.
   See: http://www.yacy-forum.de/viewtopic.php?t=1167#10020

*) adding yacyVersion.java containing constants defining yacy versions
   that support a given feature.
   Needed to determine if a remote peer is able to decode gzip 
   content-encoded http post bodies properly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 10:30:55 +00:00
orbiter
96a5b6e8fb removed yacy peer types from serverSwitch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 23:15:33 +00:00
borg-0300
11e175630b StringBuffers, finals;
cleaned;
Properties;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-18 15:23:01 +00:00
theli
a2fec3bb1c *) Bugfix for " java.lang.NullPointerException at hello.respond(hello.java:167)"
See: http://www.yacy-forum.de/viewtopic.php?p=9471

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@685 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-08 20:07:57 +00:00
theli
4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logSevere instead of logFailure and logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:32:59 +00:00
theli
6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logFailure instead of logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:10:39 +00:00
theli
a812fb86cc *) Port Forwarding Feature does not detect broken connection properly.
Therefor a test-request was added to the isConnected function to detect broken connections
   and to keep open connections alive


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@596 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-29 11:39:10 +00:00
orbiter
c47bb1182d bugfix for assortment initialization error
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-16 11:43:14 +00:00
orbiter
25f632dbd9 more DHT bugfixes and better logging of DHT effects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-16 00:31:15 +00:00
orbiter
cd10370992 several bugfixes and dht selection / logging improvement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-14 00:57:30 +00:00
theli
865b9490a2 *) Making DHT Transfer while Crawling configurable
See: http://www.yacy-forum.de/viewtopic.php?p=6904

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-06 11:40:12 +00:00
theli
0610e83468 *) Bugfix. recipient peer was accidentally displayed as source peer of a url transmission.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@495 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-05 21:37:59 +00:00
orbiter
bb3e897baf mor minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@488 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-03 13:43:55 +00:00
orbiter
2d8557cb10 minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-03 02:02:39 +00:00
rramthun
eacff63eda Typos...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 16:09:19 +00:00
orbiter
083c8ddc69 new alert symbols
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-02 00:16:19 +00:00
orbiter
e24dbde217 better logging for WRONG seed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@463 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-31 11:11:29 +00:00
rramthun
b99205e445 Translation, spelling...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@448 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-28 13:30:11 +00:00
orbiter
a2cf76ea7c bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@413 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-20 01:36:28 +00:00
rramthun
0f11399d16 Some corrections...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-19 16:07:13 +00:00
orbiter
9f505af7aa preparations for bulk remote crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@408 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-19 00:26:31 +00:00
orbiter
19dbed7cc8 code clean-up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@401 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-12 15:09:35 +00:00
orbiter
40036ba69c fixed dht transmission; added url-blacklist blocking also for remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-12 00:07:09 +00:00
orbiter
311e627363 blocking of blacklisted urls in indexReceive and small changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-11 15:36:10 +00:00
theli
0e2c33ee55 *) Network.html/Network.java:
- Adding function to manually force peer ping to remote yacy peer
  See:Network.html?page=4
- for debugging purpose only!

*) serverAbstractThread.java:
- Adding posibility to notify a server thread via a synchronization object
- this is needed e.g. by the port forwarding feature to send a notification
  to the peerPing thread to redo peer-ping with the new ip/port Settings_p.html

*) Port Forwarding Feature (it should work now)
- adding a serverThread which is responsible to detect broken port forwarding 
  connections and to do reconnect if needed
- serverCore.java: moving port forwarding initialization into a separate function
- adding positility to configure the ssh port 
- moving configuration section on the gui into a separate fieldset
- hello.java: only trying to do a second connect to the clientIp address during
  peer handshake if either remote port forwarding is not enabled locally or
  the clientIP is not equal to any local ip

*) httpdFileHandler.java:
- printout a more verbose errormessage

*) httpc.java
- allowing to deactivate content encoding from outside


 

*) plasmaCrawlWorker.java
- the crawler worker now tries to refetch the content of a website without
  gzip content encoding if a gzip error occured



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-04 11:09:48 +00:00
theli
d50ad1521a *) correcting logging statement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@335 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-29 11:08:47 +00:00
theli
9d8c66fb5e *) adding possibility to forward received yacy-messages (htroot/yacy/message.java)
via a command-line email program (e.g. sendmail) to a configured email address
   - the configuration dialog is reachable via Settings_p.html#messageForwarding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-29 09:50:48 +00:00
theli
08e4334c1d *) Status.java: showing amount of time since last upload of seed-file
*) hello.java: adding additional output for principal-downgrade bug
*) httpd.java, httpdFileHandler.java, httpdProxyHandler.java: improved errorhandling
*) yacyCore.java: trying to fix principal-downgrade bug
*) yacySeed.java: adding some constants

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@329 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-28 11:27:31 +00:00
theli
d53b2393e5 *) autoconfig.java: ip address was not reported correctly when port-forwardin is on
*) hello.java: reportedip my be empty at peer startup
*) httpc.java: adding method to determine if the connection was already closed or is broken
*) httpdProxyHandler.java: trying to do a better errorhandling
*) server/serverCore.java
- setting myseed ip-address and port correctly if port-forwarding is on
- doing a more failsafe close and adding some debugging output
*) yacyClient.java: adding some logging statements to allow a better detection of 
   "degraded to senior"-bug
*) yacyCore.java: restructuring publishMySeed
   (@Orbiter: pleas take a look)
- to avoid buzy waiting
- to allow a gracefull shutdown on server shutdown
- new seed count was not calculated correctly in the previous version
*) yacySeedDB.java: host ip and port was not initialized correctly if port-forwarding
   was activated

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 11:00:26 +00:00
orbiter
5a490aa065 fixed html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@289 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 21:49:56 +00:00
orbiter
38747857c2 correction of correction to port-forwarding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 11:49:54 +00:00
orbiter
dbda6e1e85 corrections to port forwarding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@286 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 11:40:36 +00:00
theli
d8cb3324a9 *) property "mytime" was not set correctly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@285 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 11:22:20 +00:00
theli
9a98988c3c *) Bugfix for SSL/NIO Bug
See: http://www.yacy-forum.de/viewtopic.php?t=516
   - removing NIO from server/serverCore.java because of massive problems
     with socket close issues
*) Adding support for remote port forwarding via sch
   @Orbiter: Please take a look into
   - hello.java
   - server/serverCore.java.publicIP()
   - yacy/yacyClient.java.publishMySeed(...)
*) Making startup loading of additional content parsers more failsafe


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@281 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 07:28:07 +00:00
orbiter
a1ffc27041 preparations for image/movie/music indexing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@280 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 00:31:13 +00:00
theli
a2e5018427 *) adding missing calls for function close() to avoid "too many open file" bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@273 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-13 09:56:41 +00:00
orbiter
878ff0ae7b corrections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@262 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-09 16:39:25 +00:00
theli
52cf732fad *) correcting "seed-ftp-upload/Nothing changed" bug:
See: http://www.yacy-forum.de/viewtopic.php?p=3986#3986

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@219 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-08 09:54:52 +00:00
orbiter
e89ded9e41 bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@204 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-31 22:12:43 +00:00
orbiter
e26ac60c3e modified assortment data structures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@148 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-22 13:27:54 +00:00
theli
361f05978d Multiple updates regarding the yacy seedUpload facility,
optional content parsers, thread pool configuration ...

Please help me testing if everything works correct.

*) Migration of yacy seedUpload functionality
See: http://www.yacy-forum.de/viewtopic.php?t=256
- new uploaders can now be easily introduced because of a new modulare uploader system
- default uploaders are: none, file, ftp
- adding optional uploader for scp
- each uploader provides its own configuration file that will be 
  included into the settings page using the new template include feature
- Each uploader can define its libx dependencies. If not all needed libs are
  available, the uploader is deactivated automatically.

*) Migration of optional parsers
See: http://www.yacy-forum.de/viewtopic.php?t=198
- Parsers can now also define there libx dependencies
- adding parser for bzip compressed content
- adding parser for gzip compressed content
- adding parser for zip files
- adding parser for tar files
- adding parser to detect the mime-type of a file
  this is needed by the bzip/gzip Parser.java
- adding parser for rtf files
- removing extra configuration file yacy.parser
  the list of enabled parsers is now stored in the main config file

*) Adding configuration option in the performance dialog to configure
See: http://www.yacy-forum.de/viewtopic.php?t=267
- maxActive / maxIdle / minIdle values for httpd-session-threadpool
- maxActive / maxIdle / minIdle values for crawler-threadpool

*) Changing Crawling Filter behaviour
See: http://www.yacy-forum.de/viewtopic.php?p=2631

*) Replacing some hardcoded strings with the proper constants of the httpHeader class

*) Adding new libs to libx directory. This libs are
- needed by new content parsers
- needed by new optional seed uploader
- needed by SOAP API (which will be committed later)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-05-17 08:25:04 +00:00