yacy_search_server/htroot
theli a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
various checks like the blacklist check or the robots.txt disallow check are now
   done by a separate thread to unburden the indexer thread(s)
   TODO: maybe we have to introduce a threadpool here if it turn out that this single
         thread is a bottleneck because of the time consuming robots.txt downloads

*) improved index transfer
   The index selection and transmission is done in parallel now to improve index 
   transfer performance.
   TODO: maybe we could speed up performance by unsing multiple transmission threads in 
         parallel instead of only a single one.

*) gzip encoded post requests
   it is now configureable if a gzip encoded post request should be send on
   intex transfer/distribution

*) storage Peer (very experimentell and not optimized yet)
   Now it's possible to send the result of the yacy indexer thread to a remote peer 
   istead of storing the indexed words locally. 
   This could be done by setting the property "storagePeerHash" in the yacy config file
   - Please note that if the index transfer fails, the index ist stored locally.
   - TODO: currently this index transfer is done by the indexer thread. 
     To seedup the indexer
     a) this transmission should be done in parallel and
     b) multiple chunks should be bundled and transfered together


*) general performance improvements  
   - better memory cleanup after http request processing has finished
   - replacing some string concatenations with stringBuffers
   - replacing BufferedInputStreams with serverByteBuffer
   - replacing vectors with arraylists wherever possible
   - replacing hashtables with hashmaps wherever possible
   This was done because function calls to verctor or hashtable functions
   take 3 time longer than calls to functions of arraylists or hashmaps.
   TODO: we should take a look on the class serverObject which is inherited from hashmap
         Do we realy need a synchronization for this class?
   TODO: replace arraylists with linkedLists if random access to the list elements is not needed

*) Robots Parser supports if-modified-since downloads now
   If the downloaded robots.txt file is older than 7 days the robots parser tries to
   download the robots.txt with the if-modified-since header to avoid unnecessary downloads
   if the file was not changed. Additionally the ETag header is used to detect changes.

*) Crawler: better handling of unsupported mimeTypes + FileExtension

*) Bugfix: plasmaWordIndexEntity was not closed correctly in 
   - query.java
   - plasmaswitchboard.java

*) function minimizeUrlDB added to yacy.java 
   this function tests the current urlHashDB for unused urls
   ATTENTION: please don't use this function at the moment because
              it causes the wordIndexDB to flush all words into the
              word directory!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 10:45:33 +00:00
..
env passive red to yellow changed 2005-10-01 14:24:59 +00:00
htdocsdefault removed yacy peer types from serverSwitch 2005-09-20 23:15:33 +00:00
proxymsg *) Printout date and system name on proxy error page 2005-08-23 11:32:36 +00:00
yacy *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
autoconfig.java *) Extending proxy autoconfig to avoid problems with multiple local network cards 2005-09-30 07:47:28 +00:00
autoconfig.pac *) Extending proxy autoconfig to avoid problems with multiple local network cards 2005-09-30 07:47:28 +00:00
Blacklist_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
Blacklist_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
CacheAdmin_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
CacheAdmin_p.java fixed problem with htcache path 2005-09-29 00:24:09 +00:00
CacheResource_p.html initial load with yacy 0.36 2005-04-07 19:19:42 +00:00
CacheResource_p.java *) proxyCache, proxyCacheSize can be changed under 'Proxy Indexing' 2005-08-30 12:50:30 +00:00
Config_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
Config_p.java replace created with contributed 2005-09-12 22:20:37 +00:00
CookieMonitorIncoming_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
CookieMonitorIncoming_p.java Fixed spelling mistakes 2005-05-12 17:50:45 +00:00
CookieMonitorOutgoing_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
CookieMonitorOutgoing_p.java Fixed spelling mistakes 2005-05-12 17:50:45 +00:00
EditProfile_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
EditProfile_p.java added news-creation at crawl start 2005-07-30 11:57:19 +00:00
favicon.ico many bug-fixes 2005-04-30 01:22:46 +00:00
Help.html Corrected regex-tutorial... 2005-08-13 10:21:22 +00:00
imagetest.java implemented image-servlets. the imagetest will stay there only for a limited time. Now images can be generated on-the-fly from servlets 2005-10-05 08:40:20 +00:00
index.html bugfix for assortment initialization error 2005-08-16 11:43:14 +00:00
index.java enhancements to dht selection, search and search presentation 2005-08-15 01:12:25 +00:00
index.rss activated RWI distribution to DHT for senior peers (default redundancy 3), necessary now for network growth 2005-07-27 12:51:00 +00:00
index.soap *) Adding missing Template for SOAP API 2005-08-12 06:19:59 +00:00
index.xsl *) adding xsl stylesheet that can be used by browsers to format the rss search result in a user friendly format 2005-06-09 09:36:44 +00:00
IndexControl_p.html *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
IndexControl_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
IndexCreate_p.html *) Bugfix for "Error with request: GET http://localpeer:80/IndexDelete_p.ht" 2005-09-07 13:27:38 +00:00
IndexCreate_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
IndexCreateIndexingQueue_p.html *) Bugfix of Delete-Indexqueue-Entry, Clear Indexing Queue functionality 2005-09-06 08:03:04 +00:00
IndexCreateIndexingQueue_p.java *) Trying to get rid of possibility of exploits in IndexCreate* through HTML and JavaSkript in peernames, URLs, <title>-tags etc. (see http://www.yacy-forum.de/viewtopic.php?t=1181) I hope I got them all and did not overdo it. 2005-09-19 20:36:29 +00:00
IndexCreateLoaderQueue_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
IndexCreateLoaderQueue_p.java *) Trying to get rid of possibility of exploits in IndexCreate* through HTML and JavaSkript in peernames, URLs, <title>-tags etc. (see http://www.yacy-forum.de/viewtopic.php?t=1181) I hope I got them all and did not overdo it. 2005-09-19 20:36:29 +00:00
IndexCreateWWWGlobalQueue_p.html fix for a profile = null problem and new monitor in crawl queue 2005-09-15 21:39:37 +00:00
IndexCreateWWWGlobalQueue_p.java *) Trying to get rid of possibility of exploits in IndexCreate* through HTML and JavaSkript in peernames, URLs, <title>-tags etc. (see http://www.yacy-forum.de/viewtopic.php?t=1181) I hope I got them all and did not overdo it. 2005-09-19 20:36:29 +00:00
IndexCreateWWWLocalQueue_p.html fix for a profile = null problem and new monitor in crawl queue 2005-09-15 21:39:37 +00:00
IndexCreateWWWLocalQueue_p.java *) Trying to get rid of possibility of exploits in IndexCreate* through HTML and JavaSkript in peernames, URLs, <title>-tags etc. (see http://www.yacy-forum.de/viewtopic.php?t=1181) I hope I got them all and did not overdo it. 2005-09-19 20:36:29 +00:00
IndexMonitor.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
IndexMonitor.java configuration of number of output lines in IndexMonitor 2005-07-12 15:55:25 +00:00
IndexShare_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
IndexShare_p.java minor changes 2005-08-03 02:02:39 +00:00
IndexTransfer_p.html *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
IndexTransfer_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
Lab.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
Language_p.html Additions to the language file 2005-08-14 20:16:52 +00:00
Language_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
Log_p.html Log ist now skinnable 2005-04-09 10:10:32 +00:00
Log_p.java *) Changes needed because of logging migration 2005-06-09 09:34:44 +00:00
Messages_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
Messages_p.java using Wikicode instream of bbCode 2005-08-07 17:56:44 +00:00
MessageSend_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
MessageSend_p.java just a typo... 2005-06-21 00:53:04 +00:00
Network.csv Your-Peer Stats added 2005-05-28 11:07:08 +00:00
Network.html more + changed log for better understanding of outOfMemory bug and others 2005-10-04 00:28:59 +00:00
Network.java more + changed log for better understanding of outOfMemory bug and others 2005-10-04 00:28:59 +00:00
Network.xml added words; see http://www.yacy-forum.de/viewtopic.php?p=7349#7349 2005-08-12 22:35:21 +00:00
News.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
News.java *) Tiny little job of cleaning up. (Was man nicht im Kopf hat, hat man in den Fingern...) 2005-09-19 20:42:53 +00:00
PerformanceMemory_p.html typo 2005-09-24 00:40:52 +00:00
PerformanceMemory_p.java integrated crawl-profiles db in memory-performance monitor 2005-09-24 00:33:27 +00:00
PerformanceQueues_p.html kbytes instead of bytes in performance settings; new default values 2005-09-28 18:53:41 +00:00
PerformanceQueues_p.java kbytes instead of bytes in performance settings; new default values 2005-09-28 18:53:41 +00:00
profile.html initial load with yacy 0.36 2005-04-07 19:19:42 +00:00
ProxyIndexingMonitor_p.html Improved german translation 2005-09-03 16:20:31 +00:00
ProxyIndexingMonitor_p.java replaced o.compareTo() with o.equals() 2005-09-04 00:01:09 +00:00
Settings_p.html *) Small changes that make entering values much easier for people who use the TAB-key to navigate through the page or who use textbowsers like lynx. 2005-09-29 20:17:10 +00:00
Settings_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
SettingsAck_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
SettingsAck_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
sharedBlacklist_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
sharedBlacklist_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
simple_search.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
simple_search.java a really simple Interface 2005-05-23 18:44:12 +00:00
Skins_p.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
Skins_p.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
Statistics.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
Statistics.java Fixed spelling mistakes 2005-05-12 17:50:45 +00:00
Status_p.inc *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
Status.html *) Splitting Status Page into Private and Public Informations 2005-08-23 08:27:37 +00:00
Status.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
Steering.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
Steering.java performance setting for remote indexing configuration and latest changes for 0.39 2005-07-22 13:56:19 +00:00
User_p.html HTML for last Commit. 2005-10-03 12:23:20 +00:00
User_p.java delete Function 2005-10-03 12:17:12 +00:00
ViewLog_p.html Small changes for 0.40 2005-08-16 09:37:37 +00:00
ViewProfile.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
ViewProfile.java *) Asynchronous queuing of crawl job URLs (stackCrawl) 2005-10-05 10:45:33 +00:00
Wiki.html display of peer name in headline; see http://www.yacy-forum.de/viewtopic.php?p=7466#7466 2005-08-14 15:45:48 +00:00
Wiki.java added a Preview button to yacyWiki 2005-08-16 01:19:29 +00:00