Commit Graph

122 Commits

Author SHA1 Message Date
theli
5bf70e6e14 *) Bugfix for serverClassLoader.java
- Classloading didn't work properly if there are multiple classes with the same name
   - This could occure because the yacy servlets have no package name defined and therefore
     are all in the same (default) package.

*) Bugfix for Duplicated Class Error
   See: http://www.yacy-forum.de/viewtopic.php?t=1341

  

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1135 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-28 10:15:25 +00:00
orbiter
85282b1d98 enhanced YBR recognition and search result heuristics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 01:40:02 +00:00
orbiter
0e25020f51 added first generation and usage of YBR index-files. Enhanced overall ranking of search results.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-22 15:17:05 +00:00
orbiter
0ec54d9c5f enhanced CR-file handling and added first RCI-evaluation tests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-20 18:55:35 +00:00
orbiter
88e3234393 fine-tuning of rci-generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-18 02:00:25 +00:00
orbiter
24dc0e0760 implemented cr-file processing and further transmission steps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-17 01:59:01 +00:00
theli
8e308cf50e *) Possibility to change the server port on-the-fly.
- Now it's possible to change the server port without the need to restart the whole server.
   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1089 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-15 15:03:15 +00:00
theli
fd58d5f8e6 *) Adding possibility to specify the interface / IP-Address where YaCy should bind to.
- e.g. Port = 192.168.0.1:8080
          Port = #eth0:8080
          Port = 8080

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-13 17:03:52 +00:00
orbiter
6e81f2580d try to fix bug with storage of settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-11 08:41:13 +00:00
orbiter
79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-10 23:48:20 +00:00
orbiter
d2731418bf added creation of global ranking files and changed url normal form usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 12:33:02 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
hydrox
56b9f34411 *)removed unused imports
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 16:30:45 +00:00
orbiter
5f68b6886b introduced new url-hashes for better ranking computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1013 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 00:54:55 +00:00
orbiter
4d1e56e4d9 fixed intermission-bug (removed 'break for intermission' of httpd-thread)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1009 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-31 10:46:13 +00:00
theli
723e056c48 *) Bugfix for ClassCastException during SessionPool.close
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-28 07:25:22 +00:00
theli
9a2afe88d4 *) Deactivating unlimited timeout for persistent connections because this
could cause problems with clients which do not shutdown persistent 
   connections properly.
   - Setting timeout for idle persistent connections to 30 minutes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-25 10:21:00 +00:00
orbiter
4dcbc26ef1 introduction of search profiles; very experimental
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-23 17:50:27 +00:00
theli
40777556c5 *) Connection Tracking
- adding automatic refresh
   - accepts new parameter nameLookup which can be used to deactivate 
     yacy-peer name lookup (because we have problems with this on large seed-dbs)

*) ViewFile
   New page that can be used to view 
   - original content 
   - plain text content 
   - parsed content
   - parsed sentences 
   of a webpage specified by there url hash
   Mainly for debugging purpose at the moment

*) Robots.txt 
   Bugfix for if-modified-since usage
   TODO: synchronization of downloads to avoid loading the same robots-file 
   multiple times in parallel by different threads

*) Shutdown
   Better abortion of transferRWI and transferURL sessions on server shutdown

*) Status Page
   Adding icon to start/stop crawling via status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-18 07:45:27 +00:00
borg-0300
e642a5d8b7 more constants
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-17 15:46:12 +00:00
orbiter
d29dfb0a12 refactoring of search / preparation for better search methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@921 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 12:28:49 +00:00
theli
c8a35a0130 *) Adding new connection tracking page (currently only for incoming connections)
*) Displaying statistic for incoming connections on status page
*) Bugfix for Loop-Access Bug when trying to access the yacy page while yacy is configured as proxy
   See: http://www.yacy-forum.de/viewtopic.php?p=6826
*) Bugfix for Referer Bug
   See: http://www.yacy-forum.de/viewtopic.php?p=11098#11098
*) Adding reverse Name lookup for yacy-domain names (used by the connection tracking page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 08:17:43 +00:00
orbiter
6a72f06c40 resizable network picture + greater on click
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@900 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 10:08:28 +00:00
theli
a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
various checks like the blacklist check or the robots.txt disallow check are now
   done by a separate thread to unburden the indexer thread(s)
   TODO: maybe we have to introduce a threadpool here if it turn out that this single
         thread is a bottleneck because of the time consuming robots.txt downloads

*) improved index transfer
   The index selection and transmission is done in parallel now to improve index 
   transfer performance.
   TODO: maybe we could speed up performance by unsing multiple transmission threads in 
         parallel instead of only a single one.

*) gzip encoded post requests
   it is now configureable if a gzip encoded post request should be send on
   intex transfer/distribution

*) storage Peer (very experimentell and not optimized yet)
   Now it's possible to send the result of the yacy indexer thread to a remote peer 
   istead of storing the indexed words locally. 
   This could be done by setting the property "storagePeerHash" in the yacy config file
   - Please note that if the index transfer fails, the index ist stored locally.
   - TODO: currently this index transfer is done by the indexer thread. 
     To seedup the indexer
     a) this transmission should be done in parallel and
     b) multiple chunks should be bundled and transfered together


*) general performance improvements  
   - better memory cleanup after http request processing has finished
   - replacing some string concatenations with stringBuffers
   - replacing BufferedInputStreams with serverByteBuffer
   - replacing vectors with arraylists wherever possible
   - replacing hashtables with hashmaps wherever possible
   This was done because function calls to verctor or hashtable functions
   take 3 time longer than calls to functions of arraylists or hashmaps.
   TODO: we should take a look on the class serverObject which is inherited from hashmap
         Do we realy need a synchronization for this class?
   TODO: replace arraylists with linkedLists if random access to the list elements is not needed

*) Robots Parser supports if-modified-since downloads now
   If the downloaded robots.txt file is older than 7 days the robots parser tries to
   download the robots.txt with the if-modified-since header to avoid unnecessary downloads
   if the file was not changed. Additionally the ETag header is used to detect changes.

*) Crawler: better handling of unsupported mimeTypes + FileExtension

*) Bugfix: plasmaWordIndexEntity was not closed correctly in 
   - query.java
   - plasmaswitchboard.java

*) function minimizeUrlDB added to yacy.java 
   this function tests the current urlHashDB for unused urls
   ATTENTION: please don't use this function at the moment because
              it causes the wordIndexDB to flush all words into the
              word directory!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 10:45:33 +00:00
orbiter
3dd7e90cdd kbytes instead of bytes in performance settings; new default values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@808 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-28 18:53:41 +00:00
orbiter
2c7b490e30 memory-logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-28 00:52:54 +00:00
orbiter
7fc822a59b changed handling of time-zones
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-27 16:28:55 +00:00
orbiter
495bc8bec6 removed cache-control from low and medium priority caches which reduces memory use and computation overhead
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 20:01:26 +00:00
orbiter
07f30931ec various configuration options in memory performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@763 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-21 14:21:45 +00:00
orbiter
2f732e32a2 enhancements to memory menue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@762 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-21 12:21:01 +00:00
orbiter
96a5b6e8fb removed yacy peer types from serverSwitch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 23:15:33 +00:00
theli
b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
*) Replacing PDFBox 0.7.1 lib with newer version 0.7.2
*) Refactoring of classes httpd/httpc/httpHeaders to
   make many methods for httpHeader/Requestline parsing
   reusable for new icap implementation
*) adding chunked input stream support
   - needed by new icap implementation
   - needed by future httpc HTTP/1.1 support 
*) httpd.java
   - moving all connection property contants to class httpHeader
   - moving readHeader function to class httpHeader
   - moving parseQuery function to class httpHeader
   - moving handleTransparentProxy function to class httpHeader
*) httpHeader.java
   - adding new fuction to parse the http response line
   - adding new function to converte http headers to a string that
     can be send to the client
   - adding a function that generates a proper url using all parsed
     connection properties
*) ICAP Support
   - yacy now supports handling of icap response modification requests
   - this feature can be used by other icap enabled proxies to contact 
     yacy as icap server, and to handover the downloaded content to yacy.logging
     for indexing
   - functionality was successfully tested with squid 2.5Stable 10 + icap patch
   - further icap services e.g. URL filtering based on yacy's blacklists are possible
*) plasmaSwitchboard.java
   - htcache entries that are still needed for indexing are now properly registered 
     as in use after system restart
   - extended logging: log message now shows parsing and indexing time for each sb. entry
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 21:49:47 +00:00
borg-0300
42cd2cea65 added final constants, so that other class can reach it;
cleaned;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@741 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-18 09:45:20 +00:00
theli
394b4440d2 *) adding isLoggable function to serverLog class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@689 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-09 11:03:06 +00:00
borg-0300
718950c5da small change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-07 15:20:12 +00:00
theli
a7256e8f4e *) Adding X-Forwarded-For Header
See: http://www.yacy-forum.de/viewtopic.php?t=1118&highlight=xforwardedfor
*) httpc.java: Bugfix for incorrect http response statuscode parsing 
   In some situations the statustext whas chopped
*) Adding a lot of fileheaders containing YaCy copyright and license
*) httpd.java: Adding additional debugging http header that should help du detect
   the "binary data in browser window" bug.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@653 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-05 08:01:54 +00:00
theli
f9eb550fbc *) Bugfix for NullpointerException in serverAbstractSwitch.setConfig
See: http://www.yacy-forum.de/viewtopic.php?t=692#5575

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@636 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-02 03:34:02 +00:00
theli
7a7254713d *) Moving Logging directory per default to DATA/LOG
See: http://www.yacy-forum.de/viewtopic.php?t=940#7656

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 08:56:35 +00:00
theli
bead8a32aa *) IndexCreate_p.java:
Crawler StartURLs will now also added to the errorURL-DB if an error occures on this url
*) kelondroStack.java, plasmaSwitchboardQueue.java
   Adding method which returns a list of all entries in the queue. This list is used by IndexCreate_p.java 
   instead of an iterator to display the indexing-list. 
   Advantages: avoid concurrent modifications of the list while displaying it. 
               Speedup because now we have to access only one sync function instead of multiple ones 
               (one for each entry)
*) IndexCreateIndexingQueue_p.java
   Using new list() function of plasmaSwitchboardQueue
*) httpdFileHandler.java
   If a servelet returns the special value "LOCATION" the httpFileHandler does a Redirection of 
   the Browser to the URL specified by the servelet. This can e.g. be used when a http get request is
   used insead of a post request, but a refresh should not be allowed.
*) IndexCreateWWWLocalQueue_p.html
   Now it's possible to delete single entries of the local crawler queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@626 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-01 07:52:46 +00:00
theli
60e77dcc60 *) Adding additional loglevel finer + finest according to Thread http://www.yacy-forum.de/viewtopic.php?p=8750#8750
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@618 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 22:37:48 +00:00
borg-0300
fa54b5f38d cleanup spaces
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 22:24:38 +00:00
theli
4fd5b95b1f *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logSevere instead of logFailure and logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@615 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:32:59 +00:00
theli
6adf8a4bde *) Renaming Logger function names to reflect the proper Java Logging API Loglevels
- please use logFine instead of logDebug
   - please use logFailure instead of logError
   See: http://www.yacy-forum.de/viewtopic.php?p=8726#8726

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@614 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-30 21:10:39 +00:00
theli
a812fb86cc *) Port Forwarding Feature does not detect broken connection properly.
Therefor a test-request was added to the isConnected function to detect broken connections
   and to keep open connections alive


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@596 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-29 11:39:10 +00:00
theli
af7b8f75bd *) Making proxyAccessLogging configureable via yacy.logging file
- logging can be disabled now
   - logging directory / filelimit / rotation count can be configured now
   See: http://www.yacy-forum.de/viewtopic.php?t=965&postdays=0&postorder=asc&start=30#8280

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@595 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-29 11:31:58 +00:00
allo
66ebce1109 use staticIP more often
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@592 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-28 16:55:52 +00:00
theli
4a2f06053f *) Bugfix for "Gehäuselautsprecher" Bug
If de.anomic.server.logging.ConsoleOutErrHandler.ignoreCtrlChr is set to true all control chars except 
   space,tab,newline, are replaced with spaces
   See: http://www.yacy-forum.de/viewtopic.php?p=5528

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@579 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-23 10:48:24 +00:00
theli
dc7c5237fb *) Bugfix for "Gehäuselautsprecher" Bug
If de.anomic.server.logging.ConsoleOutErrHandler.ignoreCtrlChr is set to true all control chars except 
   space,tab,newline, are replaced with spaces
   See: http://www.yacy-forum.de/viewtopic.php?p=5528

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@578 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-23 10:45:44 +00:00
orbiter
ba0a486328 moved printStackTrace() to logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-14 23:35:18 +00:00
orbiter
248c24b60a intermission-feature usage in case of local and remote search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-09 20:43:37 +00:00