Commit Graph

91 Commits

Author SHA1 Message Date
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
theli
ed2cb040d1 *) Bugfix for http connection header validation
- Connection header was not handled correctly if it contains
     multiple values, e.g. Connection: TE, close 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2219 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-20 05:22:55 +00:00
allo
d7a3fdb18b no white pages, when clicking cancel on the password-dialog
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2198 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-12 12:12:21 +00:00
theli
b4ab183518 *) Bugfix for NullpointerException if the seeds IP could not be resolved
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2099 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-15 10:50:10 +00:00
allo
9938c252dd better Errorhandling for proxyAccounts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-11 13:12:35 +00:00
theli
727aac4768 *) Bugfix for Transparent-Proxy-Support <-> Port Forwarding problem
See: http://www.yacy-forum.de/viewtopic.php?p=20358

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-25 05:29:20 +00:00
rramthun
42b0b10a95 -Adding Windows Media to types which are not sended compressed
-Renaming writeandzip to writeandgzip to avoid confusion about type of compression
-Adding new startup message to windows script
-The usual language "enhancements" ;-)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-23 20:12:23 +00:00
theli
c7ececbfb2 *) httpd.mime: adding jar mimetype
*) httpd.java: charset is only appended to mimetype for text mimetypes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1839 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-07 15:58:50 +00:00
allo
3b4a99ff6a fix for java 1.4.x
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1685 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-17 17:55:13 +00:00
theli
9b941fb773 *) bugfix for usage of yacy with extended port binding (e.g. #eth0:8080, 192.168.0.1:8080, etc.)
- port was reported incorrectly to other peers


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-17 10:53:20 +00:00
allo
2d4e1325cf UTF-8 fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 21:33:41 +00:00
hermens
c8f5adea4d - don't send Message Body on HEAD requests, even in the case of an error
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 11:45:32 +00:00
theli
a7248fbb0a *) bugfix for http/0.9 responses
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 11:07:17 +00:00
theli
a354bc2ec1 *) Bugfix for content length check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 10:54:47 +00:00
hermens
e974d0cb99 Improve compliance to rfc
*) There is no status line in HTTP/0.9
*) Answers to HEAD requests should return the same headers as a GET request



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 10:27:21 +00:00
theli
62ffb5ece0 *) httpdFileHandler.java: adding real streaming support for lage files
- avoid to read the whole file into memory
   - support of chunked transfer-encoding for http/1.1 clients
   - support of gzip content-encoding suitable clients
   See: http://www.yacy-forum.de/viewtopic.php?p=17058#17058
*) MessageSend_p.html: better highlighting of peer response/status messages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1646 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-15 12:31:52 +00:00
theli
eeba8b055e *) guessing, testing and suggesting alternative hostnames on "unknown host" error
See: http://www.yacy-forum.de/viewtopic.php?t=1879

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1636 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-14 09:55:09 +00:00
theli
44996afd79 *) Bugfix for handling of http/0.9 clients.
- nothing was send as response

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1610 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-11 15:21:04 +00:00
(no author)
001513cc1f Now custom httpHeader can be created
and filled with cookies and so on.

This header one can set into serverObjects

Check CookieTest.html and CookieTest.java for details.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1334 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-13 22:50:04 +00:00
(no author)
55f3232219 Patch for the Coockie management.
Version 0.1

Start Yacy, go to localhost:8080/CookieTest.html
Play around with cookies
Look into CookieTest.java to See, how it works

This behavior will be changed 
such that httpHeader will be responsible for the cookies in the future



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-13 21:29:04 +00:00
(no author)
1d3249e787 handle UTF-8 correctly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-12 21:14:39 +00:00
orbiter
9544c47684 added some UTF-8 handling.
hope this will help somehow.. for shure not THE solution to our UTF-8 problem


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-10 16:48:59 +00:00
orbiter
fed92d364b introduced USAGE object for counter synchronization in kelondroRecords
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1199 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-10 02:11:24 +00:00
hermens
35cf6712b2 *) fixes for httpd
- don't send Body on HEAD requests
  - don't send a Last-modified: date, that is later then Date:
  - Use Cache-control instead of Pragma with HTTP/1.1
  - don't send header with HTTP/0.9



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1198 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-09 17:35:45 +00:00
hermens
ec1202edbe *) Fixes for httpd
- Fix for local timezone in http header
    See: http://www.yacy-forum.de/viewtopic.php?t=836
  - Allow static content to be cached by browser
    See: http://www.yacy-forum.de/viewtopic.php?t=1311


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1184 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 13:26:27 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
orbiter
76618442e0 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 21:21:14 +00:00
theli
1c3750de57 *) Bugfix for code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1161 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 09:15:05 +00:00
orbiter
1d6a6d1f85 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 00:17:12 +00:00
orbiter
a04930f025 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-04 23:51:28 +00:00
theli
7e670894d9 *) Suppressing stackTraces in proxyError message for "connect timed out" errors
See: http://www.yacy-forum.de/viewtopic.php?t=1504
*) Increasing default http client timeout

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-25 00:40:35 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
hydrox
cb69047b91 *)cleanup access static methods and fields
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 17:56:26 +00:00
hydrox
56b9f34411 *)removed unused imports
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 16:30:45 +00:00
theli
40777556c5 *) Connection Tracking
- adding automatic refresh
   - accepts new parameter nameLookup which can be used to deactivate 
     yacy-peer name lookup (because we have problems with this on large seed-dbs)

*) ViewFile
   New page that can be used to view 
   - original content 
   - plain text content 
   - parsed content
   - parsed sentences 
   of a webpage specified by there url hash
   Mainly for debugging purpose at the moment

*) Robots.txt 
   Bugfix for if-modified-since usage
   TODO: synchronization of downloads to avoid loading the same robots-file 
   multiple times in parallel by different threads

*) Shutdown
   Better abortion of transferRWI and transferURL sessions on server shutdown

*) Status Page
   Adding icon to start/stop crawling via status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-18 07:45:27 +00:00
allo
43a127ff3a allow httpsTunnels to other Ports than 443. (if secureHttps=false)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@940 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-14 12:51:56 +00:00
allo
4320425a17 ipAuth (this does not work yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-14 10:53:50 +00:00
allo
b88a9584f8 New Errorpage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-13 07:47:57 +00:00
theli
b177a80bb7 *) Bugfix for sendRespondError StackOverFlowException problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@927 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-13 07:29:14 +00:00
theli
c8a35a0130 *) Adding new connection tracking page (currently only for incoming connections)
*) Displaying statistic for incoming connections on status page
*) Bugfix for Loop-Access Bug when trying to access the yacy page while yacy is configured as proxy
   See: http://www.yacy-forum.de/viewtopic.php?p=6826
*) Bugfix for Referer Bug
   See: http://www.yacy-forum.de/viewtopic.php?p=11098#11098
*) Adding reverse Name lookup for yacy-domain names (used by the connection tracking page)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 08:17:43 +00:00
allo
f1ff33177d reset Timelimits on Daychange
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@904 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 13:06:03 +00:00
theli
a9e25c26e1 *) adding new sendRespondError method to httpd which accepts a template include file
for individual error messages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 10:33:09 +00:00
allo
5605cc8018 TimeLimits
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@901 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 10:21:25 +00:00
allo
f65c939a60 userDB Auth
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@874 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-07 13:49:07 +00:00
theli
a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
various checks like the blacklist check or the robots.txt disallow check are now
   done by a separate thread to unburden the indexer thread(s)
   TODO: maybe we have to introduce a threadpool here if it turn out that this single
         thread is a bottleneck because of the time consuming robots.txt downloads

*) improved index transfer
   The index selection and transmission is done in parallel now to improve index 
   transfer performance.
   TODO: maybe we could speed up performance by unsing multiple transmission threads in 
         parallel instead of only a single one.

*) gzip encoded post requests
   it is now configureable if a gzip encoded post request should be send on
   intex transfer/distribution

*) storage Peer (very experimentell and not optimized yet)
   Now it's possible to send the result of the yacy indexer thread to a remote peer 
   istead of storing the indexed words locally. 
   This could be done by setting the property "storagePeerHash" in the yacy config file
   - Please note that if the index transfer fails, the index ist stored locally.
   - TODO: currently this index transfer is done by the indexer thread. 
     To seedup the indexer
     a) this transmission should be done in parallel and
     b) multiple chunks should be bundled and transfered together


*) general performance improvements  
   - better memory cleanup after http request processing has finished
   - replacing some string concatenations with stringBuffers
   - replacing BufferedInputStreams with serverByteBuffer
   - replacing vectors with arraylists wherever possible
   - replacing hashtables with hashmaps wherever possible
   This was done because function calls to verctor or hashtable functions
   take 3 time longer than calls to functions of arraylists or hashmaps.
   TODO: we should take a look on the class serverObject which is inherited from hashmap
         Do we realy need a synchronization for this class?
   TODO: replace arraylists with linkedLists if random access to the list elements is not needed

*) Robots Parser supports if-modified-since downloads now
   If the downloaded robots.txt file is older than 7 days the robots parser tries to
   download the robots.txt with the if-modified-since header to avoid unnecessary downloads
   if the file was not changed. Additionally the ETag header is used to detect changes.

*) Crawler: better handling of unsupported mimeTypes + FileExtension

*) Bugfix: plasmaWordIndexEntity was not closed correctly in 
   - query.java
   - plasmaswitchboard.java

*) function minimizeUrlDB added to yacy.java 
   this function tests the current urlHashDB for unused urls
   ATTENTION: please don't use this function at the moment because
              it causes the wordIndexDB to flush all words into the
              word directory!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 10:45:33 +00:00
allo
cd77078aa0 old Version restored before Release
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@842 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-03 18:10:05 +00:00
allo
a4b747fe97 ProxyAccounts based on userDB
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-03 14:26:08 +00:00
theli
d388292f24 *) adding function for user accounting which is called after each http request
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@827 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-30 16:02:58 +00:00
theli
595e0c7e56 *) Bugfix for ProxyErrormsg: Wrong base URL
See: http://www.yacy-forum.de/viewtopic.php?p=9905#9905

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-30 06:15:22 +00:00
theli
5f95a1cf62 *) Bugfix for ProxyErrormsg: Wrong http host header
See: http://www.yacy-forum.de/viewtopic.php?p=9905#9905

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@795 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-26 08:10:40 +00:00