Commit Graph

81 Commits

Author SHA1 Message Date
orbiter
34341a868e code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-19 00:39:16 +00:00
hermens
e974d0cb99 Improve compliance to rfc
*) There is no status line in HTTP/0.9
*) Answers to HEAD requests should return the same headers as a GET request



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 10:27:21 +00:00
theli
556d242be0 *) Limited support of content-range requests
- a simple continue download request should work now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1663 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 09:23:27 +00:00
theli
040624e361 *) better support for http head requests of servlets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1648 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-15 12:51:24 +00:00
theli
62ffb5ece0 *) httpdFileHandler.java: adding real streaming support for lage files
- avoid to read the whole file into memory
   - support of chunked transfer-encoding for http/1.1 clients
   - support of gzip content-encoding suitable clients
   See: http://www.yacy-forum.de/viewtopic.php?p=17058#17058
*) MessageSend_p.html: better highlighting of peer response/status messages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1646 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-15 12:31:52 +00:00
theli
2a88232cee *) Bugfix for httpd security but
- authentication was only required for html files.
   See: http://www.yacy-forum.de/viewtopic.php?p=16510

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1563 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-07 06:54:23 +00:00
theli
ebc5b1eafb *) adding a servlet that can be used to generate a Firefox search-plugin for yacy.logging
- You can access this servlet via YaCySearchPluginFF.html
   - The generated search plugin has the name YaCySearchPluginFF.src

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-06 14:41:59 +00:00
allo
7bd61ab0e5 Locales will now be in DATA/HTDOCS. So it works with readonly htroot.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1527 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-04 10:50:22 +00:00
(no author)
001513cc1f Now custom httpHeader can be created
and filled with cookies and so on.

This header one can set into serverObjects

Check CookieTest.html and CookieTest.java for details.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1334 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-13 22:50:04 +00:00
(no author)
55f3232219 Patch for the Coockie management.
Version 0.1

Start Yacy, go to localhost:8080/CookieTest.html
Play around with cookies
Look into CookieTest.java to See, how it works

This behavior will be changed 
such that httpHeader will be responsible for the cookies in the future



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-13 21:29:04 +00:00
allo
0f1212feb9 userDB.hasAdminrights to check adminRights.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1245 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-22 14:05:05 +00:00
hermens
35cf6712b2 *) fixes for httpd
- don't send Body on HEAD requests
  - don't send a Last-modified: date, that is later then Date:
  - Use Cache-control instead of Pragma with HTTP/1.1
  - don't send header with HTTP/0.9



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1198 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-09 17:35:45 +00:00
hermens
ec1202edbe *) Fixes for httpd
- Fix for local timezone in http header
    See: http://www.yacy-forum.de/viewtopic.php?t=836
  - Allow static content to be cached by browser
    See: http://www.yacy-forum.de/viewtopic.php?t=1311


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1184 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-07 13:26:27 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
orbiter
76618442e0 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 21:21:14 +00:00
theli
bb1f73ec15 *) Bugfix for code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1164 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 09:48:11 +00:00
orbiter
1d6a6d1f85 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 00:17:12 +00:00
theli
bdf30117c1 *) Redesign of parser configuration
- restructuring of mimeTypes based on the parsers
   - displaying parser usage count
   - displaying human readably parser names
   - displaying parser version information

*) httpdFileHandler.java
   - adding possibility to support "streaming" servlets
     which are special servlets that can communicate with
     the client via the connection streams autonomous
   - the name of these new servlet types must end with the 
     file extension .stream
   - this feature will be needed by the yacy ScreenSaver
     class to fetch statistic data from the peer without the
     need to reconnect to the server all the time

*) Adding human readable names and version information for
   all supported parsers

*) plasmaParser.java
   - adding new structure to store parser statistic data

*) Adding openDocument parser
   - can be used to parse odt files

*) jmimemagic
   - adding rules to detect openDocument formats properly

*) serverLog.java
   - adding functions that can be used to query if a given
     logging level is enabled or not.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-29 07:27:58 +00:00
allo
52a0237bf2 using Filetemplates for #[metas]# and other static includes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1116 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-21 08:33:54 +00:00
orbiter
79818a320f introduced citation-rank transmission protocol and activate transport for anonymisation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-10 23:48:20 +00:00
theli
b8ceb1ffde *) Adding better https support for crawler
- solving problems with unkown certificates by implementing a dummy trust Manager
   - adding https support to robots-parser 
   - Seed File can now be downloaded from https resources
   - adapting plasmaHTCache.java to support https URLs properly

*) URL Normalization
   - sub URLs are now normalized properly during indexing
   - pointing urlNormalForm function of plasmaParser to htmlFilterContentScraper function
   - normalizing URLs which were received by a crawlOrder request

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-03 15:28:37 +00:00
hydrox
cb69047b91 *)cleanup access static methods and fields
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 17:56:26 +00:00
hydrox
56b9f34411 *)removed unused imports
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-02 16:30:45 +00:00
orbiter
b058ecf0bc refactoring of image-generation; added experimental PNG encoder (not active now)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1008 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-31 02:43:55 +00:00
orbiter
097009d910 experimental visualization of DHT access during global search (temporary)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-24 00:34:15 +00:00
allo
117a424d00 bugfix for sharing png/gif files in WWW/SHARE
http://www.yacy-forum.de/viewtopic.php?p=11565


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-20 14:58:15 +00:00
theli
6e3201b74d *) Bugfix in httpc.java
- Requestheader was not passed to the underlying post function properly
   - Bug seems not to have caused any side-effect until yet

*) Bugfix for manual peer ping functionality

*) Bugfix for UnresolvedPattern Problem if an Exception occurred in a servlet.
   See: http://www.yacy-forum.de/viewtopic.php?t=1353

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@963 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-20 09:55:12 +00:00
allo
f97c303ebd rights for Admin and Proxy.
Adminrights are OR(old auth or new).
Proxyrights are AND(you need Proxyrights and a not reached Timelimit)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@960 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 12:20:08 +00:00
allo
97de600a68 another bugfix for share/www.
Now you can Use share/ and not only share/dir.html


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@958 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 11:42:15 +00:00
allo
2dfd6bf36a fix for networkimage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@956 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 10:25:56 +00:00
allo
ec10220d57 Fix for last Commit: .class Files in htroot, not in the dir of the localized HTML-Files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 07:17:49 +00:00
allo
4db2080188 Bugfix for www and share.
http://www.yacy-forum.de/viewtopic.php?p=11486


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-19 06:52:43 +00:00
theli
40777556c5 *) Connection Tracking
- adding automatic refresh
   - accepts new parameter nameLookup which can be used to deactivate 
     yacy-peer name lookup (because we have problems with this on large seed-dbs)

*) ViewFile
   New page that can be used to view 
   - original content 
   - plain text content 
   - parsed content
   - parsed sentences 
   of a webpage specified by there url hash
   Mainly for debugging purpose at the moment

*) Robots.txt 
   Bugfix for if-modified-since usage
   TODO: synchronization of downloads to avoid loading the same robots-file 
   multiple times in parallel by different threads

*) Shutdown
   Better abortion of transferRWI and transferURL sessions on server shutdown

*) Status Page
   Adding icon to start/stop crawling via status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-18 07:45:27 +00:00
allo
6430fa520e bugfix for broken HTDOCS
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@938 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-14 11:50:11 +00:00
theli
219acc1e8f *) Bugfix for wrong http version in response to http/1.0 requests
See: http://www.yacy-forum.de/viewtopic.php?t=1312

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-13 06:30:13 +00:00
allo
0f2f783e46 no no-cache for mediaExts
see http://www.yacy-forum.de/viewtopic.php?p=11210#11210


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@924 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 20:45:14 +00:00
allo
7ca60f97bf localization Support for Includes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-12 12:44:05 +00:00
orbiter
b45ffecd39 log to fix http://www.yacy-forum.de/viewtopic.php?p=11111#11111
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@911 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-11 07:46:14 +00:00
theli
1688be8590 *) plasmaSwitchboard.java
adding more verbose logging output for db initialization
*) httpdFileHandler.java
   adding cache for servlet response methods


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@897 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 09:13:17 +00:00
theli
e3a586d7bd *) Using serverByteBuffer instead of ByteArrayOutputStream
to speedup httpdFileHandler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-10 07:15:57 +00:00
orbiter
16a49c1c9d fix for graphics generation bug, see http://www.yacy-forum.de/viewtopic.php?p=10987#10987
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@886 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-09 14:46:33 +00:00
orbiter
5153ec0f3e update to image painter
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@873 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-07 01:25:39 +00:00
orbiter
1b2db0b52a fix for file-share access; damaged some commits before by me :-(
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@870 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-06 22:30:13 +00:00
theli
a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
various checks like the blacklist check or the robots.txt disallow check are now
   done by a separate thread to unburden the indexer thread(s)
   TODO: maybe we have to introduce a threadpool here if it turn out that this single
         thread is a bottleneck because of the time consuming robots.txt downloads

*) improved index transfer
   The index selection and transmission is done in parallel now to improve index 
   transfer performance.
   TODO: maybe we could speed up performance by unsing multiple transmission threads in 
         parallel instead of only a single one.

*) gzip encoded post requests
   it is now configureable if a gzip encoded post request should be send on
   intex transfer/distribution

*) storage Peer (very experimentell and not optimized yet)
   Now it's possible to send the result of the yacy indexer thread to a remote peer 
   istead of storing the indexed words locally. 
   This could be done by setting the property "storagePeerHash" in the yacy config file
   - Please note that if the index transfer fails, the index ist stored locally.
   - TODO: currently this index transfer is done by the indexer thread. 
     To seedup the indexer
     a) this transmission should be done in parallel and
     b) multiple chunks should be bundled and transfered together


*) general performance improvements  
   - better memory cleanup after http request processing has finished
   - replacing some string concatenations with stringBuffers
   - replacing BufferedInputStreams with serverByteBuffer
   - replacing vectors with arraylists wherever possible
   - replacing hashtables with hashmaps wherever possible
   This was done because function calls to verctor or hashtable functions
   take 3 time longer than calls to functions of arraylists or hashmaps.
   TODO: we should take a look on the class serverObject which is inherited from hashmap
         Do we realy need a synchronization for this class?
   TODO: replace arraylists with linkedLists if random access to the list elements is not needed

*) Robots Parser supports if-modified-since downloads now
   If the downloaded robots.txt file is older than 7 days the robots parser tries to
   download the robots.txt with the if-modified-since header to avoid unnecessary downloads
   if the file was not changed. Additionally the ETag header is used to detect changes.

*) Crawler: better handling of unsupported mimeTypes + FileExtension

*) Bugfix: plasmaWordIndexEntity was not closed correctly in 
   - query.java
   - plasmaswitchboard.java

*) function minimizeUrlDB added to yacy.java 
   this function tests the current urlHashDB for unused urls
   ATTENTION: please don't use this function at the moment because
              it causes the wordIndexDB to flush all words into the
              word directory!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 10:45:33 +00:00
orbiter
01db66dc69 implemented image-servlets. the imagetest will stay there only for a limited time. Now images can be generated on-the-fly from servlets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@852 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 08:40:20 +00:00
theli
1dc94e7753 *) Adding support for gzip content-encoding of http post requests
used to transferRWIs and transferURLs.
   See: http://www.yacy-forum.de/viewtopic.php?t=1167#10020

*) adding yacyVersion.java containing constants defining yacy versions
   that support a given feature.
   Needed to determine if a remote peer is able to decode gzip 
   content-encoded http post bodies properly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 10:30:55 +00:00
orbiter
e17df64b54 removed IS_ADMIN - feature. This was covered by plasmaSwitchborad.adminAuthenticated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-21 09:22:01 +00:00
theli
b990dc1ad1 *) Replacing jsch 0.1.19 lib with newer version 0.1.21
*) Replacing PDFBox 0.7.1 lib with newer version 0.7.2
*) Refactoring of classes httpd/httpc/httpHeaders to
   make many methods for httpHeader/Requestline parsing
   reusable for new icap implementation
*) adding chunked input stream support
   - needed by new icap implementation
   - needed by future httpc HTTP/1.1 support 
*) httpd.java
   - moving all connection property contants to class httpHeader
   - moving readHeader function to class httpHeader
   - moving parseQuery function to class httpHeader
   - moving handleTransparentProxy function to class httpHeader
*) httpHeader.java
   - adding new fuction to parse the http response line
   - adding new function to converte http headers to a string that
     can be send to the client
   - adding a function that generates a proper url using all parsed
     connection properties
*) ICAP Support
   - yacy now supports handling of icap response modification requests
   - this feature can be used by other icap enabled proxies to contact 
     yacy as icap server, and to handover the downloaded content to yacy.logging
     for indexing
   - functionality was successfully tested with squid 2.5Stable 10 + icap patch
   - further icap services e.g. URL filtering based on yacy's blacklists are possible
*) plasmaSwitchboard.java
   - htcache entries that are still needed for indexing are now properly registered 
     as in use after system restart
   - extended logging: log message now shows parsing and indexing time for each sb. entry
    

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 21:49:47 +00:00
theli
f783061414 *) Changing redirection code from 307 to 302
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@710 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-12 11:38:46 +00:00
theli
a6a8af0f04 *) httpdFileHandler templateCache can now be disabled
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@708 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-12 10:47:27 +00:00