Commit Graph

333 Commits

Author SHA1 Message Date
theli
a7e11ada50 *) suppressing stacktrace for "server has closed connection"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2779 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-15 09:18:51 +00:00
orbiter
c8f3a7d363 added snippet-url re-indexing
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 23:07:10 +00:00
allo
226f2c5b2c first version, of the Serverlet Debugger
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2717 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-08 14:25:54 +00:00
theli
ce7ee74316 *) better errorhandling in filehandler (try catch block now starts before argument parsing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2704 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 14:21:46 +00:00
theli
f17ce28b6d *) plasmaHTCache:
- method loadResourceContent defined as deprecated. 
     Please do not use this function to avoid OutOfMemory Exceptions 
     when loading large files
   - new function getResourceContentStream to get an inputstream of a cache file
   - new function getResourceContentLength to get the size of a cached file
*) httpc.java:
   - Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
   - new option to hold loaded resource content in memory
   - adding option to use the worker class without the worker pool 
     (needed by the snippet fetcher)
*) plasmaSnippetCache
   - snippet loader does not use a crawl-worker from pool but uses
     a newly created instance to avoid blocking by normal crawling
     activity.
   - now operates on streams instead of byte arrays to avoid OutOfMemory 
     Exceptions when operating on large files 
   - snippet loader now forces the crawl-worker to keep the loaded
     resource in memory to avoid IO 
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
   - keep resource in memory whenever possible (to avoid IO)
   - when parsing from stream the content length must be passed to the parser function now.
     this length value is needed by the parsers to decide if the parsed resource content is to large
     to hold it in memory and must be stored to file 
   - AbstractParser.java: new function to pass the contentLength of a resource to the parsers
   


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 11:05:48 +00:00
orbiter
5a40ea7866 refactoring of wget string list generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 09:59:20 +00:00
orbiter
310f1c41cd added option to see ranking scores in surftipps
and some cleanups

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2684 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 23:28:03 +00:00
theli
cd5f349666 *) Better handling of large files during parsing
Extracted text of files that are larger than 5MB is stored in a temp file instead of keeping it in memory
*) plasmaParserDocument.java; getText now returnes an inputStream instead of a byte array
*) plasmaParserDocument.java: new function getTextBytes returns the parsed content as byte array
   Attention: the caller of this function has to ensure that enough memory is available to do this 
   to avoid OutOfMemory Exceptions
*) httpd.java: better error handling if the soaphander is not installed
*) pdfParser.java: 
   - better handling of documents with exotic charsets
   - better handling of large documents
   - better error logging of encrypted documents
*) rtfParser.java: Bugfix for UTF-8 support
*) tarParser.java: better handling of large documents
*) zipParser.java: better handling of large documents
*) plasmaCrawlEURL.java: new errorcode for encrypted documents
*) plasmaParserDocument.java: the extracted text can now be passed
   to this object as byte array or temp file   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2679 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-30 09:31:53 +00:00
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
theli
c665f6cddb *) handling of quotes in charset string
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-28 06:29:15 +00:00
theli
009a33170b *) Content-Location header added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-26 04:32:01 +00:00
theli
1aa07a52cd *) Bugfix for UnsupportedEncodingException if the media type contains multiple parameters
See: http://www.yacy-forum.de/viewtopic.php?p=25832#25826

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-24 15:50:51 +00:00
orbiter
ec031eb993 first version of surftipps
see http://localhost:8080/index.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 20:14:21 +00:00
theli
5afb0cbce8 *) setting default charset (for unkown documents) to iso-8859-1
*)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 11:39:06 +00:00
theli
97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
- serverFileUtils.java: 
   -- adding methods to copy from stream to writer and readers to writers
   -- moving httpc writeX methods into serverFileUtils class
   - serverCharBuffer.java: removing inheritance from Writer class
   - replacing htmlFilterOutputStream by htmlFilterWriter class which handles
     content as char stream
   - htmlFilterContentTransformer.java: deactivating getText mode 
    (still needs to be migrated to use char streams instead of byte streams)
   - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
   - changes in Scraper and Transformer classes to operate on chars instead of bytes
   - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 10:12:11 +00:00
theli
fc594e8eda *) adding httpContentLengthInputStream.java class to allow reading of http response bodies
until EOF even if a persistent connection is used
*) httpdByteCountInputStream.java: adding skip method
*) httpHeader.java: adding getCharacterEncoding function

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2616 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 10:00:28 +00:00
theli
2a06ce5538 *) next bugfix for UTF-8
- Sending UFT-8 messages to other peers did not work
   - httpd.java: minor corrections for UTF-8

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2570 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-13 15:47:56 +00:00
theli
bdc51591ae *) UTF-8 Bug solved (hopefully)
See: http://www.yacy-forum.de/viewtopic.php?p=25522

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2569 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-13 14:48:58 +00:00
theli
ef751b9d33 *) removing all string operations from the template engine
- engine should fully operate on bytes now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2567 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-13 13:56:10 +00:00
theli
fded1f4a5d *) better handling of maximum file size limit in crawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 08:26:39 +00:00
theli
63893003be *) Adding settings page for the crawler which allows to specify a file size limit and the timeout to use.
*) adding first version of maximum filesize check for the crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2534 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-09 15:06:49 +00:00
orbiter
9340dbb501 fixed all possible problems with nullpointer exception for LURLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 18:24:39 +00:00
theli
a5ed86105b *) bugfix for handling of ResourceInfo object in proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 15:50:45 +00:00
hydrox
59a5511dbb *) added missing static Strings as requested by theli
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2505 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 07:20:28 +00:00
theli
6578564c9a *) Ignore more hop by hop http headers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2504 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:38:35 +00:00
theli
dae763d8e3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542 2006-09-06 14:31:17 +00:00
theli
ffbf416e76 *) direct access to requestheader of htCache.Entry removed to make it more http independent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:29:45 +00:00
theli
3870d615e3 *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:06:58 +00:00
theli
393a7d10be *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:03:54 +00:00
theli
1c8300fcec *) Bugfix for name resolution in proxy mode
See: http://www.yacy-forum.de/viewtopic.php?p=25241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:23:57 +00:00
orbiter
d78b824e85 fixed problem with default path after first start-up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2440 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-22 13:35:51 +00:00
orbiter
6ad471ef96 * applied many compiler warning recommendations
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-16 19:49:31 +00:00
allo
cf1186597b utf fix from theli
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-16 15:26:04 +00:00
theli
eee44be602 *) adding an interface for customized blacklist classes
- now it's possible to use a customized blacklist engine
     instead of the default one
   - this can be done by configuring the property BlackLists.class
   See: http://www.yacy-forum.de/viewtopic.php?t=2108

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:28:14 +00:00
theli
d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
See: http://www.yacy-forum.de/viewtopic.php?t=2541
        http://www.yacy-forum.de/viewtopic.php?p=24516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 02:42:10 +00:00
allo
a52f36787f better templatedebugging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2371 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-10 14:02:03 +00:00
allo
3480d36417 added some debug code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-09 16:57:36 +00:00
orbiter
d468d665c9 some changes that may help to prevent deadlocks that cause an OutOfMemoryError
as described in
http://www.yacy-forum.de/viewtopic.php?p=24359

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 00:19:01 +00:00
theli
6e676224d0 *) adding support for upnp
A new port forwarding method for upnp was added.
   If this method is enabled, yacy automatically determines an UPnP 
   capable internet gateway and configures the gateway port forwarding
   settings properly. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2328 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 14:26:45 +00:00
orbiter
97fa6788a1 added gettext support:
automatic replacement of string appearances in html files by
gettext quotes.
see also: http://www.yacy-forum.de/viewtopic.php?p=23901#23901

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2309 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 22:35:36 +00:00
allo
67c486a023 some example Code, how supertemplates can be used.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2304 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 07:08:15 +00:00
allo
7b0e2521bb Support for a supertemplate, which can do all thing, a normal template can do.
Its a layer under the servlets, this means, #[page]# will be replaced by serverletcode, the rest can be set by you.
(TODO: if we use this for layout, we need to read "TITLE" from the servlet's tp, to set it outside of the servlet.)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2302 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 15:51:19 +00:00
allo
8795875800 dirlisting for all empty directories.
no problem to update dir.java anymore, because its only in htroot/htdocsdefault needed.
migration to delete old dir.* files in the fileshare

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2294 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-17 15:49:42 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
theli
b594ee9a5a *) Adding possibility to configure if the http proxy should send the
X-forwarded-for header (requested by TeeSee)
   See: http://www.yacy-forum.de/viewtopic.php?t=2577

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2257 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-29 16:01:03 +00:00
allo
6866bc2758 be quiet!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2243 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-24 17:40:55 +00:00
theli
ed2cb040d1 *) Bugfix for http connection header validation
- Connection header was not handled correctly if it contains
     multiple values, e.g. Connection: TE, close 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2219 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-20 05:22:55 +00:00
allo
0621106ef3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2214 6c8d7289-2bf4-0310-a012-ef5d649a1542 2006-06-18 12:15:26 +00:00
orbiter
12af69dd86 cosmetics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2212 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-18 11:49:31 +00:00
allo
67a8c74be3 Fix for dynamic login with static password.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2210 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-18 08:04:51 +00:00