Commit Graph

120 Commits

Author SHA1 Message Date
orbiter
6d759ad0a7 - new bot address
- removed unused skins

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-29 11:46:42 +00:00
orbiter
b5346141b3 made the plasmaHTCache static (there is only one internet, so we need only one cache)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4045 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-15 21:31:31 +00:00
orbiter
61f93cbf14 some code-cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4040 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-11 00:42:04 +00:00
orbiter
24e25e1141 enhanced SSI server-side support:
- SSIs may now refer to servlets, not only files
- calling a servlet, the servlet/SSI engine is called recursively
- SSIs now work also for non-chunked-encoding supporting clients
This will support the new search page functionality, to show search results
dynamically without using javascript. To test this method, a test page has been added
http://localhost:8080/ssitest.html
..calls dynamicalls 3 servlets, which produce some delays during their execution
please verify that you can see the result step-by-step on your browser
To implement this feature, some refactoring had been taken place, mostly code
had been made static and will execute faster.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4037 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-09 21:58:38 +00:00
orbiter
57a5b6fa71 some generalization of remote proxy configuration and setting handling in httpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4023 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-02 00:42:37 +00:00
orbiter
9ca46a8c69 indexing of local (intranet) urls enabled
To do this, one must create a separate YaCy network that has a local URL domain
A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-24 00:46:17 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
26f05d1fd0 avoid division by zero if search is done for no words
this case is relevant if the bluewords (yacy.blue) are used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3698 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 22:10:12 +00:00
theli
91c2a042a7 *) bugfix for wrong proxy traffic accounting
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3484 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 13:52:48 +00:00
orbiter
a1fb8358b2 lets make a well-formed http link so that other crawlers don't have a problem to follow this link :-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3463 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 12:35:54 +00:00
orbiter
4edb70f68b added yacybot info-page from Roland
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3462 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 12:26:31 +00:00
orbiter
c464157a6e replaced some toString()
see http://www.yacy-forum.de/viewtopic.php?p=31151#31151

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 16:26:56 +00:00
orbiter
47ab83a7c0 added flag for YaCyHop - proxy access for all paths that start with /yacy/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3304 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-31 00:09:51 +00:00
theli
a7e11ada50 *) suppressing stacktrace for "server has closed connection"
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2779 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-15 09:18:51 +00:00
orbiter
c8f3a7d363 added snippet-url re-indexing
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 23:07:10 +00:00
theli
f17ce28b6d *) plasmaHTCache:
- method loadResourceContent defined as deprecated. 
     Please do not use this function to avoid OutOfMemory Exceptions 
     when loading large files
   - new function getResourceContentStream to get an inputstream of a cache file
   - new function getResourceContentLength to get the size of a cached file
*) httpc.java:
   - Bugfix: resource content was loaded into memory even if this was not requested
*) Crawler:
   - new option to hold loaded resource content in memory
   - adding option to use the worker class without the worker pool 
     (needed by the snippet fetcher)
*) plasmaSnippetCache
   - snippet loader does not use a crawl-worker from pool but uses
     a newly created instance to avoid blocking by normal crawling
     activity.
   - now operates on streams instead of byte arrays to avoid OutOfMemory 
     Exceptions when operating on large files 
   - snippet loader now forces the crawl-worker to keep the loaded
     resource in memory to avoid IO 
*) plasmaCondenser: adding new function getWords that can directly operate on input streams
*) Parsers
   - keep resource in memory whenever possible (to avoid IO)
   - when parsing from stream the content length must be passed to the parser function now.
     this length value is needed by the parsers to decide if the parsed resource content is to large
     to hold it in memory and must be stored to file 
   - AbstractParser.java: new function to pass the contentLength of a resource to the parsers
   


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-03 11:05:48 +00:00
theli
5afb0cbce8 *) setting default charset (for unkown documents) to iso-8859-1
*)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 11:39:06 +00:00
theli
97d2a08ef1 *) restructuring needed to support parsing of documents using various charsets
- serverFileUtils.java: 
   -- adding methods to copy from stream to writer and readers to writers
   -- moving httpc writeX methods into serverFileUtils class
   - serverCharBuffer.java: removing inheritance from Writer class
   - replacing htmlFilterOutputStream by htmlFilterWriter class which handles
     content as char stream
   - htmlFilterContentTransformer.java: deactivating getText mode 
    (still needs to be migrated to use char streams instead of byte streams)
   - changes in several classes to use htmlFilterWriter instead of htmlFilterOutputStream
   - changes in Scraper and Transformer classes to operate on chars instead of bytes
   - httpdProxyHandler.java: bugfix. clientTimeout setting was missing in config file

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2617 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-18 10:12:11 +00:00
orbiter
9340dbb501 fixed all possible problems with nullpointer exception for LURLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 18:24:39 +00:00
theli
a5ed86105b *) bugfix for handling of ResourceInfo object in proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 15:50:45 +00:00
theli
6578564c9a *) Ignore more hop by hop http headers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2504 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:38:35 +00:00
theli
dae763d8e3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542 2006-09-06 14:31:17 +00:00
theli
ffbf416e76 *) direct access to requestheader of htCache.Entry removed to make it more http independent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:29:45 +00:00
theli
3870d615e3 *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:06:58 +00:00
theli
393a7d10be *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:03:54 +00:00
theli
eee44be602 *) adding an interface for customized blacklist classes
- now it's possible to use a customized blacklist engine
     instead of the default one
   - this can be done by configuring the property BlackLists.class
   See: http://www.yacy-forum.de/viewtopic.php?t=2108

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:28:14 +00:00
theli
d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
See: http://www.yacy-forum.de/viewtopic.php?t=2541
        http://www.yacy-forum.de/viewtopic.php?p=24516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 02:42:10 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
theli
b594ee9a5a *) Adding possibility to configure if the http proxy should send the
X-forwarded-for header (requested by TeeSee)
   See: http://www.yacy-forum.de/viewtopic.php?t=2577

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2257 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-29 16:01:03 +00:00
rramthun
5625937d1c Language improvements
One very minor  HTML fix

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-06 16:30:32 +00:00
orbiter
90d569d70f refactoring of index management:
url storage is part of index management; moved plasmaURL to indexURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-19 23:50:55 +00:00
orbiter
015d044c25 tried to fix some problems with latest changes to httpc
very experimental!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-10 16:01:14 +00:00
orbiter
55c5b41bd0 modified kelondroDyn to work better with new object caches
(removed own single object cache)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2077 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-05-10 13:57:31 +00:00
theli
76ea16a6cb *) Removing Keep-Alive header (is also a hopByHop header)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2034 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-22 15:00:35 +00:00
borg-0300
77f3237de3 adapted for isListed()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-21 20:55:59 +00:00
theli
759800f543 *) Bugfix for storeHTCache problem
- content was not indexed if storeHTCache was off
   See: http://www.yacy-forum.de/viewtopic.php?p=18269
   See: http://www.yacy-forum.de/viewtopic.php?t=1882
   See: http://www.yacy-forum.de/viewtopic.php?t=241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1800 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 08:30:08 +00:00
orbiter
ce5274c194 yacybot user agent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1786 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-28 19:08:58 +00:00
orbiter
34341a868e code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-19 00:39:16 +00:00
rramthun
15ed57f9b7 Updated German language, by VT100, NN, rramthun
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1690 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-17 21:23:45 +00:00
theli
8fcb25f9f9 *) Setting via header according to rfc
- can be disabled via settings dialog

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1662 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 09:20:57 +00:00
theli
eeba8b055e *) guessing, testing and suggesting alternative hostnames on "unknown host" error
See: http://www.yacy-forum.de/viewtopic.php?t=1879

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1636 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-14 09:55:09 +00:00
allo
4e4bd4662d redirectors fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-01-04 17:40:18 +00:00
orbiter
37f88b4017 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 23:51:29 +00:00
theli
44fa94ac52 *) Modifications for dbImport functionality
- dbImporter threads are now shutdown by the switchboard on server shutdown
   - adding possibility to pause a importer thread via GUI
   - Bugfix for abort function
     See: http://www.yacy-forum.de/viewtopic.php?p=13363#13363

*) Modification of content parser configuration
   - now it's possible to configure which parsers should be enabled for the proxy,
     crawler, icap, etc. separately
   - 

*) htmlFilterContentScraper.java
   - adding regular expression to normalize URLs containing /../ and /./ parts

*) httpc.java
   - adding functionality to unzip gzipped content
   - requested by roland: should be used later to allow gzipped seed lists

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1170 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-06 10:41:19 +00:00
orbiter
1d6a6d1f85 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-05 00:17:12 +00:00
theli
7e670894d9 *) Suppressing stackTraces in proxyError message for "connect timed out" errors
See: http://www.yacy-forum.de/viewtopic.php?t=1504
*) Increasing default http client timeout

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1129 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-25 00:40:35 +00:00
allo
d8afe60e07 Bugfix for last Bugfix ;-).
host/port were set to originaladdress instead of the correct values for the new Url.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 14:05:25 +00:00
orbiter
1b656f6b31 correction of bug from svn 1123
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1125 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 12:07:07 +00:00
allo
24d15eb0e8 moving the redirector code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1123 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 07:52:36 +00:00
allo
787c368696 synchronized redirectors and using the port.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-23 07:37:15 +00:00