Commit Graph

174 Commits

Author SHA1 Message Date
orbiter
47292e696a more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 12:54:16 +00:00
orbiter
d39d420b39 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 15:38:29 +00:00
f1ori
d0543a7c39 * fix the debug ant-target
* fix yacy-subdomain handling (http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1556)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5307 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-27 22:16:56 +00:00
orbiter
6941bf42b1 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-20 14:07:09 +00:00
orbiter
1778fb420d - added some performance tweaks to the new BLOB buffer
- removed the now superfluous HT storage thread
- reduced number of file decompression by shifting the compression moment to the future


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5286 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-19 18:10:42 +00:00
orbiter
826ca79735 refactoring and new architecture to store the files of the web cache:
- files are not stored any more as individual files
- a new database structure using BLOBHeap files stores many cache entries in common files
- all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods

this is only an intermediate step to the final architecture, where cached files are written together with their metadata in one single database structure.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 21:24:09 +00:00
orbiter
0cd0fee546 fixed bug with wrong proxy result enqueueing. See:
http://forum.yacy-websuche.de/viewtopic.php?p=8130#p8130
- removed the online status property. This influenced the proxy behavior and created some complexity that was not needed because the online status was never used as it was ceated for (offline browsing)
- checked all proxy identification procedures during crawling and enhanced transparency and error checking
- fixed a proxy identification routine that caused the wrong selection of the proxy result queue

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-16 21:56:23 +00:00
danielr
d60b2b198d proxy fixed 'not modified' http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1419
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-10 11:06:22 +00:00
f1ori
bd0318ba81 * YaCy only supports gzip-encoding, so remove any other encoding from request
* fixes http://www.yacy-forum.org/viewtopic.php?f=2&t=163


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-09 14:04:52 +00:00
danielr
cf29ca19d4 possible fix for POST character encoding http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5121 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-07 13:10:46 +00:00
orbiter
05dbba4bab added logging conditions to all fine and finest log line calls
this will prevent an overhead for the generation of the log lines in case that they then are not printed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 00:30:21 +00:00
danielr
e503158527 Proxy: fix for never ending loading after POST
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5091 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-27 20:46:34 +00:00
danielr
1a1d57e449 Proxy: added binary passthrough for POST
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5089 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-27 08:07:18 +00:00
danielr
9ff4fc11da partial fix (images,audio,video) for proxy and content-type problem http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-26 16:34:24 +00:00
orbiter
536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
- removed distinction between header file types for http and ftp; ftp is simulated by using http properties
- removed all old resourceInfo classes that handled this distinction
- introduced a new distinction between http request and http response objects
- unified new response objects with two other object types that had been introduced elsewhere
- changed all servlet call methods to use the new http request header object type
- divided static object keys for http header properties into request and response types
- refactoring here and there (a large number of type changes and many methods merged/moved)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:11:47 +00:00
danielr
4d937f6b21 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1396
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-22 23:46:32 +00:00
orbiter
7989335ed6 Preparations to replace the HTCache with a new storage data structure:
- refactoring of the HTCache (separation of cache entry)
- added new storage class for BLOBs. (not used yet, this is half-way to a new structure)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-19 14:10:40 +00:00
danielr
be28af50f5 - fixed "yacy2yacy no proxy"-problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-17 10:16:32 +00:00
danielr
8422ee5ec4 - fixed UnsupportedEncoding (in proxy) using defaultCharset if no characterEncoding can be determined
- serverFileUtils.copy* use now Charset instead of String
- added some warnings for ignored exceptions


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5043 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-09 12:00:31 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
f1ori
b0724e5ec0 * add config option to disable cookie monitoring (disabled by default)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5028 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-30 21:19:06 +00:00
danielr
c049d80fbd fixed login problem with yacy as proxy (POST and Cookies)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5009 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-19 15:10:00 +00:00
orbiter
e81be7d4f2 added many missing user-agent declarations for yacy http client connections.
the most important fix was the addition of the yacybot user-agent for robots.txt loading,
because web masters look for that access to see if the crawler behaves correctly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-04 11:03:03 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
orbiter
d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-05 23:13:47 +00:00
danielr
d4bce6affd refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-03 09:06:00 +00:00
orbiter
82bf9ac1c8 - added Collage servlet from datengrab and modified it:
* all images are queued
* private/public is respected
* inserted into switchboard
* added collageQueue class that stores all the queued images

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4683 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 13:24:21 +00:00
danielr
959f448e5f - disabled redirects in proxy (so client sees real path)
- added connection stats (only connections currently in use)
- remove "old" connections (closed or idle for some time)
- synchronized shared parts of proxyHandler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4682 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 11:39:48 +00:00
orbiter
8fe39ebd74 -fixed file transmission with POST. The only usage was in ranking transmission, therefore:
-fixed ranking transmission

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4681 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 08:12:51 +00:00
orbiter
202a3adb3e refactoring of HttpClient Writer processes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4678 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 22:47:05 +00:00
orbiter
444dce7e81 more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 15:28:58 +00:00
orbiter
e356625b22 - refacotring of stream copy handling to support time-consuming operations
- made usage of BufferedStreams explizit to distinct different copy method in serverFileUtils (byte-by-byte and using an own buffer)
- introduced another timeout setting (java internal property)
- more restrictions to clients accessing a single host (a security setting to prevent DoS by mistake)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 09:53:07 +00:00
danielr
f01c50cf8d Proxy logging error (first step to resolution!?)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4673 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 06:56:06 +00:00
orbiter
c3342e1178 - removed class with only one static method
- removed connection method with too long time-out

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4672 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 23:35:20 +00:00
orbiter
f97971b63b fixed NPE problems doing a shutdown from command-line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 22:59:17 +00:00
orbiter
2c1c3bb6eb - some refactoring (sorry Daniel, hab in deinem Code rumgewütet)
- fixed broken downloads (flush was missing)
- different problem handling when download is corrupted
- different default values in yacy.init

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4669 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 21:36:33 +00:00
danielr
d96e2badc7 - fixed POST in proxy
- prepared http connection tracking
- refactoring (mainly moving StreamTools to serverFileUtils)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 21:17:40 +00:00
danielr
94d3d3a86f fixed Proxy (for GET, POST still does not work!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 09:34:20 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
4a80902081 - added ViewProfile as rdf in foaf syntax
- added link to rdf and vCard version on html page
- can be seen on http://localhost:8080/ViewProfile.html?hash=localhash
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 18:21:08 +00:00
low012
ae6d07bdb8 *) "Did you mean:" will only be displayed if the list of suggested URLs is not empty.
*) Removed <hr /> to make the "404 Unknown Host" error pag look like the other 404 error pages.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4298 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-01 23:03:02 +00:00
fuchsi
21b8d1b918 small cosmetic change for static fields in serverCore (special protocol ASCII entities) to improve readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4275 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-14 19:17:54 +00:00
orbiter
af10f729df fixed image search and favicon loading
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-22 01:34:29 +00:00
orbiter
a31b9097a4 preparations for mass remote crawls:
two main changes must be implemented to enable mass remote crawls:
- shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote
  crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused
  as crawl agent for unwanted file retrieval
- implement new index files that control double-check of remotely crawled urls

After removal of robots.txt checking from stacker threads, the multi-threading of this process is void.
Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since
creation of these threads is not resource-consuming, for a detailed explanation see svn 4106

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-29 01:43:20 +00:00
fuchsi
f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers.
- some minor code cleanups (mostly unnecessary casts, null checks)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 04:13:46 +00:00
orbiter
b19bb6e5b1 - reverted svn 4132; this did not solve the problem and removed the emergency mehtod which caused production failure for shure within some hours
- removed and added some debugging lines

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 14:34:05 +00:00