Commit Graph

53 Commits

Author SHA1 Message Date
fuchsi
21b8d1b918 small cosmetic change for static fields in serverCore (special protocol ASCII entities) to improve readability
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4275 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-14 19:17:54 +00:00
fuchsi
0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.

* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).

TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 21:38:19 +00:00
orbiter
01e0669264 re-designed some parts of DHT position calculation (effect is the same as before)
and replaced old fist hash computation by new method that tries to find a gap in the current dht
to do this, it is necessary that the network bootstraping is done before the own hash is computed
this made further redesigns in peer initialization order necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-01 12:30:23 +00:00
orbiter
76e4c2d69e fix for peer-ping in case that remote peer does not respond with valid values
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4091 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-11 15:27:01 +00:00
orbiter
bb426565f0 added new yacy protocol for mass url-pull for better remote crawling distribution
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-22 00:59:05 +00:00
orbiter
9ca46a8c69 indexing of local (intranet) urls enabled
To do this, one must create a separate YaCy network that has a local URL domain
A description how to do this is here: http://www.yacy-websuche.de/wiki/index.php/De:Netzdefinition

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-24 00:46:17 +00:00
orbiter
f5a4efb76e fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=192&hilit=&p=1034#p1034
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-20 08:06:21 +00:00
orbiter
b6d9cca67e - fixed problem with yacyVersion and own version generation
- within this context: generalized date format handling
- extended Update interface:
 * a version lookup can be triggered manually
 * a complete lookup + download + re-boot process can be triggered with one click

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-16 23:47:21 +00:00
orbiter
f40566f9bb separate YaCy networks:
- added server-side network unit identification
- added server-side network access authorization
- enhanced client-side network authentification essentials generation
- implemented first peer-peer salted-magic authentification method

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3953 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-04 23:48:52 +00:00
orbiter
4f5496062c protection against too large seeds
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3877 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-12 22:08:33 +00:00
orbiter
06b6e35484 fix for a null pointer exception if clusters are not defined
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3632 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-02 12:26:29 +00:00
orbiter
81844e85b2 - fixed more cluster routing problems
- fixed a problem in remote search when balancer caused shift process to wait too long

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-30 00:39:53 +00:00
karlchenofhell
97d4ab2053 - handle null from iterator in IndexCreateWWWLocalQueue_p.java
- fixed ETA to reach next peer in Network.java
- added some <label>s and fxied minor XHTML errors in ConfigNetwork.html
- try to avoid returning null in servlets as it is unexpected and causes a NPE in the file handler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3623 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-29 21:45:01 +00:00
orbiter
b33cef421e better routing for public clusters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3620 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-29 00:08:38 +00:00
orbiter
657585fe0d network functions for robinson peers: server-side protection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-24 15:11:12 +00:00
orbiter
b2f4087400 redesign of last-seen fieln inside seed:
the field contains now a time in UDC-0 (instead relative to local UDC offset)
this fixes a bug in peer selection, where an iteration over all seeds
ordered by lastseen did not work correctly.
Problems may occur because the new meaning of this field may mix with
the different meaning of that field in older peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-02 23:54:27 +00:00
orbiter
e00e850a98 removed constants (no connection with yacySeed.dna identifier)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-02 14:52:54 +00:00
allo
0c81bd39d4 XSS-safe put as default.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-16 14:07:54 +00:00
auron_x
9699b094e8 *) fixed hello reporting yourip=UNRESOLVED_PATTERN
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-14 16:57:24 +00:00
orbiter
114a76a86e - added flag to urlhash that shows that domain is a local domain
- enhanced local domain detection
- bugfixing for memory assignment in kelondroFlexSplit
- automatic memory assignment to caches according to available RAM
- bugfixes for details during search process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2924 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-06 02:05:39 +00:00
theli
52466067d8 *) Bugfix for ArrayIndexOutOfBoundsExceptions which occure because SimpleDateFormat is not thread-safe
See: http://www.yacy-forum.de/viewtopic.php?t=2995

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 08:33:53 +00:00
hermens
38a1410361 Don't test a remote peer's seed during hello.respond as its IP might not be proper, especially while still virgin
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-07 23:59:45 +00:00
orbiter
bd283b8443 fixed bugs:
- null pointer exception during startup of a robinson-configured peer
- wrong time calculation of default value of re-crawl option

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2005 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-04-06 16:28:28 +00:00
allo
7afa5c1b8e staticIP fix
tried to solve http://www.yacy-forum.de/viewtopic.php?p=18663#18663
D 2006/03/08 07:08:20 YACY yacyClient.publishMySeed mySeed error - not proper: IP is not proper: -UNRESOLVED_PATTERN-


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1859 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 12:23:26 +00:00
theli
f108048a2c *) Bugfix for NullpointerException in hello.java
*) Correcting for loop in hello.java   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1854 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-08 06:40:38 +00:00
allo
f73d51f94b reverted last change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1810 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 19:20:35 +00:00
allo
8997b83806 store the staticIP(dyndns) in seed, not the real IP
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1809 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-03-03 17:33:05 +00:00
allo
7c5f8f997a some more staticIP fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1784 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-28 12:20:19 +00:00
orbiter
b3dca06bb1 added location column to network pages.
The location is computed from the userAgent string of connecting peers.
Therefore this information is not available right after start-up.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-12-22 01:01:46 +00:00
theli
fb766413d1 *) Changes on httpc dns caching
- Bugfix: old dns cache did not handle case insensitive hostnames correctly. 
   - adding a possibility to set domain name patterns defining hostnames that should not be cached by the httpc dns cache
     e.g. borg-300.dyndns.org
     This can be done by setting the new httpc.nameCacheNoCachingPatterns property
   - using httpc.dnsResolve wherever possible within the sourcecode
     [httpd.java,plasmaCrawlStacker.java]

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-11-07 10:57:54 +00:00
theli
40777556c5 *) Connection Tracking
- adding automatic refresh
   - accepts new parameter nameLookup which can be used to deactivate 
     yacy-peer name lookup (because we have problems with this on large seed-dbs)

*) ViewFile
   New page that can be used to view 
   - original content 
   - plain text content 
   - parsed content
   - parsed sentences 
   of a webpage specified by there url hash
   Mainly for debugging purpose at the moment

*) Robots.txt 
   Bugfix for if-modified-since usage
   TODO: synchronization of downloads to avoid loading the same robots-file 
   multiple times in parallel by different threads

*) Shutdown
   Better abortion of transferRWI and transferURL sessions on server shutdown

*) Status Page
   Adding icon to start/stop crawling via status page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-18 07:45:27 +00:00
borg-0300
e642a5d8b7 more constants
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@947 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-17 15:46:12 +00:00
borg-0300
a1777788a5 small change
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@879 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-07 15:04:03 +00:00
borg-0300
64acb46a91 cleaned, finals, Properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 13:16:53 +00:00
theli
a2fa75e688 *) Asynchronous queuing of crawl job URLs (stackCrawl)
various checks like the blacklist check or the robots.txt disallow check are now
   done by a separate thread to unburden the indexer thread(s)
   TODO: maybe we have to introduce a threadpool here if it turn out that this single
         thread is a bottleneck because of the time consuming robots.txt downloads

*) improved index transfer
   The index selection and transmission is done in parallel now to improve index 
   transfer performance.
   TODO: maybe we could speed up performance by unsing multiple transmission threads in 
         parallel instead of only a single one.

*) gzip encoded post requests
   it is now configureable if a gzip encoded post request should be send on
   intex transfer/distribution

*) storage Peer (very experimentell and not optimized yet)
   Now it's possible to send the result of the yacy indexer thread to a remote peer 
   istead of storing the indexed words locally. 
   This could be done by setting the property "storagePeerHash" in the yacy config file
   - Please note that if the index transfer fails, the index ist stored locally.
   - TODO: currently this index transfer is done by the indexer thread. 
     To seedup the indexer
     a) this transmission should be done in parallel and
     b) multiple chunks should be bundled and transfered together


*) general performance improvements  
   - better memory cleanup after http request processing has finished
   - replacing some string concatenations with stringBuffers
   - replacing BufferedInputStreams with serverByteBuffer
   - replacing vectors with arraylists wherever possible
   - replacing hashtables with hashmaps wherever possible
   This was done because function calls to verctor or hashtable functions
   take 3 time longer than calls to functions of arraylists or hashmaps.
   TODO: we should take a look on the class serverObject which is inherited from hashmap
         Do we realy need a synchronization for this class?
   TODO: replace arraylists with linkedLists if random access to the list elements is not needed

*) Robots Parser supports if-modified-since downloads now
   If the downloaded robots.txt file is older than 7 days the robots parser tries to
   download the robots.txt with the if-modified-since header to avoid unnecessary downloads
   if the file was not changed. Additionally the ETag header is used to detect changes.

*) Crawler: better handling of unsupported mimeTypes + FileExtension

*) Bugfix: plasmaWordIndexEntity was not closed correctly in 
   - query.java
   - plasmaswitchboard.java

*) function minimizeUrlDB added to yacy.java 
   this function tests the current urlHashDB for unused urls
   ATTENTION: please don't use this function at the moment because
              it causes the wordIndexDB to flush all words into the
              word directory!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-10-05 10:45:33 +00:00
orbiter
7fc822a59b changed handling of time-zones
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-27 16:28:55 +00:00
theli
1dc94e7753 *) Adding support for gzip content-encoding of http post requests
used to transferRWIs and transferURLs.
   See: http://www.yacy-forum.de/viewtopic.php?t=1167#10020

*) adding yacyVersion.java containing constants defining yacy versions
   that support a given feature.
   Needed to determine if a remote peer is able to decode gzip 
   content-encoded http post bodies properly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@772 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-22 10:30:55 +00:00
orbiter
96a5b6e8fb removed yacy peer types from serverSwitch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-20 23:15:33 +00:00
borg-0300
11e175630b StringBuffers, finals;
cleaned;
Properties;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-18 15:23:01 +00:00
theli
a2fec3bb1c *) Bugfix for " java.lang.NullPointerException at hello.respond(hello.java:167)"
See: http://www.yacy-forum.de/viewtopic.php?p=9471

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@685 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-09-08 20:07:57 +00:00
theli
a812fb86cc *) Port Forwarding Feature does not detect broken connection properly.
Therefor a test-request was added to the isConnected function to detect broken connections
   and to keep open connections alive


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@596 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-08-29 11:39:10 +00:00
orbiter
e24dbde217 better logging for WRONG seed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@463 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-31 11:11:29 +00:00
theli
0e2c33ee55 *) Network.html/Network.java:
- Adding function to manually force peer ping to remote yacy peer
  See:Network.html?page=4
- for debugging purpose only!

*) serverAbstractThread.java:
- Adding posibility to notify a server thread via a synchronization object
- this is needed e.g. by the port forwarding feature to send a notification
  to the peerPing thread to redo peer-ping with the new ip/port Settings_p.html

*) Port Forwarding Feature (it should work now)
- adding a serverThread which is responsible to detect broken port forwarding 
  connections and to do reconnect if needed
- serverCore.java: moving port forwarding initialization into a separate function
- adding positility to configure the ssh port 
- moving configuration section on the gui into a separate fieldset
- hello.java: only trying to do a second connect to the clientIp address during
  peer handshake if either remote port forwarding is not enabled locally or
  the clientIP is not equal to any local ip

*) httpdFileHandler.java:
- printout a more verbose errormessage

*) httpc.java
- allowing to deactivate content encoding from outside


 

*) plasmaCrawlWorker.java
- the crawler worker now tries to refetch the content of a website without
  gzip content encoding if a gzip error occured



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-07-04 11:09:48 +00:00
theli
d50ad1521a *) correcting logging statement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@335 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-29 11:08:47 +00:00
theli
08e4334c1d *) Status.java: showing amount of time since last upload of seed-file
*) hello.java: adding additional output for principal-downgrade bug
*) httpd.java, httpdFileHandler.java, httpdProxyHandler.java: improved errorhandling
*) yacyCore.java: trying to fix principal-downgrade bug
*) yacySeed.java: adding some constants

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@329 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-28 11:27:31 +00:00
theli
d53b2393e5 *) autoconfig.java: ip address was not reported correctly when port-forwardin is on
*) hello.java: reportedip my be empty at peer startup
*) httpc.java: adding method to determine if the connection was already closed or is broken
*) httpdProxyHandler.java: trying to do a better errorhandling
*) server/serverCore.java
- setting myseed ip-address and port correctly if port-forwarding is on
- doing a more failsafe close and adding some debugging output
*) yacyClient.java: adding some logging statements to allow a better detection of 
   "degraded to senior"-bug
*) yacyCore.java: restructuring publishMySeed
   (@Orbiter: pleas take a look)
- to avoid buzy waiting
- to allow a gracefull shutdown on server shutdown
- new seed count was not calculated correctly in the previous version
*) yacySeedDB.java: host ip and port was not initialized correctly if port-forwarding
   was activated

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-23 11:00:26 +00:00
orbiter
5a490aa065 fixed html parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@289 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 21:49:56 +00:00
orbiter
38747857c2 correction of correction to port-forwarding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 11:49:54 +00:00
orbiter
dbda6e1e85 corrections to port forwarding
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@286 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 11:40:36 +00:00
theli
9a98988c3c *) Bugfix for SSL/NIO Bug
See: http://www.yacy-forum.de/viewtopic.php?t=516
   - removing NIO from server/serverCore.java because of massive problems
     with socket close issues
*) Adding support for remote port forwarding via sch
   @Orbiter: Please take a look into
   - hello.java
   - server/serverCore.java.publicIP()
   - yacy/yacyClient.java.publishMySeed(...)
*) Making startup loading of additional content parsers more failsafe


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@281 6c8d7289-2bf4-0310-a012-ef5d649a1542
2005-06-16 07:28:07 +00:00