Commit Graph

1722 Commits

Author SHA1 Message Date
orbiter
0eb60cfe6f better handling of seed properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4199 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-09 09:40:42 +00:00
orbiter
6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-07 22:38:09 +00:00
fuchsi
425e4ead66 Allow absolute paths in configuration settings.
- before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging).
- abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.

- exceptions (hardcoded): 
  DATA/LOG/yacy.logging
  DATA/SETTINGS/httpProxy.conf
  DATA/SETTINGS/user.db
TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example.

- add missing workPath to yacy.init (it was used in code, but there was no default in the file)
- fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos.
- replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-04 10:36:25 +00:00
orbiter
794d296129 project link update
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4193 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-01 20:40:15 +00:00
orbiter
ccbfb15b6b enhancement to crawl stacker enqueue order
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-01 00:57:32 +00:00
orbiter
93905e5c7b fix for show-more bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4191 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-01 00:55:39 +00:00
orbiter
55c87b3b12 changed behavior of crawl stacker
- final flush only when tabletype = RAM
- prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100
- number of maximun entries in stacker is configurable in yacy.init (stacker.slots)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4186 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-31 11:32:40 +00:00
orbiter
87b297b4d2 update of link to english forum
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-29 10:50:27 +00:00
orbiter
a31b9097a4 preparations for mass remote crawls:
two main changes must be implemented to enable mass remote crawls:
- shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote
  crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused
  as crawl agent for unwanted file retrieval
- implement new index files that control double-check of remotely crawled urls

After removal of robots.txt checking from stacker threads, the multi-threading of this process is void.
Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since
creation of these threads is not resource-consuming, for a detailed explanation see svn 4106

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-29 01:43:20 +00:00
fuchsi
a718858e8b seed.CCOUNT is interpreted as a double value not int
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4180 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 23:25:48 +00:00
orbiter
d85821a88c fix for SVN 4178
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4179 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 22:00:34 +00:00
fuchsi
0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.

* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).

TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 21:38:19 +00:00
orbiter
9d539ec621 added option to display the network name as page greeting instead the page greeting string
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4174 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-22 08:01:44 +00:00
fuchsi
35303f9504 add real size values (KBytes) of the DHT-In/Out-RAM-Caches to the PerformanceQueues page. A lot of users seem to tweak this value and it might help in finding the best size in relation to the peer's memory ressources.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4169 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 21:47:07 +00:00
fuchsi
f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers.
- some minor code cleanups (mostly unnecessary casts, null checks)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 04:13:46 +00:00
fuchsi
3352474dd8 Remove grouping separator in Network.xml (yacystats will woork without it) and format a few more numbers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-16 13:29:11 +00:00
fuchsi
06e6a1ff62 Add a generalized Formatter class yFormatter inspired by http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code).
NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4162 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-16 02:12:31 +00:00
low012
b54fcd732b *) fixed exceptions that occured when non-integer values were entered where integers were expected
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4160 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-12 19:09:20 +00:00
low012
52c68875bd *) removed (hopefully only) surplus double encodings (http://forum.yacy-websuche.de/viewtopic.php?t=368)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-12 15:27:23 +00:00
fuchsi
e255888095 Add headless AWT, nice level and memory parameters to the init script. It should work like the startYACY.sh now.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-10 15:19:07 +00:00
fuchsi
ce0bb1dc8a Increase defaults for the DHT Recieve Limits to prevent "busy" states.
see 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-10 10:07:16 +00:00
low012
fdb0b861f8 *) fixed wrong calculation of network words, network links, network PPM if peer is senior or principal peer
*) added network QPH
*) banner is cached for 1 second to avoid DOS
*) still no logo


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-09 21:47:37 +00:00
low012
507ecd8afa *) added banner that can be displayed like this: http://localhost:8080/Banner.png
possible arguments: textcolor, bgcolor, bordercolor
   example: http://localhost:8000/Banner.png?textcolor=ffffff&bgcolor=121212&bordercolor=ffffff
   take care: YaCy uses CMY color model!
*) there are still some known bugs, but I can't continue coding right now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4149 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-07 21:59:36 +00:00
fuchsi
ebfd1e0b42 remove left over '>' in description and replace ' ' by '+' in rss search where URL-encoded parameters are required.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-05 18:52:15 +00:00
fuchsi
ed20531e68 don't encode in channel element as well
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4144 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 12:12:27 +00:00
fuchsi
c5a8585ac6 fix more encooding problems in yacysearch.rss.
- URL encoding for search terms where required
- removed "ugly" CDATA escaping
- UTF-8 encoding for the XML
- no HTML style escaping for XML/RSS element values
Note: some unicode characters might still be encooded in a wrong way.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 09:21:03 +00:00
low012
e2f3268c13 *) removed double encoding (http://forum.yacy-websuche.de/viewtopic.php?t=368)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4138 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 20:13:32 +00:00
orbiter
711641f167 extended client connection clean-up:
there are now two time-outs, one for the complete connection time, and one for an idle time
connections that are idle for more than 2 minutes are closed, and connections that are alive since more than one hour are also closed
if the complete number of connections exceeds 64, all connections more than 64 and have most idle time are also closed

During normal operation of peers these forced closings should never appear,
but the existence of the idle connection check ensures the availability of the peer and the usability of the host.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 15:06:12 +00:00
fuchsi
03c5b4ad68 more fixes to the yacysearch.rss, it's now 100% valid according to http://feedvalidator.org
- RFC-822 date time had to include the time instead of date only
- <opensearch:link> doesn't exist -> <atom:link>, see http://www.opensearch.org/Specifications/OpenSearch/1.1
- <link> elements are mandatory for <channel> and <item>

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4131 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 04:00:52 +00:00
fuchsi
e3c6236eef fixed the last opensearch/rss issue. The GUID-Tag in RSS is supposed to coontain a unique ID. By default, the ID is supposed to be a permanent link to the feed element (the permalink) in which case it's content _must_ match the syntax of a URL. The guid _can_ contain a non-URL ID, but it _must_ be specified as such with an additional isPermLink="false" attribute in this case.
see http://www.rssboard.org/rss-2-0#ltguidgtSubelementOfLtitemgt

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 00:46:30 +00:00
orbiter
dea7bee049 - increased minimum time before an active connection is interrupted from 1 minute to 10 minutes
- added sorting by connection time in client connection tabe of connectionTimeComparatorInstance

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 23:56:04 +00:00
orbiter
f8e69ce4dc removed progress bar in Network list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4127 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 22:50:47 +00:00
orbiter
b183bf6f42 - fixed opensearch bugs
- added 'full domain' button to expert crawl start
- removed not-workin 'only one domain' button, the regex allowed crawling of other domains

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4125 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 21:43:05 +00:00
fuchsi
7404f2c35c Fix some of the issues with the RSS search interface, see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=392
Note: the new DateFormatter822 in the plasmaSwitchboard is just a copy of the DateFormatter that always uses the US locale to allow formatting of a loocale independent date String.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4124 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 21:28:29 +00:00
orbiter
98abe0804d another enhancement to crawl starts with link files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4123 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 20:30:42 +00:00
fuchsi
ed2ca8fc4c Add search type to top word suggestion searches.
Closes: http://forum.yacy-websuche.de/viewtopic.php?f=6&t=391

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4122 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 19:49:50 +00:00
orbiter
1b42152a76 fixed and enhanced some details in crawl start with file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4120 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 00:49:38 +00:00
orbiter
16e101f135 - fix for bad xml tag in Network.xml
- switched on automatic deletion of passive peers in pro versions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4119 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-01 22:45:44 +00:00
orbiter
01e0669264 re-designed some parts of DHT position calculation (effect is the same as before)
and replaced old fist hash computation by new method that tries to find a gap in the current dht
to do this, it is necessary that the network bootstraping is done before the own hash is computed
this made further redesigns in peer initialization order necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-01 12:30:23 +00:00
orbiter
842308ea97 - redesigned crawl start menu, integrated monitoring pages
- removed web structure picture from indexing menu and grouped it together with htcache monitor
- added a database for terminated crawls, when a crawl is finished it is automatically moved to the new database
- extended crawl profile edit servlet, shows now also terminated crawls
- option that was used to delete profiles is now redesigned to a function that moves the current crawl to the terminated crawls and removes all urls from the current queues!
- fixed here and there problems with indexing queues
- enhances indexing speed by changing cache flush sizes.
- changed behaviour of crawl result servlet: the list of crawled urls is shown if there is one, othevise the overview window is shown

attention: the new profile databases are not compatible with the old one. current crawls will be lost! the web index is not touched.
next steps: the database of terminated crawls can be used to start with them a new crawl. This is useful if one wants to re-crawl specific pages and wants to use a old crawl profile.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4113 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-28 01:21:31 +00:00
orbiter
341f7cb327 steps to enhance remote search performance:
- added a file size limitation, that disallows parsing of large documents during (offline-) remote search
- added profiling information to search result computation, visible at search access tracker. this info shows used time for URL fetch and snippet computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4112 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-26 10:11:50 +00:00
orbiter
2f1ff048ba some fixes to socket connection time-out
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4111 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 23:45:05 +00:00
orbiter
3c74014004 automatic deletion of dead client connections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 22:46:11 +00:00
orbiter
11b4f80bde - fixed non-closing client connections
- added client connection tracker in connections servelet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4108 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-25 21:36:08 +00:00
orbiter
1488769e1f cleanup of unmaintained and outdated performance methods:
removed object pools in httpc. Object pooling is not recommended,
if the creation of the object is not time-intensive. Object pools are only useful,
if there is much computation necessary to create some basic data that is stored
in the object pool and can be re-used. This does not apply to object pools in YaCy.
Object pooling of client sessions would make sense if they would allow re-use of
living connections to other yacy clients. But every connection is closed after usage
of an object in the client pool, therefore the YaCy server client objects are not such
that hold hardware/network-allocated entities.
See:
http://www.javaperformancetuning.com/news/qotm033.shtml
http://java.sun.com/docs/hotspot/HotSpotFAQ.html#gc_pooling
http://docs.sun.com/source/816-7159-10/pt_chap5.html
http://www.microjava.com/articles/techtalk/recylcle2


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-23 20:49:52 +00:00
fuchsi
00dab81077 simpler solution to last commit + works with and without navigation collumn on the left
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4104 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-20 01:52:10 +00:00
fuchsi
eb16a99e94 avoid floating of long page titles around the favicon in search results
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4103 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-19 22:08:56 +00:00
fuchsi
9524b9c16a second try of rev 4100 :). Tested in Iceweasel/Firefox 2.0.6, Konqueror 3.5.7, Opera 9.23 (all linux) and IE6-SP1 (wine)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-17 19:39:15 +00:00
fuchsi
6b8faaadb6 undo last commit for further evaluation, a progressbar element is used on other pages as well...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-17 03:36:35 +00:00
fuchsi
1880bba420 A few changes to the progress bar and search result statistics layout influenced by the discussion in <http://forum.yacy-websuche.de/viewtopic.php?f=5&t=268> with the idea of saving vertical space. Please check in every available browser and comment wether it's better than before. ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4100 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-16 14:30:53 +00:00