Commit Graph

1739 Commits

Author SHA1 Message Date
orbiter
2fcd18a972 - fixed bad behaviour of search event worker processes
- fixed export of url lists in xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4229 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-23 01:08:16 +00:00
orbiter
445c0b5333 added domain list extraction and html export format
to URL administration menu http://localhost:8080/IndexControlURLs_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-22 20:47:06 +00:00
orbiter
bf6952abe7 - added url export to http://localhost:8080/IndexControlURLs_p.html
- removed command-line option to export urls

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4226 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-22 16:52:44 +00:00
orbiter
edba2b7bcc fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=543
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4224 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-21 23:26:51 +00:00
orbiter
c48b73cda2 redesign of ranking data structure
- the index administration now uses the same code base for url selection and collection
  as the search interface. The index administration is therefore a good test environment for
  ranking order control
- removed old postsorting-algorithms, will be replaced with new one
- fixed many bugs occurred before during ranking; especially the contraint filtering method
  removed too many links
- fixed media search flags; had been attached to too many urls. The effect should be a better
  pre-sorting before media load within snippet fetch

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4223 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-21 23:14:57 +00:00
orbiter
6f1308da2f - some enhancements to IndexControlURLs (shows more links, connects referrer to another query)
- some refactoring to search process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4222 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-17 01:53:02 +00:00
orbiter
bf9a9e4e5e fix for NPE in IndexControlRWIs_p.java
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4221 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 16:37:45 +00:00
orbiter
c527969185 - enhanced monitoring of ranking parameters
for details, please try http://localhost:8080/IndexControlRWIs_p.html
- fixed computation of ranking ordering in some cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 14:48:09 +00:00
orbiter
bd5673efbe added cleaning of search event before opening the index administration
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4219 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-15 12:49:13 +00:00
orbiter
55da871211 preparations for better ranking: better debugging of index properties
to do this, the index administration interface was extended.
It is now possible to select parts of a index.
See properties shown in interface after a word search for details.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4218 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-15 03:03:18 +00:00
low012
383dc815d2 *) fix for commit 4212
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-14 19:14:53 +00:00
orbiter
3491531cea - fixed 'appears in url' flag in index generation
- extended index administration page, shows some properties to the web links now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4216 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-14 01:15:28 +00:00
daburna
19176e12e2 -corrected typo made in 4213
-updated translation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4215 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-13 10:13:04 +00:00
orbiter
ca4ca79eba removed wrong hints to installation page.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4213 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 23:45:52 +00:00
low012
a01c42575d *) 404 error pages will be displayed with correct CSS and favicon now (http://forum.yacy-websuche.de/viewtopic.php?t=482)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4212 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 20:18:47 +00:00
orbiter
bc2368e907 fix for problem with remote crawl referrers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4210 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 16:32:50 +00:00
orbiter
2e91b724ad fix for yacysearch/rss-feed bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4203 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-11 00:42:31 +00:00
orbiter
0eb60cfe6f better handling of seed properties
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4199 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-09 09:40:42 +00:00
orbiter
6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-07 22:38:09 +00:00
fuchsi
425e4ead66 Allow absolute paths in configuration settings.
- before absolute paths would be expanded incorrectly, e.g.: fooPath=/a/b/c would become /path/to/yacy/root/a/b/c. Now you can put nearly every dynamically generated data with a configurable path to a location outside of yacys root dir without having to use symlinks (probably good for third party distribution packaging).
- abstractServerSwitch.getConfigPath(setting, default) returns a File instance, either with an absolute path or relative to the applications root path.

- exceptions (hardcoded): 
  DATA/LOG/yacy.logging
  DATA/SETTINGS/httpProxy.conf
  DATA/SETTINGS/user.db
TODO: all of these are the global configuration files and they should probably be put into _one_ command line configurable settings path, so it would be possible to package them in /etc/ for example.

- add missing workPath to yacy.init (it was used in code, but there was no default in the file)
- fix broken skinPath (was skinsPath in yacy.init but skinsPath in the code) + a few other broken config reading caused by typos.
- replaced path setting names and their default values with the related static fields in plasmaSwitchboard where not already done/existing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4196 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-04 10:36:25 +00:00
orbiter
794d296129 project link update
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4193 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-01 20:40:15 +00:00
orbiter
ccbfb15b6b enhancement to crawl stacker enqueue order
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-01 00:57:32 +00:00
orbiter
93905e5c7b fix for show-more bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4191 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-01 00:55:39 +00:00
orbiter
55c87b3b12 changed behavior of crawl stacker
- final flush only when tabletype = RAM
- prestacker (dns prefetch) only if tabletype = RAM and busytime <= 100
- number of maximun entries in stacker is configurable in yacy.init (stacker.slots)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4186 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-31 11:32:40 +00:00
orbiter
87b297b4d2 update of link to english forum
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4182 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-29 10:50:27 +00:00
orbiter
a31b9097a4 preparations for mass remote crawls:
two main changes must be implemented to enable mass remote crawls:
- shift control of robots.txt to crawl queue (away from stacker). This is necessary since remote
  crawls can contain unchecked urls. Each peer must check the robots to prevent that it is misused
  as crawl agent for unwanted file retrieval
- implement new index files that control double-check of remotely crawled urls

After removal of robots.txt checking from stacker threads, the multi-threading of this process is void.
Multithreading has been removed. Also the thread pools for the crawl threads had been removed, since
creation of these threads is not resource-consuming, for a detailed explanation see svn 4106

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-29 01:43:20 +00:00
fuchsi
a718858e8b seed.CCOUNT is interpreted as a double value not int
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4180 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 23:25:48 +00:00
orbiter
d85821a88c fix for SVN 4178
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4179 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 22:00:34 +00:00
fuchsi
0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.

* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).

TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 21:38:19 +00:00
orbiter
9d539ec621 added option to display the network name as page greeting instead the page greeting string
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4174 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-22 08:01:44 +00:00
fuchsi
35303f9504 add real size values (KBytes) of the DHT-In/Out-RAM-Caches to the PerformanceQueues page. A lot of users seem to tweak this value and it might help in finding the best size in relation to the peer's memory ressources.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4169 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 21:47:07 +00:00
fuchsi
f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers.
- some minor code cleanups (mostly unnecessary casts, null checks)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 04:13:46 +00:00
fuchsi
3352474dd8 Remove grouping separator in Network.xml (yacystats will woork without it) and format a few more numbers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-16 13:29:11 +00:00
fuchsi
06e6a1ff62 Add a generalized Formatter class yFormatter inspired by http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code).
NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4162 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-16 02:12:31 +00:00
low012
b54fcd732b *) fixed exceptions that occured when non-integer values were entered where integers were expected
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4160 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-12 19:09:20 +00:00
low012
52c68875bd *) removed (hopefully only) surplus double encodings (http://forum.yacy-websuche.de/viewtopic.php?t=368)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-12 15:27:23 +00:00
fuchsi
e255888095 Add headless AWT, nice level and memory parameters to the init script. It should work like the startYACY.sh now.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-10 15:19:07 +00:00
fuchsi
ce0bb1dc8a Increase defaults for the DHT Recieve Limits to prevent "busy" states.
see 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-10 10:07:16 +00:00
low012
fdb0b861f8 *) fixed wrong calculation of network words, network links, network PPM if peer is senior or principal peer
*) added network QPH
*) banner is cached for 1 second to avoid DOS
*) still no logo


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-09 21:47:37 +00:00
low012
507ecd8afa *) added banner that can be displayed like this: http://localhost:8080/Banner.png
possible arguments: textcolor, bgcolor, bordercolor
   example: http://localhost:8000/Banner.png?textcolor=ffffff&bgcolor=121212&bordercolor=ffffff
   take care: YaCy uses CMY color model!
*) there are still some known bugs, but I can't continue coding right now


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4149 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-07 21:59:36 +00:00
fuchsi
ebfd1e0b42 remove left over '>' in description and replace ' ' by '+' in rss search where URL-encoded parameters are required.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4147 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-05 18:52:15 +00:00
fuchsi
ed20531e68 don't encode in channel element as well
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4144 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 12:12:27 +00:00
fuchsi
c5a8585ac6 fix more encooding problems in yacysearch.rss.
- URL encoding for search terms where required
- removed "ugly" CDATA escaping
- UTF-8 encoding for the XML
- no HTML style escaping for XML/RSS element values
Note: some unicode characters might still be encooded in a wrong way.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4140 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-04 09:21:03 +00:00
low012
e2f3268c13 *) removed double encoding (http://forum.yacy-websuche.de/viewtopic.php?t=368)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4138 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 20:13:32 +00:00
orbiter
711641f167 extended client connection clean-up:
there are now two time-outs, one for the complete connection time, and one for an idle time
connections that are idle for more than 2 minutes are closed, and connections that are alive since more than one hour are also closed
if the complete number of connections exceeds 64, all connections more than 64 and have most idle time are also closed

During normal operation of peers these forced closings should never appear,
but the existence of the idle connection check ensures the availability of the peer and the usability of the host.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4134 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 15:06:12 +00:00
fuchsi
03c5b4ad68 more fixes to the yacysearch.rss, it's now 100% valid according to http://feedvalidator.org
- RFC-822 date time had to include the time instead of date only
- <opensearch:link> doesn't exist -> <atom:link>, see http://www.opensearch.org/Specifications/OpenSearch/1.1
- <link> elements are mandatory for <channel> and <item>

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4131 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 04:00:52 +00:00
fuchsi
e3c6236eef fixed the last opensearch/rss issue. The GUID-Tag in RSS is supposed to coontain a unique ID. By default, the ID is supposed to be a permanent link to the feed element (the permalink) in which case it's content _must_ match the syntax of a URL. The guid _can_ contain a non-URL ID, but it _must_ be specified as such with an additional isPermLink="false" attribute in this case.
see http://www.rssboard.org/rss-2-0#ltguidgtSubelementOfLtitemgt

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4130 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 00:46:30 +00:00
orbiter
dea7bee049 - increased minimum time before an active connection is interrupted from 1 minute to 10 minutes
- added sorting by connection time in client connection tabe of connectionTimeComparatorInstance

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 23:56:04 +00:00
orbiter
f8e69ce4dc removed progress bar in Network list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4127 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 22:50:47 +00:00
orbiter
b183bf6f42 - fixed opensearch bugs
- added 'full domain' button to expert crawl start
- removed not-workin 'only one domain' button, the regex allowed crawling of other domains

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4125 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-02 21:43:05 +00:00