yacy_search_server/htroot/Help.html

139 lines
6.8 KiB
HTML
Raw Normal View History

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>YaCy: Help</title>
#%env/templates/metas.template%#
</head>
<body id="Help">
#%env/templates/header.template%#
<h2>Help</h2>
<p>
This is a distributed web crawler and also a caching HTTP proxy. You are using the <em>online-interface</em> of the application. You can use this interface to configure your personal settings, proxy settings, access control and crawling properties. You can also use this interface to start crawls, send messages to other peers and monitor your index, cache status and crawling processes. Most important, you can use the search page to search either your own or the <em>global</em> index.
</p>
<p>
For more detailed information, visit the <a href="http://www.yacy.net/">YaCy homepage</a>.
</p>
<h3>Local and Global Search: Options and Functions</h3>
<p>
The proxy provides a search interface that accesses your local index, created from web pages that passed the proxy. The search can also be applied globally, by searching other peers. You can use the following options to enhance your search results:
</p>
<dl class="optionsAndFunctions">
<dt>Search Word List</dt>
<dd>
You can search for several words simultanous. Words must be separated by a single space.
The words are treated conjunctive, that means every must occur in the result, not any.
If you do a global search (see below) you may get different results each time you do a search.
</dd>
<dt>Maximum Number of Results</dt>
<dd>
You can select the number of wanted maximum links. We do not yet support multiple result pages for virtually any possible link.
Instead we encourage you to enhance the search result by submitting more search words.
</dd>
<dt>Result Order Options</dt>
<dd>
The search engine provides an experimental 'Quality' ranking. In contrast to other known search engines we provide also
a result order by date. If you change the order to 'Date-Quality' the most recently updated page from the search results is listed first.
For pages that have the same date the second order, 'Quality' is applied.
</dd>
<dt>Resource Domain</dt>
<dd>
This search engine is constructed to search the web pages that pass the proxy. But the search index is distributed to other peers as well,
so you can search also globally: this function is currently only rudimentary, but can be choosen for test cases. Future releases will
automatically distribute index information <em>before</em> a search happends to form a performant distributed hash table -- a very fast global search.
</dd>
<dt>Maximum Search Time</dt>
<dd>
Searching the local index is extremely fast, it happends within milliseconds, even for a large number (millions) of pages. But searching the
global index needs more time to find the correct remote peer that contains best search results. This is especially the case while the
distributed index is in test mode. Search results get more stable (repeated global search produce more similar results) the longer
the search time is.
</dd>
</dl>
<h4>Accesskeys</h4>
<p>
You may want to use accesskeys to navigate through the YaCy webinterface:
</p>
<ul>
<li>Windows and Internet Explorer: Alt + Accesskey + Enter</li>
<li>Windows and Mozilla/Firefox/Netscape: Alt + Accesskey</li>
<li>Windows and Opera: Shift + Esc + Accesskey</li>
<li>Macintosh and Internet Explorer: Strg + Accesskey + Enter</li>
<li>Macintosh and Safari: Strg + Accesskey</li>
<li>Macintosh and Mozilla/Firefox/Netscape: Strg + Accesskey</li>
<li>Macintosh and Opera: Shift + Esc + Accesskey</li>
<li>Linux Mandrake and Galeon/Mozilla: Alt + Accesskey</li>
<li>All OS and Amaya: Strg + Accesskey</li>
</ul>
<dl class="accesskeys">
<dt>s</dt>
<dd>Search Page</dd>
<dt>n</dt>
<dd>News</dd>
<dt>w</dt>
<dd>Network</dd>
<dt>t</dt>
<dd>Status</dd>
</dl>
<h4>Regular Expressions</h4>
<p>YaCy uses Regular Expressions for some functions, for example in the blacklist.</p>
<p>There are some standards for these regexps, YaCy uses the syntax used by Perl 5.</p>
<p>Here ist a short overview about the functions, which should fir for most cases:</p>
<dl class="regexp">
<dt>.</dt>
<dd>arbitrary character</dd>
<dt>x</dt>
<dd>character x</dd>
<dt>[^x]</dt>
<dd>not x</dd>
<dt>x*</dt>
<dd>0 or more times x</dd>
<dt>x?</dt>
<dd>0 or 1 time x</dd>
<dt>x+</dt>
<dd>1 or more times x</dd>
<dt>xy</dt>
<dd>concatenation of x and y</dd>
<dt>x|y</dt>
<dd>x or y</dd>
<dt>(foo|bar)</dt>
<dd>String "foo" or string "bar"</dd>
<dt>[abc]</dt>
<dd>a or b or c (same as a|b|c)</dd>
<dt>[a-c]</dt>
<dd>a or b or c (same as above)</dd>
<dt>x{n}</dt>
<dd>exactly n appearances of x</dd>
<dt>x{n,}</dt>
<dd>at least n appearances of x</dd>
<dt>x{n,m}</dt>
<dd>at least n, maximum m appearanches of x</dd>
<dt>( )</dt>
<dd>Modify priority of instructions</dd>
<dt>\</dt>
<dd>Escape-Character, used to escape special characters (for example "[" or "*"), so that they loose their special meaning</dd>
</dl>
<p>
Regex follow a special priority (descending): concatenation, unary operators (*,+,^,{}), binary operators (|). This can be overridden with brackets.
</p>
<p><strong>Example:</strong></p>
<code>
.*heise.de/.*/[0-9]+
</code>
<p>
This matches heise.de/ with a string in front of it, for example "http://www.", followed by any string, then a slash and a number. The dot in "heise.de" is not escaped with "\", because it represents any character, thus the "." itself, too.
</p>
<p>
A possible URL which would match this regexp is: http://www.heise.de/newsticker/meldung/59421
</p>
<p>
An URL which would not match is: http://www.heise.de/tp/r4/artikel/20/20701/1.html
</p>
<p>
There is ".html" at the end, which is not included with the Regular Expression.
</p>
#%env/templates/footer.template%#
</body>
</html>