yacy_search_server/htroot/Crawler_p.html
Michael Peter Christen a13e5153ac - added the possibility to have not one but a list of crawl start urls
- the list of urls is entered in the expert crawl start in a textfield;
the one-line input field was replaced with a text box
- start urls can also be given in one single line where the urls are
separated by a '|'-character
- as an effect, the crawl profile cannot carry a single start url for
identificaton because it is possible to have more. Therefore the url was
removed from the crawl profile
- this affect all servlets which display a crawl profile: removed the
url field from all there servlets
- to work consistently with several start urls and the other crawl
starts which computed crawl start url lists from sitelists or sitemaps,
the crawl start servlet was restructured completely
- new rules for must-match patterns were created to make it possible
that site crawl starts also work with several crawl starts at once
2012-09-14 12:25:46 +02:00

191 lines
7.7 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>YaCy '#[clientname]#': Crawler</title>
#%env/templates/metas.template%#
<script type="text/javascript" src="/js/ajax.js"></script>
<script type="text/javascript" src="/js/xml.js"></script>
<script type="text/javascript" src="/js/html.js"></script>
<script type="text/javascript" src="/js/rss2.js"></script>
<script type="text/javascript" src="/js/query.js"></script>
<script type="text/javascript" src="/js/Crawler.js"></script>
</head>
<body id="Crawler" onload="initCrawler();">
#%env/templates/header.template%#
#%env/templates/submenuCrawlMonitor.template%#
<h2>Crawler</h2>
<noscript><p>(Please enable JavaScript to automatically update this page!)</p></noscript>
<fieldset style="width:270px;height:140px;float:left;">
<legend>Queues</legend>
<table border="0" cellpadding="2" cellspacing="1" class="watchCrawler">
<tbody>
<tr class="TableHeader">
<th width="110">Queue</th>
<th>Size</th>
<th width="50">Pause/Resume</th>
</tr>
<tr class="TableCellLight">
<td align="left">Local Crawler</td>
<td align="right"><span id="localcrawlerqueuesize">#[localCrawlSize]#</span></td>
<td>
<a href="" id="localcrawlerstateA">
<img src="" alt="" style="width:12px; height:12px;" id="localcrawlerstateIMG" />
</a>
</td>
</tr>
<tr class="TableCellLight">
<td align="left">Limit Crawler</td>
<td align="right"><span id="limitcrawlerqueuesize">#[limitCrawlSize]#</span></td>
<td>
<a href="" title="" id="limitcrawlerstateA">
<img src="" alt="" style="width:12px; height:12px;" id="limitcrawlerstateIMG" />
</a>
</td>
</tr>
<tr class="TableCellLight">
<td align="left">Remote Crawler</td>
<td align="right"><span id="remotecrawlerqueuesize">#[remoteCrawlSize]#</span></td>
<td>
<a href="" title="" id="remotecrawlerstateA">
<img src="" alt="" style="width:12px; height:12px;" id="remotecrawlerstateIMG" />
</a>
</td>
</tr>
<tr class="TableCellLight">
<td align="left">No-Load Crawler</td>
<td align="right"><span id="noloadcrawlerqueuesize">#[noloadCrawlSize]#</span></td>
<td>
<a href="" title="" id="noloadcrawlerstateA">
<img src="" alt="" style="width:12px; height:12px;" id="noloadcrawlerstateIMG" />
</a>
</td>
</tr>
<tr class="TableCellLight">
<td align="left">Loader (<span id="loaderqueuemax">#[loaderMax]#</span>)</td>
<td align="right"><span id="loaderqueuesize">#[loaderSize]#</span></td>
<td>&nbsp;</td>
</tr>
</tbody>
</table>
</fieldset>
<fieldset style="width:140px;height:140px;float:left;">
<legend>Index Size</legend>
<table border="0" cellpadding="2" cellspacing="1" class="watchCrawler">
<tbody>
<tr class="TableHeader">
<th>Database</th>
<th>Entries</th>
</tr>
<tr class="TableCellLight">
<td align="left">Pages (URLs)</td>
<td align="right"><span id="urldbsize">#[urlpublictextSize]#</span></td>
</tr>
<tr class="TableCellLight">
<td align="left">RWIs (Words)</td>
<td align="right"><span id="rwidbsize">#[rwipublictextSize]#</span></td>
</tr>
</tbody>
</table>
</fieldset>
<fieldset style="width:520px;height:140px;;float:left;">
<legend>Progress</legend>
<form action="Crawler_p.html" method="post" enctype="multipart/form-data" accept-charset="UTF-8">
<table border="0" cellpadding="2" cellspacing="1" class="watchCrawler">
<tbody>
<tr class="TableHeader">
<th width="90">Indicator</th>
<th colspan="2">Level</th>
</tr>
<tr class="TableCellLight">
<td align="left">Speed</td>
<td align="left" colspan="2">
<input #(crawlingSpeedMinChecked)#::class="TableCellDark"#(/crawlingSpeedMinChecked)# type="submit" name="crawlingPerformance" value="minimum" />
<input #(crawlingSpeedCustChecked)#::class="TableCellDark"#(/crawlingSpeedCustChecked)# name="customPPM" type="text" size="5" maxlength="5" value="#[customPPMdefault]#" />PPM <input type="submit" name="crawlingPerformance" value="custom" />
<input #(crawlingSpeedMaxChecked)#::class="TableCellDark"#(/crawlingSpeedMaxChecked)# type="submit" name="crawlingPerformance" value="maximum" />
</td>
</tr>
<tr class="TableCellLight">
<td align="left">PPM (Pages Per Minute)</td>
<td align="left" width="20"><span id="ppmNum">&nbsp;&nbsp;&nbsp;</span></td>
<td align="left" width="400px"><span id="ppmSpan">&nbsp;&nbsp;&nbsp;</span></td>
</tr>
<tr class="TableCellLight">
<td align="left">Traffic (Crawler)</td>
<td align="left"><span id="trafficCrawler">&nbsp;&nbsp;&nbsp;</span> MB</td>
<td>&nbsp;</td>
</tr>
</tbody>
</table>
</form>
</fieldset>
<p class="watchCrawler" style="clear:both;">
#(info)#
<!-- 0 -->
::
<!-- 1 -->
Error with profile management. Please stop YaCy, delete the file DATA/PLASMADB/crawlProfiles0.db
and restart. ::
<!-- 2 -->
Error: #[errmsg]# ::
<!-- 3 -->
Application not yet initialized. Sorry. Please wait some seconds and repeat
the request. ::
<!-- 4 -->
<strong>ERROR: Crawl filter "#[newcrawlingfilter]#" does not match with
crawl root "#[crawlingStart]#".</strong> Please try again with different
filter. ::
<!-- 5 -->
Crawling of "#[crawlingURL]#" failed. Reason: #[reasonString]#<br>
::
<!-- 6 -->
Error with URL input "#[crawlingStart]#": #[error]# ::
<!-- 7 -->
Error with file input "#[crawlingStart]#": #[error]# ::
<!-- 8 -->
Crawling of "#[crawlingURL]#" started. <strong>Please wait some seconds,
it may take some seconds until the first result appears there.</strong>
If you crawl any un-wanted pages, you can delete them <a href="IndexCreateWWWLocalQueue_p.html">here</a>.<br />
#(/info)#
</p>
<!-- crawl queues -->
<p>See an <a href="/api/latency_p.xml">access timing</a></p>
<!-- crawl profile list -->
#(crawlProfilesShow)#::
<fieldset>
<legend>Running Crawls</legend>
<table border="0" cellpadding="2" cellspacing="1" summary="A list of crawl profiles and their current settings.">
<colgroup>
<col width="16" />
<col width="140"/>
</colgroup>
<tr class="TableHeader">
<td><strong>Name</strong></td>
<td><strong>Status</strong></td>
</tr>
#{list}#
<tr class="TableCell#(dark)#Light::Dark#(/dark)#">
<td>#[name]#</td>
<td>#(terminateButton)#::
<div style="text-decoration:blink;float:left;">Running</div>
<form style="float:left;" action="Crawler_p.html" method="get" enctype="multipart/form-data" accept-charset="UTF-8"><div>
<input type="hidden" name="handle" value="#[handle]#" />
<input type="submit" name="terminate" value="Terminate" />
</div></form>
#(/terminateButton)#
</td>
</tr>
#{/list}#
</table>
<h3>Crawled Pages</h3>
<p id="crawllist"></p>
</fieldset>
#(/crawlProfilesShow)#
#%env/templates/footer.template%#
</body>
</html>