yacy_search_server/source/net/yacy/crawler
Michael Peter Christen 535f1ebe3b added a new way of content browsing in search results:
- date navigation

The date is taken from the CONTENT of the documents / web pages, NOT
from a date submitted in the context of metadata (i.e. http header or
html head form). This makes it possible to search for documents in the
future, i.e. when documents contain event descriptions for future
events.

The date is written to an index field which is now enabled by default.
All documents are scanned for contained date mentions.
To visualize the dates for a specific search results, a histogram
showing the number of documents for each day is displayed. To render
these histograms the morris.js library is used. Morris.js requires also
raphael.js which is now also integrated in YaCy.

The histogram is now also displayed in the index browser by default.

To select a specific range from a search result, the following modifiers
had been introduced:
from:<date>
to:<date>
These modifiers can be used separately (i.e. only 'from' or only 'to')
to describe an open interval or combined to have a closed interval. Both
dates are inclusive. To select a specific single date only, use the
'to:' - modifier.

The histogram shows blue and green lines; the green lines denot weekend
days (saturday and sunday).

Clicking on bars in the histogram has the following reaction:
1st click: add a from:<date> modifier for the date of the bar
2nd click: add a to:<date> modifier for the date of the bar
3rd click: remove from and date modifier and set a on:<date> for the bar
When the on:<date> modifier is used, the histogram shows an unlimited
time period. This makes it possible to click again (4th click) which is
then interpreted as a 1st click again (sets a from modifier).

The display feature is NOT switched on by default; to switch it on use
the /ConfigSearchPage_p.html servlet.
2015-03-02 04:30:10 +01:00
..
data added a new way of content browsing in search results: 2015-03-02 04:30:10 +01:00
retrieval added a html field scraper which reads text from html entities of a 2015-01-30 13:20:56 +01:00
robots more ipv6 bugfixes 2014-10-08 15:21:49 +02:00
Balancer.java - added a new Crawler Balancer: HostBalancer and HostQueues: 2014-04-16 21:34:28 +02:00
CrawlStacker.java ViewFile servlet: update index if newer, 2014-12-05 01:13:37 +01:00
CrawlSwitchboard.java added a html field scraper which reads text from html entities of a 2015-01-30 13:20:56 +01:00
HarvestProcess.java fix for wrong display of error urls in HostBrowser 2012-12-07 00:31:10 +01:00
HostBalancer.java reduce number of calls to queue.size() because that may be a bottleneck 2014-11-23 20:09:32 +01:00
HostQueue.java added a new way of content browsing in search results: 2015-03-02 04:30:10 +01:00
LegacyBalancer.java special strategy for balancer: do not remove targets with zero wait time 2014-04-18 06:50:07 +02:00