yacy_search_server/source/net/yacy/cora
Michael Peter Christen 66b5a56976 Added and integrated new date detection class which can identify date
notions within the fulltext of a document. This class attempts to
identify also dates given abbreviated or with missing year or described
with names for special days, like 'Halloween'. In case that a date has
no year given, the current year and following years are considered.

This process is therefore able to identify a large set of dates to a
document, either because there are several dates given in the document
or the date is ambiguous. Four new Solr fields are used to store the
parsing result:

dates_in_content_sxt:
if date expressions can be found in the content, these dates are listed
here in order of the appearances

dates_in_content_count_i:
the number of entries in dates_in_content_sxt

date_in_content_min_dt:
if dates_in_content_sxt is filled, this contains the oldest date from
the list of available dates

#date_in_content_max_dt:
if dates_in_content_sxt is filled, this contains the youngest date from
the list of available dates, that may also be possibly in the future

These fields are deactiviated by default because the evaluation of
regular expressions to detect the date is yet too CPU intensive. Maybe
future enhancements will cause that this is switched on by default.

The purpose of these fields is the creation of calendar-like search
facets, to be implemented next.
2014-12-14 13:40:45 +01:00
..
date Added and integrated new date detection class which can identify date 2014-12-14 13:40:45 +01:00
document added toString() methods to feed classes which makes it possible to 2014-12-06 00:18:14 +01:00
federate enable sku as anchor in html response writer 2014-12-14 04:02:13 +01:00
geo - the webgraph shall store all links which appear on a web page and not 2013-09-15 00:30:23 +02:00
language added option to enrich vocabularies with synonyms from synonym database 2014-11-19 18:12:43 +01:00
lod enhanced tagging preparation speed which reduces initialization time for 2014-12-13 09:54:41 +01:00
order fixed and enhanced Base64 (en)coder (again) 2014-06-20 13:54:18 +02:00
plugin added phonetic classes 2011-12-14 17:33:18 +01:00
protocol remove redundant null check in ResponseHeader.lastModified 2014-12-09 00:58:08 +01:00
sorting better handling of ranking parameters and new default values for date 2014-05-22 03:01:07 +02:00
storage fixed generics warnings for generic array instantiation that appeared 2014-05-20 21:50:16 +02:00
util fixes on wkhtmltopdf 2014-12-14 04:03:20 +01:00