yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

History

Michael Peter Christen 25573bd5ab added a crawl filter based on <div> tag class names When a crawl is started, a new field to exclude content from scraping is available. The field can be identified with the class name of div tags. All text contained in such a div tag where the configured class name(s) match are not indexed, while the remaining page is indexed.		2017-12-09 22:29:35 +01:00
..
content	Extend DCEntry.getLanguage convert to ISO639-1 codes for more languages	2017-03-05 02:26:10 +01:00
parser	added a crawl filter based on <div> tag class names	2017-12-09 22:29:35 +01:00
DateDetectionTest.java	Remove old hard-coded holiday dates from DateDection class.	2017-11-07 19:02:09 +01:00
DocumentTest.java	Adjust mergeDocuments to keep youngest last-modified date of document	2017-05-09 22:52:54 +02:00
ParserTest.java	Improved parsing support for OOXML spreadsheets (.xlsx)	2017-08-21 09:38:20 +02:00
TextParserTest.java	Restore initial locale at the end of a JUnit test case which modify it.	2017-11-20 18:50:49 +01:00
TokenizerTest.java	optimize condenser language detection a little.	2016-10-06 19:03:52 +02:00
WordTokenizerTest.java	reactivate sentence counter in WordTokenizer for phrasepos ranking,	2016-09-07 02:16:16 +02:00