yacy_search_server/test/java/net/yacy/document
Michael Peter Christen 25573bd5ab added a crawl filter based on <div> tag class names
When a crawl is started, a new field to exclude content from scraping is
available. The field can be identified with the class name of div tags.
All text contained in such a div tag where the configured class name(s)
match are not indexed, while the remaining page is indexed.
2017-12-09 22:29:35 +01:00
..
content Extend DCEntry.getLanguage convert to ISO639-1 codes for more languages 2017-03-05 02:26:10 +01:00
parser added a crawl filter based on <div> tag class names 2017-12-09 22:29:35 +01:00
DateDetectionTest.java Remove old hard-coded holiday dates from DateDection class. 2017-11-07 19:02:09 +01:00
DocumentTest.java Adjust mergeDocuments to keep youngest last-modified date of document 2017-05-09 22:52:54 +02:00
ParserTest.java Improved parsing support for OOXML spreadsheets (.xlsx) 2017-08-21 09:38:20 +02:00
TextParserTest.java Restore initial locale at the end of a JUnit test case which modify it. 2017-11-20 18:50:49 +01:00
TokenizerTest.java optimize condenser language detection a little. 2016-10-06 19:03:52 +02:00
WordTokenizerTest.java reactivate sentence counter in WordTokenizer for phrasepos ranking, 2016-09-07 02:16:16 +02:00