mirror of
https://github.com/yacy/yacy_search_server.git
synced 2024-09-19 00:01:41 +02:00
25573bd5ab
When a crawl is started, a new field to exclude content from scraping is available. The field can be identified with the class name of div tags. All text contained in such a div tag where the configured class name(s) match are not indexed, while the remaining page is indexed. |
||
---|---|---|
.. | ||
content | ||
parser | ||
DateDetectionTest.java | ||
DocumentTest.java | ||
ParserTest.java | ||
TextParserTest.java | ||
TokenizerTest.java | ||
WordTokenizerTest.java |