yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

Author	SHA1	Message	Date
luccioman	5a14d34a7d	Refactoring : documented and extracted autotagging processing functions.	2018-02-02 10:27:36 +01:00
reger	b017e97421	optimize condenser language detection a little. langdetect probabilities take letter case into account, add words from description and anchors etc. as is. + add it to javadoc	2016-10-06 19:03:52 +02:00
reger	ae3717d087	adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! ) + remove unused sentenceword map (we use only the count) + upd test case for sentence count	2016-10-06 03:41:07 +02:00
reger	474f0476c6	adjust Tokenizer sentence count on trailing text after last recognized sentence + upd test case for rwi multi-word-query (leaving results known to fail untested)	2016-10-05 05:52:37 +02:00
reger	96467c5467	remove not needed counter in Tokeninzer (completing last changes) including a small change, word posintext counting. We remember/store 1st posintext. Previously following words got a handle (posintext) excluding found. Now it just counts and assigns true posintext as handle (posintext)	2016-09-10 18:23:09 +02:00
reger	272cdd496a	reactivate sentence counter in WordTokenizer for phrasepos ranking, by counting punktuation (delivered as 1 char word) again.	2016-09-07 02:16:16 +02:00
reger	e310ec5f70	fix posInText ranking calculation to score 0 on no position info + fix Word posInText calc in Tokenizer to start with 1 + test case	2016-09-06 00:05:59 +02:00
Michael Peter Christen	90f75c8c3d	added enrichment of synonyms and vocabularies for imported documents during surrogate reading: those attributes from the dump are removed during the import process and replaced by new detected attributes according to the setting of the YaCy peer. This may cause that all such attributes are removed if the importing peer has no synonyms and/or no vocabularies defined.	2015-07-02 00:23:50 +02:00
Michael Peter Christen	7829480b82	refactoring: separated condenser and tokenizer	2015-07-01 18:28:18 +02:00

9 Commits