yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

Author	SHA1	Message	Date
reger	7a64bebb86	init Recrawl job chunk size to max crawl loader during job start, to use some system preferences and allow injection of recrawl urls before queue is empty During recrawl the balancer hangs on the very last urls often on hosts with huge delay time, by allowing injection earlier progress is more balanced. Max number of injected crawl urls by recrawl job is 2 * max loader.	2015-10-16 03:05:39 +02:00
reger	fb75fea446	use recrawljob w/o sort results by date This is a workaround for existing index (not fully reindexed) since intro of schema with docvalues to prevent solr exception causing recrawljob to fail with org.apache.solr.core.SolrCore java.lang.IllegalStateException: unexpected docvalues type NONE for field 'load_date_dt' (expected=NUMERIC). Use UninvertingReader or index with docvalues.	2015-10-04 05:43:40 +02:00
reger	98ab655917	on reindex delete index document with invalid url if discovered	2015-09-12 23:06:13 +02:00
Michael Peter Christen	dbbad23e12	removed warnings	2015-08-03 05:37:34 +02:00
reger	72f6a0b0b2	enhance recrawl job - allow to modify the query to select documents to process (after job has started) - allow to include failed urls (httpstatus <> 200)	2015-06-06 18:45:39 +02:00
reger	cd7c0e0aae	detail optimization of RecrawlThread	2015-05-17 00:13:00 +02:00
reger	ace71a8877	Initial (experimental) implementation of index update/re-crawl job added to IndexReIndexMonitor_p.html Selects existing documents from index and feeds it to the crawler. currently only the field fresh_date_dt is used determine documents for recrawl (fresh_date_dt:[* TO NOW-1DAY] Documents are added in small chunks (200) to the crawler, only if no other crawl is running.	2015-05-16 01:23:08 +02:00

7 Commits