yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

Author	SHA1	Message	Date
Michael Peter Christen	0716a24737	added more / all new crawl profile fields into crawl profile editor	2012-10-31 15:13:05 +01:00
Michael Peter Christen	4a14122ba7	in case that a crawl profile has a collection assigned, use the collection to show a name in the web interface. This should prevent that much too long names make the interface unusable.	2012-10-31 14:08:33 +01:00
Michael Peter Christen	0fe8be7981	enhaced data structures for balancer and latency computation which should produce a bit better prognosis about forced waiting times.	2012-10-30 17:30:24 +01:00
Michael Peter Christen	ac9540dfb6	removed options for stopwords which are not used	2012-10-30 12:36:36 +01:00
Michael Peter Christen	b2ffd49817	less latency	2012-10-30 12:26:32 +01:00
Michael Peter Christen	0833937c1c	better balancing and duetime-cumputation also for no-delay intranet hosts	2012-10-30 11:28:49 +01:00
Michael Peter Christen	c326aa8f67	disabled writing new entries to crawl stacks to prevent that a domain with many documents block refreshing of the crawl queue	2012-10-29 22:26:52 +01:00
Michael Peter Christen	c25d7bcb80	- added concurrency for robots.txt loading - changed data model for domain counter	2012-10-29 21:08:45 +01:00
Michael Peter Christen	a87811bc38	more auto-commit calls when a search interface is opened, but not when a search is done there to prevent blocking during search-time.	2012-10-29 11:27:13 +01:00
Michael Peter Christen	2d9e577ad0	replaced the custom robots.txt loader by the standard http loader	2012-10-28 22:48:11 +01:00
Michael Peter Christen	a33e2742cb	- removed unnecessary synchronized and deadlock in crawler - removed problem with monitoring object on Balancer.wait - added missing user agent settings	2012-10-28 19:56:02 +01:00
orbiter	8952153ecf	update to Balancer algorithm: - create a load list from the current list of known hosts - do not create this list for each Balancer.pop access - create the list from those hosts which have a zero-waiting time - select 1/3 from that list which have the most urls waiting - get hosts from the wainting list in random order - fixes for some delta-time computations - always load all urls from hosts which have never been loaded before	2012-10-28 13:24:49 +01:00
Michael Peter Christen	85ca07b90e	when a new crawl is started, an equal crawl, if still running, is terminated and the corresponding crawl profile is deleted (this also clears the crawl queue entries for that crawl profile)	2012-10-25 10:20:55 +02:00
Michael Peter Christen	ae6feb5610	showing the web structure graph as animation in the crawl monitor	2012-10-23 02:50:26 +02:00
Michael Peter Christen	ccc3760a47	Refactoring and redesign of data architecture to make URIMetadataRow superfluous. The target is to make a solr document as the core of YaCy documents which would cause that many conversions can be removed. On the way to this target the Equivalence of URIMetadataRow and URIMetadataNode had to be removed to expose the usage of the old URIMetadataRow data structure. This refactoring already removes unneccessary conversions and should make memory usage during indexing lower.	2012-10-18 14:29:11 +02:00
Michael Peter Christen	e5b3c172ff	removed hack which translated Solr documents to virtual RWI entries which had been then mixed with remote RWIs. Now these Solr documents are feeded into the result set as they appear during local and remote search. That makes the search much faster.	2012-10-17 17:45:41 +02:00
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	2012-10-16 18:11:57 +02:00
Michael Peter Christen	21fe8339b4	- enhanced generation of url objects - enhanced computation of link structure graphics - enhanced collection of data for link structures	2012-10-15 13:17:13 +02:00
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	2012-10-10 11:46:22 +02:00
Michael Peter Christen	53789555b9	fix for crawl start filter	2012-10-10 10:40:32 +02:00
Michael Peter Christen	a06930662c	replaced some more .getBytes() with UTF8/ASCII.getBytes()	2012-10-09 12:14:28 +02:00
Michael Peter Christen	4b5e0c1500	added an url rewriter which can be used to remove session ids from urls	2012-10-09 11:24:48 +02:00
Michael Peter Christen	76d218fbef	fixes to crawl profiles	2012-10-08 10:50:40 +02:00
Michael Peter Christen	1533bfd63b	refactoring	2012-09-25 21:20:03 +02:00
Michael Peter Christen	872f83ebe0	refactoring	2012-09-25 21:04:58 +02:00
Michael Peter Christen	8219a445f3	refactoring	2012-09-21 16:46:57 +02:00
Michael Peter Christen	f879a344e7	fix for no depth limit default value	2012-09-21 16:05:17 +02:00
Michael Peter Christen	00c1c777fa	refactoring	2012-09-21 15:48:16 +02:00

28 Commits