yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

Author	SHA1	Message	Date
Michael Peter Christen	461a0ce052	removed warnings	2012-06-05 20:03:43 +02:00
Michael Peter Christen	9b4c699526	ehanced location search: - search request are now made using a map boundary - search results are only computed for the map boundary - the number of results is adopted to the results in the visible range - added a double-buffering for the search result markers - added a search query option for the search results: /radius/<lat>/<lon>/<radius>	2012-05-31 22:39:53 +02:00
Michael Peter Christen	16b21f7a5b	Added more steering in Crawler_p.html interface	2012-05-23 18:00:37 +02:00
Michael Peter Christen	acc19e190d	hack against 100% cpu during crawl delete	2012-05-23 15:45:07 +02:00
Michael Peter Christen	c15fcde1c8	add-on to latest commit	2012-05-21 17:52:30 +02:00
Michael Peter Christen	cf47d94888	performance hack to parse numbers inside of substrings without actually generating a substring. This avoids the allocation of a String object ech time a substring is parsed. Should affect CPU load during RWI transmission.	2012-05-21 13:40:46 +02:00
Michael Peter Christen	7e0ddbd275	added a "fromCache" flag in Response object to omit one cache.has() check during snippet generation. This should cause less blockings	2012-05-21 03:03:47 +02:00
Michael Peter Christen	f294f2e295	bugfix to http://bugs.yacy.net/view.php?id=181 tried to make a bit less 'noise' to dns server also included: less processes in snippet fetch to reduce load during search on small computers	2012-05-19 01:06:33 +02:00
Michael Peter Christen	e7e381d110	added configuration to switch off redirection following in crawler	2012-05-15 12:25:46 +02:00
Michael Peter Christen	70505107ca	enhanced crawler/balancer: better remaining waiting-time guessing	2012-05-15 12:24:54 +02:00
Michael Peter Christen	f150bc218b	fixed bug in solr error document	2012-05-14 14:56:21 +02:00
Roland 'Quix0r' Haeder	a093ccf5eb	Now used synchronization in all close() methods to make sure all objects are 'closed' in an ordered way Conflicts: source/de/anomic/http/server/ChunkedInputStream.java source/de/anomic/http/server/ChunkedOutputStream.java source/de/anomic/http/server/ContentLengthInputStream.java source/net/yacy/cora/protocol/Domains.java source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java source/net/yacy/document/content/dao/PhpBB3Dao.java source/net/yacy/document/parser/html/AbstractTransformer.java source/net/yacy/kelondro/blob/BEncodedHeap.java source/net/yacy/kelondro/blob/HeapReader.java source/net/yacy/kelondro/index/RAMIndexCluster.java source/net/yacy/kelondro/io/ByteCountInputStream.java source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java source/net/yacy/kelondro/table/SQLTable.java	2012-05-14 07:41:55 +02:00
Michael Peter Christen	ba6aaabc51	refactoring + parser bugfixes	2012-05-04 17:28:27 +02:00
Michael Peter Christen	659178942f	- Redesigned crawler and parser to accept embedded links from the NOLOAD queue and not from virtual documents generated by the parser. - The parser now generates nice description texts for NOLOAD entries which shall make it possible to find media content using the search index and not using the media prefetch algorithm during search (which was costly) - Removed the media-search prefetch process from image search	2012-04-24 16:07:03 +02:00
Michael Peter Christen	f5efdb21fd	refactoring	2012-04-24 12:54:41 +02:00
Michael Peter Christen	f8cd57c92f	new indexing strategy: ALL links that appear anywhere are indexed, not only links where the content can be parsed. All non-parseable links are placed into the noload queue. The search process must therefore be able to filter out non-text search results. - This fixes the problem that image search results appeared in the text search. - The interactive search can retrieve now ALL types of links - The p2p interface is now extended to retrieve only certain types of links (text, image, video, apps) - The search process has an extension to filter the right document type according to the search query	2012-04-22 02:05:17 +02:00
Michael Peter Christen	a1a5b015d8	refactoring: moved document Classification to cora package	2012-04-21 21:31:13 +02:00
Michael Peter Christen	a5d7da68a0	refactoring: removed dependency from switchboard in Balancer/CrawlQueues	2012-04-21 13:47:48 +02:00
Michael Peter Christen	33d1062c79	refactoring: the cache belongs to the crawler	2012-04-21 13:34:07 +02:00
Michael Christen	22f05c83ff	fixed default must-match filter for full domain crawls - the old filter was to restrictive and did not allow intranet crawls	2012-03-28 21:50:00 +02:00
Michael Peter Christen	0cc0290978	bugfix for a must-not-match pattern check. This bug did not make the check semantically wrong, but a trick that prevented an IP lookup in case that the filter was not used did not work. That bugfix causes that crawling gets a huge speed boost for noload urls!	2012-02-27 00:52:44 +01:00
Michael Peter Christen	2fc8ecee36	ConcurrentLinkedQueue has a VERY long return time on the .size() method. See http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentLinkedQueue.html and the following test programm: public class QueueLengthTimeTest { public static long countTest(Queue<Integer> q, int c) { long t = System.currentTimeMillis(); for (int i = 0; i < c; i++) { q.add(q.size()); } return System.currentTimeMillis() - t; } public static void main(String[] args) { int c = 1; for (int i = 0; i < 100; i++) { Runtime.getRuntime().gc(); long t1 = countTest(new ArrayBlockingQueue<Integer>(c), c); Runtime.getRuntime().gc(); long t2 = countTest(new LinkedBlockingQueue<Integer>(), c); Runtime.getRuntime().gc(); long t3 = countTest(new ConcurrentLinkedQueue<Integer>(), c); System.out.println("count = " + c + ": ArrayBlockingQueue = " + t1 + ", LinkedBlockingQueue = " + t2 + ", ConcurrentLinkedQueue = " + t3); c = c * 2; } } }	2012-02-27 00:42:32 +01:00
Michael Peter Christen	c6c61be3f0	fix for http://bugs.yacy.net/view.php?id=148	2012-02-24 00:38:57 +01:00
Michael Peter Christen	0d148c3353	more logging in resource observer	2012-02-23 01:20:42 +01:00
Michael Peter Christen	2fa037ae1d	enhanced crawler	2012-02-23 01:20:24 +01:00
Lotus	ee89cf5ae5	fix must match filter for full domain crawl allow: http://www.example.com http://www.example.com/ http://www.example.com/abc.html?xyz=q block: http://www.example.com.cn http://www.example.com.cn/dsf	2012-02-07 16:13:13 +01:00
Michael Peter Christen	9ad1d8dde2	complete redesign of crawl queue monitoring: do not look at a ready-prepared crawl list but at the stacks of the domains that are stored for balanced crawling. This affects also the balancer since that does not need to prepare the pre-selected crawl list for monitoring. As a effect: - it is no more possible to see the correct order of next to-be-crawled links, since that depends on the actual state of the balancer stack the next time another url is requested for loading - the balancer works better since the next url can be selected according to the current situation and not according to a pre-selected order.	2012-02-02 21:33:42 +01:00
Michael Peter Christen	1f4f60654a	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/document/parser/pdfParser.java	2012-01-24 20:42:30 +01:00
Michael Peter Christen	2ee8cbeb2c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: source/net/yacy/search/Switchboard.java	2012-01-05 18:37:46 +01:00
Michael Peter Christen	992dbdf4bb	added noload statistic to servlets	2012-01-05 18:33:05 +01:00
Michael Christen	c21966bb43	fix	2012-01-04 23:02:12 +01:00
Michael Christen	13b05f9c08	fix	2012-01-04 23:01:04 +01:00
Michael Christen	e5d878c59e	Merge branch 'master' of ssh://gitorious.org/yacy/rc1 Conflicts: source/de/anomic/crawler/CrawlQueues.java	2012-01-04 22:08:17 +01:00
Michael Christen	ec26b2bea4	Merge commit 'fa08ed5ae5d72bddc3cc6a662b23103579e86109' into quix0r Conflicts: source/de/anomic/crawler/CrawlQueues.java	2012-01-04 20:32:42 +01:00
Michael Christen	216a287a85	Merge commit '6d4e08ed06c5cd28c45981b2ebe31c7f7ec6fd83' into quix0r Conflicts: source/de/anomic/crawler/CrawlQueues.java	2012-01-04 20:16:37 +01:00
stbrumm	d18095dc48	Patch fuer Issue 0000102 and fixes to Patch (private peer status is a property of a peer, not a status)	2012-01-03 17:49:37 +01:00
Roland 'Quix0r' Haeder	901f37d608	Also this ... :( #2	2011-12-29 00:36:56 +01:00
Roland 'Quix0r' Haeder	a985717ed2	Also this ... :(	2011-12-29 00:35:51 +01:00
Roland 'Quix0r' Haeder	5f490de554	Fix for ported fix from my old days ...	2011-12-29 00:34:46 +01:00
Roland 'Quix0r' Haeder	fa08ed5ae5	Fixed a lot CHMOD rights (no need for execute flag on .java/.html) and introduced local/remote crawl size ratio based check	2011-12-29 00:33:16 +01:00
Michael Christen	9e5894c784	Removed handling of components objects for URIMetadataRows. This is a preparation to replace this rows with nodes from the node store.	2011-12-17 01:27:08 +01:00
Michael Christen	c04bfaa51b	refactoring	2011-12-16 23:59:29 +01:00
Michael Christen	6e66c9d7f1	fix for http://bugs.yacy.net/view.php?id=87	2011-12-05 23:46:42 +01:00
Michael Christen	e7e429705a	- less automatic indexing after a search (needs to reset the default crawl profiles) - fix for concurrency problem in storage of serverSwitch Properties - markup update	2011-12-05 16:22:11 +01:00
orbiter	11729061f2	added an option in the bookmark import process to put everything into the crawler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8134 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-12-03 00:27:01 +00:00
orbiter	8895d8c1cd	removed unnecessary log entries git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8117 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-11-27 16:54:48 +00:00
orbiter	5a55397f99	some last-minute performance hacks git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-11-25 11:23:52 +00:00
orbiter	e4a82ddd8b	produce a bookmark entry from every crawl start. these bookmarks are always private. these bookmarks will be used to get a source reference for the search in case of intranet or portal searches. git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8062 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-11-21 23:10:29 +00:00
orbiter	aa322bc6d0	fix git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8050 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-11-16 15:36:30 +00:00
orbiter	97d1347adb	added also a default accept field to robots.txt downloads git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8049 6c8d7289-2bf4-0310-a012-ef5d649a1542	2011-11-16 15:33:55 +00:00

1 2 3 4 5 ...

498 Commits