yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

Author	SHA1	Message	Date
Michael Peter Christen	85b1922244	activated image type navigation for image search	2013-09-03 13:34:01 +02:00
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-09-03 12:22:57 +02:00
Michael Peter Christen	ab1201fdfd	fixed wrong facet count	2013-09-03 12:22:29 +02:00
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	2013-09-03 11:14:23 +02:00
Michael Peter Christen	69f85265e1	added an option to put image links to the crawl queue and handle these like normal documents. Using this option (by default on at this moment; this might change soon) it is possible to get the exif data into the search index to be used in image search.	2013-09-03 11:13:45 +02:00
Michael Peter Christen	e8e558a9b7	fix for content domain classification in URIMetadataNode	2013-09-03 10:49:09 +02:00
Michael Peter Christen	a8c5bfcf58	avoid to create unnecessary objects	2013-09-03 09:48:05 +02:00
Michael Peter Christen	5a0de1b77d	moving image description text to image text field	2013-09-03 09:47:27 +02:00
Michael Peter Christen	dc179bd61f	fix for catchall query goal for image search	2013-09-03 07:55:21 +02:00
reger	392174de8c	remove all_words, all_strings lists from QueryGoal - only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only	2013-09-02 23:09:43 +02:00
Michael Peter Christen	169ef8963d	one more fix for image search	2013-09-02 20:02:26 +02:00
Michael Peter Christen	cb85b22725	redesign of the image search process (with much better results, unfortunately the index schema has changed and p2p image search will not be muchmuch better until many people update)	2013-09-02 18:55:38 +02:00
reger	29967102a2	optimized QueryGoal (reducing mem and computation by removing all_hashes) - all_hashes used for text highlighting and word distance computation which can be done with include_hashes only	2013-09-02 04:19:53 +02:00
orbiter	f106345eef	link strings should not be tokenized	2013-09-01 14:35:36 +02:00
orbiter	deadeb406e	image alt tag strings should be tokenized	2013-09-01 13:48:10 +02:00
reger	d0e78082d1	return field names in index instead of in schema for SolrServerConnector.getFields	2013-08-31 06:25:12 +02:00
Michael Peter Christen	1a3e42eca4	index migration to lucene 4.4	2013-08-26 12:49:39 +02:00
Michael Peter Christen	a88a62f7aa	added a feature to set a collection for a crawl result based on a regular expression on th url: the collection attribut for a crawl start may be now either a token or a list of tokens, seperated by ',' where a token is either a string or a pair <string,pattern> where the string is separated to the pattern with a ':' and the string is assigned to the document as collection only if the pattern matches with the url.	2013-08-25 00:13:48 +02:00
Michael Peter Christen	3c5abedabf	NPE during shutdown fix	2013-08-24 23:36:50 +02:00
Michael Peter Christen	e4cbe9232d	fixed a crawler bug where a double-occurring url was not re-crawled because the double-check error was written to the error-db and never deleted. No the error-db is cleared on every start and these double-messages are not written to the error-db any more.	2013-08-22 15:56:09 +02:00
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	2013-08-22 14:23:47 +02:00
Michael Peter Christen	0f3d8890db	removed an assert which causes a shortcut call circuit	2013-08-22 10:12:25 +02:00
Michael Peter Christen	6d5fefe060	added missing files :(	2013-08-20 16:31:34 +02:00
Michael Peter Christen	554c0351dd	fix for http://bugs.yacy.net/view.php?id=286	2013-08-20 16:10:26 +02:00
Michael Peter Christen	47b1c81d08	- refactoring - generalized writing of url attributes to solr documents - added more url attributes to error documents	2013-08-20 15:46:04 +02:00
Michael Peter Christen	1c62fa7698	fix for bad snippets in gsa api	2013-08-18 10:37:25 +02:00
Michael Peter Christen	697613170d	less logging for postprocessing (this was a debugging logging with high CPU load)	2013-08-17 09:25:32 +02:00
reger	b4016ff324	- remove possible double initialization of rdfa parser - use ordered list to use preferred parser for mime/extension first (relates to html, rdfa, argument parser) - harmonize xhtml extension config for the 3 html base parsers	2013-08-14 21:12:10 +02:00
reger	f0575bd44b	FieldReIndex: omit active vocabulary fields from reindex detection	2013-08-14 00:00:30 +02:00
reger	a5019bc470	make Vocabulary Navigator tags a hard result entry filter by checking vocabulary tags also for rwi results (currently a filter is applied to the solr query) TODO: as vocabularies are only locally valid, auto-switch to Searchdom.LOCAL could be considered.	2013-08-13 03:07:25 +02:00
reger	a67a4b7d86	improve tld: query modifier filter pattern (to prevent tld:net accepting www.abcinet.org)	2013-08-12 21:20:23 +02:00
reger	02fe8b43ba	Field Re-Indexing: display list of fields in reindex queue change servlet to display statistic on 1st click (instead after refresh)	2013-08-11 04:51:29 +02:00
sixcooler	7f501b7c38	clear some caches before reporting low Memory do not break lines in Network-table-rows	2013-08-08 14:38:26 +02:00
reger	b355dd52c6	Index Administration - Field Re-Indexing: exclude internal Solr _version_ field from obsolete field check	2013-08-08 00:55:21 +02:00
sixcooler	8a96140f92	fix / workaround for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4750 + Seed.hash should be final	2013-08-01 16:40:58 +02:00
Michael Peter Christen	2857499467	fix to collection schema; bug appeared for _txt fields with empty String as content	2013-07-31 13:32:05 +02:00
Michael Peter Christen	dbfa865700	added a stub of a class for crawler redesign	2013-07-31 13:16:32 +02:00
Michael Peter Christen	76afcccaaf	fix for default boolean post values: the default value MUST NOT be TRUE, because it's normal that a boolean value is missing in the post argument if a checkbox is not selected. Added also some style enhancements to IndexFederated, removed the Solr attachment manual and replaced it with a link to the wiki which explains this in more detail.	2013-07-31 10:49:26 +02:00
orbiter	252c525709	fixed feed api servlet and and enhanced RSSReader class	2013-07-31 06:18:30 +02:00
orbiter	d38c3c14d8	fix for CGI test	2013-07-31 05:43:58 +02:00
Michael Peter Christen	31902f54df	fix for NPE which happens within solr code at MultiMapSolrParams.java, line 52 in case that the array arr.length == 0	2013-07-30 14:32:59 +02:00
Michael Peter Christen	f13df9dbb6	migration to solr 4.4.0	2013-07-30 14:01:16 +02:00
Michael Peter Christen	58fe986cca	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-07-30 12:49:14 +02:00
Michael Peter Christen	cf12835f20	replaced the single-text description solr field with a multi-value description_txt text field	2013-07-30 12:48:57 +02:00
sixcooler	7d53ac86a3	fix for Blacklist (-Administration)	2013-07-29 19:09:28 +02:00
reger	f2d99053ed	Field Re-Indexing: prevent endless error loop in ReindexSolrBusyThread on Solr exception (by skipping query causing the exception) (occured during testing while working on q=store:[* TO *])	2013-07-29 01:32:02 +02:00
reger	92d3f71b16	htmlParser: closes input stream -> changed it to leave it open for a reset (used by AugmentParser - even if this is practically not used), note: stream.close is done by caller (Textparser.parseSource) - removed unnecessary reset in AugmentParser - added stream.mark in tdfatripleimpl. to make stream.reset work here	2013-07-28 03:41:09 +02:00
orbiter	87cfeaa4f3	fix for npe	2013-07-27 15:20:09 +02:00
orbiter	268a36aaff	emergency fix for crawler: this will otherwise cause loss of complete crawl queue if latency of remote system is too low	2013-07-27 11:59:07 +02:00
orbiter	d05e0c5368	wait a bit longer before doing the first peer ping	2013-07-27 11:00:35 +02:00
orbiter	b8f57f7703	don't be noisy when doing background tasks that may be allowed to fail	2013-07-27 10:51:58 +02:00
Roland Haeder	0343f0668c	Fix for NPE: E 2013/07/26 20:29:29 BUSYTHREAD Runtime Error in serverInstantThread.job, thread 'net.yacy.search.Switchboard.cleanupJob': null; target exception: null java.lang.NullPointerException at net.yacy.search.schema.CollectionConfiguration.convergenceStep(CollectionConfiguration.java:1116) at net.yacy.search.schema.CollectionConfiguration.postprocessing(CollectionConfiguration.java:897) at net.yacy.search.Switchboard.cleanupJob(Switchboard.java:2296) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165) Conflicts: source/net/yacy/search/schema/CollectionConfiguration.java	2013-07-27 10:19:46 +02:00
Roland Haeder	b58ca8622d	Some cleanups: - added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added - Added 'final' keyword to a string	2013-07-27 10:13:57 +02:00
Roland Haeder	7263bb82fb	Fix for NPE on shutdown: java.lang.NullPointerException at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:2732) at net.yacy.search.Switchboard.access00(Switchboard.java:207) at net.yacy.search.Switchboard.run(Switchboard.java:3049)	2013-07-27 09:55:43 +02:00
Roland Haeder	13433d41a1	Log this exception better Conflicts: source/net/yacy/kelondro/blob/Tables.java	2013-07-27 09:54:51 +02:00
orbiter	080d80c9de	do not write an empty failreason in case that there is no fail. Because of the lazy instantiation rule this value was not actually written, but if lazy instantiation is switched on, then this causes that all crawl starts delete all crawl-start-hosts completely because this looks for filled error reasons.	2013-07-26 17:53:28 +02:00
Michael Peter Christen	4c242f9af9	always use a default value for boolean options to have transparency for the outcome if the attribute is missing in servlets	2013-07-25 12:17:29 +02:00
Michael Peter Christen	61e015268b	fix in forced deletion: forced commit needed	2013-07-25 09:53:19 +02:00
Michael Peter Christen	83e2921b39	new test case for http://bugs.yacy.net/view.php?id=141	2013-07-25 09:31:48 +02:00
Michael Peter Christen	304aacb2cc	fix for http://bugs.yacy.net/view.php?id=267	2013-07-25 09:26:24 +02:00
Michael Peter Christen	c3b2301b2f	fix for http://bugs.yacy.net/view.php?id=268	2013-07-25 09:21:37 +02:00
reger	aa1a1f1d2c	- small adjustment to make sure genericParser is tried last -- for some documents genericParser grabs document instead of specific available parser due to unordered pick of 1st to try parser (like .ps .rdf files and other) - remove redundant file extension registration	2013-07-23 20:24:13 +02:00
orbiter	3e901dcb06	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-07-23 19:33:07 +02:00
orbiter	f50b596e0b	do not run dht ditribution if system load is over 2.5	2013-07-23 19:32:32 +02:00
orbiter	056b42f5aa	- added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end	2013-07-23 18:03:33 +02:00
orbiter	6fb2811e68	fixes for problems with remote solr and non-activated webgraph index	2013-07-23 16:46:44 +02:00
sixcooler	af740f3058	changed optimization to a segment-size of index-size/5.000.000 + one if not idle + one (and force) if postprocessing	2013-07-23 14:21:12 +02:00
Michael Peter Christen	336f86394c	replaced StringBuffer with StringBuilder	2013-07-23 12:21:27 +02:00
Michael Peter Christen	aeac2fb763	replaced more containsKey() -> get() usages by a simple get(), followed by a test for NULL. This should increase the application speed and reduces the lookup time for the affected methods by 50%	2013-07-23 12:16:51 +02:00
orbiter	5364c4dcc9	delayed first peer-ping to send the first ping out after the http got up; if the ping comes before the http is up, it cannot be recognized as senior peer (if at all). See also: http://bugs.yacy.net/view.php?id=266	2013-07-22 18:21:37 +02:00
orbiter	e24016e30a	added the property federated.service.solr.indexing.timeout to yacy.init to provide a configurable time-out for solr; see also: http://bugs.yacy.net/view.php?id=254	2013-07-22 17:45:12 +02:00
orbiter	c124037f19	removed forced non-soft commits to prevent index fragmentation	2013-07-22 17:28:20 +02:00
Michael Peter Christen	31483c47e1	fixed problem with remote luke requests	2013-07-22 15:55:20 +02:00
Michael Peter Christen	c15aa758dc	removed failreason_t removal patch because that causes too much confusion using an external solr. to clean up the index after a schema change, use the index cleaner function from the online servlet	2013-07-22 14:17:38 +02:00
reger	2b7a38640a	extend content type detection on file extension for .tif .tiff .htm	2013-07-21 22:57:21 +02:00
Michael Peter Christen	ac1aad5064	added a getSegmentCount method and use it to disable optimize if wanted current segment count is below optimization level	2013-07-18 14:31:42 +02:00
Michael Peter Christen	36035e0a0a	- used reger's LukeRequest to generalize the index info in SolrServerConnector - used the LukeRequest in SolrServerConnector to replace the index size method by a getNumDocs request to a LukeRequest result	2013-07-18 13:26:07 +02:00
Michael Peter Christen	39fceb5ccf	fix for NPE & bug #264	2013-07-18 12:37:32 +02:00
Michael Peter Christen	735a66eff3	enhancements to crawler	2013-07-18 12:29:04 +02:00
Roland Haeder	be0ff6018f	Removed trailing spaces + some more final	2013-07-17 18:44:24 +02:00
Roland Haeder	aaedc0405d	Fixes and avoid of catching bad exceptions (some): - Rewrote usage of HashMap/Map to concurrent versions (to avoid a CME=ConcurrentModificationException) - Rewrote ConnectionInfo (as an example) to use a synchronized iterator instead of synchronizing an already synced HashSet (see Collections call) - This avoids catching CMEs again - Commented out noisy ConcurrentLog.logException() call Conflicts: source/net/yacy/repository/LoaderDispatcher.java	2013-07-17 18:37:34 +02:00
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	2013-07-17 18:31:30 +02:00
Felix Ableitner	03044589dd	Fixed (?i) appearing in entries, fixed multiple equal lines in file.	2013-07-17 16:42:10 +02:00
Michael Peter Christen	89c0aa0e74	added collection_sxt to error documents	2013-07-17 15:20:56 +02:00
Michael Peter Christen	0df5195cb0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-07-17 12:42:06 +02:00
Michael Peter Christen	1fd006cc56	fixes using the embedded connector	2013-07-17 12:41:54 +02:00
orbiter	d0dc86cf3d	logging of deadlocks (if any) during cleanup process	2013-07-17 12:38:58 +02:00
Michael Peter Christen	c6a6f159e8	fix for crawl stack domain counter	2013-07-16 18:18:55 +02:00
Michael Peter Christen	93d1bac140	do a more frequent optimization, reduces IO after optimization	2013-07-16 17:16:48 +02:00
orbiter	b71d13a014	added load and deadlock detector in Memory util	2013-07-16 10:49:20 +02:00
orbiter	290e24564b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-07-14 17:41:32 +02:00
orbiter	5533fc8e01	fix for bug 260	2013-07-14 17:40:28 +02:00
Michael Peter Christen	b79471ee67	grr	2013-07-14 10:15:47 +02:00
Michael Peter Christen	a79f288ac1	automatically running optimize on solr if user/search is idle for some time	2013-07-14 10:02:08 +02:00
orbiter	a9c8046c87	do a light optimization at the end of a crawl postprocessing	2013-07-13 19:09:46 +02:00
orbiter	a548354c71	replaced type of solr schema object sku of text_en_splitting_tight by string	2013-07-13 18:54:09 +02:00
orbiter	2f1ec8d4a2	npe fix	2013-07-13 11:10:05 +02:00
Michael Peter Christen	bcc623a843	refactoring of load_delay: this is a matter of client identification	2013-07-12 16:24:56 +02:00
orbiter	0d0b3a30f5	activate api actions after postprocessing of crawls	2013-07-12 16:05:48 +02:00
orbiter	3978c5ca5d	fix for http://bugs.yacy.net/view.php?id=255	2013-07-12 14:38:30 +02:00
orbiter	2be456e7fb	added a postprocessing field into api/status_p.xml to show if the postprocessing task is running at that time (status: busy) or not (status:idle)	2013-07-12 14:29:22 +02:00
orbiter	dac88561ae	minimum access time has a tight connection to ClientIdentification, therefore it is defined there.	2013-07-11 17:04:24 +02:00
Michael Peter Christen	9a29ab469e	another patch to prevent CLOSE_WAIT status on solr connections	2013-07-11 12:53:39 +02:00
Michael Peter Christen	5091d627bc	fixed parsing of peer flags	2013-07-11 12:53:16 +02:00
Michael Peter Christen	87e9052081	added Connection:close to all http requests in our http client to prevent CLOSE_WAIT states (as seen in lsof)	2013-07-11 11:54:11 +02:00
Michael Peter Christen	5c6946dd5f	replaced usage of log4j by ConcurrentLog where possible	2013-07-09 14:42:39 +02:00
Michael Peter Christen	5878c1d599	- refactoring of log to ConcurrentLog: jdk-based logger tend to block at java.util.logging.Logger.log(Logger.java:476) in concurrent environments. This makes logging a main performance issue. To overcome this problem, this is a add-on to jdk logging to put log entries on a concurrent message queue and log the messages one by one using a separate process. - FTPClient uses the concurrent logging instead of the log4j logger	2013-07-09 14:28:25 +02:00
orbiter	f4f6551c66	better handling of time-out at solrj in case that a commit is done in a fail-over case during add	2013-07-09 11:01:37 +02:00
Michael Peter Christen	07261fe274	Merge remote-tracking branch 'nutomics/blacklist_structure'	2013-07-08 23:32:15 +02:00
Michael Peter Christen	dea71851d2	- better concurrency for network scanner - network scanner can now start from the list of all hosts in the search index	2013-07-08 16:29:30 +02:00
Michael Peter Christen	a34e137e27	fix for citation index generation in case that entry.referrerhash() is null. This is especially the case if ftp sites are crawled	2013-07-08 16:26:11 +02:00
Michael Peter Christen	a2c8116a8f	accept (but ignore) a '+' sign in front of search words	2013-07-08 16:20:40 +02:00
orbiter	9f0cc9b401	enhanced network scanner - textarea input field can now be used to paste in a large list of hosts - /31er subnet is possible (only one host) - auto-detect subdomains for ftp and www subdomains	2013-07-08 13:17:09 +02:00
sixcooler	308d73f855	do not use remote proxy if not switched on - regardless of the proto	2013-07-04 19:16:13 +02:00
sixcooler	69906b1d2e	Revert "do not use remote proxy if not switched on - regardless of the proto" This reverts commit `20f452d228`.	2013-07-04 19:13:51 +02:00
sixcooler	20f452d228	do not use remote proxy if not switched on - regardless of the proto	2013-07-04 19:12:50 +02:00
sixcooler	9551720d5c	re-enable saved setting for proxy-crawl-profile	2013-07-04 19:10:57 +02:00
sixcooler	d5d8936f9d	For indexes that are changing rapidly in NRT situations, fcs (stands for Field Cache per Segment) may be a better choice than the default fc. (saves memory) see: http://wiki.apache.org/solr/SimpleFacetParameters#facet.method	2013-07-04 19:08:53 +02:00
Felix Ableitner	44f8fcf62e	Changed class structure of Blacklist.	2013-07-04 18:37:57 +02:00
Michael Peter Christen	57ffdfad4c	added a crawl option to obey html-meta-robots-noindex. This is on by default.	2013-07-03 14:50:06 +02:00
Michael Peter Christen	5a5d411ec0	new robots_i attribute fields	2013-07-02 14:29:13 +02:00
Michael Peter Christen	fa08bd9d5a	hack to prevent long waiting times in crawler	2013-07-01 13:24:52 +02:00
Michael Peter Christen	f1c5338210	prepartion for greedy crawl profiles and refactoring	2013-07-01 13:10:09 +02:00
Michael Peter Christen	e6f361f474	adding the canonical tag to crawl queues	2013-07-01 13:09:41 +02:00
reger	a6bf44212e	bugfix: location (lat/lon) meta data retrival (Double.NaN check)	2013-06-30 03:50:07 +02:00
Michael Peter Christen	203921006a	redesign of citation index storage	2013-06-30 02:11:46 +02:00
reger	83763ee4a4	jpeg parser: extract GPS location from meta data	2013-06-29 00:35:43 +02:00
Michael Peter Christen	32aa1d4569	removed unused option for queries	2013-06-28 15:32:36 +02:00
Michael Peter Christen	9d291764d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-06-28 15:03:25 +02:00
sixcooler	e5abccdfe4	added optimize-option	2013-06-28 14:51:37 +02:00
Michael Peter Christen	64140f35cd	fix for solr requests if no query part is given (prevent npe)	2013-06-28 13:16:25 +02:00
Michael Peter Christen	8caaf6203a	fixed false multiple-generation of remote facet search which caused high cpu usage on remote side.	2013-06-28 12:39:36 +02:00
Michael Peter Christen	823ae4d6a7	added url_protocol_s to error documents	2013-06-26 16:51:36 +02:00
Michael Peter Christen	660a196989	refactoring	2013-06-26 09:27:22 +02:00
Michael Peter Christen	c4538d8d91	added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib	2013-06-26 09:26:34 +02:00
reger	3760e2616b	bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments	2013-06-25 23:24:02 +02:00
Michael Peter Christen	9a6fcdf597	npe fix	2013-06-25 16:36:16 +02:00
Michael Peter Christen	16d1d744fa	added url_file_name_s in default collection schema for the file name without the file extension. This part of the file path is removed from the multi-field url_paths_sxt, which has now not the file name as last part of the path list. The same applies to the new fields source_file_name_s and target_file_name_s in the webgraph schema.	2013-06-25 16:27:20 +02:00
reger	8d1c4c423d	make imageparser fileextension detection case insensitive (extensions are often upper case)	2013-06-23 00:39:15 +02:00
Michael Peter Christen	f9d859f5dc	now writing image alt texts and (camelcase-)parsed urls into a text search field for a better image retrieval	2013-06-18 16:51:56 +02:00
Michael Peter Christen	e441a9d4c8	to avoid confusion, the gsa api is available at /search? and /searchresult?	2013-06-18 16:22:06 +02:00
orbiter	8792e6c6e9	stub for better image indexing	2013-06-18 13:28:30 +02:00
orbiter	97f2ac9091	added hint to gsa response writer that the result comes from a yacy peer	2013-06-17 13:29:03 +02:00
Michael Peter Christen	14186e815e	npe fix	2013-06-13 22:42:21 +02:00
Michael Peter Christen	bdf306e0a7	increased time-out for loading of seed-lists	2013-06-13 22:32:06 +02:00
Michael Peter Christen	374d2e2a52	removed warning message during crawling	2013-06-13 13:03:56 +02:00
Michael Peter Christen	570511f3c8	removed fields references_internal_id_sxt and references_internal_url_sxt because they had been shown to be superfluous. The citation of referrer in the host browser is possible without them. Therefore now the host browser does not only show internal, but also external referrer to each link.	2013-06-13 13:01:28 +02:00
Michael Peter Christen	fd1776a3b0	added a new 'Citations' function: each search result item can now be explored for citations within other documents. A click on the 'Citations' link shows an analysis with all text lines in the document each with a complete list of documents which contain the same line. A second section shows the linking documents in ascending order of number of citations from the original document. Because documents from different hosts are most interesting here, they are listed at the top of the page as possible 'copypasta' source.	2013-06-12 15:02:49 +02:00
Michael Peter Christen	fc3ff92c69	npe fix	2013-06-12 13:23:58 +02:00
Michael Peter Christen	1762911f57	added synchronizations and timeouts in solr api; missing synchronizations in index modification methods causes deadlocks inside solr.	2013-06-12 02:13:18 +02:00

1 2 3 4 5 ...

2145 Commits