Commit Graph

19 Commits

Author SHA1 Message Date
Michael Peter Christen
710a0efa1b generalized time period computations 2015-03-02 12:55:31 +01:00
Michael Peter Christen
70f03f7c8e do not cache search requests to Solr if the result is used for
doublechecking. If a double-check comes from cached results the
doublecheck fails.
2014-11-20 18:45:27 +01:00
Michael Peter Christen
6d3d4c4ea6 changed the concurrent enumeration of query results in such a way that
it is now possible to get the results in two steps:
- first retrieve all IDs as given for a query
- then retieve each document individually

This was necessary for very large result sets where a query may run for
hours and is possibly terminated by a solr-internal timeout. This occurs
regulary during postprocessing and therefore this commit may fix
unwanted postprocessing terminations.
2014-09-17 13:58:55 +02:00
Michael Peter Christen
87f8118108 added option to delete documents from the webgraph 2014-07-16 16:04:19 +02:00
Michael Peter Christen
a2f800cd8f fix for bad String conversion 2014-06-04 12:07:07 +02:00
reger
f87ac716f3 improve IndexDeletion by query
adding transparently text_t as pseudo default search field if no fieldname (no  : ) is included.
adressing bug report  http://mantis.tokeek.de/view.php?id=274
2014-05-12 00:12:05 +02:00
Michael Peter Christen
bd886054cb new structure and enhancements for link graph computation:
- added order option to solr queries to be able to retrieve document
lists in specific order, here: link length
- added HyperlinkEdge class which manages the link structure
- integrated the HyperlinkEdge class into clickdepth computation
- extended the linkstructure.json servlet to show also the clickdepth
and other statistic information
2014-04-09 12:45:04 +02:00
Michael Peter Christen
51800007c4 - added concurrency to postprocessing of webgraph document
- bundeled separate webgraph postprocesing steps into one
2014-03-06 01:43:48 +01:00
Michael Peter Christen
a632b0d2a4 added a forced commit to index deletion to enable synchronized index
updates
2014-02-27 12:50:40 +01:00
Michael Peter Christen
0f6b72f24b do not use luke requests for remote solr servers if the result is
different from normal requests. This happens if the remote solr is
actually a solrCloud; in such cases the luke request returns only the
result of the single solr peer, not the whole cloud.
also done: some refactoring.
2014-02-26 14:30:48 +01:00
Michael Peter Christen
a9ed28c0b5 no commit if no action is requested 2014-01-17 14:54:44 +01:00
Michael Peter Christen
5e31bad711 - the webgraph shall store all links which appear on a web page and not
all unique links! This made it necessary, that a large portion of the
parser and link processing classes must be adopted to carry a different
type of link collection which carry a property attribute which are
attached to web anchors.
- introduction of a new URL class, AnchorURL
- the other url classes, DigestURI and MultiProtocolURI had been renamed
and refactored to fit into a new document package schema, document.id
- cleanup of net.yacy.cora.document package and refactoring
2013-09-15 00:30:23 +02:00
orbiter
6fb2811e68 fixes for problems with remote solr and non-activated webgraph index 2013-07-23 16:46:44 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
Michael Peter Christen
fdcd4e6a6f fixes to index deletion: quoting of host name (a '-' may be part of the
url) and disabling the engage button when changing the url field at
'Delete by URL matching'
2013-06-07 08:52:07 +02:00
Michael Peter Christen
e26bdd4a52 fixes to deletion methods (removed unnecessary concurrency and added
removal of crawl queue entries)
2013-05-08 13:26:25 +02:00
Michael Peter Christen
d7fd346917 - added regular-expression based deletions
- on-demand collection-list generation for collection-based deletions
instead of a default collection-list presentation (this makes calling
the interface much faster since the computation of collections lists for
large indexes may take some seconds)
2013-05-04 01:14:10 +02:00
Michael Peter Christen
1b102d98d8 - added index deletion to index administration submenu
- added index deletion processes to the process scheduler/recorder
2013-04-30 02:11:28 +02:00
Michael Peter Christen
0e2ee00fea added an index deletion servlet and some style changes for the
'dangerous' engage-button
2013-04-29 19:30:53 +02:00