Commit Graph

1739 Commits

Author SHA1 Message Date
orbiter
0f7ea7ad9f - enhanced solr.add procedure for mass adds
- removed unused solr access classes
- made snippet generation for documents aus YaCy RWI/DHT concurrent (as
it was before the search process removation)
- reduced the number of remote results in settings file because the
processing of such mass documents add is too CPU-intensive (in Solr)
2013-03-01 15:27:17 +01:00
Michael Peter Christen
f327ffedb4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-28 15:55:13 +01:00
orbiter
9c09fd7d0b better/less requests to local solr; the request is made in chunks which
are exactly at only that size which is needed to present the current
search result page. This will also cause that next solr request are made
automatically during switching to next pages.
2013-02-28 14:04:08 +01:00
Michael Peter Christen
840fa22135 disabled clickdepth computation during craling since that is repeated
during clean-up phase.
2013-02-28 02:25:39 +01:00
orbiter
d74472f562 corrected result counter 2013-02-27 22:40:23 +01:00
orbiter
2555542f7a removed the dns prefetch because that was not soo useful 2013-02-27 20:58:34 +01:00
Michael Peter Christen
d957739441 removed size request 2013-02-26 17:53:44 +01:00
Michael Peter Christen
c95a84103a complete redesign of search process:
- removed 'worker' processes
- no internal time-out behaviour: methods either are successful or
return null
- waiting is only done on top-level
- removed snippet-production; this is replaced by solr snippets
- removed statistics based on solr size queries (they had been VERY
long); the statistics (like suggestions or tag cloud) are now again
based on the old but very fast RWI index. In portal or intranet mode the
RWI index is usually switched off; if you like to have statistics again
then you must switch on the rwis again in this mode.
- fixed many bugs regarding correct page counter
2013-02-26 17:16:31 +01:00
Michael Peter Christen
35fa718b77 testing to use solr for portalsearch caused some bugfixing but no full
success: try to comment out the solr search request in
yacy-portalsearch.js
2013-02-25 14:31:50 +01:00
Michael Peter Christen
008288719c fix for schema export to consider also automatically generated
coordinate fields
2013-02-25 01:13:03 +01:00
Michael Peter Christen
089dee1770 - generalized SchemaConfiguration into super-class Configuration and
adopted other classes which used the configuration-only access for that
class
- removed many warnings
- adjusted logging
2013-02-25 00:09:41 +01:00
Michael Peter Christen
c16de49f64 fix for webgraph delete query 2013-02-24 18:17:58 +01:00
Michael Peter Christen
56d5946a59 - added flags in IndexFederated_p.html to switch on or off the webgraph
index (new solr core webgraph) .. this is now off by default
- completely redesigned this servlet
- added description how to attach a remote solr
- adjusted naming of servlet and menues
- moved 'lazy initialization' attribut from IndexSchema to
IndexFederated (this is a general option) back again.
2013-02-24 18:09:34 +01:00
Michael Peter Christen
14cceb6b17 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	htroot/IndexFederated_p.html
	source/net/yacy/cora/federate/solr/YaCySchema.java
	source/net/yacy/peers/Protocol.java
	source/net/yacy/search/Switchboard.java
	source/net/yacy/search/index/Segment.java

also moved portalsearch-dev to yacy-portalsearch to be able to fix
problems with new attachment to solr of the search widget
2013-02-23 08:48:33 +01:00
Michael Peter Christen
58e1e6fa2b fixes to schema 2013-02-23 08:14:10 +01:00
reger
f291d60c5f on remote Solr search take only locally enabled schema fields from remote solrdocument for the inputdocument added to local index 2013-02-22 22:17:45 +01:00
Michael Peter Christen
788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
The default schema uses only some of them and the resting search index
has now the following properties:
- webgraph size will have about 40 times as much entries as default
index
- the complete index size will increase and may be about the double size
of current amount
As testing showed, not much indexing performance is lost. The default
index will be smaller (moved fields out of it); thus searching
can be faster.
The new index will cause that some old parts in YaCy can be removed,
i.e. specialized webgraph data and the noload crawler. The new index
will make it possible to:
- search within link texts of linked but not indexed documents (about 20
times of document index in size!!)
- get a very detailed link graph
- enhance ranking using a complete link graph

To get the full access to the new index, the API to solr has now two
access points: one with attribute core=collection1 for the default
search index and core=webgraph to the new webgraph search index. This is
also avaiable for p2p operation but client access is not yet
implemented.
2013-02-22 15:45:15 +01:00
Michael Peter Christen
91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
structure, but is not filled yet. To have the opportunity of a second
core, multi-core functionality had to be implemented to the
deep-embedded solr:
- migrated the solr_40 directory content to a subdirectory
'collection1'; the previously used default core is now called
collection1
- added solr_40/webgraph subdirectory as second core
- added a servlet configuration for the second core 'webgraph' in
/IndexSchema_p.html
- added instance handling as addition to solr connections: all solr
connectors are now instances of an solr 'instance' object; this required
a complete re-design of the solr embedding
- migrated also caching and sharding ontop of new instance handling
- migrated the search apis to handle now the access to a specific core,
the default core named 'collection1'
- migrated the remote solr search interface to access shards of cores;
for the yacy remote search the default core is now called 'solr'; using
the peer address as solr address
- migrated the solr backup and restore process: old backups cannot be
used after this migration!
- redesign of solr instance handling in all methods which access the
instances: they cannot hold copies of these instances any more; the must
retrieve the actuall connection object every time they want to write to
it (this solves also some bugs when switching the index/network)
- added another schema 'solr.webgraph.schema', the old solr.keys.list is
replaced by solr.collection.schema
2013-02-21 13:23:55 +01:00
Michael Peter Christen
33bc255e85 prevent that crawl starts with very large url lists cause a time-out in
the user front-end
2013-02-15 01:58:28 +01:00
Michael Peter Christen
b6de1f42dc Full redesign of solr connection architecture. This was done to support
multiple solr cores instead of just one. Therefore it is now necessary
to distuingish between solr server connections (called an 'Instance')
and a connection to a single solr core. One Instance may now have
multiple connector classes assigned to it, each connecting to a single
core.
To support multiple cores it is also necessary to distinguish between
the connection configuration and the configuration of the index schema.
We will have multiple schema configurations in the future, each for
every solr core. This caused that the IndexFederated servlet had to be
split into two parts, the new Servlet for the Schema editor is now in
the IndexSchema Servlet.
2013-02-15 01:38:10 +01:00
Michael Peter Christen
4111606654 removed the commitWithin attribute because that is not the way how the
index is updated the right way for us. May also be be superfluous with
the solr 4.0 softcommit.
2013-02-13 02:29:47 +01:00
Michael Peter Christen
c20fa3640d fix to unbalanced tag and license for null objects 2013-02-13 01:23:05 +01:00
Michael Peter Christen
3a6097966d added jsonp option to yjson result writer 2013-02-13 01:11:57 +01:00
Michael Peter Christen
de58043205 Added image license generation for solr image search results when
results are generated within yjson result writer. This makes it possible
to view images in yacyinteractive from solr.
2013-02-13 00:33:53 +01:00
Michael Peter Christen
d3508fa8ff fixed json search, quotes, auto-facets, urls etc. for
yacyinteractive.html
2013-02-13 00:01:38 +01:00
Michael Peter Christen
1db23e9eac Moved methods from SolrServerConnector to AbstractSolrConnector with the
result that most of these methods become superfluous in other classes.
This is a generalization step towards multi-indexes in Solr.
2013-02-12 22:03:10 +01:00
Michael Peter Christen
16d90859b7 reverted put-semantics back to as-usual in serverObjects and introduced
an add-method to put in several objects for the same key
2013-02-12 11:52:33 +01:00
Michael Peter Christen
0d888ff69e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-12 03:42:58 +01:00
Michael Peter Christen
c34af7fe94 extended JSON Response Writer and Opensearch Response Writer for the
Solr search interface in such way that it is possible to use this
interface for the yacyinteractive search. This search interface is now
much faster using the Solr search directly. For the Solr interface it
was necessary to create a translation from the YaCy search modifiers to
the Solr facet selection. This was added in such a way that it becomes
generic for the normal YaCy search and as a on-top evaluation for Solr
queries.
2013-02-12 03:42:46 +01:00
reger
c37d718f16 make sure yacy.running is deleted if not running (catch exception)
- to prevent following log if YaCy was previously not properly shutdown 

E ... STARTUP WARNING: the file C:\src\git\yacy-rc1\DATA\yacy.running exists, this usually means that a YaCy instance is still running
E ... STARTUP FATAL ERROR: java.util.concurrent.TimeoutException
java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException
	at net.yacy.cora.protocol.TimeoutRequest.call(TimeoutRequest.java:91)
	at net.yacy.cora.protocol.TimeoutRequest.ping(TimeoutRequest.java:112)
	at net.yacy.yacy.startup(yacy.java:200)
	at net.yacy.yacy.main(yacy.java:638)
Caused by: java.util.concurrent.TimeoutException

- adjust Netbeans path (to solr4.1.jars)
2013-02-11 22:53:19 +01:00
Michael Peter Christen
762b687e47 extended the serverObjects to be able to hold multipel values for a
single key. This is done using the solr class MultiMapSolrParams. That
class is needed in the OpensearchResultWriter to get multiple facet
requests.
2013-02-11 22:12:15 +01:00
Michael Peter Christen
d70d99fab5 added more metadata fields and facets to OpensearchResponseWriter.
This should make it possible to replace the original and enriched yacy
opensearch result with a solr output in opensearch format.
2013-02-11 22:10:14 +01:00
Michael Peter Christen
6a4878940b fix in html parser and bookmark generation 2013-02-11 13:28:08 +01:00
Michael Peter Christen
dee8b24d3c better error handling for bookmarks 2013-02-09 06:55:57 +01:00
Michael Peter Christen
e1da39245a when searching the network, do not search on robinson peers with the old
DHT search interface. Now use the solr interface.
2013-02-08 18:30:08 +01:00
Michael Peter Christen
6f6ddaf7e7 A robinson peer does not need to write RWI data if such peers are only
searched using the solr interface. Searching public rpbinsons will be
done with solr only in the future.
2013-02-08 17:58:54 +01:00
Michael Peter Christen
ab4f74c82c fix for xml blacklist import 2013-02-08 15:12:10 +01:00
Michael Peter Christen
7806680ab8 fixed a problem with re-feeding of already indexed documents whith
coordinates attached.
2013-02-08 12:45:54 +01:00
Michael Peter Christen
cb38e860cf After the observation that Windows user simply forget that they started
YaCy; YaCy is still running and the user additionally expect that
another doubleclick on the YaCy icon simply opens the search windows
(again) I decided to add a function that complies to the expectation to
the user: simply open the browser pop-up page again if the user starts
YaCy while YaCy is still running.
2013-02-07 23:39:00 +01:00
Marc Nause
27894d2c1a Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2013-02-05 21:09:41 +01:00
Marc Nause
75f9568472 *) only install files from the RELEASE directory
*) minor changes
2013-02-05 21:02:32 +01:00
Michael Peter Christen
eb80405a16 added a disable function in RemoteCrawl_p servlet which prevents setting
of remote crawl if peer is not a senior or principal peer
2013-02-05 12:47:20 +01:00
Michael Peter Christen
19c46e4acf catch more exceptions 2013-02-04 21:24:39 +01:00
Michael Peter Christen
7de502f43d Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-04 20:02:35 +01:00
Marc Nause
3bc5ee6e3d *) added protection against CSRF in update download page
(http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release
does not work anymore)
2013-02-04 19:57:28 +01:00
Michael Peter Christen
4f270d89e2 another NPE 2013-02-04 18:04:52 +01:00
Michael Peter Christen
921091c3a6 use thread-safe http connection manager for authenticated remote solr
connections
2013-02-04 17:48:04 +01:00
Michael Peter Christen
e8f7b85b98 fixes to internal RWI usage if RWI is switched off (NPE etc) 2013-02-04 17:11:02 +01:00
Michael Peter Christen
3834829b37 bugfixes and more logging for solr connector 2013-02-04 16:42:10 +01:00
Michael Peter Christen
80fe3d7860 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
Conflicts:
	source/net/yacy/cora/federate/solr/connector/EmbeddedSolrConnector.java
2013-02-04 10:57:54 +01:00