Commit Graph

267 Commits

Author SHA1 Message Date
Michael Peter Christen
316b5fe116 - added a solr type definition verifier
- fixed type definition found by the verifier
- added multivalue-string fields for solr with extension 'sxt'
- added multivalue-integer fields for solr with extension 'val'
- renamed some solr attributes from txt to sxt
- changed solr query line to an explicit AND/OR structure
- added a country code second level domain list to Domains class; with
parser
- added a host string parser to get domain class name, country-code
second-level domain and subdomain out of it
- removed old coordinate attributes
2012-08-28 16:58:06 +02:00
Michael Peter Christen
4521d63c92 added boosts to solr search queries 2012-08-27 15:25:25 +02:00
Michael Peter Christen
e8acd542b5 - added faceted drill-down for host and geolocation to solr queries
- added a new geolocation field to index schema, the old values are
migrated if possible
2012-08-27 14:41:33 +02:00
orbiter
29171e2f6c fixed generation of ontologies from index enumerations 2012-08-24 14:13:42 +02:00
orbiter
01a63ef595 redesign of YaCySchema and SolrDoc handling 2012-08-23 09:51:45 +02:00
orbiter
479bfca571 refctoring 2012-08-23 09:30:11 +02:00
Michael Peter Christen
48a82bc705 log queries anonymous from gsa+solr requests 2012-08-22 23:50:40 +02:00
Michael Peter Christen
ab6ec4ec52 added snippet computation to solr/rss and gsa result writer 2012-08-22 17:37:34 +02:00
Michael Peter Christen
4716546ef5 - reduced memory usage in index transmission using a transformation of
Node to Row objects
- removed peerDeparture in solr remote search in case that peer does not
answer (this may be normal because it is allowed to switch this off)
2012-08-22 16:30:33 +02:00
Michael Peter Christen
653645c1cf corrected solr query syntax 2012-08-22 00:48:03 +02:00
orbiter
716ea0cfe2 sorted the solr schema into mandatory and optional fields; reduced
number of used field to reduce solr index size
2012-08-21 23:52:56 +02:00
orbiter
9b8c8c0f47 fix from gaston in
http://forum.yacy-websuche.de/viewtopic.php?p=26909#p26909
2012-08-21 21:03:26 +02:00
Michael Peter Christen
a049761e0c fixed double-check 2012-08-20 14:16:37 +02:00
Michael Peter Christen
f42a57cd7d gsa format update 2012-08-20 12:50:51 +02:00
Michael Peter Christen
ff3eaa21b0 added remote search to solr on YaCy peers!
- when doing a remote search, node peers are selected for solr queries
- the solr query is done concurrently to the standard YaCy rwi search
- the solr search result is feeded into the same data structure that
prepares the rwi search result
- the same remote seach that is done to several outside peers is done to
the local solr index
- the search process works now also without any 'old' RWI data using
solr
2012-08-20 12:16:11 +02:00
Michael Peter Christen
a06123aec6 more abstraction and less parameter overhead for remote search 2012-08-20 01:29:15 +02:00
Michael Peter Christen
f00733186b code simplifications 2012-08-19 13:17:03 +02:00
Michael Peter Christen
db0d438709 fix for http://bugs.yacy.net/view.php?id=206 2012-08-19 08:43:56 +02:00
orbiter
404b0aab09 refactoring in remote search and stub for remote node peer selection 2012-08-18 23:59:25 +02:00
orbiter
d7ea45f698 - get nice text_t values from metadata conversions that are stored into
solr as fulltext search index.
- added slow migration from old metadata to solr index entries: each
entry from the old metadata is removed from that data structure and
written into solr.
2012-08-18 19:36:21 +02:00
orbiter
99ef57f103 reduced sleep times 2012-08-18 17:48:20 +02:00
orbiter
780f8974e7 added ramaining iteration methods for solr in fulltext class 2012-08-18 15:39:14 +02:00
orbiter
ee01c12e56 fixes for putDocument and putMetadata 2012-08-18 13:05:27 +02:00
orbiter
cc47a0876e reverted bf55f69176
to have a fall-back option in case that memory problems as reported in
http://forum.yacy-websuche.de/viewtopic.php?p=26901#p26901
for full-solr installation are too strong and we have to work with an
'small memory footprint' peer system.
2012-08-18 10:28:40 +02:00
Michael Peter Christen
0904afe8fb added concurrent iterator methods to the solr connectors 2012-08-17 18:22:56 +02:00
Michael Peter Christen
0cab06c47c refactoring 2012-08-17 15:52:33 +02:00
Michael Peter Christen
bf55f69176 removed write methods to old metadata file type; all metadata now goes
to solr
2012-08-17 15:46:26 +02:00
Michael Peter Christen
40c0856489 refactoring 2012-08-17 15:33:02 +02:00
Michael Peter Christen
06a78eecb7 code simplification 2012-08-17 14:43:32 +02:00
Michael Peter Christen
9bece5ac5f enhanced snippet fetch - removed a bug that caused documents to be
parsed even if a solr text was available
2012-08-17 14:22:07 +02:00
Michael Peter Christen
18f989dfb1 - refactoring (load -> getMetadata)
- added getDocument to retrieve Solr documents which shall replace
getMetadata
2012-08-17 01:34:38 +02:00
Michael Peter Christen
395b78a0d8 using the solr search index to concurrently search within solr and the
rwis during local search requests.
2012-08-17 01:21:56 +02:00
Michael Peter Christen
6197caf698 added clear-text search words in query params 2012-08-16 23:05:37 +02:00
Michael Peter Christen
23226676c6 FOR THE BRAVE.. this is a forced migration to solr which is now ready
for production as a replacement of the metadata-db.
This intermediate release 1.041 will switch on the previously optional
solr index and the old metadata-db will still work as it did before.
Solr+metadata are accessed in mixed mode, no migration is done yet.
If this causes not a catastrophe until the end of the weekend, we will
do a YaCy 1.1 main release containing this as default.
2012-08-16 18:17:47 +02:00
Michael Peter Christen
d988ba50cf added a very rudimentary, incomplete, non-verified GSA response writer
for solr. Try this:
http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10
2012-08-14 12:40:26 +02:00
Michael Peter Christen
aab0b680c3 - added xslt support for solr result formats.
try i.e.
http://localhost:8090/solr/select?q=*:*&start=0&rows=10&wt=xslt&tr=json.xsl
- added servlet-side mime-type configuration for streamed servlets. this
is used for the result formatters in solr result formats
2012-08-14 11:12:50 +02:00
Michael Peter Christen
e5ef840f40 - renamed DoubleSolrConnector to MirrorSolrConnector and added a
hit/miss/document cache to the MirrorSolrConnector.
- more abstraction to SolrDocument in Connector interface
- bugfixes in Solr field reader
2012-08-13 13:32:32 +02:00
Michael Peter Christen
b51df6c7e8 - added coordinate storage in solr schema
- fixed shutdown process
- fixed some solr-to-metadata reading
- added a large number of metadata attributes in ViewFile.html
2012-08-13 10:40:04 +02:00
Michael Peter Christen
da851c6071 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-08-11 01:21:18 +02:00
Michael Peter Christen
bd4f03bc85 removed unused class 2012-08-11 01:05:40 +02:00
orbiter
39f8eb60c3 tried to prevent calls to bad-hack getSize() method and reduced overhead
of that method a bit.
2012-08-10 18:10:25 +02:00
orbiter
e816b88b55 changed behaviour of metadata storage: in case that any solr is
attached, the metadata is not written to the metadata-db, even if it is
enabled but instead to solr. This prevents that metadata is written in
two store systems at the same time. It is also the next step to migrate
the current metadata-db to solr.
2012-08-10 15:39:10 +02:00
orbiter
2571e0d47a removed unused classes 2012-08-10 14:47:44 +02:00
Michael Peter Christen
f9c0e6e950 - Implemented and integrated the URIMetadataNode object which is a
metadata representation from the solr index. This shall replace metadata
from the built-in database in the future.
- added the Solr-driven metadata into the search index of YaCy which
makes it now possible to run YaCy without the old metadata index. This
is a major stept forward to a full migration to Solr.
2012-08-10 13:26:51 +02:00
Michael Peter Christen
136fcb1ad9 refactoring 2012-08-10 06:47:13 +02:00
Michael Peter Christen
a12f693ec9 added two response writer for embedded solr interface:
a rss/opensearch writer and an enhanced solr xml writer.
The enhanced solr writer has less configuration overhead than the
original writer and should by slightly faster. The rss/opensearch writer
is at this time slightly incomplete compared with the already existing
rss search result form YaCy and also snippets are missing at this time.
To test the new interface, open for example:
http://localhost:8090/solr/select?wt=rss&q=olympia
The wt-code for the new result writers are=
wt=rss for opensearch
wt=exml for the enhanced solr xml writer.
Additionally, the SRU search parameters had been added to the solr
interface which can now also be used for a normal solr/xml search.
2012-08-09 18:06:48 +02:00
Michael Peter Christen
bca4a16603 replaced the multivalue generic string field name suffix _ss by _txt
because _ss is not part of the standard solr example schema.
2012-08-06 17:58:09 +02:00
orbiter
67edfd991c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-08-05 15:49:48 +02:00
orbiter
d9173ba7ed added more solr fields to integrate values from URIMetadataRow. All
writings to the Metadata-DB are now also done to solr. This includes
metadata transfer during search and rwi transfer.

The new/added solr fields are:

## time when resource was loaded
load_date_dt

## date until resource shall be considered as fresh
fresh_date_dt

## id of the host, a 6-byte hash that is part of the document id
host_id_s

## ids of referrer to this document
referrer_id_ss

## the md5 of the raw source
md5_s

## the name of the publisher of the document
publisher_t

## the language used in the document; starts with primary language
language_ss

## an external ranking value
ranking_i

## the size of the raw source
size_i

## number of links to audio resources
audiolinkscount_i

## number of links to video resources
videolinkscount_i

## number of links to application resources
applinkscount_i
2012-08-05 15:49:27 +02:00
Michael Peter Christen
3ce04cecf3 bad hack to prevent a bug appearing in solr 2012-07-31 23:49:07 +02:00