Commit Graph

58 Commits

Author SHA1 Message Date
Michael Peter Christen
88cd17ea57 migrated solr from 8.9.0 to 8.11.2; activated also migration script. A YaCy index with solr 8.9.0 will automatically be migrated to 8.11.2. This is a preparation step to migrate to 9.0.0 soon. 2023-09-01 18:24:52 +02:00
Michael Peter Christen
1c0f50985c fixed documentation and some details of handling of keywords 2023-04-04 12:41:12 +02:00
sgaebel
1cdc55a425 lets SOLR merge bigger segments (up to 50GB)
+ some setting to reduce caches
2021-10-31 11:33:42 +01:00
Michael Peter Christen
8b4394a6c5 fixes for solr 8.8.1 migration
- replace new guava 30 with older 25 because that is the correct
dependency for solr 8.8.1. The newer one did actually not work!
- index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1
subfolder. The older solr_6_6 index is not touched but also not
migrated. The index starts with fresh (empty) content.
- Older indexes must be migrated by hand (export/import) so far until a
better solution is found.
- Large schema adoptions for lucene 8.8.1
2021-03-08 13:39:27 +01:00
luccioman
5a3d5cb92c Upgraded Solr config files with the ones provided by Solr release
Fixes #292
2019-04-16 10:25:48 +02:00
luccioman
4196101379 Enable soft autocommit in default Solr config
Since upgrade from Solr 5.5 to Solr 6.6 (commit 6fe7359), hard
autocommits were still enabled to regularly persist the Solr index to
the file system, but new index entries were no more automatically made
available for use by the application (soft autocommit).
Therefore, YaCy features such as index statistics, that do not perform
an explicit commit (as recommended by Solr documentation) were no more
accurate.
Soft autocommit is now restored as a default, with a time period
expected to be sufficient for accuracy while adding only a reasonable
system load overhead.

Fixes issue #251
2018-11-19 08:49:13 +01:00
reger
41616de0b8 Add SolrConfig ClassicIndexSchemaFactory to prevent Solr startup warning.
This overrides Solr default to use managed schema. As we don't use
programatic schema changes this directs Solr to use schema.xml, eliminating
the warning.
2017-07-23 03:55:56 +02:00
reger
9220ccbec7 remove reference to velocityresponsewriter in solrconfig.xml
it is not longer part of solr-core api
http://lucene.apache.org/solr/6_6_0/index.html
2017-06-16 00:12:09 +02:00
reger
4be4bfbba6 remove sample path setting in solrconfig.xml not valid in Yacy
resulting in startup stop exception after fresh swithch to 1.921
2017-06-15 21:02:18 +02:00
luccioman
f6e8d71718 Prevent high CPU load at startup, caused by the Solr suggester build.
Reported by Collision on mantis 758 (
http://mantis.tokeek.de/view.php?id=758 ).
Introduced by the new YaCy Solr configuration for Solr 6.6.0 (see commit
6fe735945d), including now Suggester
configuration.
2017-06-15 14:13:46 +02:00
Michael Peter Christen
6fe735945d migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8
Also: now Version 1.921
2017-06-09 12:25:23 +02:00
reger
35a7d57260 update lucenematchversion to current (5.2.0 -> 5.5.0)
there should be no need for reindex by the update
2016-07-23 18:36:43 +02:00
luc
55a4d15775 Added a note on deprecated default search field and operator. 2015-12-14 23:55:12 +01:00
sixcooler
f5a9948860 do not store subfield *_coordinate 2015-11-10 20:32:42 +01:00
sixcooler
fca353e5eb set startuptype of most solr handlers to lazy 2015-11-10 20:32:05 +01:00
reger
c720b4c249 remove override of dynamicField coordinate_p in solr schema
(coordinate_p is not a mandatory field as such doesn't need to be declared as schema.field)
2015-10-24 22:44:28 +02:00
reger
5e45f1a460 enable Solr schema dynamicField _p (type=location) for YaCy coordinate_p field 2015-09-01 21:47:25 +02:00
sixcooler
87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
moved and was not cleared anymore. This results in an huge fieldcache.
(http://lucene.apache.org/#highlights-of-the-lucene-release-include
https://issues.apache.org/jira/browse/LUCENE-5666)
Here I try to use DovValues where it is possible.
For this I used the Api-Scheme as new basis für the Solr-Schema.
This needs at least a complete optimization of the Solr-Index to get a
smaller FieldCache.
Everything that is indexed with these setting will not use the
Fieldcache at all.
2015-08-31 20:24:41 +02:00
reger
00d2062813 Rem depreciated AdminHandlers in solrconfig.xml
avoid warning log
W  org.apache.solr.handler.admin.AdminHandlers <requestHandler name="/admin/"  class="solr.admin.AdminHandlers" /> is deprecated . It is not required anymore
2015-07-01 00:58:23 +02:00
Michael Peter Christen
694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
This is a very complex migration: many classes had been renamed or
removed, dependencies changed and the solr index type is now aligned to
be a solr cloud repository.
Together with the Solr 5.2 library update, one other dependent library
had been updated as well: httpclient 4.4->4.4.1

Older indexes are migrated from 4_10 to 5_2. However, the new index
structure is more efficient and we recommend to re-index everything.
Please use the index export before you do the update to a large
surrogate xml file. After the update, start with an empty index and then
initialize this with your dump.
2015-06-24 01:55:51 +02:00
Michael Peter Christen
36e9cdb376 testing switching off cold searchers; maybe this brings performance
enhancements when using large facets
2015-04-07 13:14:41 +02:00
sixcooler
5594c43d2e bump to Solr-/Lucene-4.10.3 2015-01-04 18:47:47 +01:00
sixcooler
725b206fb4 update to solr-/lucene-4.10.2 2014-11-07 18:51:31 +01:00
Michael Peter Christen
09dcdb9b19 update to solr 4.9.0 2014-07-01 16:39:00 +02:00
Michael Peter Christen
d4157184ec migration to Solr 4.8.1
This includes also an update to zookeeper 3.4.6 and a new library that
Solr initializes by default: org.restlet from
http://restlet.com/download/current#release=stable&edition=jse&distribution=zip
which is included in version 2.2.1 from may 6th 2014
2014-05-21 11:48:08 +02:00
Michael Peter Christen
ebd44a7080 replaced solr 4.6.1 with solr 4.7.1 and added index migration to
lucene_47
2014-04-06 10:45:03 +02:00
Michael Peter Christen
ee92d748b5 test using compound file format, see UseCompoundFile in
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
This appears to be necessary as many times a
java.io.FileNotFoundException: (Too many open files) appears.
See also: https://issues.apache.org/jira/browse/SOLR-4 and desperate
users at
http://stackoverflow.com/questions/3828343/too-many-open-file-exception-while-indexin-using-solr
We cannot force users to do a "ulimit -n 1000000", so this action seems
to be required.
2014-04-06 00:35:35 +02:00
orbiter
f77afa9d1d add index on _val fields, this affects especially title length
an index on fields make search facets on that field possible
2014-03-04 11:24:04 +01:00
Michael Peter Christen
2f16770681 migrated to solr 4.6.0 2013-12-19 21:51:05 +01:00
Michael Peter Christen
a5c1249ee2 reverted autowarming setting in solrconfig 2013-11-09 01:43:44 +01:00
Michael Peter Christen
81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The
not-flushed Solr cache is now handled in this way:
- it is smaller by default
- an Solr-internal process is started to flush the cache periodically
(this does NOT clean the cache, just removes old objects)
- a Solr-external process (the standard YaCy cleanup-process) now has
direct access to the solr internal cache and flushes them completely.
The time frame for such a flush is defined by the cleanup-process
frequency, by default 10 minutes.
2013-11-07 10:01:44 +01:00
Michael Peter Christen
21aa6a0321 migration to Solr 4.5.0 2013-10-07 17:09:40 +02:00
Michael Peter Christen
1a3e42eca4 index migration to lucene 4.4 2013-08-26 12:49:39 +02:00
sixcooler
1bc6003057 rise autoCommit maxTime to 3 Minutes to reduce IO
lower mergeFactor again (5) for less segments
2013-08-06 03:58:53 +02:00
orbiter
1b43e02b86 Merge branch 'master' of git://gitorious.org/~quix0r/yacy/quix0rs-yacy-rc1 2013-07-13 18:54:18 +02:00
orbiter
a548354c71 replaced type of solr schema object sku of text_en_splitting_tight by
string
2013-07-13 18:54:09 +02:00
Roland Haeder
ebbb3bc5c1 Fixed CHMOD on many files + added missing loggers (e.g. jena) and made some noisy loggers quiet 2013-07-13 13:12:36 +02:00
Michael Peter Christen
7754a1263b switching back to the merge factor 10; the solr default. 2013-06-12 11:29:35 +02:00
Michael Peter Christen
959ccc4675 increased the solr merge factor because 4 was too much IO load for
frequent index receiving and re-indexing after clickdepth/cr
calculation.
2013-06-11 16:51:40 +02:00
reger
8a7fcb391d enable use of solrcore.properties for property substitution of solrconfig.xml
- move setting of system property solr.directoryFactory=solr.MMapDirectoryFactory to solrcore.properties
- add check of os.arch for 64bit system, if it fails use default/solrcore.x86.properties (if exists) as solrcore.properties
 
reason: on 32bit MMapDirectoryFactory may fail with.....
Caused by: java.io.IOException: Map failed
	at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849)
	at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283)
2013-06-01 05:43:08 +02:00
Michael Peter Christen
eb9d0ba5b1 ranking and boost function update, small bugfixes, better default search
field for solr
2013-05-30 16:30:35 +02:00
Michael Peter Christen
a8dc4346e8 default configuration of MMapDirectoryFactory for solr, increased lock
timeout, less documents from remote searches (too many results had
easily blocked a peer)
2013-05-30 12:31:28 +02:00
Michael Peter Christen
0c1a018bbd removed 'later' tactic because it used too much RAM, reduced number of
soft commits, reduced caching size of search events, ensured that solr
results are processed before connection is closed to keep that stuff not
too long in RAM
2013-05-29 18:27:27 +02:00
Michael Peter Christen
9bd2aee180 migrated to solr 4.3.0 2013-05-09 02:17:53 +02:00
Michael Peter Christen
addba047e2 changes in ranking computation
- an existing ranking servlet for solr was extended. It is now possible
to set boost values for fields, boost functions and boost queries.
- The ranking can have different instances, but currently only the first
one is used
- added an abstraction layer for fields which can be used for search and
those fields can be edited in the solr ranking configruation
- the ranking value from solr within the field score is used to combine
remote search requests, which all are created using the same locally
defined boost values
- reduced the number of fields which are used for search (makes it
faster)
- replaced some text fields by string fields (makes indexing faster)
- removed classes which had no use
- made a large number of experiments for a better ranking and created a
temporary setting which prefers hits inside titles
- adjusted also the RWI-based ranking computation to 'prefer title'
- made special cases like for portal search where no post-processing and
post-ranking is wanted: this keeps the original ranking order as done by
Solr
- fixed many bugs with old settings for ranking
2013-03-13 14:47:00 +01:00
Michael Peter Christen
91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
structure, but is not filled yet. To have the opportunity of a second
core, multi-core functionality had to be implemented to the
deep-embedded solr:
- migrated the solr_40 directory content to a subdirectory
'collection1'; the previously used default core is now called
collection1
- added solr_40/webgraph subdirectory as second core
- added a servlet configuration for the second core 'webgraph' in
/IndexSchema_p.html
- added instance handling as addition to solr connections: all solr
connectors are now instances of an solr 'instance' object; this required
a complete re-design of the solr embedding
- migrated also caching and sharding ontop of new instance handling
- migrated the search apis to handle now the access to a specific core,
the default core named 'collection1'
- migrated the remote solr search interface to access shards of cores;
for the yacy remote search the default core is now called 'solr'; using
the peer address as solr address
- migrated the solr backup and restore process: old backups cannot be
used after this migration!
- redesign of solr instance handling in all methods which access the
instances: they cannot hold copies of these instances any more; the must
retrieve the actuall connection object every time they want to write to
it (this solves also some bugs when switching the index/network)
- added another schema 'solr.webgraph.schema', the old solr.keys.list is
replaced by solr.collection.schema
2013-02-21 13:23:55 +01:00
Michael Peter Christen
8651ec35fe turned author_s into the multi-valued field author_sxt 2013-01-24 18:24:31 +01:00
Michael Peter Christen
9b5bdae1b4 Reverted setting of MMapDirectoryFactory from solrconfig; see
http://forum.yacy-websuche.de/viewtopic.php?p=27509#p27509
Instead, in the start script is checked if the host is a 64 host and
-Dsolr.directoryFactory=solr.MMapDirectoryFactory is set as java option

Reverted the ramBufferSizeMB setting (this was not enabled anyway)
because that may be too much memory for small peers and embedded
systems.

Activated the mergeFactor 4; this was commented out by mistake
2013-01-21 17:55:28 +01:00
orbiter
eb68a30947 solr performance settings
the target of these performance settings is the reduction of IO in
general and during search in particual.
- reduced mergeFactor to 4. This will increase the IO during indexing,
but will reduce IO during search. It will also greatly reduce the number
of open files which should make it possible to have overall larger
indexes until the number of open files in an OS is reached.
- increased ramBufferSizeMB to 256mb. This will reduce the number of
commits. This change may compensate the reduction of the mergeFactor.
- disabled updateLog. This is a real-time search feature which is
available in YaCy anyway because a commit is forced if index.html is
called. The updateLog feature causes a lot of IO during indexing and
search and produced a lot of files in SEGMENTS/solr_40/data/tlog
2013-01-19 11:21:33 +01:00
Michael Peter Christen
f53703df62 using MMapDirectoryFactory as solution for ClosedChannelException given
in https://issues.apache.org/jira/browse/SOLR-2247
2013-01-16 14:35:37 +01:00