Commit Graph

210 Commits

Author SHA1 Message Date
Michael Peter Christen
76202f068e extended abstraction of local and remote solr index using one front-end
for index administration and querying.
2012-07-24 17:23:29 +02:00
Michael Peter Christen
826967513b changed options in IndexFederated_p to switch on/off parts of the index
individually. The settings are experimental and the values of the
settings will be overwritten when an index migration from urldb to solr
starts.
2012-07-23 16:28:39 +02:00
orbiter
69e743d9e3 - more abstraction for the RWI index as preparation for solr integration
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
orbiter
05a3ffd03a patches to ensure that solr connectors are active ony if they have a
solr object assigned and vice versa
2012-07-20 11:47:50 +02:00
orbiter
5a3c829872 embedded solr is only initiated if it is activated with
IndexFederated_p.html
2012-07-20 11:40:33 +02:00
Michael Peter Christen
97b7bcf2a6 added a solr search index
- by default, a (empty) solr storage instance is created at
SEGMENTS/solr_36
- the index is written if in /IndexFederated_p.html the flag "embedded
solr search index" is switched on
- a standard solr query interface is available now with a new servlet at
http://127.0.0.1:8090/solr/select

To test this, do the following:
- switch to webportal mode
- switch on the feature as described
- do a crawl. this fills the solr index. The normal YaCy search will NOT
work now!
- do a solr query, like:
http://127.0.0.1:8090/solr/select?q=*:*
http://127.0.0.1:8090/solr/select?q=text_t:Help
play with different search fields as you can see in
/IndexFederated_p.html
You can use the standard solr query attributes as described in
http://wiki.apache.org/solr/SearchHandler
2012-07-19 11:34:05 +02:00
orbiter
c00a3cf74d less usage of generic logger to avoid logger generation overhead 2012-07-12 19:54:54 +02:00
orbiter
e76159040b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-12 11:14:04 +02:00
orbiter
bbfa497a3c replaced more size() > 0 by !isEmpty() 2012-07-12 11:12:21 +02:00
Michael Peter Christen
58e7d1952f reduction of logging to prevent too much IO caused be logging 2012-07-12 02:08:11 +02:00
orbiter
0cbda0b2b8 - replaced all length() == 0 and size() == 0 with isEmpty()
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
orbiter
c7afa8bc48 using SwitchboardConstants for solr attributes 2012-07-10 12:01:20 +02:00
orbiter
c6d8950651 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-07-09 14:33:11 +02:00
orbiter
62202e2d71 refactoring of query attribute variable names for better consistency
with (next) stored query words
2012-07-09 11:14:50 +02:00
Michael Peter Christen
d09d9f2364 filter old peers from bootstrap (now stronger: 60 minutes instead of
240).
2012-07-08 21:25:22 +02:00
Michael Peter Christen
b0c408788b made class methods static where possible 2012-07-05 12:38:41 +02:00
Michael Peter Christen
7c1ba99755 removed more unused method parameters 2012-07-05 10:44:30 +02:00
Michael Peter Christen
0301aba1e9 removed unused method parameters 2012-07-05 10:23:07 +02:00
Michael Peter Christen
241dd8410a removed snippet pattern filter - it was not used 2012-07-05 09:21:27 +02:00
Michael Peter Christen
d3964253ae - added @SuppressWarnings to unused servlet method parameters
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
Michael Peter Christen
ea10766bfd cleaned unnecessary nested code 2012-07-05 08:44:39 +02:00
orbiter
fc0f9543fe More SentenceReader cleanup 2012-07-05 00:20:58 +02:00
orbiter
d4291ac1f3 more tolerance when creating solar document 2012-07-04 21:15:38 +02:00
orbiter
78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
of the major CPU users during snippet verification. The class was not
efficient for two reasons:
- it used a too complex input stream; generated from sources and UTF8
byte-conversions. The BufferedReader applied a strong overhead.
- to feed data into the SentenceReader, multiple toString/getBytes had
been applied until a buffered Reader from an input stream was possible.
These superfluous conversions had been removed.
- the best source for the Sentence Reader is a String. Therefore the
production of Strings had been forced inside the Document class.
2012-07-04 21:15:10 +02:00
Michael Peter Christen
613b45f604 - better data structures in secondary search
- fixed a big memory leak in secondary search
2012-07-03 07:12:20 +02:00
Michael Peter Christen
de903a53a0 parser refactoring & hacks 2012-07-03 06:06:38 +02:00
Michael Peter Christen
8a82609360 - smaller caches to save memory
- close cloneable iterators to free memory
2012-07-02 15:40:40 +02:00
Michael Peter Christen
7249d9c9de bugfix for concurrent seed loader 2012-07-02 14:37:57 +02:00
Michael Peter Christen
c72d3b12cd concurrently initialize the seed list during p2p network bootstrap 2012-07-02 14:27:37 +02:00
Michael Peter Christen
1825f165b8 better integration of blacklist according to use case 2012-07-02 13:57:29 +02:00
Michael Peter Christen
c18fa9fa75 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-07-02 12:20:57 +02:00
Michael Peter Christen
ce8d4b87d9 fixes for new eclipse 'Juno' warning 'Resource leak'. 2012-07-02 10:27:46 +02:00
Michael Peter Christen
0c345d1559 giving threads name so its easier to see whats happening during
debugging and within a thread dump
2012-07-02 09:51:43 +02:00
reger
067728bccc add search result heuristic. adding a crawl job with depth-1 for every displayed search result (crawling every external linked page of displayed search result pages) 2012-07-01 00:12:20 +02:00
Michael Peter Christen
03280fb161 removed segments-concept and the Segments class:
the segments had been there to create a tenant-infrastructure but were
never be used since that was all much too complex. There will be a
replacement using a solr navigation using a segment field in the search
index.
2012-06-28 14:27:29 +02:00
Michael Peter Christen
508a81b86c added solr field 'refresh_s' which stores the refresh url contained in
the meta-refresh html header field.
2012-06-28 13:27:45 +02:00
Michael Peter Christen
9116013c64 - allow lazy initialization of solr value (if using 'lazy', then no
0-values and no empty strings are written). This may save a lot of
memory (in ram and on disc) if excessive 0-values or empty strings
appear)
- do not allow default boolean values for checkboxes because that does
not make sense: browsers may omit the checkbox attribute name if the box
is not checked. A default value 'true' would not comply with the
semantic of the browsers response.
- add a checkbox in IndexFederated_p for the lazy initialization of solr
fields.
2012-06-27 12:17:58 +02:00
Michael Peter Christen
0294a53459 - add canonical field only if requested by solr schema
- remove canonical url from in/outbound urls if present
2012-06-26 14:51:57 +02:00
Michael Peter Christen
3fd4a01286 added option to record urls that are forwarded to the solr index 2012-06-26 13:54:48 +02:00
Michael Peter Christen
96aeb127e3 generalized localhost naming.
this is also a preparation for a better IPv6 implementation.
2012-06-26 00:08:25 +02:00
Michael Peter Christen
77f795756c fixing redirects and status codes: storing of status code in
ResponseHeader to make it available for late evaluations, like storage
in solr.
2012-06-25 18:17:31 +02:00
Michael Peter Christen
8dd469b9dd added option to configure the autocommit delay time of solr on-the-fly 2012-06-25 14:59:46 +02:00
Michael Peter Christen
b9dfca4b0a - fixed IndexFederated Servlet / a embedded Solr can now be selected
- added code stub for an embedded Solr but generation of Solr store is
still commented out (it works but is not yet ready for usage)
2012-06-25 11:34:38 +02:00
Michael Peter Christen
fad3b14813 added jetty libraries, needed for future use as web server and as
application server for the solr search interface
2012-06-22 15:31:17 +02:00
Michael Peter Christen
a38b0a2c46 extended embedded solr tests to ensure that it will be usable within a
jetty instance
2012-06-22 11:40:02 +02:00
Michael Peter Christen
b9d42fd9c8 using com.google.common.io.Files instead of homebrew methods 2012-06-22 11:39:17 +02:00
Michael Peter Christen
a5eb91fa60 refactoring 2012-06-22 00:49:32 +02:00
Michael Peter Christen
1be0025a9c - added test for EmbeddedSolrConnector
- added needed libraries for this test
this includes most (all) files needed for an embedded solr
2012-06-22 00:36:49 +02:00
Michael Peter Christen
e12bb254b4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-06-21 14:55:50 +02:00
Michael Peter Christen
3f55dc7c1e - added solr core and libraries that solr needs (lucene is missing, will
follow later)
- added embedded solr connector which can connect to solr
programmatically (without using a server in between)
2012-06-21 14:55:38 +02:00