Commit Graph

762 Commits

Author SHA1 Message Date
orbiter
2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used
as path for solr index dumps (instead of the SEGMENTS path). This will
make a maintenance of index backups easier. It will also provide a tool
to migrate from an freeworld index to a webportal index.
2014-01-07 17:53:49 +01:00
orbiter
3cb6c7861f fixed shutdown authenticaton problem 2014-01-06 01:48:54 +01:00
Michael Peter Christen
2939b47986 removed non-working realm setting in http client (auth for localhost was
added in previous commit)
2014-01-05 15:04:18 +01:00
Michael Peter Christen
9bd71fdbb4 made the access tracker class static because it shall be used by the
jetty auth module
2014-01-05 05:04:28 +01:00
Michael Peter Christen
7d6fc79eb8 refactoring (usage of constant names for attributes of authentication
check)
2014-01-05 04:23:44 +01:00
Michael Peter Christen
b9d36e45e0 removed the &amp explicit encoding of ampersand character since this is
double-translated within the template replacement process.
2014-01-05 03:40:10 +01:00
reger
e9081c0f17 moved startup execAPIActions call after Jetty startup
execAPIActions require http to be up. The 10s sleep was sufficient to allow Jetty to start, 
but it's more robust to place the call after http is assigned to switchboard/serverSwitch.
2014-01-01 10:28:49 +01:00
orbiter
dcf46ce8f6 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-12-31 15:20:49 +01:00
orbiter
343d2ef49a new data type for access tracker (unfinished) 2013-12-31 15:20:34 +01:00
reger
dd8ea0cdd6 fix "add to blacklist" button style in IndexControlRWIs_p
- added default filename filter to select field (as only addition to *.black list is permanent)

- modified Blacklist_p header/legend to show all active blacklists 
  (to support understanding that all configured lists are active)
- removed obsolete code in Blacklist_p servlet
2013-12-30 20:03:59 +01:00
reger
abbf487023 fix QueryGoal Image query (missing space)
see query log example .. url_file_ext_s:(jpg OR png OR gif) ORcontent_type:(image/*)) ..
2013-12-29 20:14:10 +01:00
reger
26e9d7e066 fix NPE in IndexControlRWIs_p.html
- metatags my be null
Caused by: java.lang.NullPointerException
	at net.yacy.search.query.QueryParams.getFacets(QueryParams.java:445)
	at net.yacy.search.query.QueryParams.getBasicParams(QueryParams.java:400)
	at net.yacy.search.query.QueryParams.solrTextQuery(QueryParams.java:345)
	at net.yacy.search.query.QueryParams.solrQuery(QueryParams.java:334)
	at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:290)
	at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:176)
	at IndexControlRWIs_p.genSearchresult(IndexControlRWIs_p.java:641)
	at IndexControlRWIs_p.respond(IndexControlRWIs_p.java:141)
2013-12-29 08:05:37 +01:00
reger
7f9b9315fe Merge origin/master 2013-12-29 02:05:07 +01:00
reger
8eaabb9600 remove dependency from old serverCore.java
- remaining getPortNr not needed 
  (as current release allows only to set plain integer as port,
   see ConfigBasic)
2013-12-29 02:00:44 +01:00
orbiter
3961b643a3 write solr searches to search log 2013-12-29 01:25:44 +01:00
orbiter
15882beb19 fix for strange NPE
java.lang.NullPointerException
        at
net.yacy.search.Switchboard.updateMySeed(Switchboard.java:3667)
        at net.yacy.peers.Network.peerPing(Network.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at
net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107)
        at
net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165)
2013-12-29 00:40:31 +01:00
orbiter
f3ac923a7e ftp client shall be able to open non-anonymous ftp servers if login
details are given
2013-12-28 22:42:02 +01:00
Michael Peter Christen
ee17bd0b69 added option to attach remote solr servers in read-only mode 2013-12-27 02:55:21 +01:00
Michael Peter Christen
25f9c35033 add patch which shall prevent that naive search mistakes like usage of
regular expressions cause no results. Usage of '*' followed by a dot or
any expression will now cause that this expression is used as a filetype
search.
2013-12-27 00:34:55 +01:00
reger
71cac1a278 added SSL/HTTPS connector to support SSL/https connection on port 8443
!!! attention !!! to make sure YaCy can start, https will be disabled if port 8443 is used
   - added ping test for above to migration 

- as of now port for https is hardcoded to default 8443
- if not urgend required I'd leave it this way (it's standard) to use different ports for http and https 

- post https port on ConfigBasic.html (if active)
2013-12-25 05:20:13 +01:00
Michael Peter Christen
82c0525e71 wrong logger fix 2013-12-23 10:52:02 +01:00
Michael Peter Christen
25250405f1 solr servlet preparation for join with jetty branch 2013-12-20 00:45:58 +01:00
Michael Peter Christen
2f16770681 migrated to solr 4.6.0 2013-12-19 21:51:05 +01:00
orbiter
937273d4e3 added parsing of metadata to surrogate reading:
a dublin core record inside of surrogate input files may now contain
tokens within the namespace 'md' (short for: metadata). The token names
must be valid withing the namespace of the solr field names. All
md-tokens inside of surrogate files then overwrite values within solr
documents before they are written to the solr index. This makes it
possible to assign collection names to each surrogate entry and also
ranking information can be added. Please see the example file.
2013-12-17 14:02:27 +01:00
Michael Peter Christen
2702d9e56b - added a SolrQueryResponse2SolrDocumentList method which is able to
work around the unfolding process in Solr's BinaryResponseWriter.
This was a huge performance bottleneck in the embedded solr connector
and the problem is actually on Solr side, but we have now a workaround.
- This made it possible to abstract a high-performance index access
method which is implemented as method getDocumentListByParams. That
method is also implemented in the SolrServerConnector and provides a
very efficient access to a solr index if the index is embedded.
- a popular use of the document list retrieval is a result count which
can now also make use of the new method, via getDocumentCountByParams.
- enhanced the Error cache which now does not store error documents
within the ram cache if the document is also written to solr. When
documents are retrieved from the cache, they are partly read from the
ram cache and if not existent there, from the Solr index.
2013-12-13 15:56:29 +01:00
Michael Peter Christen
552ef9f18e fix for bad ErrorCache.exists test (bug from latest commit) 2013-12-12 10:38:32 +01:00
Michael Peter Christen
09412ea3a4 counting search requests in solr interface 2013-12-12 03:37:19 +01:00
Michael Peter Christen
303f5694ba avoid usage of existsByQuery. If a document can be loaded by the ID
before testing other fields from the existsByQuery request, then a
document cache fills and queries after that one can be avoided.
2013-12-12 03:36:30 +01:00
Michael Peter Christen
78eac85161 better calibration of caches and queue maximum sizes 2013-12-04 23:15:10 +01:00
Michael Peter Christen
c8af19bd37 removed unnecessary check which causes a NPE when searching with empty
search string
2013-12-04 17:58:36 +01:00
Michael Peter Christen
e3c2f09de9 - reduce computation in case that specific postprocessing fields are not
selected
- de-select citation rank computation
2013-12-04 17:48:12 +01:00
Michael Peter Christen
cfa08024c7 removed optimization bevore postprocessing because that may cause a
time-out which will cause that postprocessing fails.
2013-12-04 16:04:29 +01:00
Michael Peter Christen
6f3a923691 fixed urlmask which was not able to combine several constraints 2013-12-04 13:48:01 +01:00
Michael Peter Christen
a125904a1c fixed a NPE in surrogat processing 2013-12-04 01:56:38 +01:00
Michael Peter Christen
0db8e34625 enhanced webgraph processing 2013-12-04 01:54:45 +01:00
Michael Peter Christen
a16534cb0a tried to fix timeout and connection-lost problems when using an outside
solr.
2013-11-28 01:31:53 +01:00
Michael Peter Christen
c3dcbdc8d5 try to recover from an OOM during citation index reading and fail-over
to second solr core in case of unrecoverable OOM.
2013-11-28 01:10:25 +01:00
Michael Peter Christen
9932c441c8 fixed a problem with Date fields parsing Solr results if a remote Solr
is attached.
2013-11-28 00:54:53 +01:00
Michael Peter Christen
ae55d69ef6 include/exclude size NPE fix (recently added) 2013-11-26 11:47:04 +01:00
Michael Peter Christen
2c39b65409 fixes for searches containing stopwords. The fix was done using a
reconstruction of the search word set access method to protect that
words are deleted from the sets from the outside of the QueryGoal class.
2013-11-26 02:24:47 +01:00
orbiter
037cd0a57c using the BinaryResponseWriter which is supported within the YaCy solr
servlet since YaCy 1.63. This is much more performant for the client
than using the XMLResponseWriter because parsing of XML data is very CPU
intensive. Older YaCy peers are still requested using the
XMLResponseWriter but the majority of YaCy peers already respond with
the binary writer. This makes remote searches much faster and less CPU
intensive.
2013-11-25 21:31:40 +01:00
orbiter
61409788eb less word hash computations (removing some overhead because of MD5
calcs) using the clear word in a normalized form.
2013-11-25 15:20:54 +01:00
reger
f23471c471 add check to prevent index entries containing url_file_ext_s with ";jsession=xyz"
note: check could be implemented in MultiProtocolURL (but at this time didn't oversee possible implication)
2013-11-25 00:14:53 +01:00
orbiter
3e552550d1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-11-18 22:48:00 +01:00
orbiter
c2d720cdaf purge a lucene cache - possible memory leak fix 2013-11-18 22:47:35 +01:00
reger
e4f49fb175 for searchresults with empty title use filename as title
- to not store a title in index which isn't extracted from source 
  the title is empty check only added to ResultEntry class
2013-11-18 19:41:31 +01:00
orbiter
da33ee0d77 extended also timeout fr webgraph postprocessing 2013-11-16 18:30:06 +01:00
orbiter
74f9e40747 extended timeout during postprocessing of 30 minutes. 2013-11-16 18:29:08 +01:00
orbiter
19a051bec8 more monitoring for postprocessing and enhanced layout in Crawler
monitor page
2013-11-16 18:23:14 +01:00
Michael Peter Christen
9cf9727685 fix for wrong counter 2013-11-16 11:33:35 +01:00