Commit Graph

2589 Commits

Author SHA1 Message Date
Michael Peter Christen
453bfd0f17 removed unused variables and warnings 2014-03-19 09:29:01 +01:00
Michael Peter Christen
05655d98df Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-03-17 11:57:01 +01:00
reger
9f02d2c47b fix: remove link to triplestore in Vocabulary_p (triplestore does not longer exist)
- should be investigated in more detail to look for additional implications
Remove "yacyaction" from proxyservlet as it was only needed for removed interaction routines.
2014-03-16 22:11:19 +01:00
reger
81a846ec33 fix: set YaCy CONNECTION_PROP_HOST Header in ProxyServlet to host incl. port 2014-03-16 20:51:32 +01:00
reger
251be9ecfa remove unused ProxySettings ref. from loader
clean unused whois test code
2014-03-16 05:19:01 +01:00
reger
82dc815af9 cleanup: remove unrelated and unused code 2014-03-16 00:15:12 +01:00
Michael Peter Christen
85a427ec54 support for multiple sitemaps in robots.txt 2014-03-14 13:33:23 +01:00
reger
a373fb717d remove more unused from legacy server.http
- triggerOnlineAction not used
- useTemplateCache not used
2014-03-14 03:12:04 +01:00
reger
749d020aeb remove redundant url string manipulation in HTTPDProxyHandler
(still used by ProxyServlet)
2014-03-14 02:24:12 +01:00
reger
612294cf84 use servletPath in ProxyServlet instead of fixed name
to allow servlet-mapping via web.xml
2014-03-13 02:46:05 +01:00
reger
1d01672bd3 fix DCEntry.getIdentifier
on successful url parameter
2014-03-12 23:35:57 +01:00
Michael Peter Christen
b08375da33 fix for bad/missing values of size_i 2014-03-11 09:51:04 +01:00
reger
6306d28a6a OAI import get multivalued keywords (dc:subject) 2014-03-09 03:15:35 +01:00
reger
0a8c8102de allow YaCy to start w/o ssl if JKS init fails 2014-03-07 20:06:14 +01:00
sixcooler
0b2101c59c Speed up the ProxyHandler:
simplified cache-storing and make it concurrent in order to free the
clientconnection asap
let other prozesses wait on proxy-access like it was bevore
2014-03-07 17:47:09 +01:00
reger
516f8c2489 fix: to allow unix scripts (bin/*.sh) to allways submit http admin apicalls
using auth via config hash (legacy requirement)
2014-03-07 00:16:57 +01:00
Michael Peter Christen
ea3aa30593 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-03-06 03:33:33 +01:00
reger
dd5bf0b71b cleanup old reference to HTTPDemon.setAlternativeResolver
optimize .yacyh check in AbstractRemoteHandler
2014-03-06 03:08:04 +01:00
Michael Peter Christen
51800007c4 - added concurrency to postprocessing of webgraph document
- bundeled separate webgraph postprocesing steps into one
2014-03-06 01:43:48 +01:00
Michael Peter Christen
5f4a6892c1 enhanced RowSet re-sort limit for small sets 2014-03-05 23:28:19 +01:00
reger
351c2be68d fix: make sure adminAccount changes made via ConfigAccounts_p are effective immediately
force to remove current credentials from knownuser cache
2014-03-05 02:59:27 +01:00
reger
5c9dcc269d improve OAI-PMH import identifier recognition
- find best fittng identifier (url) by checking all given dc:identifier in record (many entries proviede several identifiers)
  as identifier is currently a multivalued field use "getParams" in preference of splitting the 1st string by ";" 
- add resolve DOI:... identifier via http://dx.doi.org/
2014-03-04 03:08:37 +01:00
Michael Peter Christen
0e7d249a69 fixed another shutdown problem (only occurs if webgraph core is enabled) 2014-03-04 01:36:38 +01:00
Michael Peter Christen
e485fbd0ce - let crawl loader jobs die after 10 seconds without new jobs
- corrected shutdown order t prevent a deadlock during shutdown
2014-03-04 00:33:13 +01:00
Michael Peter Christen
bcd9dd9e1d enhanced concurrent loading by using a fixed set of concurrent loader
processes in favor of throwaway-processes. The control mechanism does
less often report a 'queue full' message to the busy loop which then
does not perform a long busy waiting; instead all requests are queued
and new loader processes are started if necessary up to a given limit
(as set before)
2014-03-03 22:13:40 +01:00
orbiter
051328271c bugfix-bugfix 2014-03-02 21:13:38 +01:00
orbiter
eedcbcd906 bugfix to proxy handler: recognize the own yacyh-host 2014-03-02 12:10:19 +01:00
orbiter
d68e5ad0c4 NPE fix for Thread name (just commited yesterday, sorry) 2014-03-02 11:20:48 +01:00
reger
6878c90f99 fix: IPv6 INTRANET_PATTERNS for local ip (see http://bugs.yacy.net/view.php?id=378)
requiring following ":" for fc and fd prefix and made pattern match case insesitive
- add some more ipv6 test cases to MultiProtocolURLTest.java
2014-03-02 06:13:21 +01:00
reger
a2e5ea2026 status panel link to set max mem
+url proxy same error text as in transparent
2014-03-01 00:56:45 +01:00
Michael Peter Christen
6ed9c0164e attaching names to all Threads to get a better view in profiling tools
like VisualVM
2014-02-28 15:02:01 +01:00
Michael Peter Christen
fdaeac374a - enhanced postprocessing speed and memory footprint (by using HashMaps
instead of TreeMaps)
- enhanced memory footprint of database indexes (by introduction of
optimize calls)
- optimize calls shrink the amount of used memory for index sets if they
are not changed afterwards any more
2014-02-28 14:01:09 +01:00
reger
ba49ff81ed little more verbose proxy 403 error message 2014-02-28 03:14:07 +01:00
Michael Peter Christen
d325cb8912 fixes and enhancements for postprocessing 2014-02-28 02:51:14 +01:00
Michael Peter Christen
7c1b968378 another fix for the shutdown exceptions 2014-02-28 01:53:32 +01:00
orbiter
133d41386c (again) full redesign of ConcurrentUpdateSolrConnector to remove
out-of-order transactions regarding add and delete operations. Now all
operations (add and delete) are executed concurrently in-order.
2014-02-28 00:19:30 +01:00
Michael Peter Christen
a632b0d2a4 added a forced commit to index deletion to enable synchronized index
updates
2014-02-27 12:50:40 +01:00
Michael Peter Christen
1d069c5861 make sure that postprocessed documents are overwritten 2014-02-27 12:27:15 +01:00
Michael Peter Christen
0d2342575e Merge branch 'master' of ssh://gitorious.org/yacy/rc1 2014-02-27 01:29:52 +01:00
Michael Peter Christen
3cc5c0ffdd a concurrency enhancement which was not used because tests showed worse
indexing speed. I leave the code there since it may be useful in
SolrCloud environments.
2014-02-27 01:27:06 +01:00
Michael Peter Christen
e644981697 added one more postprocessing low memory check 2014-02-27 00:34:13 +01:00
reger
5e645f4449 Merge origin/master 2014-02-27 00:24:30 +01:00
reger
3b89176b9f use config value htroot in Jetty init (was hardcoded)
- move htroot exist check from old httpdfilehandler to startup, remove from filehandler and legacy proxyhandler
- use SwitchboardConstant.htroot where appropriate
2014-02-27 00:23:34 +01:00
Michael Peter Christen
e1bf65c892 added short memory protection during postprocessing 2014-02-26 23:02:56 +01:00
Michael Peter Christen
90b47e83e6 fixed shutdown error when closing solr connectors 2014-02-26 22:47:16 +01:00
Michael Peter Christen
7640834b37 removed double concurrency to put Solr documents into the index. The
writings to the solr index are also buffered in
ConcurrentUpdateSolrConnector
2014-02-26 22:21:00 +01:00
Michael Peter Christen
0f6b72f24b do not use luke requests for remote solr servers if the result is
different from normal requests. This happens if the remote solr is
actually a solrCloud; in such cases the luke request returns only the
result of the single solr peer, not the whole cloud.
also done: some refactoring.
2014-02-26 14:30:48 +01:00
Michael Peter Christen
c57026e242 recover from OOM 2014-02-25 15:23:45 +01:00
Michael Peter Christen
907db8b7a6 fix for bad query shortcut hack 2014-02-25 15:19:04 +01:00
Michael Peter Christen
a2b66fe2eb Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-25 14:37:39 +01:00
Michael Peter Christen
9f6be762a6 - better logging for postprocessing
- fixed collection bug in postprocessing
2014-02-25 14:37:30 +01:00
orbiter
da5d4128bf prevent npe 2014-02-25 03:26:20 +01:00
orbiter
a878c7982c prevent npe 2014-02-25 03:19:41 +01:00
orbiter
e4eb87d924 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-25 02:16:37 +01:00
orbiter
ced1a96f9c fixed error cache 2014-02-25 02:16:22 +01:00
reger
3ba81bd08a Merge origin/master 2014-02-25 00:24:10 +01:00
reger
4d896383db fix: use timeout = proxy.ClientTimeout in ProxyHandler
(was 10sec fix) see http://bugs.yacy.net/view.php?id=236
2014-02-25 00:23:06 +01:00
orbiter
cfb647db6e - introduced a miss cache in ConcurrentUpdateSolrConnector
- better usage of cache
- bugfix for postprocessing
2014-02-24 23:42:50 +01:00
orbiter
a87d8e4a8e changed caching of ConcurrentUpdateSolrConnector: it caches now also the
url along with the load date. While this takes much more memory, it
eliminates database lookups for getURL() requests, which happen equally
often. This speeds up remote solr configurations.
2014-02-24 22:59:58 +01:00
orbiter
f6e441dd77 refactoring 2014-02-24 21:01:56 +01:00
orbiter
76c53faeb2 removed unused code (HostStat) 2014-02-24 20:51:43 +01:00
orbiter
d3a88eaecb introducing ConcurrentUpdateSolrServer for remote solr servers.
Scaling of write buffers and update queue size is made according to
assigned memory.
2014-02-24 20:26:02 +01:00
reger
809e976578 remove unused java imports form yacy.java 2014-02-24 05:19:40 +01:00
reger
a9b06f8719 add a -config command line parameter e.g. -config "port=9090" "port.ssl=8043"
- useful for remote installation to set any config file property
- multipe parameter can be set at once, on Windows enclose parameter in doublequotes
- special handling   "adminAccount=adminuser:adminpwd"  sets adminusername and md5 encoded admin-pwd

- adjusted windows startbatch to allow command line parameter handling
- remove not needed classpath calculation from startYACY_debug.bat
2014-02-24 05:16:31 +01:00
reger
0923b09216 fix: allow 4 character admin user name
(was min 5 char)
2014-02-24 00:01:11 +01:00
Michael Peter Christen
254a7ac66c fixed cleaning of index 2014-02-22 01:35:01 +01:00
Michael Peter Christen
28a7b42e6b removed warning "sun.misc.BASE64Encoder is internal proprietary API and
may be removed in a future release"
2014-02-22 00:52:49 +01:00
Michael Peter Christen
046f5a03cb one more SolrIndexSearcher bugfix 2014-02-21 23:48:56 +01:00
sixcooler
78c01b3eff fix for 'AlreadyClosedException: this IndexReader is closed' 2014-02-21 17:28:32 +01:00
Michael Peter Christen
1b5e3d523a better control over close-state of remote solr connections 2014-02-20 00:39:19 +01:00
Michael Peter Christen
1a364572a5 fix for
"org.apache.solr.core.SolrCore Too many close [count:-1] on
org.apache.solr.core.SolrCore@51af7c57"
-error
2014-02-20 00:03:35 +01:00
Michael Peter Christen
69391e5d9e changed strategy to test existence of documents in Solr: using the
update time. The reason for that is a better caching for the crawler
double-check, which needs the update time for crawler steering.
2014-02-19 04:03:45 +01:00
Michael Peter Christen
790f103f32 delete fail-docs during postprocessing to prevent that they will appear
again and stay in postprocessing forever.
2014-02-18 01:38:56 +01:00
Michael Peter Christen
ff656ce860 explicit call to optimize to add a expungeDeleted flag 2014-02-12 01:01:23 +01:00
Michael Peter Christen
9eb668e951 enhanced the resource observer
The resource observer is now able to recognize free disk space AND
available space for YaCy. The amount of space which is assigned for YaCy
are defined in new settings in the configuration file.
Furthermore, there is now a cleanup process which deletes files in case
that an autodelete is activated. The autodelete is now BY DEFAULT ON if
the disk space is low, which means that YaCy starts to delete documents
when the disk is full!
2014-02-12 01:00:44 +01:00
Michael Peter Christen
fbee98c06f fixed shortcut self-reference bug 2014-02-11 22:14:46 +01:00
Michael Peter Christen
e7a29a2851 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-11 22:03:46 +01:00
Michael Peter Christen
bf97e38b83 removed clearURLIndex, which is a stub remaining from the old metadata
database and not needed any more
2014-02-11 22:01:25 +01:00
orbiter
14764632b5 clear solr caches in case that an exception occurrs. The reason behind
this hack is the occurrence of Exceptions like:
W 2014/02/11 18:51:33 ConcurrentLog GC overhead limit exceeded
java.io.IOException: GC overhead limit exceeded
        at
net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:334)
        at
net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getDocumentById(MirrorSolrConnector.java:173)
        at
net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getDocumentById(ConcurrentUpdateSolrConnector.java:415)
        at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:331)
        at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:317)
        at
net.yacy.search.query.SearchEvent.pullOneRWI(SearchEvent.java:1024)
        at
net.yacy.search.query.SearchEvent.pullOneFilteredFromRWI(SearchEvent.java:1047)
        at
net.yacy.search.query.SearchEvent$3.run(SearchEvent.java:1263)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3077)
        at java.lang.StringCoding.decode(StringCoding.java:196)
        at java.lang.String.<init>(String.java:491)
        at java.lang.String.<init>(String.java:547)
        at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:187)
        at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:351)
        at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
        at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
        at
org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
        at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:657)
        at
net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.SolrQueryResponse2SolrDocumentList(EmbeddedSolrConnector.java:230)
        at
net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getDocumentListByParams(EmbeddedSolrConnector.java:320)
        at
net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:330)
        ... 7 more
        
This problem was analysed with the Eclipse Memory Analyser after a heap
dump, where the following problem was reported as the main Problem
Suspect:

One instance of "org.apache.solr.util.ConcurrentLRUCache" loaded by
"sun.misc.Launcher$AppClassLoader @ 0x42e940a0" occupies 902.898.256
(61,80%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "<system
class loader>".

This memory is part of the result cache of Solr. Flushing this cache
appears the most appropriate solution to that problem.
2014-02-11 20:56:40 +01:00
Michael Peter Christen
bc28247089 Added methods in resource observer to calculate the available and the
occupied disc space. These values are also shown on the status page.
The disc space calculation shall be used for a disk-limitation of the
search index.
2014-02-11 03:20:03 +01:00
Michael Peter Christen
0dda979801 adopted network image drawing to increased number of peers 2014-02-11 00:53:10 +01:00
Michael Peter Christen
ca8b100f96 run the cleanup process even when load is high, do postprocessing even
if load > 1 (but < 2) but only if there is enough memory (now: 0.5 GB
RAM available). The memory amount of the postprocessing is the cause
that systems block because they run into a frequent-GC chain which
almost locks the peer. If running with enough memory, the postprocessing
is fast and not damaging to the system.
Because the required RAM of 0.5 GB is never available in default
setting, the postprocessing will not run if the peer is not reconfigured
to use more memory.
2014-02-10 12:59:30 +01:00
Michael Peter Christen
195e5868d3 catch solr close exceptions 2014-02-09 15:04:46 +01:00
Michael Peter Christen
751c128544 extra sleep for remote searches enhances search results because there is
more time for more remote peers to contribute on the first result page
2014-02-09 14:57:17 +01:00
Michael Peter Christen
0cabcbbe83 more efficient wordcount 2014-02-09 14:45:12 +01:00
Michael Peter Christen
3d474a843e added memory protection for postprocessing 2014-02-09 12:36:56 +01:00
Michael Peter Christen
412d55523c enhanced memory protection and OOM exception handling in Solr connector 2014-02-09 12:36:14 +01:00
Michael Peter Christen
d9858e1b8a removed warnings and superfluous logging 2014-02-09 12:26:58 +01:00
Michael Peter Christen
acc8d7faa7 fixed setting of shortMemoryStatus in MemoryControl 2014-02-09 12:25:55 +01:00
Michael Peter Christen
94245ce0a8 fixed "Size in KBytes" calculation in PerformanceQueues_p.html,
see http://bugs.yacy.net/view.php?id=362
2014-02-07 17:19:08 +01:00
Michael Peter Christen
726e8c3ad5 removed unused classes and servlets 2014-02-07 01:47:10 +01:00
Michael Peter Christen
6e59ca4ebf removed jena library and all code that depended on jena. When jena was
introduced, it was also used for search facets. The generic search
facets are now deduced from generic solr fields which makes jena as tool
for facet semantics superfluous.
2014-02-07 01:20:06 +01:00
Michael Peter Christen
9228214f9b enrichment of PerformanceMemory display of SolrInfoMBean table 2014-02-07 00:22:31 +01:00
Michael Peter Christen
e8bdf16ea7 added statistic information for solr resources in PerformanceMemory 2014-02-07 00:02:19 +01:00
Michael Peter Christen
931541d198 re-inserted default value re-set button to performance queues and
patched missing values for recent new queues
2014-02-06 22:39:19 +01:00
Michael Peter Christen
456e52e0d5 enhanced strategy to clear solr caches
- redesigned the instance mirror class (which was a mess)
- added final method to close a searcher (which otherwise keeps a cache)
- changed cache clear method which iterates over resources and calls
clear to all caches in the searcher resources
2014-02-06 19:13:29 +01:00
reger
bd1685c94a fix not needed getFileExtension().toLower (double)
add missing .getFileExtension
2014-02-05 03:45:02 +01:00
orbiter
a11f072504 enhanced didyoumean 2014-02-04 00:18:11 +01:00
Michael Peter Christen
c0e6a65ec3 enhanced didyoumean 2014-02-03 18:49:03 +01:00
Michael Peter Christen
6d2dab7b21 fixed 'resource leak' warning 2014-02-03 13:38:26 +01:00