Commit Graph

9850 Commits

Author SHA1 Message Date
orbiter
f425b2c61c re-try to fetch url after a soft commit 2013-07-27 10:56:02 +02:00
orbiter
b8f57f7703 don't be noisy when doing background tasks that may be allowed to fail 2013-07-27 10:51:58 +02:00
orbiter
bf0ad04e1b apply load limitation also to dht-in 2013-07-27 10:42:38 +02:00
Roland Haeder
0343f0668c Fix for NPE:
E 2013/07/26 20:29:29 BUSYTHREAD Runtime Error in
serverInstantThread.job, thread
'net.yacy.search.Switchboard.cleanupJob': null; target exception: null
java.lang.NullPointerException
        at
net.yacy.search.schema.CollectionConfiguration.convergenceStep(CollectionConfiguration.java:1116)
        at
net.yacy.search.schema.CollectionConfiguration.postprocessing(CollectionConfiguration.java:897)
        at net.yacy.search.Switchboard.cleanupJob(Switchboard.java:2296)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107)
        at
net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165)

Conflicts:
	source/net/yacy/search/schema/CollectionConfiguration.java
2013-07-27 10:19:46 +02:00
Roland Haeder
b58ca8622d Some cleanups:
- added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added
- Added 'final' keyword to a string
2013-07-27 10:13:57 +02:00
Roland Haeder
e2ee412160 Use SwitchboardConstants.LISTS_PATH_DEFAULT instead of 'DATA/LISTS'
Conflicts:
	htroot/api/blacklists_p.java
2013-07-27 10:12:58 +02:00
Roland Haeder
ae19401af0 Removed another duplicate occurance of Blacklist.BLACKLIST_FILENAME_FILTER 2013-07-27 09:59:09 +02:00
Roland Haeder
59225487ea Fix for blacklist export, also applied the filename filter here 2013-07-27 09:58:56 +02:00
Roland Haeder
952fc0e7bd Removed superfluous check for files ending '.black' as the previous commit already excluded all other files (e.g. .ser dumps), added logging in catch-all block 2013-07-27 09:58:38 +02:00
Roland Haeder
060fec1577 Reuse Blacklist.BLACKLIST_FILENAME_FILTER 2013-07-27 09:57:50 +02:00
Roland Haeder
29049c71f5 Possible fix for ticket http://bugs.yacy.net/view.php?id=270, the filter for only including *.black must be applied 2013-07-27 09:57:07 +02:00
Roland Haeder
7263bb82fb Fix for NPE on shutdown:
java.lang.NullPointerException
        at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:2732)
        at net.yacy.search.Switchboard.access00(Switchboard.java:207)
        at net.yacy.search.Switchboard.run(Switchboard.java:3049)
2013-07-27 09:55:43 +02:00
Roland Haeder
13433d41a1 Log this exception better
Conflicts:
	source/net/yacy/kelondro/blob/Tables.java
2013-07-27 09:54:51 +02:00
orbiter
080d80c9de do not write an empty failreason in case that there is no fail. Because
of the lazy instantiation rule this value was not actually written, but
if lazy instantiation is switched on, then this causes that all crawl
starts delete all crawl-start-hosts completely because this looks for
filled error reasons.
2013-07-26 17:53:28 +02:00
Michael Peter Christen
4c242f9af9 always use a default value for boolean options to have transparency for
the outcome if the attribute is missing in servlets
2013-07-25 12:17:29 +02:00
Michael Peter Christen
61e015268b fix in forced deletion: forced commit needed 2013-07-25 09:53:19 +02:00
Michael Peter Christen
83e2921b39 new test case for http://bugs.yacy.net/view.php?id=141 2013-07-25 09:31:48 +02:00
Michael Peter Christen
304aacb2cc fix for http://bugs.yacy.net/view.php?id=267 2013-07-25 09:26:24 +02:00
Michael Peter Christen
c3b2301b2f fix for http://bugs.yacy.net/view.php?id=268 2013-07-25 09:21:37 +02:00
reger
aa1a1f1d2c - small adjustment to make sure genericParser is tried last
-- for some documents genericParser grabs document instead of specific available parser due to unordered pick of 1st to try parser
      (like .ps .rdf files and other)
- remove redundant file extension registration
2013-07-23 20:24:13 +02:00
orbiter
3e901dcb06 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-23 19:33:07 +02:00
orbiter
f50b596e0b do not run dht ditribution if system load is over 2.5 2013-07-23 19:32:32 +02:00
orbiter
9c681cc00d added segment sizes, postprocessing status and cpu load to crawler
monitor
2013-07-23 19:10:11 +02:00
orbiter
86b514cf46 added load info to status_p.xml 2013-07-23 18:20:07 +02:00
orbiter
056b42f5aa - added information about segment count to status_p.xml
- also moved this information from the old index structure, which is
still in use for the RWI/DHT index to that front-end
2013-07-23 18:03:33 +02:00
orbiter
6fb2811e68 fixes for problems with remote solr and non-activated webgraph index 2013-07-23 16:46:44 +02:00
sixcooler
af740f3058 changed optimization to a segment-size of index-size/5.000.000
+ one if not idle
+ one (and force) if postprocessing
2013-07-23 14:21:12 +02:00
Michael Peter Christen
336f86394c replaced StringBuffer with StringBuilder 2013-07-23 12:21:27 +02:00
Michael Peter Christen
aeac2fb763 replaced more containsKey() -> get() usages by a simple get(), followed
by a test for NULL. This should increase the application speed and
reduces the lookup time for the affected methods by 50%
2013-07-23 12:16:51 +02:00
orbiter
5364c4dcc9 delayed first peer-ping to send the first ping out after the http got
up; if the ping comes before the http is up, it cannot be recognized as
senior peer (if at all). See also: http://bugs.yacy.net/view.php?id=266
2013-07-22 18:21:37 +02:00
orbiter
e24016e30a added the property federated.service.solr.indexing.timeout to yacy.init
to provide a configurable time-out for solr; see also:
http://bugs.yacy.net/view.php?id=254
2013-07-22 17:45:12 +02:00
orbiter
c124037f19 removed forced non-soft commits to prevent index fragmentation 2013-07-22 17:28:20 +02:00
Michael Peter Christen
31483c47e1 fixed problem with remote luke requests 2013-07-22 15:55:20 +02:00
Michael Peter Christen
c15aa758dc removed failreason_t removal patch because that causes too much
confusion using an external solr. to clean up the index after a schema
change, use the index cleaner function from the online servlet
2013-07-22 14:17:38 +02:00
reger
2b7a38640a extend content type detection on file extension for .tif .tiff .htm 2013-07-21 22:57:21 +02:00
Michael Peter Christen
ac1aad5064 added a getSegmentCount method and use it to disable optimize if wanted
current segment count is below optimization level
2013-07-18 14:31:42 +02:00
Michael Peter Christen
36035e0a0a - used reger's LukeRequest to generalize the index info in
SolrServerConnector
- used the LukeRequest in SolrServerConnector to replace the index size
method by a getNumDocs request to a LukeRequest result
2013-07-18 13:26:07 +02:00
Michael Peter Christen
39fceb5ccf fix for NPE & bug #264 2013-07-18 12:37:32 +02:00
Michael Peter Christen
735a66eff3 enhancements to crawler 2013-07-18 12:29:04 +02:00
orbiter
232100301c removed double-ocurring value assignments 2013-07-17 19:09:25 +02:00
Roland Haeder
be0ff6018f Removed trailing spaces + some more final 2013-07-17 18:44:24 +02:00
Roland Haeder
aaedc0405d Fixes and avoid of catching bad exceptions (some):
- Rewrote usage of HashMap/Map to concurrent versions (to avoid a
CME=ConcurrentModificationException)
- Rewrote ConnectionInfo (as an example) to use a synchronized iterator
instead of synchronizing an
  already synced HashSet (see Collections call)
- This avoids catching CMEs again
- Commented out noisy ConcurrentLog.logException() call

Conflicts:
	source/net/yacy/repository/LoaderDispatcher.java
2013-07-17 18:37:34 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
Roland Haeder
98e10f95e2 Added some cora package loggers 2013-07-17 18:28:10 +02:00
Roland Haeder
553f83a14e Recommended cleanup (please, one day, execute this cleanup) 2013-07-17 18:26:50 +02:00
Felix Ableitner
03044589dd Fixed (?i) appearing in entries, fixed multiple equal lines in file. 2013-07-17 16:42:10 +02:00
Felix Ableitner
376f9cd9d0 Merge branch 'master' of git://gitorious.org/yacy/rc1 into blacklist_structure 2013-07-17 15:58:09 +02:00
Michael Peter Christen
89c0aa0e74 added collection_sxt to error documents 2013-07-17 15:20:56 +02:00
Michael Peter Christen
0df5195cb0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-07-17 12:42:06 +02:00
Michael Peter Christen
1fd006cc56 fixes using the embedded connector 2013-07-17 12:41:54 +02:00