Commit Graph

7431 Commits

Author SHA1 Message Date
Michael Peter Christen
ad35d9294f added a 'stats' table which records some peer statistics twice every
hour. The table can be shown with
http://localhost:8090/Tables_p.html?table=stats

The entries have the following meaning: 
aM: activeLastMonth
aW: activeLastWeek
aD: activeLastDay
aH: activeLastHour
cC: countConnected (Active Senior)
cD: countDisconnected (Passive Senior)
cP: countPotential (Junior)
cR: count of the RWI entries
cI: size of the index (number of documents)

The entry keys are abbreviated to reduce the space in the table as the
name is written again for every row.

This is the beginning of a 'yacystats' micro-alternative als built-in
function in YaCy. Graphics may follow after some time if enough test
data is available.
2014-09-17 12:54:50 +02:00
reger
8284ea751a catch TimeoutException during ping and do not delete yacy.conf during prereadconfigfile
found a situation after crash (reboot) with existing running semaphore but YaCy not running.
Ping generated exception which finally deleted the conf file (during pre-read procedure)
- change to ping (catch exception solved it)
- additionally removed delete yacy.conf file (if needed we need to make a backup)
2014-09-16 23:14:13 +02:00
reger
ffa7c7116f better fix for NPE in image search
replace 8931e14514
2014-09-16 16:43:17 +02:00
Michael Peter Christen
759e7d9538 fix for http://forum.yacy-websuche.de/viewtopic.php?p=30720#p30720 2014-09-16 14:53:30 +02:00
Michael Peter Christen
bf18a39d0e replaced warning with info 2014-09-16 14:41:04 +02:00
Michael Peter Christen
f1032fb8fe more enhancements to image search in case that a restriction to a single
domain is done
2014-09-16 13:41:01 +02:00
Michael Peter Christen
475125f9d7 hack to get more results when doing a remote site search 2014-09-16 00:13:26 +02:00
Michael Peter Christen
81f9b34da7 increaesed ability ot search for all images on a single server within
the p2p remote search
2014-09-15 20:33:22 +02:00
Michael Peter Christen
2c26013c50 better contentdom abstraction 2014-09-15 14:00:41 +02:00
Michael Peter Christen
6a8fb8190b changed default value for maximum number of connections to 50 2014-09-15 13:50:40 +02:00
Michael Peter Christen
ca8b2bf099 removed www and welcome servlet, these had been demo servlets and are
not needed any more
2014-09-15 12:48:58 +02:00
reger
03a7a29db3 limit OAI import urn resolver try for Deutsche National Library
The resolver service of National Library uses name space nbn, limit use of nbn-resolving.de accordingly to urn:nbn:
- add resolver for rfc's
2014-09-14 01:38:27 +02:00
Michael Peter Christen
0838326a76 changed error message, see http://mantis.tokeek.de/view.php?id=439 2014-09-13 17:02:26 +02:00
reger
b5e0f70197 - remove repositoryPath post from ConfigBasic (obsolete)
- remove static snippetComputationTime from ResultEntry (not used)
2014-09-13 03:21:52 +02:00
reger
8931e14514 fix NPE in image search 2014-09-13 00:27:39 +02:00
Michael Peter Christen
1735dbc9d9 enhanced image search: bugfixes and performance enhancements 2014-09-12 16:37:01 +02:00
Michael Peter Christen
ebd0be2cea fixes and speed updates for search process 2014-09-10 14:24:03 +02:00
Michael Peter Christen
7611bf79bd Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1
Conflicts:
	locales/ru.lng
2014-09-10 13:24:49 +02:00
Michael Peter Christen
524bedc00a fixed text in startup tray icon and added shutdown icon during shutdown 2014-09-10 13:19:08 +02:00
Michael Peter Christen
4709d8417c npe fix for non-tray users 2014-09-08 10:26:28 +02:00
orbiter
5b5635e187 replaced font for boot tray icon with image and added some more images
for further tray icon displays
2014-09-08 00:21:29 +02:00
orbiter
aa6cdc4ab5 speed-up of start process if remote DNS waits for timeout 2014-09-07 12:28:19 +02:00
orbiter
40b3977c21 added an animation of the tray icon during the boot phase of YaCy.
Additionally, there is a tooltip and a new headline at the tray menu
which states the current booting status.
2014-09-07 12:04:35 +02:00
Michael Peter Christen
ec6082c872 very bad language detection hack fix hack 2014-09-05 23:29:09 +02:00
Michael Peter Christen
39615de3f9 adding the buffer size is not wrong but may cause confusing information
when the buffer is cleaned after a buffer flush which is not then
available in Solr since that is waiting for a commit. In such cases the
counter would run backwards which is prevented by ignoring the buffer
size.
2014-09-05 14:57:40 +02:00
Michael Peter Christen
395edec6f1 changed strategy to count the number of documents: get the max of
solr+buffer and the hit cache. This shall help during first crawls to
see a running document counter even if there was no commit meanwhile to
solr. To support that strategy, the hit cache must be written earlier.
2014-09-05 14:50:22 +02:00
Michael Peter Christen
e87dc08c0d set the correct fail time in error docs 2014-09-05 14:46:11 +02:00
Michael Peter Christen
cfb20bc0ce removing the [] for ipv6 addresses may be a bad idea.. 2014-09-04 18:17:38 +02:00
orbiter
b6d57f06eb enhanced the apk parser (up to beeing production-ready).
The parser is not yet activated and will be after the next release step.
2014-09-04 09:41:42 +02:00
Michael Peter Christen
a7dd89c4de changed method to write the citation index: do not catch up references
during document parsing; instead use the same references that would also
be written into the webgraph. That should cause that the webgraph and
the citation index express the exact same semantic.
2014-09-02 13:22:12 +02:00
Michael Peter Christen
57ce7eeff3 fixed localhost authorization and replaced the adminRealm with an info
string which is visible in the browser. That makes it possible that the
browser instructs the user how to change a forgotten admin password
(during runtime).
2014-09-02 13:15:19 +02:00
orbiter
f318d7c285 enhanced date-ordered ranking 2014-09-01 13:01:30 +02:00
reger
a6891ff7f8 fix Querygoal.parse exception on +/-null-term
covers http://mantis.tokeek.de/view.php?id=452
2014-09-01 00:16:26 +02:00
reger
c7335318eb remove unused legacy procedure from httpserver
(deleted  generateSocketAddress(port) )
2014-08-31 00:33:05 +02:00
Michael Peter Christen
eab0d3e1a9 bugfix for wrong lock display, see
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5321&p=30484#p30484
2014-08-28 12:50:45 +02:00
orbiter
49d4f95faf bugfix to latest commit 2014-08-27 00:16:50 +02:00
orbiter
68211f8244 enable Crawler_p servlet if a rss feed or a wiki dump import was
submitted.
2014-08-27 00:15:31 +02:00
orbiter
a65df4ce7e do not push noindex errors into log if in intranet mode. noindex
attributes are attached to artificial constructed index.html files which
list directories. Such files are naturally rejected by the crawler and
should not appear in the error log because these files are part of the
construction of file crawlers and confuse users if they see them in the
error log.
2014-08-27 00:10:51 +02:00
orbiter
688c6d8954 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-08-27 00:04:36 +02:00
orbiter
4ae7aead28 addon to latest fix 2014-08-27 00:03:49 +02:00
Marc Nause
2af56fa37d Improved UPnP. (still not perfect)
*) set HTTPS port if enabled
*) improved data structures (may not be final)
*) moved UPnP to own package
2014-08-26 22:47:13 +02:00
orbiter
b3ebd38079 removed the HTDOCS repository concept because the concept to host files
on the YaCy http server is obsolete; YaCy can index file:// and smb://
paths
2014-08-26 19:02:53 +02:00
reger
1fdcc2d67b change seedfile upload ip check to allow intranet ip in intranet mode
- this allows to setup a principal peer in intranet environment
2014-08-25 01:25:22 +02:00
reger
e31b0e6d67 - update javadoc Seed.getIP
- default mySeed.ip to hostip in SeedDB.initMySeed() if Intranetmode
this allows to become senior status in intranet hosted search network with view peers,
otherwise peer would stay junior because of default init with loopback ip as public (dna) ip.
2014-08-24 21:13:36 +02:00
reger
350c6b8250 in IntranetMode allow intranet hosted seedlist with Network_Domain "any"
- so far intranet seedlist hosts are always denied but need to be allowed in intranet mode
2014-08-24 05:20:06 +02:00
orbiter
d68438c3d9 make sure that the postprocessing background thread never dies by any
exception
2014-08-23 10:35:38 +02:00
orbiter
b4f2a1db6e added a unlock icon for all protected pages that are unlocked because
the administrator is logged in.
2014-08-19 19:58:31 +02:00
reger
ea6c9e9b07 reduce mem buffer overhead for gap files during r/w
(they are typically small compared to idx allowing to use smaller buffersize -> set to 16k records)
2014-08-18 00:03:24 +02:00
reger
e88537522d allow single quote " ' " in query
see http://mantis.tokeek.de/view.php?id=379
-add QueryGoal test case for this
2014-08-16 14:29:52 +02:00
orbiter
487021fb0a snippet computation update 2014-08-15 01:17:11 +02:00
orbiter
1c2f1f233a Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-08-14 20:58:05 +02:00
reger
5a4995ded3 fill solr rss writer dc:subject tag with keyword content 2014-08-14 03:06:41 +02:00
orbiter
927aaa95a6 concurrency bugfix 2014-08-13 00:59:11 +02:00
orbiter
c9e593cf78 removed warnings 2014-08-11 23:53:12 +02:00
reger
7584352e7b use more predefined Solr query parameter constants
- use CommonParams and DisMaxParams constants
- fix typo in get sort parameter
- getDocumentCountByParams redundant implementation and risk of not optimized call (row parameter unspecified) -> as only used from getCountByQuery removed from interface
2014-08-10 22:33:10 +02:00
reger
f9db5dd6c5 reduce doublecontent check document (prevent out of memory)
see http://mantis.tokeek.de/view.php?id=437

test result (concurrency=7)
2000 docs = eom always
1000 docs = eom always
100 docs = eom never

chosen -> 200 docs (eom not encountered during test with 1GB mem setting)
2014-08-10 03:18:15 +02:00
reger
e9eae45b55 simplify rssreader and improve atom feed link extraction
- type detection (rss/atom) 
    - init type parameter overwritten during parse, parameter obsolete
    - detection by endtag changed to simpler first-tag evaluation
- channel image not used, removed related extra parser handling
    - remove unused code (set/getImage) in rssfeed
- atom link extraction to account for possible multipe link tags
   - spec limits link to one with rel="alternate" or one without rel attribute
     not accounting for the follwing type & hreflang exception yet:

   o  atom:entry elements MUST NOT contain more than one atom:link
      element with a rel attribute value of "alternate" that has the
      same combination of type and hreflang attribute values.
2014-08-10 01:29:16 +02:00
reger
a8508417d1 catch NPE during crawl (OAI import)
- condenseDocument mime=null (allowed)
- collectionconfiguration responseheader = null (allowed)
2014-08-08 00:02:59 +02:00
reger
3dde94422f center searchevent lines on network graph
(PerformanceSearch_p.html)
2014-08-06 23:04:42 +02:00
Michael Peter Christen
3860711aef fix for possible interruption of concurrent queries 2014-08-06 12:55:18 +02:00
Michael Peter Christen
6344718f8b reducing the concurrent query stack size and reduced concurrency of
postprocessing to avoid OOM situations
2014-08-06 12:36:59 +02:00
Michael Peter Christen
eca9380e3d bugfix for crawler double-check: if an url is redirected, the
redirect-target was not double-checked. This is now done by replacing
the redirect-URL on the crawl queue again (where it is double-checked)
2014-08-06 12:35:12 +02:00
Michael Peter Christen
9ac0c93f17 fix for subpath crawl filter 2014-08-06 01:33:24 +02:00
Michael Peter Christen
66106bdaf0 fix for crawler attribute maxdompages 2014-08-05 21:32:25 +02:00
Michael Peter Christen
49d91b94c3 npe fix in crawler 2014-08-05 21:31:59 +02:00
Michael Peter Christen
b7183a7321 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-08-05 09:54:18 +02:00
reger
ea2e627662 fix ConfigAccounts del user with uppercase letter in name
(usernames are case sensitive, userdb.delete used toLower)
2014-08-05 01:27:27 +02:00
Michael Peter Christen
c465b791af typo 2014-08-04 16:13:39 +02:00
Michael Peter Christen
191ec8c82a added concurrency to postprocess rewrite process 2014-08-04 15:28:58 +02:00
Michael Peter Christen
a1e8bdd5e9 log ppm instead of docs/second 2014-08-04 14:44:42 +02:00
Michael Peter Christen
cc0ded7abd set process type of web graph according to fields as defined in the
schema
2014-08-04 14:44:20 +02:00
Michael Peter Christen
12fb9d7cd1 log postprocessing constraints in case that postprocessing is not
performed
2014-08-04 14:19:37 +02:00
Michael Peter Christen
3c23b89823 less logging 2014-08-04 13:37:34 +02:00
Michael Peter Christen
a0c53174c5 better solr query logging to detect unnecessary sort requests for more
performance profiling
2014-08-04 13:00:45 +02:00
Michael Peter Christen
338f574bdc no sorting if http/www unique fields are not demanded (makes query
faster) and some code restrucuring
2014-08-04 12:59:38 +02:00
Michael Peter Christen
1609763be5 toString fix 2014-08-04 12:58:39 +02:00
Michael Peter Christen
b983e68254 more retries, less sleep 2014-08-04 08:29:35 +02:00
Michael Peter Christen
1503ba7794 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-08-04 08:24:31 +02:00
reger
8f77719091 fix "Ljava.lang.String" in crawl queue anchor name
(e.g. IndexCreateQueues_p.html?stack=LOCAL with images in queue)
2014-08-04 02:38:58 +02:00
Michael Peter Christen
0ceeceb35e more logic on Solr queries; usage of the query terms in posprocessing,
saving one query for double document detection now per document
2014-08-04 02:35:38 +02:00
orbiter
38864ae004 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-08-03 22:44:49 +02:00
orbiter
4099296b45 added new classes which shall reduce call overhead to Solr (stub) 2014-08-03 22:44:22 +02:00
reger
d0c02e1de7 adjust rss lat/lon to double
(common format across other classes)
2014-08-03 20:09:23 +02:00
orbiter
3491ab4c38 removed unused images from webgraph edge computation 2014-08-01 13:21:16 +02:00
orbiter
2371d6b8db target linktexts must be string to enable search facets on these fields 2014-08-01 13:20:25 +02:00
Michael Peter Christen
001e05bb80 do not store failure of loading of robots.txt into the index as a fail
document
2014-08-01 12:15:14 +02:00
Michael Peter Christen
05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-08-01 12:04:25 +02:00
Michael Peter Christen
98f45c9032 fix for image alt attachment to AnchorURLs in html parser. 2014-08-01 12:04:15 +02:00
orbiter
22ce4fb4dd better error handling for remote solr queries and exists-checks 2014-08-01 11:00:10 +02:00
Marc Nause
9df14fc126 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-29 21:26:43 +02:00
Marc Nause
477be17c51 Replaced old UPNP library with Weupnp. UPNP should
work now, at least it does on my network. UPNP code in YaCy can still
be improved though (see TODO comment: make port on gateway configurable
or find free one).

*) removed old code
*) added new lib
*) changed code to work with new lib
2014-07-29 21:26:27 +02:00
orbiter
738989aab7 reverted commit f94c91315b because the
webgraph has not enough performance for that
2014-07-29 18:49:42 +02:00
orbiter
e9163e7e10 fix for malformed hostpath names in crawl balancer 2014-07-29 11:18:45 +02:00
Michael Peter Christen
c115f3869c enhanced snippet computation and test method in ViewFile 2014-07-28 15:42:57 +02:00
reger
6c10b59f3e move bootstrap peers test systems to its test class
var assignment not needed  elsewhere.
2014-07-27 04:13:07 +02:00
orbiter
1027f3d04a fix for the usage of ready-prepared solr queries, some queries are
formulated as edismax query but this was not set as query attribut. The
defType=edismax property needs a qf-field, so this was added as well. Do
not remove that field again! This fixes also a problem with title-unique
computation.
2014-07-25 18:53:13 +02:00
Michael Peter Christen
f94c91315b if the webgraph is used, then use it also for reference computation to
avoid contradictions with references_i in the collection index.
2014-07-24 15:35:53 +02:00
Michael Peter Christen
6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
specific word on the document; the servlet returns the snippet in the
same way as it would be shown in a search result.
2014-07-24 14:59:37 +02:00
orbiter
4b06adb751 fix for file urls 2014-07-23 17:54:31 +02:00
orbiter
08409ec680 no idea why the words max was an ordered one. This change increaes speed
dunring document processin a bit
2014-07-23 17:54:16 +02:00
reger
e5854a5cdb fix localhost link to opensearchdescription.xml 2014-07-22 21:57:38 +02:00
Michael Peter Christen
b44626e55b fixed target_alt_t in webgraph 2014-07-22 18:24:10 +02:00
Michael Peter Christen
504327b15c fix for condition for writing the webgraph 2014-07-22 00:59:08 +02:00
Michael Peter Christen
542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
filled with the date, when the url is recognized as to be outdated. That
field was partly misinterpreted and the time interval was filled in. In
case that all the urls which are in the index shall be treated as
outdated, the field is filled now with Long.MAX_VALUE because then all
crawl dates are before that date and therefore outdated.
2014-07-22 00:23:17 +02:00
Michael Peter Christen
4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
confusion with Node data which is also called metadata)
2014-07-21 23:54:23 +02:00
reger
c95ba52cf0 improve logexception info
- log a message or class name insted of msgtxt "null"
2014-07-21 22:13:34 +02:00
orbiter
e441831a24 reverted toString() change in AnchorURL to prevent mistakenly used
toString(). This fixes also the update link bug.
2014-07-21 15:58:29 +02:00
reger
47f201a6b8 Add Solr default query fields (&qf) to select servlet
according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query).
This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration.
Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields
and does not relay on the duplication of text to text_t.
- add author to reset-default boost fields (support results for author nav)
2014-07-21 00:47:14 +02:00
reger
f96cfdc84d prevent array out of bound exception on getRankingProfile(x)
on faulty &profileNr=  query parameter
2014-07-21 00:04:54 +02:00
reger
5f5fb4ecdc remove unused static (RSS)search from protocol 2014-07-20 02:49:49 +02:00
reger
7c1706d83a use CRLF in generated bat command scripts for windows
- for easier viewing with standard viewers
2014-07-20 00:06:22 +02:00
reger
a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
- with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid)
- this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches
- the index.html searchoption text adjusted to be displayed only if option configured
- add Archive-It to predefined systems
2014-07-20 00:00:43 +02:00
Michael Peter Christen
2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
attribute in the <a> tag for each crawl. This introduces a lot of
changes because it extends the usage of the AnchorURL Object type which
now also has a different toString method that the underlying
DigestURL.toString. It is therefore not advised to use .toString at all
for urls, just just toNormalform(false) instead.
2014-07-18 12:43:01 +02:00
Michael Peter Christen
bf1b6b93e7 do not write CR values to webgraph if no CR values are computed 2014-07-16 18:13:29 +02:00
Michael Peter Christen
e039e78210 small bugfixes 2014-07-16 16:04:38 +02:00
Michael Peter Christen
32a2ff925c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-16 14:58:27 +02:00
Michael Peter Christen
d07cdd8c3b added SolrCloud access mode and configuration 2014-07-16 14:57:51 +02:00
Michael Peter Christen
8514bffc22 enhanced postprocessing status report 2014-07-16 14:57:25 +02:00
reger
b24572f304 fix GSA filter query assignment
- use more parameter constants
2014-07-13 00:11:17 +02:00
Michael Peter Christen
b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
with metadata retrieval from connectors directly. This should cause
better usage of the cache. Automatically increase the metadata cache if
more memory is available.
2014-07-11 19:52:25 +02:00
Michael Peter Christen
62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
double-calling of solr
2014-07-11 18:36:04 +02:00
Michael Peter Christen
dd5cdfe212 reverted filter query hack, it did not work 2014-07-11 18:15:35 +02:00
Michael Peter Christen
b5d78ba156 reduced number of solr queries during crawling 2014-07-11 18:05:11 +02:00
Michael Peter Christen
5326970d6c enhanced solr queries for single document extraction 2014-07-11 18:04:55 +02:00
Michael Peter Christen
525575bd97 added debugging of filter queries in thread dump thread names 2014-07-11 17:34:41 +02:00
Michael Peter Christen
f319ef268f testing filter queries instead of queries to retrieve documents by id 2014-07-11 17:09:46 +02:00
Michael Peter Christen
fd87fa1613 removed more unnecessary exist-checks in ErrorCache 2014-07-11 16:48:08 +02:00
Michael Peter Christen
f2b476e08b don't do a double check to solr for failed documents if they are not
written to solr
2014-07-11 16:26:52 +02:00
Michael Peter Christen
06ab72d1af enhanced crawler host round-robin strategy 2014-07-11 16:01:42 +02:00
orbiter
dab9a0786a Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-07-11 04:04:34 +02:00
orbiter
51bf5c85b0 Renamed the transmission cloud to buffer in dispatcher since the name
'cloud' was a bad idea. Changed also the accumulation process for peer
targets so that every dht chunk is not assigned the set of redundant
targets but they are assigned to redundant targets individually. This
enhances the granularity of the target accumulation and should enhance
the efficiency of the process. Finally the dht protocol client was
enriched with the ability to remove the 'accept remote index' flag from
peers or remove peers completely if they do not answer at all.
2014-07-11 04:04:09 +02:00
Michael Peter Christen
a694b6a8fc another fix for unique field computation 2014-07-10 17:25:33 +02:00
Michael Peter Christen
fb3dd56b02 fix for processing of noindex flag in http header 2014-07-10 17:13:35 +02:00
Michael Peter Christen
b0d941626f fixed bugs in canonical, robots and title/description unique calculation 2014-07-10 15:40:38 +02:00
reger
d9472d043a cleanup older unused classes 2014-07-10 02:20:01 +02:00
reger
665e12f88e move startup time from old serverCore to switchboard (most used here)
to make servercore eventually obsolete.
2014-07-10 02:17:56 +02:00
reger
336425912a remove unused localSearchThread from SearchEvent 2014-07-10 02:14:03 +02:00
reger
32bd2a61c1 add local ip to AbstractRemoteHandler local hostname cache 2014-07-10 02:09:26 +02:00
Michael Peter Christen
f3a6b6e21e fix for bad URL decoding 2014-07-10 01:59:29 +02:00
Michael Peter Christen
1092e798a5 fixed double content postprocessing 2014-07-07 19:15:11 +02:00
Michael Peter Christen
aee5b108e5 added linkScraperParser, a parser which ignores the text like the
generic parser but extracts links like the htmlParser. This should be
used for ASCII documents without known text format annotation like
source code files or json documents. Probably also good for xml files
without known schema.
2014-07-07 13:37:17 +02:00
reger
2b8cc5832c fix seek error for 0 file size records file
by add extra check for file size = 0 in cleanlast()
- (http://mantis.tokeek.de/view.php?id=411)
2014-07-06 20:49:01 +02:00
reger
2ba394333f fix Crawler HostQueue release of stackfile
- close stackfile inputstream at end of ChunkIterator
This should solve startup delay while unfinished crawl jobs exist (maybe also too many open file situation)
2014-07-06 16:04:30 +02:00
reger
40133ba2d0 fix NPE in Condenser,
discovered by calling IndexControlRWI, "Word Deletion" with "for every resolvable and deleted URL reference"
2014-07-06 13:24:36 +02:00
orbiter
59160984cc timeline performance update 2014-07-03 13:06:29 +02:00
orbiter
54bea96e67 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-07-02 23:23:34 +02:00
Michael Peter Christen
841cc77391 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-02 14:35:02 +02:00
Michael Peter Christen
e09218129c remove check for local solr. This check was made during a time when Solr
was optional and another alternative metadata store was available. Since
that store is now removed, Solr is always available (internally or
externally)
2014-07-02 14:34:48 +02:00
orbiter
2073e69034 fix for long periods in timeline 2014-07-02 11:29:50 +02:00
reger
1f94df29e7 fix NPE in solr rss where snippet contains only the title text
and adjusted xslt, for solr snippets (&hl=true) to decode the xml encoded html <b> tag by adding disable-output-escaping
(still open item description may be double as dc: tag and rss.description tag)
2014-07-01 23:24:26 +02:00