Commit Graph

11315 Commits

Author SHA1 Message Date
reger
7328c2883b fix type in .init description
http://mantis.tokeek.de/view.php?id=430
2014-07-26 00:38:53 +02:00
reger
94819f0797 set .ini default boost fields to same as assigned by button "reset to default"
(in RankingSolr_p)
- fix typo http://mantis.tokeek.de/view.php?id=430
2014-07-26 00:17:41 +02:00
reger
b4b937a046 update to pdfbox 1.8.6 2014-07-25 23:55:10 +02:00
orbiter
1027f3d04a fix for the usage of ready-prepared solr queries, some queries are
formulated as edismax query but this was not set as query attribut. The
defType=edismax property needs a qf-field, so this was added as well. Do
not remove that field again! This fixes also a problem with title-unique
computation.
2014-07-25 18:53:13 +02:00
Michael Peter Christen
f94c91315b if the webgraph is used, then use it also for reference computation to
avoid contradictions with references_i in the collection index.
2014-07-24 15:35:53 +02:00
Michael Peter Christen
6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
specific word on the document; the servlet returns the snippet in the
same way as it would be shown in a search result.
2014-07-24 14:59:37 +02:00
Michael Peter Christen
c63e93df46 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-24 00:04:56 +02:00
Michael Peter Christen
1bf605b6d1 toString() fix 2014-07-24 00:04:46 +02:00
orbiter
4b06adb751 fix for file urls 2014-07-23 17:54:31 +02:00
orbiter
08409ec680 no idea why the words max was an ordered one. This change increaes speed
dunring document processin a bit
2014-07-23 17:54:16 +02:00
reger
dd311ddac9 Merge origin/master 2014-07-22 22:01:01 +02:00
reger
e5854a5cdb fix localhost link to opensearchdescription.xml 2014-07-22 21:57:38 +02:00
reger
29d1945c16 fix double &query parameter (index.html)
?query=word&query=
2014-07-22 21:54:46 +02:00
Marc Nause
172d7e68da Updated commandline reconfiguration tool.
*) fixed "set HTTP port" (root cause was sloppy implementation of method
which gets values from config file)
*) added "set HTTPS port"
2014-07-22 21:52:53 +02:00
Michael Peter Christen
b44626e55b fixed target_alt_t in webgraph 2014-07-22 18:24:10 +02:00
Michael Peter Christen
504327b15c fix for condition for writing the webgraph 2014-07-22 00:59:08 +02:00
Michael Peter Christen
542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
filled with the date, when the url is recognized as to be outdated. That
field was partly misinterpreted and the time interval was filled in. In
case that all the urls which are in the index shall be treated as
outdated, the field is filled now with Long.MAX_VALUE because then all
crawl dates are before that date and therefore outdated.
2014-07-22 00:23:17 +02:00
Michael Peter Christen
4eec1a7452 refactoring (change Metadata name of load time data structure to avoid
confusion with Node data which is also called metadata)
2014-07-21 23:54:23 +02:00
reger
c95ba52cf0 improve logexception info
- log a message or class name insted of msgtxt "null"
2014-07-21 22:13:34 +02:00
reger
7f0e757bb5 fix bookmark.rss
- channel end tag postion
- link with html entity
2014-07-21 19:26:12 +02:00
orbiter
e441831a24 reverted toString() change in AnchorURL to prevent mistakenly used
toString(). This fixes also the update link bug.
2014-07-21 15:58:29 +02:00
reger
697b9743e7 Add link to RemoteCrawl_p
suggestion http://mantis.tokeek.de/view.php?id=277
2014-07-21 02:00:05 +02:00
reger
47f201a6b8 Add Solr default query fields (&qf) to select servlet
according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query).
This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration.
Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields
and does not relay on the duplication of text to text_t.
- add author to reset-default boost fields (support results for author nav)
2014-07-21 00:47:14 +02:00
reger
f96cfdc84d prevent array out of bound exception on getRankingProfile(x)
on faulty &profileNr=  query parameter
2014-07-21 00:04:54 +02:00
Michael Peter Christen
970368359b Merge branch 'master' of ssh://gitorious.org/yacy/rc1 2014-07-20 22:35:40 +02:00
Michael Peter Christen
c4608469bf Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1 2014-07-20 22:35:19 +02:00
reger
8004cfc961 fix input boostfield factor of 0.0 in RankingSolr
- input was accepted and stored but not editeable (added check factor >0.0 during edit)
- make use of some more predefined solr constants
2014-07-20 12:28:59 +02:00
reger
5f5fb4ecdc remove unused static (RSS)search from protocol 2014-07-20 02:49:49 +02:00
reger
7c1706d83a use CRLF in generated bat command scripts for windows
- for easier viewing with standard viewers
2014-07-20 00:06:22 +02:00
reger
a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
- with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid)
- this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches
- the index.html searchoption text adjusted to be displayed only if option configured
- add Archive-It to predefined systems
2014-07-20 00:00:43 +02:00
Michael Peter Christen
2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
attribute in the <a> tag for each crawl. This introduces a lot of
changes because it extends the usage of the AnchorURL Object type which
now also has a different toString method that the underlying
DigestURL.toString. It is therefore not advised to use .toString at all
for urls, just just toNormalform(false) instead.
2014-07-18 12:43:01 +02:00
Michael Peter Christen
bf1b6b93e7 do not write CR values to webgraph if no CR values are computed 2014-07-16 18:13:29 +02:00
Michael Peter Christen
e039e78210 small bugfixes 2014-07-16 16:04:38 +02:00
Michael Peter Christen
87f8118108 added option to delete documents from the webgraph 2014-07-16 16:04:19 +02:00
Michael Peter Christen
32a2ff925c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-16 14:58:27 +02:00
Michael Peter Christen
d07cdd8c3b added SolrCloud access mode and configuration 2014-07-16 14:57:51 +02:00
Michael Peter Christen
8514bffc22 enhanced postprocessing status report 2014-07-16 14:57:25 +02:00
malykhin.dmitry
53ecd54b45 Update russian translation 2014-07-13 02:50:16 +04:00
reger
f99f3d5cf2 fix button (clear list) text color in CrawlResults 2014-07-13 00:48:50 +02:00
reger
b24572f304 fix GSA filter query assignment
- use more parameter constants
2014-07-13 00:11:17 +02:00
Michael Peter Christen
b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
with metadata retrieval from connectors directly. This should cause
better usage of the cache. Automatically increase the metadata cache if
more memory is available.
2014-07-11 19:52:25 +02:00
Michael Peter Christen
62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
double-calling of solr
2014-07-11 18:36:04 +02:00
Michael Peter Christen
dd5cdfe212 reverted filter query hack, it did not work 2014-07-11 18:15:35 +02:00
Michael Peter Christen
b5d78ba156 reduced number of solr queries during crawling 2014-07-11 18:05:11 +02:00
Michael Peter Christen
5326970d6c enhanced solr queries for single document extraction 2014-07-11 18:04:55 +02:00
Michael Peter Christen
525575bd97 added debugging of filter queries in thread dump thread names 2014-07-11 17:34:41 +02:00
Michael Peter Christen
f319ef268f testing filter queries instead of queries to retrieve documents by id 2014-07-11 17:09:46 +02:00
Michael Peter Christen
fd87fa1613 removed more unnecessary exist-checks in ErrorCache 2014-07-11 16:48:08 +02:00
Michael Peter Christen
f2b476e08b don't do a double check to solr for failed documents if they are not
written to solr
2014-07-11 16:26:52 +02:00
Michael Peter Christen
06ab72d1af enhanced crawler host round-robin strategy 2014-07-11 16:01:42 +02:00