reger
7328c2883b
fix type in .init description
...
http://mantis.tokeek.de/view.php?id=430
2014-07-26 00:38:53 +02:00
reger
94819f0797
set .ini default boost fields to same as assigned by button "reset to default"
...
(in RankingSolr_p)
- fix typo http://mantis.tokeek.de/view.php?id=430
2014-07-26 00:17:41 +02:00
reger
b4b937a046
update to pdfbox 1.8.6
2014-07-25 23:55:10 +02:00
orbiter
1027f3d04a
fix for the usage of ready-prepared solr queries, some queries are
...
formulated as edismax query but this was not set as query attribut. The
defType=edismax property needs a qf-field, so this was added as well. Do
not remove that field again! This fixes also a problem with title-unique
computation.
2014-07-25 18:53:13 +02:00
Michael Peter Christen
f94c91315b
if the webgraph is used, then use it also for reference computation to
...
avoid contradictions with references_i in the collection index.
2014-07-24 15:35:53 +02:00
Michael Peter Christen
6e1dc444c3
added a snippet test function in ViewFile: you can now search for a
...
specific word on the document; the servlet returns the snippet in the
same way as it would be shown in a search result.
2014-07-24 14:59:37 +02:00
Michael Peter Christen
c63e93df46
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-07-24 00:04:56 +02:00
Michael Peter Christen
1bf605b6d1
toString() fix
2014-07-24 00:04:46 +02:00
orbiter
4b06adb751
fix for file urls
2014-07-23 17:54:31 +02:00
orbiter
08409ec680
no idea why the words max was an ordered one. This change increaes speed
...
dunring document processin a bit
2014-07-23 17:54:16 +02:00
reger
dd311ddac9
Merge origin/master
2014-07-22 22:01:01 +02:00
reger
e5854a5cdb
fix localhost link to opensearchdescription.xml
2014-07-22 21:57:38 +02:00
reger
29d1945c16
fix double &query parameter (index.html)
...
?query=word&query=
2014-07-22 21:54:46 +02:00
Marc Nause
172d7e68da
Updated commandline reconfiguration tool.
...
*) fixed "set HTTP port" (root cause was sloppy implementation of method
which gets values from config file)
*) added "set HTTPS port"
2014-07-22 21:52:53 +02:00
Michael Peter Christen
b44626e55b
fixed target_alt_t in webgraph
2014-07-22 18:24:10 +02:00
Michael Peter Christen
504327b15c
fix for condition for writing the webgraph
2014-07-22 00:59:08 +02:00
Michael Peter Christen
542c20a597
changed handling of crawl profile field crawlingIfOlder: this should be
...
filled with the date, when the url is recognized as to be outdated. That
field was partly misinterpreted and the time interval was filled in. In
case that all the urls which are in the index shall be treated as
outdated, the field is filled now with Long.MAX_VALUE because then all
crawl dates are before that date and therefore outdated.
2014-07-22 00:23:17 +02:00
Michael Peter Christen
4eec1a7452
refactoring (change Metadata name of load time data structure to avoid
...
confusion with Node data which is also called metadata)
2014-07-21 23:54:23 +02:00
reger
c95ba52cf0
improve logexception info
...
- log a message or class name insted of msgtxt "null"
2014-07-21 22:13:34 +02:00
reger
7f0e757bb5
fix bookmark.rss
...
- channel end tag postion
- link with html entity
2014-07-21 19:26:12 +02:00
orbiter
e441831a24
reverted toString() change in AnchorURL to prevent mistakenly used
...
toString(). This fixes also the update link bug.
2014-07-21 15:58:29 +02:00
reger
697b9743e7
Add link to RemoteCrawl_p
...
suggestion http://mantis.tokeek.de/view.php?id=277
2014-07-21 02:00:05 +02:00
reger
47f201a6b8
Add Solr default query fields (&qf) to select servlet
...
according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query).
This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration.
Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields
and does not relay on the duplication of text to text_t.
- add author to reset-default boost fields (support results for author nav)
2014-07-21 00:47:14 +02:00
reger
f96cfdc84d
prevent array out of bound exception on getRankingProfile(x)
...
on faulty &profileNr= query parameter
2014-07-21 00:04:54 +02:00
Michael Peter Christen
970368359b
Merge branch 'master' of ssh://gitorious.org/yacy/rc1
2014-07-20 22:35:40 +02:00
Michael Peter Christen
c4608469bf
Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1
2014-07-20 22:35:19 +02:00
reger
8004cfc961
fix input boostfield factor of 0.0 in RankingSolr
...
- input was accepted and stored but not editeable (added check factor >0.0 during edit)
- make use of some more predefined solr constants
2014-07-20 12:28:59 +02:00
reger
5f5fb4ecdc
remove unused static (RSS)search from protocol
2014-07-20 02:49:49 +02:00
reger
7c1706d83a
use CRLF in generated bat command scripts for windows
...
- for easier viewing with standard viewers
2014-07-20 00:06:22 +02:00
reger
a2cb366b25
Combine /heuristic search modifier with opensearch configured targets
...
- with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid)
- this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches
- the index.html searchoption text adjusted to be displayed only if option configured
- add Archive-It to predefined systems
2014-07-20 00:00:43 +02:00
Michael Peter Christen
2de159719b
added an option to set 'obey nofollow' for links with rel="nofollow"
...
attribute in the <a> tag for each crawl. This introduces a lot of
changes because it extends the usage of the AnchorURL Object type which
now also has a different toString method that the underlying
DigestURL.toString. It is therefore not advised to use .toString at all
for urls, just just toNormalform(false) instead.
2014-07-18 12:43:01 +02:00
Michael Peter Christen
bf1b6b93e7
do not write CR values to webgraph if no CR values are computed
2014-07-16 18:13:29 +02:00
Michael Peter Christen
e039e78210
small bugfixes
2014-07-16 16:04:38 +02:00
Michael Peter Christen
87f8118108
added option to delete documents from the webgraph
2014-07-16 16:04:19 +02:00
Michael Peter Christen
32a2ff925c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2014-07-16 14:58:27 +02:00
Michael Peter Christen
d07cdd8c3b
added SolrCloud access mode and configuration
2014-07-16 14:57:51 +02:00
Michael Peter Christen
8514bffc22
enhanced postprocessing status report
2014-07-16 14:57:25 +02:00
malykhin.dmitry
53ecd54b45
Update russian translation
2014-07-13 02:50:16 +04:00
reger
f99f3d5cf2
fix button (clear list) text color in CrawlResults
2014-07-13 00:48:50 +02:00
reger
b24572f304
fix GSA filter query assignment
...
- use more parameter constants
2014-07-13 00:11:17 +02:00
Michael Peter Christen
b5fc2b63ea
removed exist() retrieval functions from error cache and replaced it
...
with metadata retrieval from connectors directly. This should cause
better usage of the cache. Automatically increase the metadata cache if
more memory is available.
2014-07-11 19:52:25 +02:00
Michael Peter Christen
62c72360ee
cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
...
double-calling of solr
2014-07-11 18:36:04 +02:00
Michael Peter Christen
dd5cdfe212
reverted filter query hack, it did not work
2014-07-11 18:15:35 +02:00
Michael Peter Christen
b5d78ba156
reduced number of solr queries during crawling
2014-07-11 18:05:11 +02:00
Michael Peter Christen
5326970d6c
enhanced solr queries for single document extraction
2014-07-11 18:04:55 +02:00
Michael Peter Christen
525575bd97
added debugging of filter queries in thread dump thread names
2014-07-11 17:34:41 +02:00
Michael Peter Christen
f319ef268f
testing filter queries instead of queries to retrieve documents by id
2014-07-11 17:09:46 +02:00
Michael Peter Christen
fd87fa1613
removed more unnecessary exist-checks in ErrorCache
2014-07-11 16:48:08 +02:00
Michael Peter Christen
f2b476e08b
don't do a double check to solr for failed documents if they are not
...
written to solr
2014-07-11 16:26:52 +02:00
Michael Peter Christen
06ab72d1af
enhanced crawler host round-robin strategy
2014-07-11 16:01:42 +02:00