Commit Graph

4912 Commits

Author SHA1 Message Date
reger
0a2f4a0e2f eliminate lat/lon type conversion in osm
(define as double)
2014-08-10 22:35:25 +02:00
Michael Peter Christen
01bbb20666 increased default logging line count to max 2014-08-06 12:40:35 +02:00
Michael Peter Christen
9bc3e457dd fix for termination of all crawls 2014-08-05 22:23:52 +02:00
Michael Peter Christen
8d650ca225 added hint to port forwarding videos 2014-08-05 21:31:28 +02:00
reger
3963bca3b6 catch IndexControlRWIs_p error if RWI not connected 2014-08-04 00:03:42 +02:00
orbiter
2371d6b8db target linktexts must be string to enable search facets on these fields 2014-08-01 13:20:25 +02:00
Michael Peter Christen
05d58e4df0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-08-01 12:04:25 +02:00
Michael Peter Christen
98f45c9032 fix for image alt attachment to AnchorURLs in html parser. 2014-08-01 12:04:15 +02:00
orbiter
22ce4fb4dd better error handling for remote solr queries and exists-checks 2014-08-01 11:00:10 +02:00
orbiter
161a11070c yacystats is gone :( 2014-07-29 11:12:01 +02:00
Michael Peter Christen
c115f3869c enhanced snippet computation and test method in ViewFile 2014-07-28 15:42:57 +02:00
Michael Peter Christen
6e1dc444c3 added a snippet test function in ViewFile: you can now search for a
specific word on the document; the servlet returns the snippet in the
same way as it would be shown in a search result.
2014-07-24 14:59:37 +02:00
reger
29d1945c16 fix double &query parameter (index.html)
?query=word&query=
2014-07-22 21:54:46 +02:00
Michael Peter Christen
542c20a597 changed handling of crawl profile field crawlingIfOlder: this should be
filled with the date, when the url is recognized as to be outdated. That
field was partly misinterpreted and the time interval was filled in. In
case that all the urls which are in the index shall be treated as
outdated, the field is filled now with Long.MAX_VALUE because then all
crawl dates are before that date and therefore outdated.
2014-07-22 00:23:17 +02:00
reger
7f0e757bb5 fix bookmark.rss
- channel end tag postion
- link with html entity
2014-07-21 19:26:12 +02:00
orbiter
e441831a24 reverted toString() change in AnchorURL to prevent mistakenly used
toString(). This fixes also the update link bug.
2014-07-21 15:58:29 +02:00
reger
697b9743e7 Add link to RemoteCrawl_p
suggestion http://mantis.tokeek.de/view.php?id=277
2014-07-21 02:00:05 +02:00
reger
47f201a6b8 Add Solr default query fields (&qf) to select servlet
according to the ranking profiles boost fields defined by the peer (if df/qf is not specified in query).
This allows for pretty simple queries ( q=word) without the need to know about the specific index configuration.
Making sure all relevant fields (as determined by the index owner) are searched, still maintaining the option to query specific fields
and does not relay on the duplication of text to text_t.
- add author to reset-default boost fields (support results for author nav)
2014-07-21 00:47:14 +02:00
reger
8004cfc961 fix input boostfield factor of 0.0 in RankingSolr
- input was accepted and stored but not editeable (added check factor >0.0 during edit)
- make use of some more predefined solr constants
2014-07-20 12:28:59 +02:00
reger
a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
- with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid)
- this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches
- the index.html searchoption text adjusted to be displayed only if option configured
- add Archive-It to predefined systems
2014-07-20 00:00:43 +02:00
Michael Peter Christen
2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
attribute in the <a> tag for each crawl. This introduces a lot of
changes because it extends the usage of the AnchorURL Object type which
now also has a different toString method that the underlying
DigestURL.toString. It is therefore not advised to use .toString at all
for urls, just just toNormalform(false) instead.
2014-07-18 12:43:01 +02:00
Michael Peter Christen
87f8118108 added option to delete documents from the webgraph 2014-07-16 16:04:19 +02:00
Michael Peter Christen
32a2ff925c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-16 14:58:27 +02:00
Michael Peter Christen
d07cdd8c3b added SolrCloud access mode and configuration 2014-07-16 14:57:51 +02:00
Michael Peter Christen
8514bffc22 enhanced postprocessing status report 2014-07-16 14:57:25 +02:00
reger
f99f3d5cf2 fix button (clear list) text color in CrawlResults 2014-07-13 00:48:50 +02:00
Michael Peter Christen
b5fc2b63ea removed exist() retrieval functions from error cache and replaced it
with metadata retrieval from connectors directly. This should cause
better usage of the cache. Automatically increase the metadata cache if
more memory is available.
2014-07-11 19:52:25 +02:00
Michael Peter Christen
62c72360ee cleanup of checkAcceptanceInitially in CrawlStacker, should avoid
double-calling of solr
2014-07-11 18:36:04 +02:00
orbiter
dab9a0786a Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-07-11 04:04:34 +02:00
orbiter
51bf5c85b0 Renamed the transmission cloud to buffer in dispatcher since the name
'cloud' was a bad idea. Changed also the accumulation process for peer
targets so that every dht chunk is not assigned the set of redundant
targets but they are assigned to redundant targets individually. This
enhances the granularity of the target accumulation and should enhance
the efficiency of the process. Finally the dht protocol client was
enriched with the ability to remove the 'accept remote index' flag from
peers or remove peers completely if they do not answer at all.
2014-07-11 04:04:09 +02:00
reger
7057e0b3e2 catch input file not found in Mediawiki import 2014-07-10 23:58:47 +02:00
Michael Peter Christen
f384fd624b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-07 11:11:50 +02:00
reger
ba5a59a28d make search result also avail. as atom feed via /yacysearch.atom
- fix logo in rss feed
2014-07-03 22:01:13 +02:00
orbiter
59160984cc timeline performance update 2014-07-03 13:06:29 +02:00
orbiter
54bea96e67 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-07-02 23:23:34 +02:00
Michael Peter Christen
15b2fad6a2 reverted latest change for reindexing because that works actually only
for internal Solr indexes. This is mainly caused by the fact that an
external Solr may be also a SolrCloud which do not support LukeRequests,
which are needed to request the old Schema.
2014-07-02 14:56:34 +02:00
Michael Peter Christen
841cc77391 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-02 14:35:02 +02:00
Michael Peter Christen
e09218129c remove check for local solr. This check was made during a time when Solr
was optional and another alternative metadata store was available. Since
that store is now removed, Solr is always available (internally or
externally)
2014-07-02 14:34:48 +02:00
orbiter
2073e69034 fix for long periods in timeline 2014-07-02 11:29:50 +02:00
reger
1f94df29e7 fix NPE in solr rss where snippet contains only the title text
and adjusted xslt, for solr snippets (&hl=true) to decode the xml encoded html <b> tag by adding disable-output-escaping
(still open item description may be double as dc: tag and rss.description tag)
2014-07-01 23:24:26 +02:00
Michael Peter Christen
8c52f0651b refactoring of AccessTracker events & timeline fix 2014-07-01 16:06:01 +02:00
Michael Peter Christen
1b279d7a7e fixed external link 2014-06-27 15:12:53 +02:00
Michael Peter Christen
74206a10c7 refactoring 2014-06-27 14:40:36 +02:00
Michael Peter Christen
36e623d8bf enhanced metadata enrichment for media file type search:
- Web servers may now deliver YaCy-specific http header field with a
title and keywords. The new http header fields are:
X-YaCy-Media-Title - to be used for media (image, audio, video) titles
X-YaCy-Media-Keywords - to be used for media (image, audio, video)
keywords
- both fields are written to document fields title and keywords and are
searched also during image search.
- to make the usage of arbitrary http header fields (including this new
fields) possible in the /api/push_p.json servlet, a new POST argument is
also introduced to push http header fields. The new POST attribute is
named "responseHeader-X" (where X is the counter). It is allowed to use
this attribute as multi-attribute several times, each can be filled with
a http header line.
- see /api/push_p.html for examples
2014-06-26 13:02:35 +02:00
reger
a88ea14e09 harmonize use of style for "delete" button
- apply the monstly used btn-danger class
2014-06-22 23:33:59 +02:00
Michael Peter Christen
8fd72b5e8b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-06-20 13:57:06 +02:00
Michael Peter Christen
81d0f01a6f added 'synchronous' and 'commit' flags in push api 2014-06-20 13:56:55 +02:00
reger
5043eff33a move page navigation below results (image search)
force page navigation to be displayed below results in image search for any number of displayed images instead to be displayed to the right of last image.
2014-06-20 01:02:43 +02:00
Marc Nause
f443cfa32d Improvements and bugfixes for recording actions of blacklist API. 2014-06-17 22:54:47 +02:00
Michael Peter Christen
0ba6b98d5b fix for broken json 2014-06-17 11:36:20 +02:00