Commit Graph

10943 Commits

Author SHA1 Message Date
orbiter
59160984cc timeline performance update 2014-07-03 13:06:29 +02:00
orbiter
54bea96e67 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-07-02 23:23:34 +02:00
Michael Peter Christen
841cc77391 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-02 14:35:02 +02:00
Michael Peter Christen
e09218129c remove check for local solr. This check was made during a time when Solr
was optional and another alternative metadata store was available. Since
that store is now removed, Solr is always available (internally or
externally)
2014-07-02 14:34:48 +02:00
orbiter
2073e69034 fix for long periods in timeline 2014-07-02 11:29:50 +02:00
reger
1f94df29e7 fix NPE in solr rss where snippet contains only the title text
and adjusted xslt, for solr snippets (&hl=true) to decode the xml encoded html <b> tag by adding disable-output-escaping
(still open item description may be double as dc: tag and rss.description tag)
2014-07-01 23:24:26 +02:00
Michael Peter Christen
09dcdb9b19 update to solr 4.9.0 2014-07-01 16:39:00 +02:00
Michael Peter Christen
282b53db42 update of commons-io and slf4j-api (as preparation for Solr 4.9.0) 2014-07-01 16:18:12 +02:00
Michael Peter Christen
1cd4b2e8be Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-07-01 16:06:12 +02:00
Michael Peter Christen
8c52f0651b refactoring of AccessTracker events & timeline fix 2014-07-01 16:06:01 +02:00
reger
431a5f9c4e added test case for TextSnippet,
removed obsolete/unused parameter and reference to MediaSnippet
2014-06-30 05:36:48 +02:00
Michael Peter Christen
5b94a257ce no timeout for large reference collections 2014-06-29 22:26:22 +02:00
Michael Peter Christen
f5b817bac4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-06-29 22:25:08 +02:00
reger
cb2c17d236 extract author and keywords in .doc and .ppt parser 2014-06-29 02:54:09 +02:00
reger
a5707cd2eb enable proper Author navigator
- author facet is based on omitted author_sxt field
- adjust to make author nav available on exist of author field but keep using author_sxt to construct the facet (why!?)
- add check for querymodifier author in searchevent
2014-06-27 23:05:06 +02:00
Michael Peter Christen
1b279d7a7e fixed external link 2014-06-27 15:12:53 +02:00
Michael Peter Christen
74206a10c7 refactoring 2014-06-27 14:40:36 +02:00
orbiter
fec673c9d1 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-06-27 10:15:37 +02:00
orbiter
4a66af716d added apkParser stub (work in progress) 2014-06-27 10:15:01 +02:00
orbiter
c59da9fe7a added access tracker log reader stub 2014-06-27 10:14:36 +02:00
reger
2d67f29244 adjust mergeDocument after parsing to
- preserve charset and languages
- fix merge of author
2014-06-26 22:16:15 +02:00
Michael Peter Christen
0d29b972cc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-06-26 13:02:56 +02:00
Michael Peter Christen
36e623d8bf enhanced metadata enrichment for media file type search:
- Web servers may now deliver YaCy-specific http header field with a
title and keywords. The new http header fields are:
X-YaCy-Media-Title - to be used for media (image, audio, video) titles
X-YaCy-Media-Keywords - to be used for media (image, audio, video)
keywords
- both fields are written to document fields title and keywords and are
searched also during image search.
- to make the usage of arbitrary http header fields (including this new
fields) possible in the /api/push_p.json servlet, a new POST argument is
also introduced to push http header fields. The new POST attribute is
named "responseHeader-X" (where X is the counter). It is allowed to use
this attribute as multi-attribute several times, each can be filled with
a http header line.
- see /api/push_p.html for examples
2014-06-26 13:02:35 +02:00
Michael Peter Christen
49886fab08 enhanced debugging 2014-06-26 12:57:01 +02:00
Michael Peter Christen
b893c42a0f bugfix for image search 2014-06-26 12:56:33 +02:00
Michael Peter Christen
c7995d3e2a increased fixed limit for http POST request sizes to 100MB 2014-06-26 11:58:07 +02:00
reger
7847a93558 fix AbstractParser.singleList not adding null strings
- prevents null titles in oo... parser  (as detected by ParserTest)
- correct ParserTest dc_description check (dc_description allowed to return 0 length array)
2014-06-26 02:56:45 +02:00
Michael Peter Christen
8acae852a0 write <em>-tagged texts also into the bold_txt field 2014-06-25 11:51:11 +02:00
reger
a88ea14e09 harmonize use of style for "delete" button
- apply the monstly used btn-danger class
2014-06-22 23:33:59 +02:00
sixcooler
66c784c552 bump to httpclient-4.3.4 2014-06-22 16:24:45 +02:00
reger
b9f6acee23 update to Jetty 9.2.1 2014-06-22 00:21:47 +02:00
reger
90c4576361 add a link to recrawl index entry to metadata html page
- to allow manually renew index content for this url (e.g. in case it is a remote search result with metadata only)
- use simply a  QuickCrawlLink_p javascript snippet (minimalistic 1st solution)
2014-06-21 04:21:29 +02:00
Michael Peter Christen
8fd72b5e8b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-06-20 13:57:06 +02:00
Michael Peter Christen
81d0f01a6f added 'synchronous' and 'commit' flags in push api 2014-06-20 13:56:55 +02:00
Michael Peter Christen
2626c8f6db using concurrency to do base64 encoding in file POST commands 2014-06-20 13:55:15 +02:00
Michael Peter Christen
e132689818 fixed and enhanced Base64 (en)coder (again) 2014-06-20 13:54:18 +02:00
Michael Peter Christen
2415e3db43 enhanced ASCII byte[] -> String conversion 2014-06-20 13:53:22 +02:00
reger
5043eff33a move page navigation below results (image search)
force page navigation to be displayed below results in image search for any number of displayed images instead to be displayed to the right of last image.
2014-06-20 01:02:43 +02:00
Michael Peter Christen
4751ed974f enhanced base64 encoding 2014-06-19 12:11:02 +02:00
Michael Peter Christen
e949071160 removed superfluous date method 2014-06-19 12:10:42 +02:00
Michael Peter Christen
501d55cd35 removed superfluous assert 2014-06-19 12:10:12 +02:00
Marc Nause
f443cfa32d Improvements and bugfixes for recording actions of blacklist API. 2014-06-17 22:54:47 +02:00
Michael Peter Christen
0ba6b98d5b fix for broken json 2014-06-17 11:36:20 +02:00
orbiter
4177c9cf05 fix for crawl start check 2014-06-15 22:50:04 +02:00
orbiter
515e63c274 ignore the api javadoc directory in git commits 2014-06-15 12:41:14 +02:00
orbiter
0bbb5040b8 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-06-15 12:38:52 +02:00
orbiter
9d5d86cd03 Added filter query options to the ranking servlet /RankingSolr_p.html.
Filter queries are not actually related to ranking, but user requests
have pointed out that specific boost queries to move results to the end
of the result list are not sufficient. Such boost filters may be better
executed as actual filter and therefore such a filter can now be
statically applied to every search request. A typical use could be the
expression "http_unique_b:true AND www_unique_b:true" which uses the
recently introduced fields http_unique_b and www_unique_b which are true
only for one of the alternatives with/without http(s) and with/without
prefix 'www.' in host names.
2014-06-15 12:38:30 +02:00
Michael Peter Christen
d2151857f1 Added collection navigation:
The collection field (can be filled i.e. in Crawl Start) can be used to
add categories to YaCy index entries. The usage of that field was
restricted to solr searches and post argument filters as implemented in
commit f7571386a3.
This commit extends collections to a full navigation option in the
standard YaCy search interface. The field is not active by default but
can be activated easily in the /ConfigSearchPage_p.html servlet (just
check the 'Collection' facet field). Collections can now be used for (at
least) two purposes:
- to provide search tenants (through post argument collection)
- to provide self-made category navigation
Search requests may now have (independently from switched on or off
collection facet) a "collection:<collection-name>" modifier attached;
firthermore collection names may use disjunctions using the '|' pipe
symbol. For example, this is a valid search request:
www collection:user|proxy
2014-06-15 12:11:23 +02:00
Michael Peter Christen
74c249288a added a push api to make it possible to upload files directly without
crawling to the YaCy indexer. Files are uploaded using POST multipart
requests; multiple file uploads are possible as well. Each file has
attached the file date and mime type which is used to get the right
parser for the submitted data. Also an url is submitted which is
assigned to the document.
The CrawlSwitchboard has a new option for default Crawl Profiles which
are assigned dynamically from the new push interface.
2014-06-12 18:10:07 +02:00
Michael Peter Christen
f13c8aa7dd re-implementation of file push option in the context of POST http
requests. The internal representation of post-arguments is String and
therefore not appropriate for byte[] object as submitted by file pushes.
Therefore all pushed files are encoded to base64 _after_ uploading with
an http form (you do not need to do that encoding yourself) to hand-over
the byte[] as string in the post argument.
Servlets which read such files must decode the base64 data to get the
original byte[] array.
This is considered as a temporary solution for file uploads and a proper
implementations would need to consider all attributes as handed over as
Objects with either String or byte[] Object instances. This would be a
major code change and is not done at this time here now. The feature was
submitted to realize a feature as pushed with the next commit.
2014-06-12 18:06:22 +02:00