Commit Graph

5279 Commits

Author SHA1 Message Date
reger
a2dcf64039 fix IndexImportMediawiki_p servlet's refresh header
add url parameter to make sure no parameter are included in refresh url 
which could cause unwanted restart of import job

see http://mantis.tokeek.de/view.php?id=591 comments
2015-10-25 05:41:25 +01:00
Michael Peter Christen
ac034db8bc Merge branch 'master' of https://github.com/luccioman/yacy_search_server
# Conflicts:
#	htroot/js/highslide/highslide.js
#	source/net/yacy/document/ImageParser.java
2015-10-24 11:22:35 +08:00
luc
a156fd65d0 Patch to manage render or load errors is still needed after highlight.js
version upgrade.
Updated patch for better behavior consistency between browsers.
2015-10-22 00:36:34 +02:00
luc
37e28e0dd3 - Keep aspect ratio of images rendered directly by browser such as gif
and svg.
- Corrected quadratic rendering of landscape images with height smaller
than maxHeight
2015-10-21 02:49:51 +02:00
reger
571609c208 upd javascript img viewerto highslide 4.1.13 2015-10-21 02:14:04 +02:00
luc
e2d00585e2 Display full size preview using ViewImage Servlet. 2015-10-20 01:17:37 +02:00
luc
74b0283d57 Added image preview error management. 2015-10-20 01:15:02 +02:00
luc
d6522fa4a2 Integrated haraldk/TwelveMonkeys library to first add TIF image format
support.
2015-10-15 10:06:51 +02:00
luc
62e07a26a0 Refactoring : split into sub-functions to make it understanding and
performance measurement easier.
2015-10-14 10:15:00 +02:00
reger
c9937973e3 unescape MultiProtocolURL getAttributes() return values.
use getAttributes() to get query parameters as clear text (w/o url encoding)
use getSearchpartMap() to get in internal format (url encoded)

fix for http://mantis.tokeek.de/view.php?id=606
2015-10-13 02:43:18 +02:00
reger
10b0eb106f fix link target on iframe list in CrawlProfileEditor 2015-10-11 06:06:40 +02:00
reger
5744342fec handle image preview for url w empty file extension
fix of commit 688f7b2a5c
2015-10-06 04:13:04 +02:00
reger
43c27aa550 upd to solr/lucene 5.3.1 2015-10-03 23:20:33 +02:00
reger
688f7b2a5c allow/display svg images in image results previews
svg is not supported by awt but by most browser. Image content is delivered as received (without size adjustment)
2015-10-02 01:48:48 +02:00
Michael Peter Christen
225200194a every time a crawl is started, the user expects a different search
result behaviour. This requires that the search cache is flushed for
each crawl start. TODO: this should also be done if a crawl is
terminated.
2015-10-01 13:18:44 +02:00
reger
b92d81b073 remove double caching of inputstream in ViewImage 2015-09-27 03:24:28 +02:00
Michael Peter Christen
3c31bf845f fix for latest merge 2015-09-24 13:53:54 +02:00
luc
5578886f6f Merge branch 'master' of https://github.com/luccioman/yacy_search_server.git 2015-09-23 21:04:20 +02:00
reger
2951c9fc40 remove unused check for known fileextension in searchtrailer
(check is done on add to filetype-nav)
2015-09-22 03:52:15 +02:00
reger
733d725dec limit css scrolling to result/content window x
from pull request #10
2015-09-15 02:11:30 +02:00
Burkhard
4c38083a11 Merge pull request #10 from Raegdan/raegdan-css-layout-fix
Fixed CSS scrolling
2015-09-15 02:09:17 +02:00
luccioman
a7179138ce Returned again to main repository location : does anyone want to
consider mantis 597 ?  (http://mantis.tokeek.de/view.php?id=597)
2015-09-11 17:23:59 +02:00
luccioman
199b2ce52d Translator refactoring : to simplify locale files writing, process keys
as simple string and no more as regular expressions.
Updated all locale files to adapt to refectored Translator : removed
useless escaped characters and did minor corrections.
Performed minor syntax corrections on some html source files.
Added an util to translate all html source files with all locales
without launching full YaCy application.
Corrected main arguments parsing on other translation utils.
2015-09-11 17:20:11 +02:00
luccioman
4dd9c0d5d9 Merge from main repository 2015-09-08 08:54:48 +02:00
Michael Peter Christen
0a37d8af89 in case that a site crawl is started for urls with file:// path, the
host filter does not work because there is no host given in such urls.
In that case, patch the filter to be a sub-path filter.
2015-09-05 14:07:23 +02:00
luccioman
9df249296a Return to mai repository version 2015-09-04 13:52:03 +02:00
luccioman
c1d937a90c Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server 2015-09-04 09:57:49 +02:00
reger
7c1da173e0 fix missing license in image search
see http://mantis.tokeek.de/view.php?id=522
2015-09-03 23:36:57 +02:00
luccioman
918ef72bbe Corrected br markup 2015-09-03 08:59:17 +02:00
luccioman
f88bb2277e Corrected bookmark link title 2015-09-03 08:58:14 +02:00
luccioman
802ea66d19 Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server 2015-09-03 08:04:38 +02:00
reger
5297e80cda fix missing onclick in ConfigPortal
to enable checkbox
2015-09-03 00:59:14 +02:00
luccioman
70e483ecc6 Merge branch 'master' of ssh://git@github.com/yacy/yacy_search_server 2015-09-01 08:57:32 +02:00
sixcooler
87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
moved and was not cleared anymore. This results in an huge fieldcache.
(http://lucene.apache.org/#highlights-of-the-lucene-release-include
https://issues.apache.org/jira/browse/LUCENE-5666)
Here I try to use DovValues where it is possible.
For this I used the Api-Scheme as new basis für the Solr-Schema.
This needs at least a complete optimization of the Solr-Index to get a
smaller FieldCache.
Everything that is indexed with these setting will not use the
Fieldcache at all.
2015-08-31 20:24:41 +02:00
luccioman
67799ce867 Updated translation of index.html, yacysearch.html and
simpleheader.template, corrected some special characters not written as
HTML entities.
2015-08-26 14:40:39 +02:00
Michael Peter Christen
df3314ac1a added a new facet type based on a probabilistic classifier using
bayesian filters. This can be used to classify documents during
indexing-time using a pre-definied bayesian filter.

New wordings:
- a context is a class where different categories are possible. The
context name is equal to a facet name.
- a category is a facet type within a facet navigation. Each context
must have several categories, at least one custom name (things you want
to discover) and one with the exact name "negative".

To use this, you must do:
- for each context, you must create a directory within
DATA/CLASSIFICATION with the name of the context (the facet name)
- within each context directory, you must create text files with one
document each per line for every categroy. One of these categories MUST
have the name 'negative.txt'.

Then, each new document is classified to match within one of the given
categories for each context.
2015-08-10 14:27:44 +02:00
Michael Peter Christen
dbbad23e12 removed warnings 2015-08-03 05:37:34 +02:00
reger
9e4043731d add missing ; in base.css 2015-08-02 21:36:44 +02:00
Michael Peter Christen
de8cfbe1d7 added export option to export the fulltext of the search index text only 2015-07-30 03:21:40 +02:00
Kirill Fomchenko
ab22a32c09 Fixed CSS scrolling
When the sidebar on search page becomes scrollable, the scrollbar shrinks the sidebar and makes the search results weirdly scrollable on X axis by several pixels. Now the sidebar always have a scrollbar, and results are never X-scrollable.
2015-07-21 08:21:10 +03:00
Michael Peter Christen
785781253e added jsonp to suggest servlet 2015-07-16 23:42:41 +02:00
reger
821262a179 add CommonPattern for multiple spaces
to eliminate empty split words on following spaces
2015-07-04 22:49:01 +02:00
Michael Peter Christen
f901e7d3cf fix for non-authorized view of IndexBrowser: show only the number of
non-failure documents
2015-06-30 11:12:36 +02:00
Michael Peter Christen
3c4c69adea fix for
- bad regex computation for crawl start from file (limitation on domain
did not work)
- servlet error when starting crawl from a large list of urls
2015-06-29 02:02:01 +02:00
Michael Peter Christen
1fec7fb3c1 suppress access to solr when doing search suggestions in case that the
index has more than two million documents. This protects the index from
beeing flooded with search requests that cannot be resolved before the
real search query has to be computet.
2015-06-24 13:02:12 +02:00
Michael Peter Christen
886fca2260 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-06-24 01:59:46 +02:00
Michael Peter Christen
694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
This is a very complex migration: many classes had been renamed or
removed, dependencies changed and the solr index type is now aligned to
be a solr cloud repository.
Together with the Solr 5.2 library update, one other dependent library
had been updated as well: httpclient 4.4->4.4.1

Older indexes are migrated from 4_10 to 5_2. However, the new index
structure is more efficient and we recommend to re-index everything.
Please use the index export before you do the update to a large
surrogate xml file. After the update, start with an empty index and then
initialize this with your dump.
2015-06-24 01:55:51 +02:00
Michael Peter Christen
6c2e6f1f37 remove redundant code 2015-06-23 23:41:43 +02:00
Michael Peter Christen
9c12555be5 added link to Snapshots in search results if the snapshot exists and
option is set in ConfigSearchPage_p
(this is a stub: we also need a visualization of pdf files!)
2015-06-07 20:37:37 +02:00
reger
72f6a0b0b2 enhance recrawl job
- allow to modify the query to select documents to  process (after job has started)
- allow to include failed urls (httpstatus <> 200)
2015-06-06 18:45:39 +02:00