Commit Graph

13055 Commits

Author SHA1 Message Date
luccioman
e5858bc8c8 Fixed a NullPointerException case possible on Index Export
As reported by Palulukas in YaCy forum
(http://forum.yacy-websuche.de/viewtopic.php?f=18&t=5944&sid=dcef5b899ab4aa9b40e3a3d158c13aed#p33454)
the Index Export operation can fails, notably when the Solr index
contains one or more documents with empty (despite required)
"load_date_dt" field.

This fixes the export failure when the situation finally occurs, but
more should be done to harden verifications on minimum required fields.
2017-02-17 11:09:30 +01:00
reger
7e53860fc7 fix NPE in HTMLResponseWriter on missing document title 2017-02-16 02:36:24 +01:00
reger
5e8879beb7 Reduce self generated content for text_t (visible text index field)
to avoid repeat of tokenized url as description,
continuation of 7e09bff4a1
1409cabe8b
Add some javadoc, and not needed remove of omitted fields in postprocessing.
2017-02-16 01:43:14 +01:00
reger
6ec6ab55ba removed faroo news from default opensearch config
As @luccioman informed, it's only useable with a free api key
http://www.faroo.com/hp/api/api.html
http://blog.faroo.com/2013/06/30/faroo-introduces-an-api-key/
2017-02-15 23:26:54 +01:00
luccioman
6e89d125f2 Added robots.txt support for heuristics federated search.
As noticed by @reger24, abusive use of OpenSearch systems should be
prevented, especially if allowing to parse and reuse HTML results.
robots.txt file is now checked before requesting an external OpenSearch
system to respect the host exclusions and eventual crawl-delay value.
The check is also performed when trying to add a new OpenSearch URL
template through the /ConfigHeuristics_p.html admin page.
2017-02-15 15:04:40 +01:00
reger
7e6e14a406 adjust translation to renamed configparser_p.html 2017-02-14 02:30:26 +01:00
reger
a011a97de9 make ConfigParser a protected page, for consistent behavior of locked
menu items.
2017-02-14 02:04:42 +01:00
reger
f85aaa7c76 update opensearch conf - remove suche.sueddeutsche.de
apparently they've revoked the participation in opensearch initiative.
2017-02-14 00:31:32 +01:00
luccioman
bf16de29c1 Added support for HTML OpenSearch results.
Many OpenSearch systems do not provide results as standard RSS/Atom
feeds but only as HTML. 

This modification add some support for custom OpenSearch HTML results
through the use of mapping files (as already done for federated Solr
search) relying on CSS-like selectors to retrieve information from HTML
content.

An example mapping file is provided to map results from the
www.npmjs.com OpenSearch URL.
2017-02-13 19:11:17 +01:00
reger
a79194a102 upd to Jetty-9.2.21.v20170120 2017-02-11 19:53:27 +01:00
luccioman
4306f4d9a3 Upgraded Apache Ant to 1.10.1 in the Docker alpine flavor image
For a more reliable Docker image build, also switched to the ant archive
repository to fetch the needed binary as other repositories only provide
the latest versions.
2017-02-10 09:40:42 +01:00
luccioman
54405577aa Replaced absolute redirection locations by relative ones when possible.
This makes integration of YaCy behind a reverse proxy subfolder easier.
2017-02-09 16:42:21 +01:00
luccioman
1857651988 Added a new Debug/Analysis advanced settings subsection.
As discussed in PR #93 with @JeremyRand and @reger24 this new advanced
settings page includes:
 - a new setting to control remote Solr responses encoding
 - some existing debug settings which could not be set through the admin
user interface
2017-02-09 11:05:06 +01:00
luccioman
526f2d6a8b Fixed NPE case occurring when local solr index is disabled in search. 2017-02-09 10:59:41 +01:00
luccioman
def55ec166 Improved termination of timed out remote solr requests to peers.
On timeout, closing remote Solr requests is proper than simply using
Thread.interrupt() that is not effective in most cases. Closing does not
ask commit on remote solr, but release http connections resources and is
more likely to end those threads that can else wait indefinitely.

Other related improvements included :
 - no more marking remote peer as not available when remote search is
interrupted before timeout by the cleanup job.
 - added a short fine log level trace of failing remote solr requests
2017-02-06 12:41:24 +01:00
luccioman
94af489f14 Removed deprecated "localMissCount" prop from yacysearchlatestinfo.json.
This property has been deprecated four years ago by commit
d74472f562. For any active search event
id, it was then always filled with "-UNRESOLVED_PATTERN-".
2017-02-03 10:32:31 +01:00
luccioman
08de58b6d3 Named a Thread without name for easier monitoring 2017-02-03 09:55:08 +01:00
luccioman
9a5a124bf2 Distinguished solr connectors thread names for easier monitoring. 2017-02-03 09:54:29 +01:00
luccioman
f6ad927a14 Refactored the DHT-Trigger section in Performance_p.html page.
This is to be more easily understandable and to reflect more accurately
the current memory strategies implementations that eventually set the
"proper" state not only because DHT reception.
2017-02-01 18:44:42 +01:00
luccioman
85d8173b1e Updated French translation for the /Performance_p.html page.
Also updated the master xliff file with missing recent changes.
2017-01-31 16:33:17 +01:00
luccioman
b51fd9467c Fixed unresolved pattern on directory entries in HostBrowser.xml api.
As described in mantis 725 (http://mantis.tokeek.de/view.php?id=725) the
HostBrowser.xml api directory entries had incorrect count attribute
value. 
This was because the HostBrowser html page and backing template servlet
evolved, but modifications were not reported on the xml api.
2017-01-31 09:20:19 +01:00
reger
f6b08443f0 adjust column layout in Settings_Proxy.inc 2017-01-30 22:44:28 +01:00
luccioman
21ab41d8d6 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2017-01-28 10:20:25 +01:00
luccioman
95b63f5126 Added a CSS class for infobox block.
This will prevent mistakenly hiding a div element not designed to be an
infobox but having a ".info" parent (After having previously added the
possibility for a div - and not only a span element - to be an infobox).
2017-01-28 10:19:39 +01:00
reger
8ce8e23e7d Update language file de & master, remove obsolete "Augmented Browsing" 2017-01-28 01:13:57 +01:00
reger
1f497ccad5 Add consistency check for related index fields upon load and save of
index schema.
To assemble the original link url for out-/inboundlinks, icons and pictures
the *_protocol_sxt and *_urlstub_sxt is needed (due to the used data-reduced
storage methode). Auto-enable *_protocol_sxt if *_urlstub_sxt is enabled.
to be able to correctly assemble the original link url.
2017-01-28 00:36:03 +01:00
luccioman
68afe900d0 Added user-friendly controls over disk usage configuration settings.
As mentioned in issue #103, control settings over YaCy disk usage
already existed but lacked a user-friendly way to set them.

I added it to the Performance_p.html administration page with a little
refactoring on the "Resource Observer" fieldset for improved
accessibility and HTML standards respect.
Also added the possibility to enable/disable the autoregulation fonction
from this page.
2017-01-27 15:47:15 +01:00
reger
95d2a28599 adjust the Field-Reindex Thread to verify and update the document id
in case hash (ID) doesn't match document url (sku field).
2017-01-26 23:49:15 +01:00
Michael Christen
e6e4ccaa00 Merge pull request #98 from Velociraptor85/patch-2
LSB Tag
2017-01-26 06:37:29 +01:00
Michael Christen
a7fd47b3aa Merge pull request #105 from ivar/patch-1
Update README.md - removes deprecated URL
2017-01-26 06:29:42 +01:00
Ivar Vasara
cfd21aaa10 Update README.md - removes deprecated URL 2017-01-25 20:36:48 -08:00
luccioman
d0182e4797 Improved Index Browser accessibility with semantically richer html tags.
Made use of ol, li, thead, th, tbody, h1 and h2 html tags.
Added aria-label attributes to provide alternative textual information
previously only conveyed by color cue.

Tested behavior with NVDA 2016.4 screen reader.
2017-01-26 01:13:32 +01:00
luccioman
fc01b69eca Fixed local image search pagination regression.
As reported by @tglman on issue #90, when searching images on the local
index only, pages next to the first were always empty. This was a
regression from commit c25e48e969.
2017-01-25 09:54:39 +01:00
luccioman
54ffd925dc Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2017-01-24 17:14:49 +01:00
luccioman
4c65321aae Updated master xliff file with missing entries for HostBrowser.html.
Also translated lang="en" html attribute to lang="[targetLang]" on
locale files having translated entries for HostBrowser.html
2017-01-24 17:14:14 +01:00
Michael Peter Christen
02d0b3172c Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2017-01-24 15:56:37 +01:00
Michael Peter Christen
d4f45cf05e added dc.date.modified and dc.date.created to date parser 2017-01-24 15:56:29 +01:00
luccioman
254060bda1 Index Browser : fixed display of "Count colors" for authorized users. 2017-01-24 11:49:15 +01:00
luccioman
96b7ddcef3 Updated French translation of HostBrowser.html 2017-01-24 11:38:56 +01:00
luccioman
c82c8351dd Fixed Index Browser page HTML validation errors and switched to HTML5.
Also removed deprecated HTML attributes uses.

Validation performed with Nu Html Checker 17.1.0.

Cross browser tested with :
 - Debian Jessie : Firefox ESR 45.6.0
 - MS Windows 10 : Firefox 50.1.0, Chrome 55.0.2883.87, MS Edge
2017-01-24 09:40:43 +01:00
reger
f9180fabc4 assure that RWI Index.Segment IODispatcher is not blocking on shudown
waiting on a semaphore permit.
see desc. http://mantis.tokeek.de/view.php?id=723
2017-01-24 01:51:28 +01:00
luccioman
826e5bbadd Documented /HostBrowser.html related configuration settings 2017-01-23 16:05:51 +01:00
luccioman
9adba36754 Fixed "-UNRESOLVED_PATTERN-" admin parameter in "load & index" links. 2017-01-23 14:54:37 +01:00
luccioman
4e2bc644cb Display Index Browser links requiring auth only when authenticated.
In the /HostBrowser.html page "only hosts with urls pending in the
crawler", "only with load errors" and "Administration Options" all
require administration credentials. But they were displayed even to
unauthenticated users, and clicking them did nothing and returned the
/HostBrowser.html page empty.
2017-01-23 14:49:02 +01:00
reger
e61ee180a7 Group all proxy settings on System Administration by adding settings of
UrlProxyAccss page (moved from deleted AugmentedBrowsing_p), adjust
submenu (remove Augmented Browsing) and translation files.
2017-01-22 23:58:46 +01:00
luccioman
39e081ef38 Fixed display of crawler pending URLs counts in HostBrowser.html page.
As described in mantis 722 (http://mantis.tokeek.de/view.php?id=722)

Also updated some Javadoc.
2017-01-22 12:31:14 +01:00
luccioman
870a5eae26 Removed temporary test main method commited by mistake. 2017-01-22 12:19:43 +01:00
reger
df80c57842 add ukr and pol to DCEntry.getLanguage ISO639-2 3-char language code
conversion to deliver uk, pl 2-char code
and use if else to return on match
2017-01-22 00:01:18 +01:00
reger
8d790ab783 delete outdated and unmaintained Netbeans project
Netbeans has good build-in maven support which is a supported and 
maintained build env, making special and additional NB setting obsolete.
2017-01-21 01:53:43 +01:00
reger
85cd19962f fix the missing solr-5.5.2.jar delete from prev. commit 2017-01-21 00:35:05 +01:00