Commit Graph

13108 Commits

Author SHA1 Message Date
luccioman
6f49ece22f Fixed redirected URLs processing as crawl start point.
See mantis 699 (http://mantis.tokeek.de/view.php?id=699) for details.
2016-10-20 12:12:26 +02:00
reger
68217465fe div by null in word distance calculation
(again, description in http://mantis.tokeek.de/view.php?id=698)
as root cause was not seen, added just workaround reducing in favour over a 
try catch (for easier followup).
2016-10-19 22:55:36 +02:00
luccioman
7263d17436 Removed mentions of deprecated LURL-db.
Thanks to LA_FORGE asking about if on YaCy forum (
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5895 )
2016-10-19 14:56:25 +02:00
luccioman
c3c4a52408 Added more examples in Blacklist JUnit test. 2016-10-19 13:14:20 +02:00
reger
8b74a6bf57 fix min/max calculation of WordReferenceVars.distance()
Issue was the calculation in AbstractReference with positions.clear() call,
this made distance result always 0 (distance needs min 2 positions) and created concurrency issues.
+ unit test of changes
2016-10-17 23:58:28 +02:00
luccioman
da362628fb Added fine log level for too long blacklist matching processing. 2016-10-17 22:32:19 +02:00
reger
aaae7c6462 adjust ConcurrentScoreMap internal value map to interface and use parameter
Long -> Integer (saves some bytes)
2016-10-16 06:31:48 +02:00
reger
31d2a5645e remove obsolete query variable
leftover from 8fb370d9f8 (diff-1d4259005ebfddc11083387857a86175)
harmonize ranking shift parameter to 0xFF
correct addresult weight parameter to long
2016-10-15 19:29:19 +02:00
luccioman
93ea366778 Updated license header file name 2016-10-15 11:34:50 +02:00
luccioman
4c0be4d5d4 Fixed maven compilation error
Removed unit test yacysearchitemTest from default maven Junit tests
path, as yacysearchitem class is not in maven build classpath.
2016-10-15 11:34:23 +02:00
reger
ba77e8f8ec upd to Jetty 9.2.19 2016-10-15 05:23:18 +02:00
luccioman
a588ed7628 Applied image headers customization to the new ViewFavicon servlet. 2016-10-14 14:05:38 +02:00
luccioman
d16e57b41e Merge pull request #39 from luccioman/master
Favicon retrieval and image preview enhancements.
More details on mantis 629 (http://mantis.tokeek.de/view.php?id=629)
2016-10-14 12:00:39 +02:00
luccioman
7717a3d43d Fixed license headers on files created to improve favicon management. 2016-10-14 11:55:49 +02:00
luccioman
6e1959f469 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
Conflicts:
	htroot/yacysearchitem.java
	source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java
	source/net/yacy/search/schema/CollectionConfiguration.java
	source/net/yacy/server/serverObjects.java
2016-10-14 11:29:55 +02:00
luccioman
7136b1ad60 HTML validation : fixed URL encoding of Pictures link. 2016-10-14 09:58:14 +02:00
reger
407563b9f0 add lock symbol to messages UI Trans menu item 2016-10-14 02:36:35 +02:00
reger
685d8e86bf Avoid frequent data type casting (float/long) for rwi score
refactor to using long in URIMetadataNode too (and related call parameters)
As remote rwi score's are not used (since v1.83) skip reading float-score ,
but keep in toString() for communication with older versions.
2016-10-14 01:17:34 +02:00
luccioman
3ccd89e274 Fixed MultiProtocolURL.resolveBackpath to handle remaining '..' segments 2016-10-13 16:18:24 +02:00
luccioman
f1f4459f88 Added some unit tests for Blacklist.isListed() 2016-10-13 15:39:47 +02:00
luccioman
4b699c469a Blacklist refactoring : extracted a function for easier unit testing 2016-10-13 15:33:31 +02:00
luccioman
54cfcc3f56 CrawlCheck_p.html : also display info about disallowed URLs. 2016-10-12 11:26:59 +02:00
luccioman
8b341e9818 Robots : properly handle URLs including non ASCII characters
This fixes GitHub issue 80 (
https://github.com/yacy/yacy_search_server/issues/80 ) reported by
Lord-Protector.
2016-10-12 11:25:36 +02:00
luccioman
75bb77f0cb Refactoring : extracted a method to handle authorized action links. 2016-10-12 09:31:42 +02:00
luccioman
c996b04741 HTML validation : fixed URL encoding of search results action links. 2016-10-12 09:16:47 +02:00
luccioman
2b81703828 Refactored search result action links construction.
These are long URLS with common parts : it is valuable to build the
common parts only one time.
2016-10-12 08:45:32 +02:00
reger
e68b00678e prevent negative score on URIMetadataNode - in the special case were no
solr score is supplied.
+ assert before use & test case
2016-10-11 19:54:50 +02:00
luccioman
242707f9b4 Fixed loadFromCache with strategy IFFRESH.
This fixes mantis 695 ( http://mantis.tokeek.de/view.php?id=695 ) :
crawl start with 'Link-List of URL' option on websites using cookies.
2016-10-10 01:10:35 +02:00
reger
c778219768 remove module for swfparser from maven parent pom
not longer required for the build
see a4465c97d6
2016-10-07 23:49:03 +02:00
luccioman
094aed8664 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2016-10-07 11:06:34 +02:00
luccioman
c7402a2f89 Removed invalid empty form action.
A form action URL must not be empty (see
https://www.w3.org/TR/html/sec-forms.html#element-attrdef-form-action ).
No action attribute has the same effect (relaunching the same GET
action) but is valid HTML.
2016-10-07 10:57:31 +02:00
luccioman
37df2e19fd Removed xmlns attribute which no more makes sense in HTML5 pages. 2016-10-07 10:46:20 +02:00
luccioman
94924e288f Added some accessibility improvements to the main interface.
Tested with NVDA screen reader.
2016-10-07 10:44:45 +02:00
luccioman
dd86f7c44e Fixed HTML validation errors and grouped radios options in fieldsets 2016-10-07 10:43:06 +02:00
luccioman
fc0c72c84b Switched to the short HTML Doctype
This pages were already no more XHTML 1.0 because made use of the HTML5
syntax and elements.
Applied current (2016) HTML standard recommended Doctype declaration
(see https://www.w3.org/TR/html/syntax.html#the-doctype ).
2016-10-07 10:42:23 +02:00
reger
7c81160f45 correct blacklist export as text url to blacklists_p.txt
was using servlet for network access and missing network.unit.name
fix for http://mantis.tokeek.de/view.php?id=694
+ prevent unresoved_pattern in yacy/list servlet
2016-10-07 03:03:41 +02:00
reger
b752bcfecb adjust date in text detection to ignore some program version strings
like "3.1.2.0102" see http://mantis.tokeek.de/view.php?id=650
+ expand test case
2016-10-06 23:37:12 +02:00
reger
b017e97421 optimize condenser language detection a little.
langdetect probabilities take letter case into account, add words from
description and anchors etc. as is.
+ add it to javadoc
2016-10-06 19:03:52 +02:00
reger
ae3717d087 adjust Tokenizer sentence count to ignore repeated punktuation (like !!!! )
+ remove unused sentenceword map (we use only the count)
+ upd test case for sentence count
2016-10-06 03:41:07 +02:00
luccioman
b5eb7a9217 Removed unnecessary crawlingDomFilterDepth hidden field.
It had incorrect "-UNRESOLVED_PATTERN-" value (see  second part of
mantis 691 http://mantis.tokeek.de/view.php?id=691 )

Note : crawlingDomFilterDepth is apparently unused in current (2016)
YaCy code-base. It was also unnecessary because crawlingDomFilterCheck
hidden field is set to "off".
2016-10-05 13:48:22 +02:00
luccioman
f6d7c6ee1f Fixed Recorded action URLs beginning displayed in /Table_API_p.html
Removed scheme, host and port from URL to avoid dealing with http/https,
external host and port retrieving issues.

What's more, this is consistent with how URL are displayed in
/Tables_p.html?table=api&count=100&reverse=on&search= or
Tables_p.xml?table=api&count=100&search=

This fixes mantis 691 first part
(http://mantis.tokeek.de/view.php?id=691)
2016-10-05 12:20:37 +02:00
reger
474f0476c6 adjust Tokenizer sentence count on trailing text after last recognized sentence
+ upd test case for rwi multi-word-query  (leaving results known to fail untested)
2016-10-05 05:52:37 +02:00
luccioman
34658ddb9b Merge pull request #76 from luccioman/crawler
Crawl monitoring : refresh running crawls table
2016-10-04 05:06:18 +02:00
luccioman
0065c9b9ea Crawl monitoring : refresh running crawls table
Fix mantis 690 ( http://mantis.tokeek.de/view.php?id=690 ). 
Tested on :
- MS Windows 10 : Edge, Firefox 49, Chrome 53
- Debian Jessie : Firefox ESR 45
2016-10-04 03:56:03 +02:00
luccioman
e1e632ad84 Switched to the short HTML Doctype
This page was already no more XHTML 1.0 as it makes use of the HTML5
<progress> element.
Applied current HTML standard recommended Doctype declaration (see
https://www.w3.org/TR/html/syntax.html#the-doctype ).
2016-10-04 03:56:02 +02:00
luccioman
4d8611e5e7 Tables accessibility : added missing <thead> sections. 2016-10-04 03:56:02 +02:00
luccioman
9fb3142317 Restricted variables scope to function handleStatus() in Crawler.js
Missing 'var' in declaration was unnecessarily giving global scope to
these variables.
2016-10-04 03:56:02 +02:00
reger
3861ac9293 upd maven dependency-check plugin to reflect changes of https://nvd.nist.gov
+ upd unknown ant script with current lib/jsch version
2016-10-04 03:05:26 +02:00
reger
681a61dafb adjust rwi index result word position handling used for rwi ranking
- correct WordReferenceVars.toRowEntry posintext parameter
to set expected min posintext (the difference is on multi-word queries,
while positions are ordered by search word order).
- modified posofphrase/posinphrase join operation
 - to set min posofphrase
 - and keep posinphrase if not same posofphrase (was set to 0, no differentiation during ranking)
+ fix compiler msg (missing type declaration)
2016-10-04 01:42:18 +02:00
reger
14f7577231 add support for older Word versions (Word6/Word95) to docParser 2016-10-03 01:52:51 +02:00