Commit Graph

13471 Commits

Author SHA1 Message Date
reger
c94bc82f6a upd to commons-compress-1.15 2017-12-16 00:49:48 +01:00
luccioman
c6e1befbca Restored peer URL host name stripping removed from previous commit.
Still useful for peers with IPv6 addresses.
2017-12-15 17:03:35 +01:00
luccioman
17e004599d Started implementing optional https preference for protocol operations
Introduced through the new configurable setting
network.unit.protocol.https.preferred, defaulting to false for now.

Let choose to prefer using https when available on remote peers to
perform YaCy protocol operations including notably hello or transferRWI.

Not yet implemented for every YaCy protocol operations.
2017-12-15 11:28:46 +01:00
luccioman
2bc61f5657
Merge pull request #149 from Scre13/bugfix_default_settings
Fixed loading default thread load setting in Performance Settings of Queues and Processes.
2017-12-13 07:38:04 +01:00
ScRe13
bb3d3fe074 fixed default loading default settings; load was populated with wrong value 2017-12-12 23:25:56 +01:00
reger
20bba135fe Show hide or show public surftip button depending on current config status,
to show the button to switch the status (hiding button of current status)
2017-12-10 01:25:20 +01:00
Michael Peter Christen
b907819cb4 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2017-12-09 22:29:54 +01:00
Michael Peter Christen
25573bd5ab added a crawl filter based on <div> tag class names
When a crawl is started, a new field to exclude content from scraping is
available. The field can be identified with the class name of div tags.
All text contained in such a div tag where the configured class name(s)
match are not indexed, while the remaining page is indexed.
2017-12-09 22:29:35 +01:00
luccioman
640fed2a9c Removed Java 1.8 no more necessary version checking (fixes issue #147)
Java 1.8 is by the way now a prerequisite to run from latest sources.
2017-12-08 15:26:46 +01:00
luccioman
d95b288f19 Removed use of deprecated Jetty IPAccessHandler for client filtering.
Upgraded to InetAccessHandler.
Added InetPathAccessHandler extension to InetAccessHandler to maintain
path patterns capability previously available in IPAccessHandler but
lost in InetAccessHandler.

Filtering on IPv6 addresses is now supported.

Support for deprecated pattern formats such as "192.168." and
"192.168.1.1/path" has been removed, but startup automated migration
should convert such patterns eventually present in serverClient.
2017-12-08 15:12:08 +01:00
reger
cc7a93e6b6 remove deprecated jetty continuation class from urlproxyservlet
(was a long time carry over, while not supporting async requests)
2017-12-08 01:01:07 +01:00
Michael Peter Christen
607b39b427 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
Conflicts:
	htroot/yacysearchitem.java
2017-12-07 15:25:41 +01:00
Michael Peter Christen
4355de0f3c (more!) evaluation of XRealIP from nginx reverse proxy 2017-12-07 15:16:11 +01:00
reger
e5b4799838 upd to Jetty-9.4.8.v20171121 2017-12-07 00:24:33 +01:00
luccioman
f9cba827c0 Made "tld:" modifier case insensitive and IDN complient.
Thus allowing typing internationalized top-level domains with non ASCII
characters as tld: modifier.
2017-12-04 19:13:16 +01:00
luccioman
a4494d6e01 Improved support for internationalized domain names on "site:" modifier
Allow typing directly internationalized domain names including non ASCII
characters in the search field. 
Search is done using the ASCII Compatible Encoding (ACE) representation.
2017-12-04 18:23:26 +01:00
luccioman
d07006bac4 Do locale independant case conversion on "filetype:" query modifier. 2017-12-04 14:11:29 +01:00
luccioman
8fbf25d1ed Made "site:" query modifier case insensitive. 2017-12-04 14:08:34 +01:00
luccioman
867388e05b Refactored 'site:' query modifier parsing into a dedicated function. 2017-12-04 13:58:15 +01:00
luccioman
c5c3cc1274 Use HTTP Post operation for resetting memory monitoring state.
Fixes issue #145

Also added textual hint on the button, and display it only when it makes
sense, that is to say when the memory state is 'exhausted'.
2017-12-04 08:48:37 +01:00
reger
0704b1d644 upd to httpcore-4.4.8 2017-12-04 01:12:50 +01:00
luccioman
bfe753acea
Merge pull request #144 from him2him2/_fic_HTTPS
Update HTTP -> HTTPS in README.md
2017-12-02 08:45:42 +01:00
luccioman
c9d80b5b77 Prefer fine URL match over approximate URL mask regex on final filtering
Also prevent adding a redundant and CPU costly Solr url mask filter
query when possible
2017-12-01 11:52:52 +01:00
luccioman
0a120787e3 Improved accuracy of URLs search filters : protocol, tld, host, file ext 2017-12-01 11:19:31 +01:00
luccioman
d1c7dfd852 Fixed URL parsing with fragment and empty path 2017-12-01 09:48:42 +01:00
luccioman
e07ef1b610 Apply tld query modifier on Solr host_s mandatory field.
The filter has thus much more chances to be effective than when applied
on the optional field host_dnc_s.
2017-12-01 08:46:46 +01:00
luccioman
478e92deff Fixed url mask filter generated when protocol modifier is not null 2017-11-30 20:21:45 +01:00
luccioman
29de4a65d7 Refactored url mask filter build from query modifiers
For better readability and easier unit testing.
2017-11-30 09:20:32 +01:00
reger
a1879115dc upd to Jsoup-1.11.2 2017-11-26 22:01:42 +01:00
reger
d5a75537e4 remove redundant setting of timeout for remoteinstance
and replace depreciated updatesolrclient instantiation with recommended builder
2017-11-26 02:53:51 +01:00
luccioman
f01aac31fd Made possible to use https for remote search on peers with SSL enabled.
Default is still http to prevent any regressions, but a new setting is
available to choose https as the preferred protocol to perform remote
searches. 
New configuration setting 'remotesearch.https.preferred' is manually
editable in yacy.conf file or in Advanced Properties page
(/ConfigProperties_p.html).
Should be enabled as default in the future for improved privacy. 
Https could also eventually be used for other peers communications.
2017-11-24 14:10:41 +01:00
Ronald Eddy Jr
97dff48abf Update HTTP -> HTTPS in README.md
URLs were updated to use HTTPS protocol in README.md.
2017-11-23 00:54:36 -08:00
luccioman
01dca12d05 Upgraded apache POI dependency from 3.16 to 3.17 2017-11-22 09:07:36 +01:00
luccioman
e2f6427a63 Added a basic JUnit test for the Visio parser (vsdParser) 2017-11-22 09:06:16 +01:00
luccioman
1e9cdaabd4 Do locale neutral case conversion of HTML charset name.
Required to properly run on systems with default locale set to Turkish
language, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
2017-11-20 18:52:45 +01:00
luccioman
d41ad7af6f Restore initial locale at the end of a JUnit test case which modify it. 2017-11-20 18:50:49 +01:00
luccioman
7206f1ed71 Do locale neutral case conversions on domain names.
Required to properly run on systems with default locale set to Turkish
language, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
2017-11-20 18:47:46 +01:00
luccioman
398c66f06c Do locale neutral case conversions in MultiProtocolURL
For any relevant URL parts : host name, URL scheme, session ids or
technical parts (see https://url.spec.whatwg.org/#url-writing and
https://tools.ietf.org/html/rfc3986 for current standard references).

Remaining locale sensitive conversion used for detection of URL word
components in urlComps() makes sense but using detected language would
be preferable than using the default system locale.
2017-11-20 15:23:33 +01:00
luccioman
9531b83598 Do locale neutral case conversions in Classification
Required for people using Turkish language as their default system
locale, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
2017-11-20 09:48:46 +01:00
luccioman
bab5f0485f Added signing key to developer releases location. 2017-11-17 11:09:55 +01:00
luccioman
d22fc0d0a2 Updated lists of known sponsored and country-code TLDs.
Using current IANA reference list at
https://www.iana.org/domains/root/db .

As for previous update on known generic TLDs list, the generated URL
hashes on these domains stay the same but it improves performance of URL
hash computation for URLs on these domains.
2017-11-16 09:50:55 +01:00
luccioman
ac209cac2e Updated the generic top-level known domains list.
Using current IANA reference list at
https://www.iana.org/domains/root/db

The generated URL hashes on these domains stay the same but performance
is greatly improved as a DNS resolve request is required on URL hash
computation when the TLD part of the host name is unknown.

Hash computation mean time measured on 1541 sample URLs (one on each
TLD) and a computer with a DSL connection : about 230ms before change,
then only 20ms.
2017-11-14 09:42:09 +01:00
luccioman
938d8a9731 Added some JavaDoc 2017-11-14 09:24:13 +01:00
luccioman
c32ac9c4c7 Updated log path in informative message of stop script.
As highlighted by @Lew-Rockwell-Fan in issue #140, the two log paths
mentioned by the stopYACY.sh script were inconsistent.
2017-11-14 09:17:43 +01:00
luccioman
8f07df5f85 Upgraded com.twelvemonkeys.imageio dependencies from 3.3.1 to 3.3.2 2017-11-09 09:30:20 +01:00
luccioman
fcd57e2d0f Improved some JUnit tests isolation and resources release
The modified tests were successfull when run manually from an IDE such
as Eclipse, but failed occasionnally when run with maven as part of the
overall test suite.
2017-11-08 09:33:30 +01:00
luccioman
e0eda84c24 Remove old hard-coded holiday dates from DateDection class.
Replaced with rules based relative to current year as already done for a
part of the supported dates.
2017-11-07 19:02:09 +01:00
luccioman
f61260c4c7 Upgraded icu4j dependency from 59_1 to 60.1 2017-11-06 09:37:44 +01:00
luccioman
73977ec0fe Added a html parser charset detection unit test 2017-11-06 09:14:03 +01:00
reger
d14c47d4d3 upd to pdfbox-2.0.8.jar 2017-11-05 00:52:14 +01:00