Commit Graph

418 Commits

Author SHA1 Message Date
luccioman
e6907fdab3 Added optional search parameter/setting to control content domain filter
Thus allowing to choose at configuration or per search request, whether
extending or not results beyond strict content domain filter (image,
video, audio or application).

Related graphical controls to be added to user interface.
2017-12-23 18:56:17 +01:00
luccioman
09c4ee56a7 Added optional https support for remote crawl and profile operations 2017-12-21 18:41:32 +01:00
luccioman
5db1c9155a Do locale independant case conversion on hosts, schemes, and file exts.
Required for proper operation when the default system locale is Turkish,
as dottless and dotted i characters have specific case conversion rules
in this language.
2017-12-19 13:52:05 +01:00
luccioman
1c4803e40a Enable optional https support for /yacy/transferURL API calls.
Also updated some Javadoc and consistently use Switchboard instance as a
constructor parameter where relevant.
2017-12-19 12:30:49 +01:00
luccioman
c6e1befbca Restored peer URL host name stripping removed from previous commit.
Still useful for peers with IPv6 addresses.
2017-12-15 17:03:35 +01:00
luccioman
17e004599d Started implementing optional https preference for protocol operations
Introduced through the new configurable setting
network.unit.protocol.https.preferred, defaulting to false for now.

Let choose to prefer using https when available on remote peers to
perform YaCy protocol operations including notably hello or transferRWI.

Not yet implemented for every YaCy protocol operations.
2017-12-15 11:28:46 +01:00
luccioman
f01aac31fd Made possible to use https for remote search on peers with SSL enabled.
Default is still http to prevent any regressions, but a new setting is
available to choose https as the preferred protocol to perform remote
searches. 
New configuration setting 'remotesearch.https.preferred' is manually
editable in yacy.conf file or in Advanced Properties page
(/ConfigProperties_p.html).
Should be enabled as default in the future for improved privacy. 
Https could also eventually be used for other peers communications.
2017-11-24 14:10:41 +01:00
luccioman
1de86cf1bf Fixed JPEG snapshot resizing when running on OpenJDK.
Resizing JPEG snapshot images through /api/snapshot.jpg failed when
running on OpenJDK, but rendered successfully with a Oracle JDK.
Details in mantis 772 ( http://mantis.tokeek.de/view.php?id=772 ).

Removing any alpha component (useless in snapshot images) from the
rendered resized image solves the issue.
2017-10-19 09:27:52 +02:00
luccioman
fe75f326d8 Fixed ProfilingGraph calculation integer overflows and added test class.
Complementary to fix proposed in PR #128 by @otteresk.
2017-10-16 09:18:12 +02:00
luccioman
5d3ceb31b7 Improved search navigators counters accuracy and consistency.
- added some missing increments from RWI results
- decrement relevant navigator counts when solr or RWI results are
evicted because duplicates detection or constraints checked belatedly
- do not compute facets when unnecessary to avoid unwanted CPU load
- do not increment from facets when already done
- do not rely on facets on remote solr peers requests, as most of the
time only a limited part of their total results if fetched (thus also
preventing unnecessary load on remote peers)
- use a concurrency friendly score map for the dates navigators to
prevent unwanted ConcurrentModificationExceptions

This improves the situation for the most obvious inconsistencies in
search navigators counts, but more has to be done for a true accuracy
(notably when query modifiers constraints are applied belatedly - after
the solr or RWI retrieval request - such as the content domain
constraint)
2017-09-06 16:58:40 +02:00
luccioman
4eba88f2ff Removed some unnecessary uses of java.lang.reflect api.
This improves code browsing and readability, making search by references
or call hierarchy IDE features more accurate.
2017-08-24 18:47:18 +02:00
reger
4979439e87 Skip public post of jre version.
Added to determine switch to java8  596b5dfa59
2017-08-06 23:41:53 +02:00
reger
d1b23afed6 Remove obsolete Protocol parameter ttl (time to live)
not interpreted in target yacy/query.html
also Protocol.querySeed() not used and parameter not interpreted in 
target servlet yacy/query.html
2017-08-01 00:59:53 +02:00
reger
15d78b1064 Replace deprecated getIP with getIPs in Protocol transferURL() and
getProfile().
Remember used ip for error handling and departInterface
2017-07-31 01:55:01 +02:00
reger
ed36b47bec Replace one more deprecated peerDeparture in Protocol.transferIndex()
by moving/using interfaceDeparture() in transferRWI()
2017-07-30 23:02:15 +02:00
luccioman
dcc56318bb Made remote search max system load limits configurable from UI.
As reported by davide on YaCy forums (
http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6004 ) when the
system is on high load, unless reading carefully YaCy configuration
file, it could be difficult to understand why remote search results are
not fetched.
2017-06-30 11:30:54 +02:00
luccioman
8da3174867 Ensure lower case conversion consistency with any default locale.
Especially for Turkish speaking users using "tr" as their system default
locale : strings for technical stuff (URLs, tag names, constants...)
must not be lower cased with the default locale, as 'I' doesn't becomes
'i' like in other locales such as "en", but becomes 'ı'.
2017-06-27 06:42:33 +02:00
luccioman
8399275142 Properly close file output streams even on exceptions scenarios. 2017-06-08 07:19:16 +02:00
Michael Peter Christen
7de7879f13 added a cache to prevent too many seed enumerations 2017-05-18 00:28:00 +02:00
luccioman
0f0f42b509 Added some JavaDoc 2017-05-11 08:33:19 +02:00
reger
a2afb4bae0 add switchboardconstants for server ports config keys 2017-03-18 20:02:26 +01:00
luccioman
1857651988 Added a new Debug/Analysis advanced settings subsection.
As discussed in PR #93 with @JeremyRand and @reger24 this new advanced
settings page includes:
 - a new setting to control remote Solr responses encoding
 - some existing debug settings which could not be set through the admin
user interface
2017-02-09 11:05:06 +01:00
luccioman
def55ec166 Improved termination of timed out remote solr requests to peers.
On timeout, closing remote Solr requests is proper than simply using
Thread.interrupt() that is not effective in most cases. Closing does not
ask commit on remote solr, but release http connections resources and is
more likely to end those threads that can else wait indefinitely.

Other related improvements included :
 - no more marking remote peer as not available when remote search is
interrupted before timeout by the cleanup job.
 - added a short fine log level trace of failing remote solr requests
2017-02-06 12:41:24 +01:00
luccioman
e048e74072 Added an optional parameter to webstructure.xml api.
This new "documentStructure" parameter can be set to false to only get
hosts accumulated references on a resource and thus prevent scraping the
specified URL and getting citations references.

Also set WebStructureGraph constants as final and updated the Javadoc
with example api call URLs.
2017-01-19 12:30:44 +01:00
luccioman
5c8958bcea Updated Javadoc and Junit tests for the WebStructureGraph class. 2017-01-17 17:01:56 +01:00
luccioman
d9766ca981 Fixed WatchWebStructure_p.html render to include https URLs.
As described in mantis 721 (http://mantis.tokeek.de/view.php?id=721)
WatchWebStructure_p.html failed to include in its structure view https
and other protocols and ports than default http.
2017-01-16 18:41:58 +01:00
luccioman
ed3dd5e31a Fixed webstructure.xml API used with a domain name 'about' parameter.
As described in mantis 720 (http://mantis.tokeek.de/view.php?id=720),
when requesting this API with a domain name instead of a complete URL
only HTTP references on default port were listed.
2017-01-16 16:41:06 +01:00
luccioman
0da1e6ba16 Factored code re-implementing DigestURL.hosthash() method.
This ensure consistent implementation of the url host hash generation
and easier usage finding in source code.

Also added a unit test for this function.
2017-01-16 10:18:42 +01:00
luccioman
86adfef30f Added automated unit tests and perfs test for WebStructureGraph class.
Fixed references count when multiple links target the same domain name
in one document.
2017-01-13 16:10:59 +01:00
luccioman
9cea7cbb10 Detailed some Javadoc related to /api/webstructure.xml usage. 2017-01-12 17:52:47 +01:00
reger
c702eb6786 del dead menu link to /repository
(directory not created in current distribution -> old)
2016-12-17 02:38:52 +01:00
luccioman
1ba705c23d Use loaderDispatcher instead of HTTPClient to download releases.
The default redirection strategy when using directly HTTPClient is
incorrect when redirection is cross host (the original Host header is
still sent when requesting the redirected location).

YaCy LoaderDispatcher handles redirections properly, thus release
archive files using redirected URLs (such as the URLs on a GitHub
Release page) are successfully downloaded.
2016-12-16 20:38:54 +01:00
luccioman
467650c042 Hardened system update checks.
When a downloaded archive release is corrupted, empty, or can not be
opened for any reason, the update script must not be launched because it
erases the existing lib/*.jar libraries.
2016-12-16 11:03:09 +01:00
luccioman
00e81fcc15 Check HTTP status when downloading a release, and report eventual error. 2016-12-15 15:30:36 +01:00
luccioman
3092a8ced5 Fixed thread name consistency for improved monitoring.
Some tasks were modifying the current thread name without restoring it
once finished as it is effectively done elsewhere.
2016-11-23 17:59:52 +01:00
reger
3c7220bc7b Refacture rwi reference word position and word distance calculation
used for rwi ranking.
Main changes:  
- introduce a  posintext() to access the stored value. This reduces also mem alloc of position array for WordReferenceRow (index access)
- use the positions() array for joined references on multi-word queries if needed (otherwise allow positions() to be null
- adjust assignments and the min() max() and distance() calculation accordingly
2016-10-23 19:40:02 +02:00
luccioman
f0639d810c Customized name for Threads still using the default "Thread-n" pattern.
This makes threads monitoring easier to read.
2016-10-22 17:17:21 +02:00
reger
3861ac9293 upd maven dependency-check plugin to reflect changes of https://nvd.nist.gov
+ upd unknown ant script with current lib/jsch version
2016-10-04 03:05:26 +02:00
reger
585d2a6441 test case: for NewsPool to check the id modificator (for unique id)
and observe the distribution order .. hands on.
+ add test/DATA to gitignor
2016-09-20 01:55:56 +02:00
reger
e990297d2e avoid NPE on hello message with missing "yourip" key
http://mantis.tokeek.de/view.php?id=684
2016-09-15 23:26:25 +02:00
reger
e51ab8c7aa hack to generate a unique message-id for messages created in the same second
by optionally add a 1 second offset counter to the current time (which is
used as the unique id part)
2016-09-15 02:59:32 +02:00
Michael Peter Christen
5e165a8150 removed unused imports 2016-09-06 18:46:24 +02:00
reger
ebf818ad95 log a error on aborted news publish (due to duplicate news.id)
+ change printed err msg to log entry in PeerAction.processPeerArrival
2016-09-04 06:42:48 +02:00
reger
4386e84b55 correct NewPool rentention calculation
(was still clearing everything after one day)
2016-08-31 02:24:30 +02:00
reger
9462a32244 Added news service for easy, community driven UI translation support.
New or modified translation (via /Translator_p.html) can be shared/distributed
via the YaCy internal news service. Remote peers can see and vote on the
translation via the new http://localhost:8090/TransNews_p.html servlet.
A positive vote will add the received translation to the local translation
list and post a voting message to the news service.
(at this no processing of received votings is implemented)

+ fixed the msg service retention time check (NewsPool.automaticProcessP)
2016-08-29 02:15:06 +02:00
Michael Peter Christen
7466d390b2 small refactoring + do not accept too old peers during bootstrap 2016-07-04 11:02:15 +02:00
reger
41c36ffd75 exclude rejected results from result count
(by using the resultcontainer.size instead of input docList.size)
skip waiting for write-search-result-to-local-index
  (by removing the Thread.join - which will bring a small performance increase)
2016-06-26 06:46:26 +02:00
reger
f23d8ab47b fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP()
returning null in intranet mode (in servlets: ConfigSearchBox, Load_PHPBB3
+remove unused (const ∅) seed.IPTYPE
2016-05-29 20:35:57 +02:00
reger
6384b7d82e fix NPE in Load_MediawikiWiki servlet in intranet mode
- in intranet mode getip returns null causing a NPE
  - adjust starturl (which was set to http://localip/repository) which is never the start url for the Mediawiki
+ correct javadoc for seed.getIP()
2016-05-27 03:10:25 +02:00
Michael Peter Christen
596b5dfa59 add the JRE version in the seed. Purpose: identify if it is possible to
migrate to new JRE version
2016-05-24 23:11:59 +02:00