Commit Graph

4080 Commits

Author SHA1 Message Date
luccioman
f01aac31fd Made possible to use https for remote search on peers with SSL enabled.
Default is still http to prevent any regressions, but a new setting is
available to choose https as the preferred protocol to perform remote
searches. 
New configuration setting 'remotesearch.https.preferred' is manually
editable in yacy.conf file or in Advanced Properties page
(/ConfigProperties_p.html).
Should be enabled as default in the future for improved privacy. 
Https could also eventually be used for other peers communications.
2017-11-24 14:10:41 +01:00
luccioman
e2f6427a63 Added a basic JUnit test for the Visio parser (vsdParser) 2017-11-22 09:06:16 +01:00
luccioman
1e9cdaabd4 Do locale neutral case conversion of HTML charset name.
Required to properly run on systems with default locale set to Turkish
language, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
2017-11-20 18:52:45 +01:00
luccioman
7206f1ed71 Do locale neutral case conversions on domain names.
Required to properly run on systems with default locale set to Turkish
language, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
2017-11-20 18:47:46 +01:00
luccioman
398c66f06c Do locale neutral case conversions in MultiProtocolURL
For any relevant URL parts : host name, URL scheme, session ids or
technical parts (see https://url.spec.whatwg.org/#url-writing and
https://tools.ietf.org/html/rfc3986 for current standard references).

Remaining locale sensitive conversion used for detection of URL word
components in urlComps() makes sense but using detected language would
be preferable than using the default system locale.
2017-11-20 15:23:33 +01:00
luccioman
9531b83598 Do locale neutral case conversions in Classification
Required for people using Turkish language as their default system
locale, as with this locale the 'i' character has different upper and
lower case flavors than with other locales.
2017-11-20 09:48:46 +01:00
luccioman
d22fc0d0a2 Updated lists of known sponsored and country-code TLDs.
Using current IANA reference list at
https://www.iana.org/domains/root/db .

As for previous update on known generic TLDs list, the generated URL
hashes on these domains stay the same but it improves performance of URL
hash computation for URLs on these domains.
2017-11-16 09:50:55 +01:00
luccioman
ac209cac2e Updated the generic top-level known domains list.
Using current IANA reference list at
https://www.iana.org/domains/root/db

The generated URL hashes on these domains stay the same but performance
is greatly improved as a DNS resolve request is required on URL hash
computation when the TLD part of the host name is unknown.

Hash computation mean time measured on 1541 sample URLs (one on each
TLD) and a computer with a DSL connection : about 230ms before change,
then only 20ms.
2017-11-14 09:42:09 +01:00
luccioman
938d8a9731 Added some JavaDoc 2017-11-14 09:24:13 +01:00
luccioman
e0eda84c24 Remove old hard-coded holiday dates from DateDection class.
Replaced with rules based relative to current year as already done for a
part of the supported dates.
2017-11-07 19:02:09 +01:00
luccioman
cb10daba92 Renamed Chinese & Greek lng files using ISO639-1 codes.
Previously named with their ISO 3166-1 country code : this way, when
setting language to "Browser" in ConfigBasic.html, it didn't work
properly when browser preferred language was Chinese or Greek as their
respective language codes are "zh" and "el" (not "cn" and "gr" which are
their country codes)
2017-11-04 11:06:05 +01:00
luccioman
46f37e38dc Customized Threads with generic name for easier monitoring. 2017-10-31 08:53:17 +01:00
luccioman
046be566e1 Updated a license header typo. 2017-10-30 07:38:47 +01:00
Apply55gx
3c905a2a5c fix typo 2017-10-27 14:00:30 +02:00
luccioman
8e732d437c Enable HTTP Digest authentication for non admin users.
Also ensure authentication is not lost by Digest timeout when navigating
between index.html and search results page.

This way, running searches with extended features on a remote peer or a
password protected peer works with a regular user (with "Extended
search" rights). 
When authenticating on the search page with a user without "Extended
search" rights, it appears as authenticated, but has just its usual
access to the public search features.
2017-10-26 07:51:18 +02:00
luccioman
d8eaf621cc Fixed blacklist returned location URL on empty parameters 2017-10-24 09:30:21 +02:00
luccioman
af198b990b Added an optional login link/status to the search public top nav bar.
Thus allowing a more convenient way (wihout the need to go to the admin
section) to login when searching on your remote or password protected
peer and benefit from extended search features such as Heuristics,
Bookmarking or JavasScript resorting.

Can be disabled using the ConfigSearchPage_p.html.
2017-10-21 10:57:36 +02:00
luccioman
1de86cf1bf Fixed JPEG snapshot resizing when running on OpenJDK.
Resizing JPEG snapshot images through /api/snapshot.jpg failed when
running on OpenJDK, but rendered successfully with a Oracle JDK.
Details in mantis 772 ( http://mantis.tokeek.de/view.php?id=772 ).

Removing any alpha component (useless in snapshot images) from the
rendered resized image solves the issue.
2017-10-19 09:27:52 +02:00
luccioman
a17a418e78 Fixed NullPointerException cases on snapshot images parsing. 2017-10-18 08:31:18 +02:00
luccioman
285f0d6a39 Consistently encode snapshot image with format requested on the API.
Previously, calling /api/snapshot.png rendered JPEG encoded images.
2017-10-18 07:53:07 +02:00
luccioman
34ca73d61b Fixed a NullPointerException case on images encoding errors. 2017-10-16 19:47:18 +02:00
luccioman
7c319c841e Fixed pdf2image conversion with imagemagick on PDFs having transparency
The target image format (jpeg) doesn't support transparency, so the
Html2ImageTest produced unusable black images when ran on a linux
machine having imagemagick package installed.
2017-10-16 19:45:17 +02:00
luccioman
6e497241f7 Properly close resources (even on error) on OS and ThreadDump classes.
Also updated some JavaDoc and main() function usage message on the same
ones.
2017-10-16 17:04:22 +02:00
luccioman
fe75f326d8 Fixed ProfilingGraph calculation integer overflows and added test class.
Complementary to fix proposed in PR #128 by @otteresk.
2017-10-16 09:18:12 +02:00
luccioman
5d1ef8fdfc Merge branch 'master' of https://github.com/otteresk/yacy_search_server 2017-10-16 09:01:34 +02:00
luccioman
8303e15419 Reduced number of search navigators refresh requests in JS resort mode
The SearchEvent listen to changes on each of its navigators, and the
information about their overall state is sent with each fetched search
item (as a "data-nav-generation" attribute). Then the browser can
regularly fetch a fresh version of yacysearchtrailer.html only if
necessary (when that nav-generation value change).
2017-10-12 07:16:19 +02:00
luccioman
dbff7b14fc Add a configurable limit to tags initially displayed in search results
When the limit is reached, a button allow expanding/collapsing remaining
tags.

When this feature is activated without a limit to the number of
displayed tags, when encountering search results with a very large
number of keywords, the results page can become almost unusable (very
long vertical scrollbar)
2017-10-09 14:13:46 +02:00
Andreas
0c4db9eef0 Merge pull request #3 from yacy/master
Fork update
2017-10-07 12:29:55 +02:00
reger
c31d94664a Update deprecated SolrInputDocument.addField() with boost value
remove unused SchemaConfiguration.getDate (as it is designed to return
only past dates which might be unexpected for general configuration schema)
2017-10-06 20:32:28 +02:00
luccioman
7e271f9cf5 Updated travis config : install ghostscript, required for Html2Image 2017-10-05 13:09:11 +02:00
luccioman
32c9dfa768 Added partial bzip2 stream parsing support and bzipParser Junit test 2017-10-04 18:33:09 +02:00
luccioman
dd9cb06d25 Fixed RWI distance calculation on multi words search queries.
Distance was lost when storing/retrieving references to intermediate
result container.

Now all JUnit tests are again successfully passing!
2017-10-04 08:41:43 +02:00
luccioman
6b11bf3a12 Fixed NullPointerException case on 'Browser' lang selection
Occurred when English was the only active language, then making the
ConfigBasic.html page unusable until manually modifying the
locale.language setting.
2017-10-02 09:36:13 +02:00
reger
ae1c675c85 fix array out of bounds in YJsonResponseWriter and OpensearchResponsWriter
on recreation of image url. 
Set parameter of indexList2protocolList to required number of images (image_stubs)
Situation e.g. image_stub(size=15) but images_protocol(size=12)
2017-10-02 02:51:10 +02:00
otter
73d1d577fd prevent integer overflow in chartDot for nodes with a big index 2017-09-30 11:58:49 +02:00
otter
4e2ccdfcac prevent integer overflow in chartLine 2017-09-30 00:48:54 +02:00
luccioman
27ab733685 Ensure private search features are not lost on Digest auth timeout
This is a fix for mantis 766 ( http://mantis.tokeek.de/view.php?id=766 )

Since the upgrade to Digest authentication, access to protected search
features was indeed disabled once the Digest nonce timed out.

After Digest auth timeout the browser no more sent authentication
information and as the search results page is not private, protected
features were simply be hidden without asking browser again for
authentication.

Adding a supplementary parameter when accessing the search results as
authenticated fixes this.
2017-09-29 19:18:12 +02:00
reger
ba60f65040 Adjust filetype: query modifier parameter to lower case
to prevent mismatch on user input with mixed case
Internally file extension are always compared lowercase.
2017-09-29 00:26:30 +02:00
luccioman
57a33aefb0 Removed unnecessary max counts init on empty search navigators. 2017-09-25 15:21:17 +02:00
luccioman
ef8aea7f8d Made the dates navigator max elements number user configurable.
Also used object properties on QueryParams instances, rather than using
mutable class (static) properties.
2017-09-25 09:19:08 +02:00
luccioman
9e86d183b8 Disable manual search results resorting when resorting is done with JS
Also added a constant for the js resorting setting key.
2017-09-13 07:58:05 +02:00
luccioman
66cb9c4ff9 Added Solr filter queries for audio, video and application domains
Inspired from the existing one used on image search, and consistent with
post filtering on content domain applied in SearchEvent.addNodes().

These filters are quite simplistic but at least audio, video or
application search now return results. Previously, when filtering on
these content domains, many results pages (and often even the first
page) were empty while the total results count suggested that results
should be available. This was because filtering on domain was only
applied AFTER requesting Solr indexes.
2017-09-08 11:16:37 +02:00
luccioman
5d3ceb31b7 Improved search navigators counters accuracy and consistency.
- added some missing increments from RWI results
- decrement relevant navigator counts when solr or RWI results are
evicted because duplicates detection or constraints checked belatedly
- do not compute facets when unnecessary to avoid unwanted CPU load
- do not increment from facets when already done
- do not rely on facets on remote solr peers requests, as most of the
time only a limited part of their total results if fetched (thus also
preventing unnecessary load on remote peers)
- use a concurrency friendly score map for the dates navigators to
prevent unwanted ConcurrentModificationExceptions

This improves the situation for the most obvious inconsistencies in
search navigators counts, but more has to be done for a true accuracy
(notably when query modifiers constraints are applied belatedly - after
the solr or RWI retrieval request - such as the content domain
constraint)
2017-09-06 16:58:40 +02:00
luccioman
8e4f31bdc7 Updated internal ISO 639-1 language codes with latest standards.
Includes 54 language code additions, some name modifications, and
marking a few deprecated.
2017-09-02 09:53:38 +02:00
luccioman
a28428047a Fixed count of filtered results from local solr.
Was inadequately modified in my previous related commits (making next
pages buttons unavailable in Search portal mode), as
SearchEvent.local_solr_available did not count the total filtered
results but only the ones within the currently fetched result page(s).
2017-08-31 11:24:59 +02:00
Michael Peter Christen
2f71005a93 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2017-08-30 23:51:44 +02:00
Michael Peter Christen
2314f8e358 try to fix problem
with error description
http://forum.yacy-websuche.de/viewtopic.php?f=5&t=6023&p=33889&sid=37bc7aa029422be571b9266cdef43c52#p33889
2017-08-30 23:50:14 +02:00
luccioman
3c9df6e0ce Use local solr filtered results in total search results count.
This modification has indeed low incidence as eventual query modifiers
are already applied when requesting the local solr index. 
It mainly impact doublons detected with results from remote peers.

Also updated javadocs for clarification.
2017-08-30 12:23:45 +02:00
luccioman
a1a0515312 Added a button to manually refresh sorting of p2p search results.
As a server-side oriented alternative to the JavaScript realtime
resorting feature proposed in PR #104.
The goal is the same as in this PR : having the possibility compensate
the network latency of various peers results fetching and obtain once
possible a consistently ranked result set.
2017-08-28 19:03:51 +02:00
luccioman
4eba88f2ff Removed some unnecessary uses of java.lang.reflect api.
This improves code browsing and readability, making search by references
or call hierarchy IDE features more accurate.
2017-08-24 18:47:18 +02:00