Commit Graph

8743 Commits

Author SHA1 Message Date
Michael Peter Christen
36e616271b do better documentation on how to set a default password 2020-12-01 02:18:08 +01:00
Michael Peter Christen
df2bf9ef28 try to fix maven build error 2020-11-29 14:24:33 +01:00
Michael Peter Christen
264bab6700 trying to fight the UI unavaiability
this path addresses a possible issue with too many open connections to
remote peers
2020-11-29 14:15:34 +01:00
Michael Peter Christen
7947baeb49 removed all remaining deprecation warnings 2020-11-23 00:03:18 +01:00
Michael Peter Christen
c0f6d6e11d removed one deprecation warning for jetty library initializing ssl
server port
2020-11-22 23:27:58 +01:00
Michael Peter Christen
133440a7a6 some debug lines 2020-11-22 23:12:04 +01:00
sgaebel
3431f91db9 removes unused 'unused' tokens 2020-08-04 20:09:34 +02:00
sgaebel
fc03c4b4fe removes some warning and unused objects 2020-08-03 20:44:31 +02:00
sgaebel
4a495df63a removes some deprecation-warnings 2020-07-31 17:28:06 +02:00
sgaebel
dd9d4b1188 replace org.junit.Assert.assertThat by
org.hamcrest.MatcherAssert.assertThat from hamcrest 2.2 to avoid
deprecation-warning
2020-07-28 19:09:26 +02:00
sgaebel
df9ea0a42a removes some warnings: unused imports, params 2020-07-27 22:20:49 +02:00
sgaebel
9bc2297161 fixes deleting during recrawl 2020-07-22 22:15:00 +02:00
sgaebel
80785b785e adds deleting during recrawl 2020-07-09 19:32:16 +02:00
Michael Peter Christen
e0ad8ca9da replaced json library from JSON.org with libandroid-json-java
This fixes https://github.com/yacy/yacy_search_server/issues/347
2020-04-24 11:45:25 +02:00
Michael Peter Christen
ea8df27e95 modified org.json.* library to fit into the YaCy environment
as drop-in replacement.
Also made some fixes and enhancements to the library.
2020-04-24 11:42:06 +02:00
Michael Peter Christen
60dc1241a3 added org.json.* library
from https://android.googlesource.com/platform/libcore/+/refs/heads/master/json/src/main/java/org/json
as a preparation step for
https://github.com/yacy/yacy_search_server/issues/347
2020-04-24 10:28:43 +02:00
Michael Peter Christen
053e54a2c7 grand CORS for json files 2019-11-05 11:50:56 +01:00
Michael Christen
cfa27d2fd5 fixed links 2019-10-20 20:20:50 +02:00
Michael Christen
cb20aa7e54 removed donation message in search result column 2019-10-17 01:35:44 +02:00
Michael Christen
25227676ae removed some warnings 2019-09-28 02:07:08 +02:00
luccioman
6b45cd5799 New optional crawl filter on the URL a doc must match to crawl its links
For finer control over which parsed documents can trigger an addition of
their links to the crawl stack, complementary to the existing crawl
depth parameter.
2019-05-01 08:54:19 +02:00
luccioman
d16bc99835 Added "Show Metadata" links to the ViewFile.html links mode
To conveniently follow parsed links in the file viewer
2019-04-18 15:31:38 +02:00
luccioman
a5771b1f14 Made SNI extension user configurable without the need for server restart
TLS Server Name Indication (SNI) extension activation can now be
configured with the new Settings_p.html?page=httpClient administration
page.
SNI extension is also now enabled by default, as in 2019 the
unrecognized_name(112) alert is more properly handled by major web
servers TLS implementations, following the RFC 6066 standard.

Related YaCy issues : #153 #189 and #272
JDK 1.7 bug :
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374
Apache httpd issue :
https://bz.apache.org/bugzilla/show_bug.cgi?id=56241
RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3
2019-04-14 15:41:13 +02:00
luccioman
e90405b6f0 Support parsing audio URLs without file extension
Added also a Junit for the audio tag parser
2019-04-09 11:40:21 +02:00
luccioman
a8316c79da Allow JS resorting of search results by unauthenticated users
Acces rate limitations to this search mode by unauthenticated users are
set low by default to prevent unwanted server overload but can be
customized through the SearchAccessRate_p.html configuration page

Fixes #291
2019-04-03 14:21:53 +02:00
luccioman
0ab2b49c31 Made /yacysearch access rate limitations user configurable
With a new admin page at /SearchAccessRate_p.html in menu Network Access
> Local Search > Access Rate Limitations
2019-04-02 17:42:50 +02:00
luccioman
5b7e41202a Added Solr GSA writer support for responses from remote instances 2019-03-27 18:23:41 +01:00
luccioman
4d8a948455 Properly close PDF snapshots loaded with pdfbox library 2019-03-22 09:50:30 +01:00
luccioman
74e6d6e984 Added Solr GrepHTML writer support for responses from remote instances 2019-03-20 18:24:16 +01:00
luccioman
5e6501974d Added Solr snapshots writer support for responses from remote instances 2019-03-19 11:25:44 +01:00
luccioman
384c37102c Improve accuracy of total results count on latest pages in Stealth mode
Previously, when mixing results from local RWI and local Solr (Stealth
mode), total local Solr count could be ignored on last result pages,
when the page offset was higher than local Solr count but lower than
total RWI count.
2019-03-04 10:05:47 +01:00
luccioman
5e9a08355a Improved logging for federated search
- Do not use spaces in logger identifier name so the log level can be
configured in yacy.logging
- Hold the logger instance to avoid the logging system to look for it
from its name at each appended log message
2019-02-02 09:59:24 +01:00
luccioman
9782a98a9c Added the possibility to customize facets sort type and direction
Previously search navigators/facets elements were sorted only by counts.
Now from the ConfigSearchPage_p.html admin page, sort direction
(ascending/descending) and type (on counts or labels) can be customized
independently for each navigator.
2019-01-24 18:43:06 +01:00
sgaebel
c2398fd890 remove warnings: 'Statement unnecessarily nested within else clause' 2019-01-10 20:02:57 +01:00
sgaebel
811d40a6c4 taking care of closing inputstreams, HTTPClient 2019-01-04 18:58:49 +01:00
sgaebel
8d2e7262d9 Recrawl:
- set the chunksize to 100 to meet the max of the embedded solr
- re-enable sorting (the case where we switched it of should be away)
- enable recrawling on remote-solr
2019-01-04 18:46:59 +01:00
sgaebel
8f58c1dcfa extend the SolrServlet to be usable as remote solr (incl. update)
this feature needs to be enabled by uncomment the url-pattern
2019-01-04 18:27:44 +01:00
luccioman
7223a2fdb1 Removed usage of now deprecated Jetty function 2018-12-22 14:42:22 +01:00
luccioman
440d9f2fa0 Exclude peers with empty or disabled RWI from remote RWI search 2018-12-20 14:53:01 +01:00
luccioman
08ea0b0397 Added a configurable timeout to wkhtmltopdf calls for pdf snapshots
Necessary to prevent blocking the indexing workflow when some
wkhtmltopdf renderings fail without terminating
2018-12-11 22:31:31 +01:00
luccioman
3fb449b3b6 Properly resolve relative URLs against document URL in html base tags
Fixes issue #256
2018-12-06 20:18:00 +01:00
luccioman
73a6e45524 Extended detection of external tools used for Snapshots generation
This enable detecting wkhtmltopdf and Imagemagick convert executables
when they are at system Path in addition to common installation paths.
2018-12-06 09:53:08 +01:00
luccioman
7dc1f60619 Fixed detection of absolute data folder path on MS Windows 2018-11-18 10:08:20 +01:00
luccioman
595e144797 Trace a message on incomplete proper server finish when killing process 2018-11-15 17:32:22 +01:00
luccioman
9daeea823b Fixed concurrency issue on cache used for circles rendering
Without synchronization lock, concurrent rendering of images including
circles could lead to glitches as reported in issue #248
2018-11-10 22:00:49 +01:00
Michael Peter Christen
c347e7d3f8 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2018-11-08 14:42:52 +01:00
Michael Peter Christen
848e9304d9 evil bots may crawl harder 2018-11-08 14:42:40 +01:00
luccioman
a997133260 Fixed gzip decompression regression on index transfer APIs
Processing of gzip encoded incoming requests (on /yacy/transferRWI.html
and /yacy/transferURL.html) was no more working since upgrade to Jetty
9.4.12 (see commit 51f4be1).

To prevent any conflicting behavior with Jetty internals, use now the
GzipHandler provided by Jetty to decompress incoming gzip encoded
requests rather than the previously used custom GZIPRequestWrapper.

Fixes issue #249
2018-11-07 14:52:42 +01:00
luccioman
e85f231bdf Fixed termination of Host browser and link structure Solr query threads
On some conditions (especially when reaching timeout), concurrent Solr
query tasks used by the /HostBrowser.html and /api/linkstructure.json
never terminated, thus leaking resources, as reported by @Vort in issue
#246
2018-11-06 10:10:09 +01:00
luccioman
fcf6b16db4 Added new crawler attribute for finer control over Media Type detection
New "Media Type detection" section in the advanced crawl start page
allow to choose between :
- not loading URLs with unknown or unsupported file extension without
checking the actual Media Type (relying Content-Type header for now).
This was the old default behavior, faster, but not really accurate.
- always cross check URL file extension against the actual Media Type.
This lets properly parse URLs ending with an apparently odd file
extension, but which have actually a supported Media Type such as
text/html.

Sample URLs with misleading file extensions added as documentation in
the crawl start page.

fixes issue #244
2018-10-25 10:42:12 +02:00