Commit Graph

6073 Commits

Author SHA1 Message Date
lifeofguenter
870319e769
Fix typo + remove dead seeds 2021-12-27 14:12:17 +01:00
ZeroCool940711
7e765b8483 Improved the Image search page to have bigger thumbnails, use a bigger area for results and a smaller left sidebar. 2021-12-26 23:41:04 -07:00
Michael Peter Christen
6fe905bb82 feature https://github.com/yacy/yacy_search_server/issues/434 2021-12-26 23:33:31 +01:00
Michael Peter Christen
9c38b1254e proper deletion of loadtime index 2021-12-22 01:22:46 +01:00
Michael Peter Christen
bd3f2483a1 replaced url and date retrieval by only url retrieval
This should prevent that the search index is used for freshnes of the
index entry.
2021-12-20 16:23:05 +01:00
Michael Peter Christen
163ba26d90 replaced check for load time method
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
2021-12-20 03:47:56 +01:00
sgaebel
cdf901270c always use HTTPClient by 'try with resources' pattern to free up
resources
2021-10-31 23:06:23 +01:00
Michael Christen
f4834e8e31
link fix 2021-10-29 11:10:23 +02:00
Michael Peter Christen
3c86b7b780 attempt to make a Mac Release using gradle
This is almost working with many workarounds:
- run rm lib/yacycore.jar
- run ./gradlew clean build bundleNative
- run ant clean all
- run again rm lib/yacycore.jar
- run ./fixMacBuild.sh

The build is then inside build/mac/YaCy.app

Right now this works so far but it does not have the correct release
number inside.

Target is to make this working for Windows releases and to embedd jre
entirely.
2021-10-25 18:37:39 +02:00
Michael Peter Christen
63ad8ce6b2 removed ymarks
had not been used since a long time
2021-09-16 22:23:51 +02:00
Michael Peter Christen
ef5a71a592 enhanced crawl start response time
for very very large crawl start lists
2021-09-16 21:01:01 +02:00
Michael Peter Christen
1bab4ffe20 calculating the correct size of an export.
This can be seen as a fix for
https://github.com/yacy/yacy_search_server/issues/343
however, the export was not flawed, it is just the impression that
something is wrong, but the export size must be smaller than the index
size because the index also containers error documents.
Now an information line is presented that shows i.e.:
"The local index currently contains 181,319 documents, only 106,887
exportable with status code 200 - the remaining are error documents."
2021-09-16 01:05:09 +02:00
admin
9b7668fa58 reduced memory footprint during indexing/crawling 2021-08-24 12:24:52 +02:00
Ian Smirlis
53518a91ab In case of reload404, load only failed documents 2021-08-19 20:49:59 +03:00
Michael Peter Christen
e6a87e0426 enhanced crawler
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
2021-08-17 15:23:21 +02:00
Michael Peter Christen
9182b3dfca enhanced default value 2021-08-05 09:18:05 +02:00
Michael Peter Christen
3959d43a5c fixed doku link 2021-08-03 16:57:24 +02:00
Michael Peter Christen
15b7461bc7 removed Xms java memory startup parameter
We will use the default value for now on.
This is much better for resource economy and fits better into a
container/docker/kubernetes strategy.
Furthermore, a small memory footprint is essential for the usage on
small devices like RaspberryPi.
2021-07-19 20:04:11 +02:00
Michael Peter Christen
4377bd2b70 fix for wrong crawlName construction 2021-06-30 18:03:54 +02:00
Michael Peter Christen
e81b770f79 enabled crawl starts with very large sets of start urls
i.e. 10MB large url list with approx 0.5 million start points
2021-06-30 10:45:58 +02:00
Michael Peter Christen
dbd211a1ad removed/replaced reflection in memory tool 2021-04-22 20:24:13 +02:00
Michael Peter Christen
1cdb21592b added hazelcast and some modifications to align legacy YaCy with
YaCyGrid
2021-04-15 20:39:22 +02:00
sgaebel
7fecd859e5 fixes showing metadata from Searchresult, by removing defType=edismax
also removes defType=edismax from IndexBrowser, but still does not show
dates
2021-03-21 00:06:26 +01:00
sgaebel
f16cd154f7 removes unused imports and variables 2021-03-20 15:14:09 +01:00
sgaebel
c69c462a15 replaces a expensive getLoadTimeURL() by exists()
refactors urlExists to getHarvestProcess as that is what it does
2021-03-20 15:01:31 +01:00
sgaebel
26223dc25a replaces getLoadTime() by exists() with a simpler query
since solr-8.8.1 getLoadTime() causes a high cpu usage
2021-03-20 01:06:02 +01:00
Michael Peter Christen
b46513f4a1 added stub of rc3assembly style
a little bit late but whatever
2021-02-09 20:30:10 +01:00
Michael Peter Christen
3da7628117 use environment variables to overwrite configuration variables
you can i.e. do:
export YACY_PORT=8092 && ./startYACY.sh
Just append "YACY_" to uppercase version of environment variables and
replace all "." with "_".
2021-02-09 20:26:49 +01:00
Michael Peter Christen
13a2e6dc6e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2021-01-25 11:49:32 +01:00
Michael Peter Christen
0ae8ccf657 Make it possible to set an empty password disabling the authentication
protocol completely
If you set now an empty password, then the http server will not ask to
authentify. This is required for environment where we attach an outside
authentification service like keycloak or similar using authentication
in an ingress proxy.
This change is part of the approach to run YaCy inside of a kubernetes
cluster where we do not want individual authentication of peers and want
to apply a ingress authentication.
2021-01-25 11:49:21 +01:00
Michael Peter Christen
96592a10cf added option to set yacy configuration values using environment
variables
To use that feature, set an environment variable with prefix "yacy." and
suffix identical to the yacy configuration attribute name.
Additionaly we implemented a way to set a peer name using the setting
"network.unit.agent". This can therefore now be used to set a peer name
with the java call parameter
-Dyacy.network.unit.agent=anonymous
The purpose for this feature is the ability to set peer names in
mass-deployed kubernetes clusters to the same name to prevent that we
are flooding peer name statistics with auto-deployment-generated names.
2021-01-24 22:50:37 +01:00
Michael Peter Christen
198826c362 added network scanner process to discover all YaCy peers in the intranet
this will be used to wire YaCy peers in a kubernetes cluster
2021-01-23 15:14:49 +01:00
Michael Peter Christen
d9602e8325 Implemented a new syntax in the template engine to simplify json APIs
Added also an example for one of the existing APIs. The problem is the
comma separator between objects which must not be there for the last
entry in a sequence. The new syntax adds the separator symbol
automatically.
2021-01-18 00:01:08 +01:00
Michael Peter Christen
5a7f12a9c1 allow network scans for non-standard http/https ports 2021-01-11 00:28:24 +01:00
Michael Peter Christen
022fb15670 fix for https://github.com/yacy/yacy_search_server/issues/385 2021-01-06 22:12:17 +01:00
Michael Peter Christen
17672fcbb4 adding hint how to shrink the disk size after an index deletion.
implements https://github.com/yacy/yacy_search_server/issues/360
2021-01-06 22:02:00 +01:00
Michael Peter Christen
907f121d0c do not overwrite PW with random PW 2020-12-29 20:18:25 +01:00
Michael Peter Christen
256fa3d985 new limitation documentation
just replaced two by four
2020-12-22 16:33:12 +01:00
Michael Peter Christen
7997836506 fixed lock image 2020-12-20 23:18:50 +01:00
Michael Peter Christen
d0abb0cedb enabling all crawl profiles in all network modes
also: increased default internet crawl speed to
4 urls/s/host
2020-12-19 01:00:51 +01:00
Michael Peter Christen
a9befbba5f Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2020-12-14 01:26:34 +01:00
Michael Peter Christen
fed8bd6325 automatically refresh css cache when switching skin
and setting of default skin to current skin in selector
2020-12-14 01:26:26 +01:00
Michael Peter Christen
9a5694261a design update
more space
2020-12-12 14:17:45 +01:00
Michael Peter Christen
4ec55289a8 using a lock symbol which looks also good in dark designs 2020-12-12 03:02:40 +01:00
Michael Peter Christen
43a9f4f574 updated solr 6.6.6 -> 7.7.3
dropped GSA support (GSA API is still in YaCy Grid)
The 6.6.6 solr index works without migration also with 7.7.3
2020-12-12 02:06:43 +01:00
Michael Peter Christen
c0d9a3e9a7 turned HostBrowser into a admin-only page, now called IndexBrowser
This was required because spiders and bots crawled through this page and
created load on the peer without use for the user or the YaCy network.
2020-12-11 00:50:52 +01:00
Michael Peter Christen
d359d521a1 fixed warc importer
The importer tried to import a gziped files as plain warc.
It will now check the file extension and use a unzip automatically
on-the-fly.
2020-12-10 11:19:25 +01:00
Michael Peter Christen
cef5fde343 adding message to UI to make port change transparent 2020-12-02 18:05:38 +01:00
Michael Peter Christen
22841ffbf1 creating a threaddump during every cleanup process
to be able to find out what a peer did (not) last time before a crash
2020-12-01 03:00:24 +01:00
Michael Peter Christen
d7b2d82faa showing MB instead of KB in PerformanceMemory 2020-11-22 23:02:49 +01:00
sgaebel
3431f91db9 removes unused 'unused' tokens 2020-08-04 20:09:34 +02:00
sgaebel
dd9d4b1188 replace org.junit.Assert.assertThat by
org.hamcrest.MatcherAssert.assertThat from hamcrest 2.2 to avoid
deprecation-warning
2020-07-28 19:09:26 +02:00
sgaebel
df9ea0a42a removes some warnings: unused imports, params 2020-07-27 22:20:49 +02:00
sgaebel
80785b785e adds deleting during recrawl 2020-07-09 19:32:16 +02:00
Michael Peter Christen
e0ad8ca9da replaced json library from JSON.org with libandroid-json-java
This fixes https://github.com/yacy/yacy_search_server/issues/347
2020-04-24 11:45:25 +02:00
Michael Peter Christen
6d7dc01670 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2020-04-22 13:14:43 +02:00
Michael Peter Christen
0a7bda2a21 removed JSON-evil license line
These classes had been my own creative work.
Just the copyright line had been appeared possibly due to a bad
copy-paste activity, unaware that the line is a non-free addition.
2020-04-22 13:14:26 +02:00
Michael Christen
57484eb1cc xss protection 2020-02-18 14:40:50 +01:00
Michael Peter Christen
37827b6788 removed doubes from getpageinfo 2020-01-16 21:09:42 +01:00
Michael Peter Christen
f03e16d3df enhanced crawl start url check experience
urls are now urlencoded and a check is also performed
in case that an url is copied into the url field using
copy-paste
2020-01-16 20:59:02 +01:00
Michael Christen
41f9b8517f Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2020-01-15 15:24:44 +01:00
Michael Christen
4ccd1ea3c0 new servlet path "p2p"
with a test class.
Call the class with
http://localhost:8090/p2p/seeds.json
2020-01-15 15:24:36 +01:00
Michael Peter Christen
f7c97fd99e scanner crawl starts wants non-parseable files 2019-12-29 01:21:39 +01:00
Michael Peter Christen
a20b61f5c0 fix for bad json 2019-11-06 17:28:11 +01:00
Michael Peter Christen
d62a8ec542 masking connects 2019-11-05 14:44:01 +01:00
Michael Peter Christen
5eb0033aef typo 2019-11-05 11:36:23 +01:00
Michael Peter Christen
2c0742fc43 added json version of peer list 2019-11-05 11:29:07 +01:00
Michael Christen
cfa27d2fd5 fixed links 2019-10-20 20:20:50 +02:00
Michael Peter Christen
0bddf2d895 switched url and snippet position 2019-09-28 23:16:23 +02:00
Michael Peter Christen
2999f4b985 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2019-09-28 22:11:22 +02:00
Michael Peter Christen
449780f762 enhanced search result design 2019-09-28 22:11:11 +02:00
Michael Christen
cdc7adedc2 added sponsor link 2019-09-28 21:27:22 +02:00
Michael Christen
f2d45ebb87 design updates + added link to new forum 2019-09-28 02:06:50 +02:00
Michael Peter Christen
789670bd8c design changes - more space 2019-09-26 23:44:04 +02:00
Michael Christen
3a46b07603 fixed many links to old forum, now https://searchlab.eu 2019-06-15 11:43:27 +02:00
luccioman
6b45cd5799 New optional crawl filter on the URL a doc must match to crawl its links
For finer control over which parsed documents can trigger an addition of
their links to the crawl stack, complementary to the existing crawl
depth parameter.
2019-05-01 08:54:19 +02:00
luccioman
d16bc99835 Added "Show Metadata" links to the ViewFile.html links mode
To conveniently follow parsed links in the file viewer
2019-04-18 15:31:38 +02:00
luccioman
8c068a9c99 Better HTML text semantics for technical descriptions 2019-04-18 14:23:00 +02:00
luccioman
a5771b1f14 Made SNI extension user configurable without the need for server restart
TLS Server Name Indication (SNI) extension activation can now be
configured with the new Settings_p.html?page=httpClient administration
page.
SNI extension is also now enabled by default, as in 2019 the
unrecognized_name(112) alert is more properly handled by major web
servers TLS implementations, following the RFC 6066 standard.

Related YaCy issues : #153 #189 and #272
JDK 1.7 bug :
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374
Apache httpd issue :
https://bz.apache.org/bugzilla/show_bug.cgi?id=56241
RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3
2019-04-14 15:41:13 +02:00
luccioman
42c8a251c8 Render a relevant message and status on blocked search requests
When unauthenticated (or with insufficient rights) client is blocked
either because blacklisted or excessive request rate, render an error
message and a relevant HTTP status for API requests, instead of an empty
response that appears broken.
2019-04-05 11:06:09 +02:00
luccioman
a8316c79da Allow JS resorting of search results by unauthenticated users
Acces rate limitations to this search mode by unauthenticated users are
set low by default to prevent unwanted server overload but can be
customized through the SearchAccessRate_p.html configuration page

Fixes #291
2019-04-03 14:21:53 +02:00
luccioman
0ab2b49c31 Made /yacysearch access rate limitations user configurable
With a new admin page at /SearchAccessRate_p.html in menu Network Access
> Local Search > Access Rate Limitations
2019-04-02 17:42:50 +02:00
luccioman
630fa0015a P2P/Privacy switch buttons support with JavaScript disabled 2019-03-15 17:46:23 +01:00
luccioman
74fd2f30fa Support for search result switch buttons with JavaScript disabled 2019-03-09 10:21:48 +01:00
luccioman
ebc583cdb2 Properly render the href attribute of the active page button 2019-03-09 08:28:39 +01:00
luccioman
093ea9586c Properly fill current page number to new server side pagination template
When current page is automatically reset to zero because of a new search
request.
2019-03-05 08:18:18 +01:00
luccioman
6e9d5f60ad Server side initial pagination links rendering
For better support of the search page usage with JavaScript disabled.
Reduces also the number of initial refreshes of the paginations links.

When JavaScript is enabled, pagination links are still regularly
refreshed until all the search feeds are terminated on server side.
2019-02-28 22:56:49 +01:00
luccioman
4b9cc4746d Upgraded Bootstrap dependency from v3.3.7 to v3.4.1
Non regressions tested on the following platforms :
Linux Debian Stretch :
 - Firefox 60.5.1esr
 - Chromium 72.0.3626.96

Windows 10 :
 - Firefox 65.0.1
 - Chrome 72.0.3626.109
 - Edge 25.10586.672.0
 - IE 11.1540.10586.0

Mac OS  :
 - Safari 11.0
2019-02-21 10:12:39 +01:00
luccioman
c617ea58a0 Render additional embedded audios from links on extended audio search 2019-02-08 12:23:20 +01:00
luccioman
69f1971052 Added basic controls to play all audio results.
Not displayed when JavaScript is disabled.
2019-01-30 18:43:13 +01:00
luccioman
9782a98a9c Added the possibility to customize facets sort type and direction
Previously search navigators/facets elements were sorted only by counts.
Now from the ConfigSearchPage_p.html admin page, sort direction
(ascending/descending) and type (on counts or labels) can be customized
independently for each navigator.
2019-01-24 18:43:06 +01:00
sgaebel
c2398fd890 remove warnings: 'Statement unnecessarily nested within else clause' 2019-01-10 20:02:57 +01:00
sgaebel
8d2e7262d9 Recrawl:
- set the chunksize to 100 to meet the max of the embedded solr
- re-enable sorting (the case where we switched it of should be away)
- enable recrawling on remote-solr
2019-01-04 18:46:59 +01:00
luccioman
60b520fb13 Cleaned up Spanish translation after merge of PR #238
* Fixed some indentation
* Removed untranslated entries
2018-12-20 15:02:07 +01:00
luccioman
cd72515188
Merge pull request #238 from ivanhercaz/esLang
[WIP] Spanish translation
2018-12-20 14:57:14 +01:00
luccioman
2f75e2d9c8 Fixed a case of NullPointerException on disconnected RWI data structure 2018-12-17 14:12:21 +01:00
luccioman
e85f231bdf Fixed termination of Host browser and link structure Solr query threads
On some conditions (especially when reaching timeout), concurrent Solr
query tasks used by the /HostBrowser.html and /api/linkstructure.json
never terminated, thus leaking resources, as reported by @Vort in issue
#246
2018-11-06 10:10:09 +01:00
luccioman
260ac11c65 Limit length of initially visible text in link structure graph nodes
To improve a bit readability of graphs having a large number of nodes.
2018-10-31 07:43:42 +01:00
luccioman
5a8d9abd8a Upgraded d3js dependency from 3.4.4 to 5.7.0 2018-10-28 10:07:46 +01:00
luccioman
9f8e1994a4 Added missing CSS width units to some HostBrowser.html styling 2018-10-26 09:11:23 +02:00