Commit Graph

464 Commits

Author SHA1 Message Date
luccioman
2f75e2d9c8 Fixed a case of NullPointerException on disconnected RWI data structure 2018-12-17 14:12:21 +01:00
luccioman
88d0ed676c Render http status instead of null responses on snapshot api errors 2018-10-19 10:12:10 +02:00
luccioman
746e0e788d Render a relevant HTTP status code on snapshot image rendering error
Instead of a null response body which is not very helpful.
2018-10-14 10:30:30 +02:00
luccioman
79bd9f623a Updated YaCy home page embedded links from http to https scheme 2018-05-22 17:46:12 +02:00
luccioman
addd18c993 Removed some remaining uses of deprecated Seed.getIP() 2018-04-26 09:39:30 +02:00
luccioman
0a058ba6af Keep https in result message URL when push_p API is requested over https 2018-04-24 08:05:17 +02:00
luccioman
dbf4c1cd76 Improved blacklist entries editing operations :
- Fixes issue #160 : handle properly syntax exceptions with a user
friendly message
- Fixes loss of information on multiple blacklist entries editions
- Fixes loss of entries when moving entries from one list to another
2018-02-13 18:24:26 +01:00
luccioman
5db1c9155a Do locale independant case conversion on hosts, schemes, and file exts.
Required for proper operation when the default system locale is Turkish,
as dottless and dotted i characters have specific case conversion rules
in this language.
2017-12-19 13:52:05 +01:00
luccioman
1de86cf1bf Fixed JPEG snapshot resizing when running on OpenJDK.
Resizing JPEG snapshot images through /api/snapshot.jpg failed when
running on OpenJDK, but rendered successfully with a Oracle JDK.
Details in mantis 772 ( http://mantis.tokeek.de/view.php?id=772 ).

Removing any alpha component (useless in snapshot images) from the
rendered resized image solves the issue.
2017-10-19 09:27:52 +02:00
luccioman
a17a418e78 Fixed NullPointerException cases on snapshot images parsing. 2017-10-18 08:31:18 +02:00
luccioman
285f0d6a39 Consistently encode snapshot image with format requested on the API.
Previously, calling /api/snapshot.png rendered JPEG encoded images.
2017-10-18 07:53:07 +02:00
luccioman
4eba88f2ff Removed some unnecessary uses of java.lang.reflect api.
This improves code browsing and readability, making search by references
or call hierarchy IDE features more accurate.
2017-08-24 18:47:18 +02:00
luccioman
3f0446f14b Ensure proper synchronous robots entry retrieval on first check.
Previously, when checking for the first time the robots.txt policy on a
unknown host (not cached in the robots table), result was always empty
in the /getpageinfo_p.xml api and in the /CrawlCheck_p.html page. Next
calls returned however the correct information.
2017-08-16 09:30:33 +02:00
reger
a21789d4e7 Fix unresolved pattern in api/share.html by init some display var's 2017-07-08 22:46:15 +02:00
luccioman
bf55f1d6e5 Started support of partial parsing on large streamed resources.
Thus enable getpageinfo_p API to return something in a reasonable amount
of time on resources over MegaBytes size range.
Support added first with the generic XML parser, for other formats
regular crawler limits apply as usual.
2017-07-08 09:04:03 +02:00
luccioman
8da3174867 Ensure lower case conversion consistency with any default locale.
Especially for Turkish speaking users using "tr" as their system default
locale : strings for technical stuff (URLs, tag names, constants...)
must not be lower cased with the default locale, as 'I' doesn't becomes
'i' like in other locales such as "en", but becomes 'ı'.
2017-06-27 06:42:33 +02:00
luccioman
0f80c978d6 Limit the number of initially previewed links in crawl start pages.
This prevent rendering a big and inconvenient scrollbar on resources
containing many links.
If really needed, preview of all links is still available with a "Show
all links" button.

Doesn't affect the number of links used once the crawl is effectively
started, as the list is then loaded again server-side.
2017-06-17 09:33:14 +02:00
luccioman
cbccf97361 Added JavaDoc to the getpageinfo_p API servlet. 2017-05-30 17:38:16 +02:00
luccioman
bd88fd303e Deprecated duplicated and internally unused getpageinfo servlet.
Redirections set for the transition of any eventual external uses:
 - /api/getpageinfo.xml to /api/getpageinfo_p.xml
 - /api/getpageinfo.json to /api/getpageinfo_p.json
2017-05-30 09:29:28 +02:00
reger
a2afb4bae0 add switchboardconstants for server ports config keys 2017-03-18 20:02:26 +01:00
reger
334c70c37a correct fromDate init value on missing param in api/timeline_p servlet
revert test modification from last commit in AccessTracker.main
2017-02-20 00:14:14 +01:00
luccioman
e048e74072 Added an optional parameter to webstructure.xml api.
This new "documentStructure" parameter can be set to false to only get
hosts accumulated references on a resource and thus prevent scraping the
specified URL and getting citations references.

Also set WebStructureGraph constants as final and updated the Javadoc
with example api call URLs.
2017-01-19 12:30:44 +01:00
luccioman
17b7c92009 Made sure webstructure.xml API produces valid XML.
Host names should not contain XML special characters such as quotation
mark, but at this stage the WebGraph may have mistakenly recorded a host
name with such characters. What's more the DigestURL constructor does
not prevent this.
By the way using serverObjects.putXML to encode host names we ensure
here the rendered XML is well formed and can be parsed by external tools
even if an structure entry is incorrect.
2017-01-17 15:59:55 +01:00
luccioman
ed3dd5e31a Fixed webstructure.xml API used with a domain name 'about' parameter.
As described in mantis 720 (http://mantis.tokeek.de/view.php?id=720),
when requesting this API with a domain name instead of a complete URL
only HTTP references on default port were listed.
2017-01-16 16:41:06 +01:00
luccioman
f793d97e56 Factored common code with DigestURL.hosthash() 2017-01-13 16:05:46 +01:00
luccioman
9cea7cbb10 Detailed some Javadoc related to /api/webstructure.xml usage. 2017-01-12 17:52:47 +01:00
reger
c50e23c495 reduce creation of empty legacy RequestHeader() in situation where null
is acceptable (less for garbage collection).
2016-12-18 02:38:43 +01:00
reger
f45945cada increase use of header const for custom "EXT" header 2016-11-13 01:39:14 +01:00
luccioman
812abfc868 Converted one more set of URLs to pure relative ones.
Easier YaCy peer configuration behind a reverse proxy subfolder : no
need for the reverse proxy to rewrite HTML links or URLs in css files.

Tested on Debian Jessie with an apache2 reverse proxy.

See related mantis issues http://mantis.tokeek.de/view.php?id=106 and
http://mantis.tokeek.de/view.php?id=701
2016-11-12 15:54:35 +01:00
luccioman
74fec066f4 Converted more URLs to pure relative ones.
Easier YaCy peer configuration behind a reverse proxy subfolder : no
need for the reverse proxy to rewrite HTML links or URLs in css files.

Tested on Debian Jessie with an apache2 reverse proxy.

See related mantis issues http://mantis.tokeek.de/view.php?id=106 and
http://mantis.tokeek.de/view.php?id=701
2016-11-12 10:51:54 +01:00
luccioman
734340c128 Fixed errors for Search portal mode or when peer is not reachable.
Same case as reported on issue #87.
2016-11-04 14:31:22 +01:00
luccioman
6e1959f469 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
Conflicts:
	htroot/yacysearchitem.java
	source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java
	source/net/yacy/search/schema/CollectionConfiguration.java
	source/net/yacy/server/serverObjects.java
2016-10-14 11:29:55 +02:00
reger
7c81160f45 correct blacklist export as text url to blacklists_p.txt
was using servlet for network access and missing network.unit.name
fix for http://mantis.tokeek.de/view.php?id=694
+ prevent unresoved_pattern in yacy/list servlet
2016-10-07 03:03:41 +02:00
reger
91ab8a526a add error msg to api/share.html
and skip display of url on nothing uploaded
2016-08-17 03:07:26 +02:00
luccioman
6e96c7341a Merge remote-tracking branch 'origin/master'
Conflicts:
	htroot/Load_MediawikiWiki.java
	htroot/Load_PHPBB3.java
	htroot/ViewImage.java
2016-07-03 18:59:00 +02:00
reger
4e0892962a fix NPE in citation servlet on empty text field 2016-05-14 03:51:13 +02:00
reger
d9adc2c255 load handler for Transparent Proxy on startup only if feature is activated
to save the resources and keep handler chain small if the feature is not used.
+add a warning message on settingsack_p page to restart on first activation
2016-03-25 05:26:48 +01:00
Michael Peter Christen
b89465d952 0N - basic dump upload servlet infrastructure, to share index dumps
within an experimental new sharing model
2016-03-11 18:12:13 +01:00
Michael Peter Christen
f12a900f3e harmonization of http post of files for one and several files - this had
been differently - and wrong for several files. also: base64-encoding
for gzipped push files because our data structures currently only
supports ASCII POST pushes..
2016-03-11 08:59:33 +01:00
luc
8682dfbd5e Updated getpageinfo outputs to return page icons list. 2016-02-10 09:02:21 +01:00
luc
3cc5619d93 Improved HTML icons indexing and rendering in search results.
See http://mantis.tokeek.de/view.php?id=629
2016-02-02 09:57:54 +01:00
luc
571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
charset names.
2016-01-05 23:37:05 +01:00
luc
55a4d15775 Added a note on deprecated default search field and operator. 2015-12-14 23:55:12 +01:00
reger
52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
- by direct using Set vs. List
- remove not neede String[] getter
2015-11-13 01:48:28 +01:00
reger
a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt() 2015-10-31 23:09:03 +01:00
sixcooler
87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
moved and was not cleared anymore. This results in an huge fieldcache.
(http://lucene.apache.org/#highlights-of-the-lucene-release-include
https://issues.apache.org/jira/browse/LUCENE-5666)
Here I try to use DovValues where it is possible.
For this I used the Api-Scheme as new basis für the Solr-Schema.
This needs at least a complete optimization of the Solr-Index to get a
smaller FieldCache.
Everything that is indexed with these setting will not use the
Fieldcache at all.
2015-08-31 20:24:41 +02:00
Michael Peter Christen
b43811d38c added surrogate import process for exported solr dumps.
Just throw your solr dump file into DATA/SURROGATES/in/ and it will be
imported!
2015-05-30 13:19:59 +02:00
reger
3e742d1e34 Init remote crawler on demand
If remote crawl option is not activated, skip init of remoteCrawlJob to save the resources of queue and ideling thread.
Deploy of the remoteCrawlJob deferred on activation of the option.
2015-05-23 02:06:39 +02:00
reger
609c52e987 refactor getBookmark
to consistenly check existance by != null (w/o throwing exception on not found)
2015-05-11 00:37:04 +02:00
reger
8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
(instead of using the description field)
- adjust display and autosearch
- don't overwrite existing bookmark but combine info
2015-05-03 02:31:50 +02:00