Commit Graph

3593 Commits

Author SHA1 Message Date
reger
708bcbb042 one more replacement to use cached hosthash vs. calculated 2016-07-07 02:50:57 +02:00
reger
22db449f2a to prevent crawler to concurrently access and alter same crawl queue
after restart, put hosthash in queue's filename (which is used as primary 
key for crawl queue. Hint: initial hosthash from url and recalculated hosthash 
from just hostname:port are not the same. 
fixes http://mantis.tokeek.de/view.php?id=668 (partially)
2016-07-05 23:22:35 +02:00
Orbiter
50c5ddf1a1 Merge pull request #56 from luccioman/LibreJS
LibreJS compliance : YaCy JavaScript license information
2016-07-04 21:07:11 +02:00
Michael Peter Christen
7466d390b2 small refactoring + do not accept too old peers during bootstrap 2016-07-04 11:02:15 +02:00
reger
8d58a48029 remove wrong log line in CrawlSwitchboard
+ don't allow CrawlSwitchboard to exit application
making network param unused
2016-07-02 20:33:23 +02:00
reger
5aaa057c65 ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read.
equalizes behavior with getListString()
improves: case were blacklist file contained a undesired empty line, not 
fixed by blacklist-cleaner.
2016-06-28 23:44:28 +02:00
reger
41c36ffd75 exclude rejected results from result count
(by using the resultcontainer.size instead of input docList.size)
skip waiting for write-search-result-to-local-index
  (by removing the Thread.join - which will bring a small performance increase)
2016-06-26 06:46:26 +02:00
reger
d4da4805a8 internal wiki code, require header line to start with markup
(to allow something like  "one=two"  as text)
+ incl. test case
2016-06-25 02:46:44 +02:00
reger
e952e355a2 have Translator servlet adhoc apply added translation by translating a single file
+ fix NPE in Translator, coming from translation read by TranslatorXliff 
  which allows null content for not translated key's
2016-06-14 22:14:46 +02:00
reger
b119ff65be clean out not used Switchboard variables
counter indexedPages, const xstackCrawlSlots
2016-06-14 01:50:32 +02:00
reger
223071337b Translator to take caution of word boundaries to identify text portion to
be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it"
translation of partial terms/words.
Add check of word boundary before and after sourcetext (incl. take care
of current praxis for key to be delimetered by > < 
+ add test case
2016-06-10 01:14:19 +02:00
luccioman
009657791e Merge remote-tracking branch 'origin/master' into LibreJS 2016-06-09 14:44:51 +02:00
luccioman
a73c9327a5 JavaScript License fixes for LibreJS compatibility 2016-06-08 23:16:10 +02:00
reger
0c40401d28 fix MessageBoard test for null data 2016-06-07 23:34:42 +02:00
reger
5b22c63030 Adjust TranslatorXliff to load default 1st and merge downloaded or modified local translation.
process 1. load default from locales/*.* 
        2. load and merge(overwrite) from DATA/LOCALE/*.* (can be partial translation as it is merged)
- include all entries from DATA/LOCAL to be edited in Translator servlet
  and save just modifications (instead of full list) to DATA/LOCALE

This shall make it easy to share modifications.
2016-06-05 23:01:45 +02:00
reger
a2e0f00456 optimize Translator
- translateFilesRecursive: load translation once (reduce io), return true on complete success
  - remove resulting unused translateFiles() variant
- translate: use StringBuilder parameter (skip toString conversion)
- remove not needed static declaration
- upd some javadoc
2016-06-05 03:57:08 +02:00
reger
a6ba1faa80 introduce a translation edit servlet Translator_p.html YaCy's UI text translation
This is the 1st rudimentary approach to support the translatio utilities.
It allows currently to edit untranslated text and save it in a local translation file
in the DATA/LOCALE directory.
+ refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine
+ adjust TranslatorXliff to check for local translations in DATA/LOCALE,
  this includes storing manually downloaded translation files in DATA as well 
  (to keep default untouched)
+ on 1st call of Translator_p a master tanslation file is generated, checking
the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages)
Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.
2016-06-03 01:46:30 +02:00
reger
b3c9041f79 remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames
to free unused resources
2016-06-02 01:42:15 +02:00
reger
bd8f7c11f5 Use transparent addToCrawler in AutoSearch instead of addToIndex
This would likely also be of advantage for RSS import/schedule as
following bug-reports suggest
http://mantis.tokeek.de/view.php?id=569
http://mantis.tokeek.de/view.php?id=655
2016-06-01 01:14:22 +02:00
reger
f23d8ab47b fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP()
returning null in intranet mode (in servlets: ConfigSearchBox, Load_PHPBB3
+remove unused (const &empty;) seed.IPTYPE
2016-05-29 20:35:57 +02:00
reger
bb0076c3dd fix: assure close inputstream in TranslatorXliff after reading xlf file
by using try-wiht-resource block
2016-05-29 01:25:47 +02:00
reger
6384b7d82e fix NPE in Load_MediawikiWiki servlet in intranet mode
- in intranet mode getip returns null causing a NPE
  - adjust starturl (which was set to http://localip/repository) which is never the start url for the Mediawiki
+ correct javadoc for seed.getIP()
2016-05-27 03:10:25 +02:00
Michael Peter Christen
596b5dfa59 add the JRE version in the seed. Purpose: identify if it is possible to
migrate to new JRE version
2016-05-24 23:11:59 +02:00
reger
4cc38e979d add InputStream close after reading input file (Vocabulary_p servlet) 2016-05-24 00:26:28 +02:00
reger
6bf9c55584 adjust Solr select servlet to lates bugfix for boostquery (bq param)
to split query into multiple parameter on line separator in input query.
e.g. split "crawldepth_i_0^10.0 \n crawldepth_i:1^5.0"
but allow   "url_file_ext_s:jpg OR url_file_ext_s:png"  to be unsplitted
2016-05-22 22:43:56 +02:00
Burkhard
9a18e2297b Merge pull request #51 from JeremyRand/multiple-boost-query
Fix multiple boost queries
2016-05-22 22:24:04 +02:00
reger
f0d7b93372 make use and activate autodetect charset in Vocabulary input from file
+ revert mistake of empty cn.lng
2016-05-22 05:38:26 +02:00
JeremyRand
433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.) 2016-05-20 20:17:51 -05:00
JeremyRand
58824dfa6c Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx. 2016-05-20 20:17:51 -05:00
reger
9e94989237 upd to PDFBox 2.0.1 2016-05-20 23:12:16 +02:00
reger
d0a571bed2 del cytag trail for own index.html (save resource not used by default) 2016-05-19 01:59:00 +02:00
reger
de46879637 fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut) 2016-05-17 02:07:49 +02:00
reger
24b0fa2a38 extend snapshot Html2Image.pdf2image to use PDFBox image export capability
if no external tool installed (and for Win)
Resulting jpg are not always perfect (if graphic included) but imho sufficient.
2016-05-16 02:13:33 +02:00
reger
eb2a00b1d8 fix NPE on missing crawldepth_i 2016-05-15 01:26:38 +02:00
reger
efb9f1a8b7 save resource for unused blacklistFiles map 2016-05-12 00:13:57 +02:00
reger
5f113be760 cleanup connectPeer & yacyVersion.latestRelease usage
obsolete since 
527b3decde
2016-05-06 21:05:15 +02:00
reger
7097dcbdbd cleanup hack for partial Solr update on multivalued datefields
has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050
2016-05-06 02:47:04 +02:00
reger
f10ea3c155 clean-out unused SwitchboardConstants 2016-05-05 00:55:22 +02:00
reger
ef24593347 delete obsolete SEARCHRESULT busythread constants
not used since 29.05.2013 18:27:27
0c1a018bbd
2016-05-04 01:30:10 +02:00
reger
125b5e26a5 apply bugfix for ChartPlotter from Pullreq 42
https://github.com/yacy/yacy_search_server/pull/42
thanks to otteresk (https://github.com/otteresk)
2016-05-03 03:06:06 +02:00
reger
06ce9ae711 prevent "unchecked conversion" compiler message
+ include "translate" property in xlf "trans-unit" export
2016-05-01 02:22:05 +02:00
reger
b4a576dbdf exclude unused protocol param "duetime"
(receiver interpretes param "time" only)
2016-04-25 01:57:33 +02:00
reger
3bd6ae8d8b keep addon/Notepad++ keyword marker on lng export
(length of remarks devider line)
+ harmonize status_p.inc lng text
2016-04-21 00:51:08 +02:00
reger
16837d60c7 fix version in locale version file
(it's compared to full version)
2016-04-17 22:54:28 +02:00
reger
0fb01e429e fix migration, account for ssl port in config (for auto-disable https) 2016-04-17 04:42:05 +02:00
reger
7be1c7a05a fix logger name 2016-04-17 03:20:14 +02:00
reger
1d940e5a94 upd commons-compress 1.11 2016-04-16 23:31:03 +02:00
reger
7789c32c82 delete crawl queue on init exception
(happens occasionally on path name vaiolation and will never get resolved)
2016-04-16 00:22:48 +02:00
reger
f781b9dd47 revert call condition f. migration.installSkins
(a bug introduced in fb8ae14b21 , 
see comment on that commit )
2016-04-14 22:14:32 +02:00
reger
3adb670f44 remove never used Domains.myHostNames set 2016-04-14 02:54:41 +02:00