luccioman
46b8836548
Copy image resources contained in donation iframe.
...
Handle eventual images loading errors.
2016-08-17 15:19:15 +02:00
reger
4c7a77662a
eleminate dependency on file-extension in storeDocument but use supported mime-type
...
to also support handling of urls w/o corresponding file-extension.
For this refactor use of document.getParserObject() to alway return a Parser (for clean logic)
and define/move the scraperObject as local var of AbstractParser.
Adjust related calls to getParserObject (where actually a scraperObject is wanted).
Addionally skip appending url token to parsed text for dht metadata entries
(by default returned as result by rwi index).
2016-08-14 03:53:16 +02:00
reger
ebde21079a
refactor xlsParser to include Excel file attribute (like author) in parser result doc.
...
Similar to ppt and doc parser, completing a TODO in xlsParser.
2016-08-13 23:46:36 +02:00
luccioman
744c9a2615
Opensearch desc : handle https protocol url with default port (443)
...
This completes modifications made for mantis 669
(http://mantis.tokeek.de/view.php?id=669 )
2016-08-12 12:18:26 +02:00
luccioman
b9c28893ee
Merged master to 'heroku' branch.
2016-08-10 11:03:01 +02:00
Michael Peter Christen
103a8348b3
fix for NPE and small performance enhancement
2016-08-10 06:48:08 +02:00
reger
2910fe35c1
add missing scheduler calc of next exec_date (call of calculateAPIScheduler)
...
- after last_exec_date is altered, next_exec_date should be recalculated
- makes the recalculation of next_exec in advance (without api call surely made) in Switchbard.schedulerJob() obsolete
Slightly modify next_exec calc. on missed event to now+schedule_time (from fix 10min)
2016-08-09 03:03:04 +02:00
reger
70d47ae38a
keep scheduler selection by repeat entry from 07311020d4
...
to allow exec schedule on actual exec event.
Iterate on exec date (of advantage after interruption/shutdown) to schedule
older or missed events first.
2016-08-08 02:19:48 +02:00
reger
7c3f932e5d
revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)
2016-08-08 01:57:31 +02:00
reger
07311020d4
postpone apicall exec date init until actual call
...
fix for http://mantis.tokeek.de/view.php?id=677
The difference is on scheduling a large number of rss feeds and loading
is not finished before shutdown of YaCy. The change makes sure not already
loaded RSS will be loaded by the scheduler on next startup.
2016-08-07 05:08:55 +02:00
reger
5e335b32da
fix Blacklist.contains() matching path pattern to string
...
similar to 5e9e871192
+ add proof testcase
2016-08-04 01:12:49 +02:00
reger
5e9e871192
fix Blacklist.remove by using pattern.toString to find pattern to remove,
...
parameter String path did never equal Pattern.
+ delete unused removeAll, as it does not persist changes after restart
2016-08-03 02:13:26 +02:00
reger
1843ea7e69
on Blacklist.add pattern to source file also update internal entry maps
...
as in Blacklist.add(blacklistType) to make entry effective w/o restart
fix for http://mantis.tokeek.de/view.php?id=676
2016-08-02 02:41:03 +02:00
reger
bf6ce33da3
Correct use of _htDocsPath config in YaCyDefaultServlet to use servlet config variable
...
+ add some javadoc and remove a not useful static declaration
2016-07-31 23:16:24 +02:00
luccioman
480027ec98
Merge remote-tracking branch 'origin/master' into heroku_experiments
2016-07-28 02:29:40 +02:00
reger
fcad2d0744
add uses of config constant INDEX_RECEIVE_ALLOW
2016-07-27 02:16:20 +02:00
reger
226f81cfcf
declare poison pill url MultiProtocolURL() as protected to make sure not
...
used from outside.
After double checking use of poison url revert path init from commit
f8632ad292
2016-07-23 20:03:13 +02:00
reger
f8632ad292
prevent string index out of bounds MultiProtocolURL.getPaths
...
as path maybe a empty string
+ init path to "" also in init for poison url (to guarantee success for
all existing uses of path w/o check for null)
2016-07-23 19:18:23 +02:00
reger
35a7d57260
update lucenematchversion to current (5.2.0 -> 5.5.0)
...
there should be no need for reindex by the update
2016-07-23 18:36:43 +02:00
reger
9b07bbf955
deprecate newurl(), not used and already replaced
...
instead of making it handle all supported the protocols
2016-07-21 02:14:35 +02:00
luccioman
47d486298f
Merged changes from master.
2016-07-20 00:37:31 +02:00
reger
774b3906a9
fix GenericFormatter.parse ("time","timeoffset")
...
change: UTC offset internally expected in minutes
2016-07-19 02:57:41 +02:00
reger
27163af0e1
improve detection of referenced links by taking http and https link protocol
...
into account
+ correct query start detection of commit f89d4eb51d
2016-07-17 23:42:25 +02:00
reger
f89d4eb51d
fix MultiProtocolURL init (assign of host) for urls with '/' in query part
...
+ add to test case
2016-07-17 04:17:01 +02:00
reger
87fcfc6d78
Adjusted hash computation and toNormalform for file:// protocol to deliver
...
same hash same file on Windows filesystem path with forward- and backslash in path.
Background see http://mantis.tokeek.de/view.php?id=671
+Test case
2016-07-16 01:59:09 +02:00
luccioman
d6bf90803f
Merged from maain master branch.
2016-07-12 09:05:31 +02:00
luccioman
9b9c112263
Handle more propertly local port configuration by system property
...
And prefixed property with "net.yacy" to avoid ambiguity.
2016-07-12 01:53:01 +02:00
reger
3811184abd
fix GSA servlet clientIP retrival
2016-07-09 23:39:43 +02:00
reger
7ab41d4ff1
use directories original lastmodified date in file- & smbloader in response
2016-07-09 19:55:47 +02:00
reger
708bcbb042
one more replacement to use cached hosthash vs. calculated
2016-07-07 02:50:57 +02:00
luccioman
b57a06d88e
Let Heroku decide which http port to use
2016-07-06 22:14:40 +02:00
reger
22db449f2a
to prevent crawler to concurrently access and alter same crawl queue
...
after restart, put hosthash in queue's filename (which is used as primary
key for crawl queue. Hint: initial hosthash from url and recalculated hosthash
from just hostname:port are not the same.
fixes http://mantis.tokeek.de/view.php?id=668 (partially)
2016-07-05 23:22:35 +02:00
Orbiter
50c5ddf1a1
Merge pull request #56 from luccioman/LibreJS
...
LibreJS compliance : YaCy JavaScript license information
2016-07-04 21:07:11 +02:00
Michael Peter Christen
7466d390b2
small refactoring + do not accept too old peers during bootstrap
2016-07-04 11:02:15 +02:00
reger
8d58a48029
remove wrong log line in CrawlSwitchboard
...
+ don't allow CrawlSwitchboard to exit application
making network param unused
2016-07-02 20:33:23 +02:00
reger
5aaa057c65
ignore empty input lines in FileUtils.getListArray() to poka joke blacklist read.
...
equalizes behavior with getListString()
improves: case were blacklist file contained a undesired empty line, not
fixed by blacklist-cleaner.
2016-06-28 23:44:28 +02:00
reger
41c36ffd75
exclude rejected results from result count
...
(by using the resultcontainer.size instead of input docList.size)
skip waiting for write-search-result-to-local-index
(by removing the Thread.join - which will bring a small performance increase)
2016-06-26 06:46:26 +02:00
reger
d4da4805a8
internal wiki code, require header line to start with markup
...
(to allow something like "one=two" as text)
+ incl. test case
2016-06-25 02:46:44 +02:00
reger
e952e355a2
have Translator servlet adhoc apply added translation by translating a single file
...
+ fix NPE in Translator, coming from translation read by TranslatorXliff
which allows null content for not translated key's
2016-06-14 22:14:46 +02:00
reger
b119ff65be
clean out not used Switchboard variables
...
counter indexedPages, const xstackCrawlSlots
2016-06-14 01:50:32 +02:00
reger
223071337b
Translator to take caution of word boundaries to identify text portion to
...
be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it"
translation of partial terms/words.
Add check of word boundary before and after sourcetext (incl. take care
of current praxis for key to be delimetered by > <
+ add test case
2016-06-10 01:14:19 +02:00
luccioman
009657791e
Merge remote-tracking branch 'origin/master' into LibreJS
2016-06-09 14:44:51 +02:00
luccioman
a73c9327a5
JavaScript License fixes for LibreJS compatibility
2016-06-08 23:16:10 +02:00
reger
0c40401d28
fix MessageBoard test for null data
2016-06-07 23:34:42 +02:00
reger
5b22c63030
Adjust TranslatorXliff to load default 1st and merge downloaded or modified local translation.
...
process 1. load default from locales/*.*
2. load and merge(overwrite) from DATA/LOCALE/*.* (can be partial translation as it is merged)
- include all entries from DATA/LOCAL to be edited in Translator servlet
and save just modifications (instead of full list) to DATA/LOCALE
This shall make it easy to share modifications.
2016-06-05 23:01:45 +02:00
reger
a2e0f00456
optimize Translator
...
- translateFilesRecursive: load translation once (reduce io), return true on complete success
- remove resulting unused translateFiles() variant
- translate: use StringBuilder parameter (skip toString conversion)
- remove not needed static declaration
- upd some javadoc
2016-06-05 03:57:08 +02:00
reger
a6ba1faa80
introduce a translation edit servlet Translator_p.html YaCy's UI text translation
...
This is the 1st rudimentary approach to support the translatio utilities.
It allows currently to edit untranslated text and save it in a local translation file
in the DATA/LOCALE directory.
+ refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine
+ adjust TranslatorXliff to check for local translations in DATA/LOCALE,
this includes storing manually downloaded translation files in DATA as well
(to keep default untouched)
+ on 1st call of Translator_p a master tanslation file is generated, checking
the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages)
Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.
2016-06-03 01:46:30 +02:00
reger
b3c9041f79
remove with localHostNames redundant (but unused) publicIPv4HostNames and publicIPv6HostNames
...
to free unused resources
2016-06-02 01:42:15 +02:00
reger
bd8f7c11f5
Use transparent addToCrawler in AutoSearch instead of addToIndex
...
This would likely also be of advantage for RSS import/schedule as
following bug-reports suggest
http://mantis.tokeek.de/view.php?id=569
http://mantis.tokeek.de/view.php?id=655
2016-06-01 01:14:22 +02:00
reger
f23d8ab47b
fix 2 more servlet RuntimeException in intranet mode thrown due to seed.getIP()
...
returning null in intranet mode (in servlets: ConfigSearchBox, Load_PHPBB3
+remove unused (const ∅) seed.IPTYPE
2016-05-29 20:35:57 +02:00
reger
bb0076c3dd
fix: assure close inputstream in TranslatorXliff after reading xlf file
...
by using try-wiht-resource block
2016-05-29 01:25:47 +02:00
reger
6384b7d82e
fix NPE in Load_MediawikiWiki servlet in intranet mode
...
- in intranet mode getip returns null causing a NPE
- adjust starturl (which was set to http://localip/repository ) which is never the start url for the Mediawiki
+ correct javadoc for seed.getIP()
2016-05-27 03:10:25 +02:00
Michael Peter Christen
596b5dfa59
add the JRE version in the seed. Purpose: identify if it is possible to
...
migrate to new JRE version
2016-05-24 23:11:59 +02:00
reger
4cc38e979d
add InputStream close after reading input file (Vocabulary_p servlet)
2016-05-24 00:26:28 +02:00
reger
6bf9c55584
adjust Solr select servlet to lates bugfix for boostquery (bq param)
...
to split query into multiple parameter on line separator in input query.
e.g. split "crawldepth_i_0^10.0 \n crawldepth_i:1^5.0"
but allow "url_file_ext_s:jpg OR url_file_ext_s:png" to be unsplitted
2016-05-22 22:43:56 +02:00
Burkhard
9a18e2297b
Merge pull request #51 from JeremyRand/multiple-boost-query
...
Fix multiple boost queries
2016-05-22 22:24:04 +02:00
reger
f0d7b93372
make use and activate autodetect charset in Vocabulary input from file
...
+ revert mistake of empty cn.lng
2016-05-22 05:38:26 +02:00
JeremyRand
433217b33e
Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)
2016-05-20 20:17:51 -05:00
JeremyRand
58824dfa6c
Refactor escaping in config file read/write code. Now it uses Apache Commons StringUtils instead of RegEx.
2016-05-20 20:17:51 -05:00
reger
9e94989237
upd to PDFBox 2.0.1
2016-05-20 23:12:16 +02:00
reger
d0a571bed2
del cytag trail for own index.html (save resource not used by default)
2016-05-19 01:59:00 +02:00
reger
de46879637
fix SeedDB.get(byte[]) hash string compare (for returning own seed shortcut)
2016-05-17 02:07:49 +02:00
reger
24b0fa2a38
extend snapshot Html2Image.pdf2image to use PDFBox image export capability
...
if no external tool installed (and for Win)
Resulting jpg are not always perfect (if graphic included) but imho sufficient.
2016-05-16 02:13:33 +02:00
reger
eb2a00b1d8
fix NPE on missing crawldepth_i
2016-05-15 01:26:38 +02:00
reger
efb9f1a8b7
save resource for unused blacklistFiles map
2016-05-12 00:13:57 +02:00
reger
5f113be760
cleanup connectPeer & yacyVersion.latestRelease usage
...
obsolete since
527b3decde
2016-05-06 21:05:15 +02:00
reger
7097dcbdbd
cleanup hack for partial Solr update on multivalued datefields
...
has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050
2016-05-06 02:47:04 +02:00
reger
f10ea3c155
clean-out unused SwitchboardConstants
2016-05-05 00:55:22 +02:00
reger
ef24593347
delete obsolete SEARCHRESULT busythread constants
...
not used since 29.05.2013 18:27:27
0c1a018bbd
2016-05-04 01:30:10 +02:00
reger
125b5e26a5
apply bugfix for ChartPlotter from Pullreq 42
...
https://github.com/yacy/yacy_search_server/pull/42
thanks to otteresk (https://github.com/otteresk )
2016-05-03 03:06:06 +02:00
reger
06ce9ae711
prevent "unchecked conversion" compiler message
...
+ include "translate" property in xlf "trans-unit" export
2016-05-01 02:22:05 +02:00
reger
b4a576dbdf
exclude unused protocol param "duetime"
...
(receiver interpretes param "time" only)
2016-04-25 01:57:33 +02:00
reger
3bd6ae8d8b
keep addon/Notepad++ keyword marker on lng export
...
(length of remarks devider line)
+ harmonize status_p.inc lng text
2016-04-21 00:51:08 +02:00
reger
16837d60c7
fix version in locale version file
...
(it's compared to full version)
2016-04-17 22:54:28 +02:00
reger
0fb01e429e
fix migration, account for ssl port in config (for auto-disable https)
2016-04-17 04:42:05 +02:00
reger
7be1c7a05a
fix logger name
2016-04-17 03:20:14 +02:00
reger
1d940e5a94
upd commons-compress 1.11
2016-04-16 23:31:03 +02:00
reger
7789c32c82
delete crawl queue on init exception
...
(happens occasionally on path name vaiolation and will never get resolved)
2016-04-16 00:22:48 +02:00
reger
f781b9dd47
revert call condition f. migration.installSkins
...
(a bug introduced in fb8ae14b21
,
see comment on that commit )
2016-04-14 22:14:32 +02:00
reger
3adb670f44
remove never used Domains.myHostNames set
2016-04-14 02:54:41 +02:00
reger
6ecc180299
fix rwi doubledom return best (highest) ranking
2016-04-12 03:55:43 +02:00
reger
2343e3f1cd
keep and update existing xlf translation master instead of create new
...
in utility CreateTranslationMasters
+ small fixes in lng's
2016-04-09 23:25:05 +02:00
reger
a1935f485f
Added utility class CreateTranslationMasters to create a language independant
...
translation master as source to harmonize individual translation files
Included a main to create masters in YaCy an xliff format for testing
+ restrict TranslatorXliff to use only entries with State=translated
P.S. used https://open-language-tools.java.net/editor/about-xliff-editor.html to
experiement with xlf output (haven't a Pootle avail.)
2016-04-05 01:57:32 +02:00
reger
acaf51b296
keep ConfigLanguage_p as 1st entry in exported translation file
...
+ rem untranslated text & some typo fixes in several translations
(considering to create a translation master file to harmonize entries)
2016-04-04 02:56:19 +02:00
reger
61c5b6b403
fix empty drop down list in ConfigLanguage after wrong/empty download
...
+ add xliff translated attribut
+ append japanese lng name
2016-03-31 01:51:25 +02:00
reger
4eddabee42
translate Network History screen -> de
...
+ remove leftover debug line
2016-03-30 01:09:13 +02:00
reger
90c79014ae
remove unused translator routine which also doesn't handle rel path input
...
+ correct some language file match issues
2016-03-29 21:31:02 +02:00
reger
902e79e261
Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
...
This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649
Allows longer term also to store translation maps for the htroot files
in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ).
+ added test case creating and comparing xliff file with internal custom prop file.
(currently the introduced class is not used in core code)
2016-03-28 23:26:30 +02:00
reger
d9adc2c255
load handler for Transparent Proxy on startup only if feature is activated
...
to save the resources and keep handler chain small if the feature is not used.
+add a warning message on settingsack_p page to restart on first activation
2016-03-25 05:26:48 +01:00
reger
ec24a0c85a
add test case for optimized toTokens()
2016-03-24 19:26:38 +01:00
reger
cada24f918
adjust utility ListNonTranslatedFiles for path compare on windows
...
(backslash replace)
2016-03-20 23:46:39 +01:00
reger
fb8ae14b21
make migration version safe
2016-03-20 03:34:28 +01:00
reger
258cd41577
reduce logging (EmbeddedSolrConnector.query)
...
mainly to reduce the frequent metadat checks like
> EmbeddedSolrConnector.query QUERY: q={!cache=false raw f=id}xXxXxX&rows=1&start=0&fl=id,load_date_dt
(p.s. direct servlet queries logged via AccessTracker.addToDump)
2016-03-14 22:32:06 +01:00
reger
6783ef5540
move example code SearchClient out of yacycore package
...
to example directory
2016-03-14 02:22:06 +01:00
Michael Peter Christen
b89465d952
0N - basic dump upload servlet infrastructure, to share index dumps
...
within an experimental new sharing model
2016-03-11 18:12:13 +01:00
Michael Peter Christen
f12a900f3e
harmonization of http post of files for one and several files - this had
...
been differently - and wrong for several files. also: base64-encoding
for gzipped push files because our data structures currently only
supports ASCII POST pushes..
2016-03-11 08:59:33 +01:00
Michael Peter Christen
849ab671a9
0n: modified the p2p bootstraping process - rules had been too tight and
...
did not support the re-start of a network with just one principal peer.
2016-03-11 08:54:42 +01:00
reger
764f5100f0
fix delete of temp file after odt % ooxml parser
...
Close zipfile after parsing
2016-03-04 23:05:55 +01:00
reger
379e9b330d
use supplied url port to get robots.txt in crawlers hostqueue
2016-03-02 00:12:34 +01:00
reger
58a959403d
fix mixed logfactory in UrlProxyServlet,
...
Class doesn't use functions of declared ancestor, change to extend on httpservlet
2016-02-27 03:44:43 +01:00