Commit Graph

12111 Commits

Author SHA1 Message Date
luc
2a67d2ba6f Corrected error management for unsupported image formats, parsing
errors, and unavailable resources : avoid logging to much Exceptions as
these errors easily occur when searching images.
2015-12-01 01:06:01 +01:00
reger
9cfa847c94 upd maven pom (add langdetect) 2015-11-30 18:57:16 +01:00
Michael Peter Christen
5309b99e98 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-11-30 17:01:39 +01:00
Michael Peter Christen
d6e9834040 Merge branch 'master' of
https://github.com/Scarfmonster/yacy_search_server

# Conflicts:
#	.classpath
#	build.xml
2015-11-30 16:54:54 +01:00
Michael Peter Christen
c030b9bc5c Merge pull request #27 from Stepanov-Sergey/master
added Russian synonyms
2015-11-30 13:45:25 +01:00
Michael Peter Christen
cff152f26e Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-11-30 13:44:36 +01:00
Michael Peter Christen
7e785dac8e urlproxyheader must be in the default package because all classes in the
htroot path must be in the default package
2015-11-30 13:35:41 +01:00
Michael Peter Christen
d82d311995 Merge branch 'master' of https://github.com/luccioman/yacy_search_server
# Conflicts:
#	.classpath
2015-11-30 13:34:10 +01:00
Michael Peter Christen
d3ab43e743 fixed classpath 2015-11-30 13:19:49 +01:00
Michael Peter Christen
06994b5853 Merge pull request #23 from linkerlin/patch-1
Create .travis.yml
2015-11-30 13:18:03 +01:00
Sergey Stepanov
de0f3c6ff1 added Russian synonyms 2015-11-30 11:37:47 +03:00
reger
b5371ea8c1 read/init crawl queue in a thread
to speed-up YaCy start on large existing crawler queues
2015-11-29 05:19:39 +01:00
reger
f05b34fc35 upd to slf4j-1.7.13 2015-11-29 01:24:46 +01:00
reger
1160b13172 remove unused md5 from ViewFile servlet params 2015-11-28 23:09:15 +01:00
reger
e163ea88f6 fix vsdParser (Visio) parser return statement
(final block un-necessary throw)
2015-11-28 02:43:38 +01:00
reger
b2c8bc0ae6 remove md5_s from default index fields
it is not assigned a value / not used
Due to above also excluded from transfer protocol.
2015-11-27 02:41:02 +01:00
luc
e40ae0943b - No max dimensions specified : render raw image data when source and
target image format are the same.
- Corrected scaling condition.
2015-11-26 09:30:43 +01:00
luc
4c36b7bd14 Merge branch 'master' of https://github.com/yacy/yacy_search_server
Conflicts:
	.classpath
2015-11-26 09:28:34 +01:00
reger
90686a75a2 fix flux factor (additional crawl delay by access count) calculation 2015-11-25 01:34:41 +01:00
reger
d79fa7fbeb upd to Jetty v9.2.14.v20151106 2015-11-24 21:35:58 +01:00
luc
4af27289e5 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-23 09:01:25 +01:00
reger
297fdb60d3 throw exception if crawler hostqueue can't create hostpath directory.
In rare cases hostname may not be a valid filesystem directory name,
which can't be created (e.g. containing '*' char). To prevent crawl queue
looping on this invalid entry by throwing a malformedurlexception.
2015-11-22 21:26:18 +01:00
luc
755efac17d Use same max file size when loading all resource bytes or opening stream
content
2015-11-20 19:35:39 +01:00
luc
5eafce5577 Rendering performance improvement : use EncodedImage constructor with
BufferedImage parameter to avoid re-rerendering BufferedImage.
2015-11-20 15:02:58 +01:00
luc
bc6c79fc12 Corrected scaling function for non RGB images. 2015-11-20 14:35:36 +01:00
luc
042b0e9658 Corrected IcedTea version. See http://mantis.tokeek.de/view.php?id=615 2015-11-20 10:15:54 +01:00
luc
1565559df8 Refactoring : extracted write InputStream method. 2015-11-20 09:42:24 +01:00
luc
f0478bb14d BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
imageio-bmp-3.2 library.

 - better BMP format flavours support
 - handle PNG encoded icons
 - handle transparency
 
Added some javadoc url references to .classpath
2015-11-20 09:38:16 +01:00
luc
b6ba941d33 Configuration projet eclipse : ajout nature et validation javascript 2015-11-20 09:32:30 +01:00
luc
7f27683831 Correction erreur de compilation. 2015-11-20 09:29:02 +01:00
luc
07437986e7 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-20 08:15:24 +01:00
reger
97cc03ef6a start using a template for urlproxy header
It is included as iframe  /proxmsg/urlproxyheader.html
to allow full servlet functionallity and flexibility to display some
index/meta data in future.
2015-11-20 01:49:56 +01:00
reger
d08e421809 fix link to logo (yacysearch.xsl) 2015-11-19 21:08:00 +01:00
luc
f01d49c37a Process large or local file images dealing directly with content
InputStream.
2015-11-18 10:15:38 +01:00
luc
3c4c77099d If available, check content length before downloading. Check also
content length is not over Integer.MAX_VALUE.
2015-11-18 10:11:38 +01:00
luc
5bbb2e1730 Ensure resource is closed when reading a full file InputStream 2015-11-18 10:08:06 +01:00
luc
6291a57300 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-18 08:49:31 +01:00
reger
0d3c5b223e have psParser cleanup temp file 2015-11-17 23:45:29 +01:00
reger
7d0d19cb8e avoid File.deleteOnExit() on temp files
JVM registers each file in a list regardless of already deleted and never
cleans up the list during runtime.
This accumulates to a considerable amount of mem during large crawls and/or
long uptime.
To tackle this, all temp files are now created in a subdir of java.io.tmpdir 
and the jvm tmpdir property is set to this subdir, which is deleted by
code on shutdown.
Additionally let pdfParser use this tmp subdir too.
2015-11-17 22:27:07 +01:00
luc
bfe51001e3 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-17 08:30:32 +01:00
reger
02e4489a23 set tmpfile.deleteOnExit by default,
to make sure files are removed on shutdown.
2015-11-16 21:37:45 +01:00
reger
2985baaa01 Exclude repetitive protocol part in tokenized url
used as description if none is avail. from parser.
2015-11-16 01:06:20 +01:00
reger
ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
remove obsolete partial init of wordreference from urimetadata
2015-11-15 06:06:37 +01:00
reger
7bf03856d1 add link to quick select blacklist
from title list
2015-11-15 00:39:38 +01:00
reger
440ce6d198 add German translation to re-crawl job 2015-11-15 00:34:22 +01:00
reger
5362a80f1c upd to httpcore 4.4.4 2015-11-14 21:16:31 +01:00
reger
e90593450c upd to TwelveMonkeys ImageIO 3.2 2015-11-14 01:46:25 +01:00
reger
b4dbff6a6a fix yacysearch.json "totalResults"
element "totalResults" is included twice (at begin & end), 
only the element after performing the search holds number > 0
see http://mantis.tokeek.de/view.php?id=608
2015-11-13 20:10:47 +01:00
reger
52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
- by direct using Set vs. List
- remove not neede String[] getter
2015-11-13 01:48:28 +01:00
luc
49331dc523 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-12 08:21:56 +01:00