Commit Graph

12140 Commits

Author SHA1 Message Date
reger
297fdb60d3 throw exception if crawler hostqueue can't create hostpath directory.
In rare cases hostname may not be a valid filesystem directory name,
which can't be created (e.g. containing '*' char). To prevent crawl queue
looping on this invalid entry by throwing a malformedurlexception.
2015-11-22 21:26:18 +01:00
luc
755efac17d Use same max file size when loading all resource bytes or opening stream
content
2015-11-20 19:35:39 +01:00
luc
5eafce5577 Rendering performance improvement : use EncodedImage constructor with
BufferedImage parameter to avoid re-rerendering BufferedImage.
2015-11-20 15:02:58 +01:00
luc
bc6c79fc12 Corrected scaling function for non RGB images. 2015-11-20 14:35:36 +01:00
luc
042b0e9658 Corrected IcedTea version. See http://mantis.tokeek.de/view.php?id=615 2015-11-20 10:15:54 +01:00
luc
1565559df8 Refactoring : extracted write InputStream method. 2015-11-20 09:42:24 +01:00
luc
f0478bb14d BMP and ICO image formats support : integrated /haraldk/TwelveMonkeys
imageio-bmp-3.2 library.

 - better BMP format flavours support
 - handle PNG encoded icons
 - handle transparency
 
Added some javadoc url references to .classpath
2015-11-20 09:38:16 +01:00
luc
b6ba941d33 Configuration projet eclipse : ajout nature et validation javascript 2015-11-20 09:32:30 +01:00
luc
7f27683831 Correction erreur de compilation. 2015-11-20 09:29:02 +01:00
luc
07437986e7 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-20 08:15:24 +01:00
reger
97cc03ef6a start using a template for urlproxy header
It is included as iframe  /proxmsg/urlproxyheader.html
to allow full servlet functionallity and flexibility to display some
index/meta data in future.
2015-11-20 01:49:56 +01:00
reger
d08e421809 fix link to logo (yacysearch.xsl) 2015-11-19 21:08:00 +01:00
luc
f01d49c37a Process large or local file images dealing directly with content
InputStream.
2015-11-18 10:15:38 +01:00
luc
3c4c77099d If available, check content length before downloading. Check also
content length is not over Integer.MAX_VALUE.
2015-11-18 10:11:38 +01:00
luc
5bbb2e1730 Ensure resource is closed when reading a full file InputStream 2015-11-18 10:08:06 +01:00
luc
6291a57300 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-18 08:49:31 +01:00
reger
0d3c5b223e have psParser cleanup temp file 2015-11-17 23:45:29 +01:00
reger
7d0d19cb8e avoid File.deleteOnExit() on temp files
JVM registers each file in a list regardless of already deleted and never
cleans up the list during runtime.
This accumulates to a considerable amount of mem during large crawls and/or
long uptime.
To tackle this, all temp files are now created in a subdir of java.io.tmpdir 
and the jvm tmpdir property is set to this subdir, which is deleted by
code on shutdown.
Additionally let pdfParser use this tmp subdir too.
2015-11-17 22:27:07 +01:00
luc
bfe51001e3 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-17 08:30:32 +01:00
reger
02e4489a23 set tmpfile.deleteOnExit by default,
to make sure files are removed on shutdown.
2015-11-16 21:37:45 +01:00
reger
2985baaa01 Exclude repetitive protocol part in tokenized url
used as description if none is avail. from parser.
2015-11-16 01:06:20 +01:00
reger
ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
remove obsolete partial init of wordreference from urimetadata
2015-11-15 06:06:37 +01:00
reger
7bf03856d1 add link to quick select blacklist
from title list
2015-11-15 00:39:38 +01:00
reger
440ce6d198 add German translation to re-crawl job 2015-11-15 00:34:22 +01:00
reger
5362a80f1c upd to httpcore 4.4.4 2015-11-14 21:16:31 +01:00
reger
e90593450c upd to TwelveMonkeys ImageIO 3.2 2015-11-14 01:46:25 +01:00
reger
b4dbff6a6a fix yacysearch.json "totalResults"
element "totalResults" is included twice (at begin & end), 
only the element after performing the search holds number > 0
see http://mantis.tokeek.de/view.php?id=608
2015-11-13 20:10:47 +01:00
reger
52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
- by direct using Set vs. List
- remove not neede String[] getter
2015-11-13 01:48:28 +01:00
luc
49331dc523 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-12 08:21:56 +01:00
luc
0de6988604 Added links to more image test suites. 2015-11-12 08:21:37 +01:00
reger
47d70732f6 improve locale translator
- skip empty line
- robustness file section detection (space independant)
2015-11-11 00:57:51 +01:00
sixcooler
646afe9183 do not store subfield *_coordinate + make all num-fields being docvalues 2015-11-10 20:45:33 +01:00
sixcooler
194df613de not using 'location' as defaultfacetfield - since we removed it being
default.
2015-11-10 20:43:58 +01:00
sixcooler
d3b9349b6f simplification / speedup of GenerationMemoryStrategy 2015-11-10 20:39:46 +01:00
sixcooler
f5a9948860 do not store subfield *_coordinate 2015-11-10 20:32:42 +01:00
sixcooler
fca353e5eb set startuptype of most solr handlers to lazy 2015-11-10 20:32:05 +01:00
sixcooler
4a905ec134 fix to not let the AccessTracker-Log grow to much, but have enough data
to monitor.
(+gitignore-correction)
2015-11-10 20:27:17 +01:00
sixcooler
209f502f09 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-10 20:23:03 +01:00
reger
20e18d79f8 harmonize document title for archive parsers 2015-11-10 01:29:13 +01:00
sixcooler
d481653202 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-09 20:42:44 +01:00
Linker Lin
658d9e74d2 Create .travis.yml 2015-11-09 15:18:32 +08:00
luc
f11b5e8309 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-09 08:13:12 +01:00
reger
112ae013f4 update bzip and bzip parser process,
to return one document for the file with combined parser results of the
containing file and registers it with supplied url and mime of the archive.
2015-11-07 19:13:18 +01:00
reger
e76a90837b update zip and tar parser process,
to return one document for the file with combined parser results of the
containing files.
2015-11-06 23:58:55 +01:00
sixcooler
bc610e5382 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-06 23:28:39 +01:00
luc
0e8b3d9a90 Refactoring : default favicon and image processing errors.
- moved default favicon processing from ViewImage to
yacysearchitem.html : when previewing ico image search results we don't
want a default favicon be displayed
 - throw an IOException ending in a HTTP 500 error when image processing
fails, rather than returning a null result : behavior is more consistent
accross browsers (for exempla Chrome and Firefox), especially with new
default favicon display system
2015-11-05 09:45:19 +01:00
luc
4e673ffc9a Ensure closing of InputStream even when an exception occurs. 2015-11-05 09:40:24 +01:00
luc
10696b53f7 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-05 08:26:52 +01:00
reger
8532565c7d optimize order of parsers to try
- start with a parser matching the remote supplied mime
2015-11-04 21:52:02 +01:00
reger
681889ae64 use current tar library for untar files
- remove old source copy
2015-11-04 02:57:00 +01:00