Commit Graph

12073 Commits

Author SHA1 Message Date
luc
6291a57300 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-18 08:49:31 +01:00
reger
0d3c5b223e have psParser cleanup temp file 2015-11-17 23:45:29 +01:00
reger
7d0d19cb8e avoid File.deleteOnExit() on temp files
JVM registers each file in a list regardless of already deleted and never
cleans up the list during runtime.
This accumulates to a considerable amount of mem during large crawls and/or
long uptime.
To tackle this, all temp files are now created in a subdir of java.io.tmpdir 
and the jvm tmpdir property is set to this subdir, which is deleted by
code on shutdown.
Additionally let pdfParser use this tmp subdir too.
2015-11-17 22:27:07 +01:00
luc
bfe51001e3 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-17 08:30:32 +01:00
reger
02e4489a23 set tmpfile.deleteOnExit by default,
to make sure files are removed on shutdown.
2015-11-16 21:37:45 +01:00
reger
2985baaa01 Exclude repetitive protocol part in tokenized url
used as description if none is avail. from parser.
2015-11-16 01:06:20 +01:00
reger
ca3d26a401 harmonize wordsintitle & CollectionSchema.title_words_val calculation,
remove obsolete partial init of wordreference from urimetadata
2015-11-15 06:06:37 +01:00
reger
7bf03856d1 add link to quick select blacklist
from title list
2015-11-15 00:39:38 +01:00
reger
440ce6d198 add German translation to re-crawl job 2015-11-15 00:34:22 +01:00
reger
5362a80f1c upd to httpcore 4.4.4 2015-11-14 21:16:31 +01:00
reger
e90593450c upd to TwelveMonkeys ImageIO 3.2 2015-11-14 01:46:25 +01:00
reger
b4dbff6a6a fix yacysearch.json "totalResults"
element "totalResults" is included twice (at begin & end), 
only the element after performing the search holds number > 0
see http://mantis.tokeek.de/view.php?id=608
2015-11-13 20:10:47 +01:00
reger
52a9040ae6 Sort out double keywords (dc_subject) early in parsed documents
- by direct using Set vs. List
- remove not neede String[] getter
2015-11-13 01:48:28 +01:00
luc
49331dc523 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-12 08:21:56 +01:00
luc
0de6988604 Added links to more image test suites. 2015-11-12 08:21:37 +01:00
reger
47d70732f6 improve locale translator
- skip empty line
- robustness file section detection (space independant)
2015-11-11 00:57:51 +01:00
sixcooler
646afe9183 do not store subfield *_coordinate + make all num-fields being docvalues 2015-11-10 20:45:33 +01:00
sixcooler
194df613de not using 'location' as defaultfacetfield - since we removed it being
default.
2015-11-10 20:43:58 +01:00
sixcooler
d3b9349b6f simplification / speedup of GenerationMemoryStrategy 2015-11-10 20:39:46 +01:00
sixcooler
f5a9948860 do not store subfield *_coordinate 2015-11-10 20:32:42 +01:00
sixcooler
fca353e5eb set startuptype of most solr handlers to lazy 2015-11-10 20:32:05 +01:00
sixcooler
4a905ec134 fix to not let the AccessTracker-Log grow to much, but have enough data
to monitor.
(+gitignore-correction)
2015-11-10 20:27:17 +01:00
sixcooler
209f502f09 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-10 20:23:03 +01:00
reger
20e18d79f8 harmonize document title for archive parsers 2015-11-10 01:29:13 +01:00
sixcooler
d481653202 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-09 20:42:44 +01:00
luc
f11b5e8309 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-09 08:13:12 +01:00
reger
112ae013f4 update bzip and bzip parser process,
to return one document for the file with combined parser results of the
containing file and registers it with supplied url and mime of the archive.
2015-11-07 19:13:18 +01:00
reger
e76a90837b update zip and tar parser process,
to return one document for the file with combined parser results of the
containing files.
2015-11-06 23:58:55 +01:00
sixcooler
bc610e5382 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-06 23:28:39 +01:00
luc
0e8b3d9a90 Refactoring : default favicon and image processing errors.
- moved default favicon processing from ViewImage to
yacysearchitem.html : when previewing ico image search results we don't
want a default favicon be displayed
 - throw an IOException ending in a HTTP 500 error when image processing
fails, rather than returning a null result : behavior is more consistent
accross browsers (for exempla Chrome and Firefox), especially with new
default favicon display system
2015-11-05 09:45:19 +01:00
luc
4e673ffc9a Ensure closing of InputStream even when an exception occurs. 2015-11-05 09:40:24 +01:00
luc
10696b53f7 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-05 08:26:52 +01:00
reger
8532565c7d optimize order of parsers to try
- start with a parser matching the remote supplied mime
2015-11-04 21:52:02 +01:00
reger
681889ae64 use current tar library for untar files
- remove old source copy
2015-11-04 02:57:00 +01:00
reger
5d71fc70e3 fix tarParser early exit on looping content
- adjust check of data available according to doc 
- return null on no recognized content (to not exit TextParser next parser try)
- use commons.compress directly
2015-11-03 22:14:14 +01:00
luc
bcc2e7cb5b Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-03 09:29:57 +01:00
reger
2fcf6f104c fix bzipParser recognition
- Bzip2Inputstream checks magic byte itself to identify bz2 (leave it in input)
- try to suppy fitting mime for parsing bz2 content
2015-11-03 03:35:01 +01:00
luc
745e97a575 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-02 08:10:11 +01:00
reger
a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt() 2015-10-31 23:09:03 +01:00
reger
02afba730e fix detection of https port changed after set in System Admin 2015-10-31 22:53:59 +01:00
reger
11f3666660 increase use of pre.defined CATCHALL_QUERY string 2015-10-31 19:44:31 +01:00
reger
a58ee49307 Optimize internal imagequery focus on using content_type to select images
(in favor of url file extension)
2015-10-31 19:18:46 +01:00
sixcooler
b61f91f0d4 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-10-30 18:47:42 +01:00
luc
e90e1f165d Avoid returning an empty image when target encoding is not supported or
when an error occured while encoding.
2015-10-30 16:20:28 +01:00
luc
fc3294382e Updated javadocs for warning on target encoding format potential errors. 2015-10-30 16:19:05 +01:00
luc
aa70ff4ff6 Corrected images alpha channel rendering 2015-10-30 05:18:16 +01:00
luc
2895ab552a Made ViewImagePerfTest extend ViewImageTest to ease automated image
render tests
2015-10-30 04:19:56 +01:00
luc
4a03cf06e1 Corrected encoding extension arg parsing 2015-10-29 02:24:17 +01:00
reger
81f53fc83a upd readme.mediawiki min java version 1.7 2015-10-26 22:19:20 +01:00
reger
d223cf0ae4 adjust MediaWiki importer geo coordinate calculation
- allow lat/long 0.xxx
- south / west assignment
include test class
2015-10-26 21:19:35 +01:00