Commit Graph

11925 Commits

Author SHA1 Message Date
reger
3d53da8236 refactor ResultEntry to be based on MetadataNode/SolrDocument
to share/reuse common access routines
2015-05-25 21:28:48 +02:00
reger
d882991bc5 Implement sharing of ioDispatcher for term & citation index
as proposed in ioDispatcher description
2015-05-25 19:46:26 +02:00
reger
17e820cfd7 use doctype() in ViewFile to choose display routines
in preference of getfileExtension()
2015-05-25 00:08:38 +02:00
reger
370ba9da71 On imageSearch prefere mime to sort out none-image documents
Generalize the hack to prevent urls with just a img extension beeing returned

improving http://mantis.tokeek.de/view.php?id=528
2015-05-24 21:48:58 +02:00
reger
cd31633369 improve MultiprotocolURL.getFileExtension()
prevent string OOB while querypart contains a dot (return just "")
see log snippet in http://mantis.tokeek.de/view.php?id=533
2015-05-24 19:38:04 +02:00
reger
c60ccdfbcf Increase IODspatcher dumpQueue size to 2 to reduce risk of concurrent emergency dump,
skip concurrent emergency merge
dealing with/see  http://mantis.tokeek.de/view.php?id=566
2015-05-24 18:03:27 +02:00
reger
8a9622c31c fix string OoB on getImagelinks with long alttext
in description calculation
2015-05-24 01:59:40 +02:00
reger
aa83931765 Convert content charset for display via CacheResource_p
Cached resource charset encoding might not fit to internal handling (using utf-8),
convert resource to utf-8
see http://mantis.tokeek.de/view.php?id=576
2015-05-23 20:31:37 +02:00
reger
3e742d1e34 Init remote crawler on demand
If remote crawl option is not activated, skip init of remoteCrawlJob to save the resources of queue and ideling thread.
Deploy of the remoteCrawlJob deferred on activation of the option.
2015-05-23 02:06:39 +02:00
Michael Peter Christen
dbf9e3503d Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-05-22 11:39:00 +02:00
Michael Peter Christen
8b1a30be50 removed a -UNRESOLVED_PATTERN- 2015-05-22 11:22:36 +02:00
Michael Peter Christen
9938c81378 fix for division by zero 2015-05-22 11:15:53 +02:00
reger
13f013f64a Limit extra sleep of BusyThread on LowMemCycle 2015-05-17 06:21:12 +02:00
reger
cd7c0e0aae detail optimization of RecrawlThread 2015-05-17 00:13:00 +02:00
reger
ace71a8877 Initial (experimental) implementation of index update/re-crawl job
added to IndexReIndexMonitor_p.html
Selects existing documents from index and feeds it to the crawler.
currently only the field fresh_date_dt is used determine documents for recrawl (fresh_date_dt:[* TO NOW-1DAY]
Documents are  added in small chunks (200) to the crawler, only if no other crawl is running.
2015-05-16 01:23:08 +02:00
reger
141cd80456 correct log msg text 2015-05-16 00:01:54 +02:00
reger
f3ce99bfb8 fix extract of inboundlinks_protocol_sxt
url counter maybe > 999
2015-05-14 00:03:09 +02:00
reger
2bc9cb5828 fix early return in addToCrawler
check / handle all supplied urls after error url
2015-05-13 21:58:43 +02:00
Michael Peter Christen
f5f88272e4 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-05-12 12:06:42 +02:00
Michael Peter Christen
5c67c4d460 fix for latest commit, see
f810915717 (commitcomment-11145880)
2015-05-12 12:06:21 +02:00
reger
c37dda8849 fix NPE on MultiProtocolURL on url with parameter value and '='
in getAttribute
- added test case for it
2015-05-12 01:09:10 +02:00
Michael Peter Christen
f810915717 added crawl start from a clone with very, very large url: they are now
encoded as post submit form inside a javascript creation function.
2015-05-11 16:30:41 +02:00
Michael Peter Christen
51de86c992 disabled debug thread dumps 2015-05-11 14:46:09 +02:00
Michael Peter Christen
d524a9d77c Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-05-11 14:42:40 +02:00
Michael Peter Christen
0710648c31 enable api calls with very long urls 2015-05-11 14:42:21 +02:00
reger
31346e873b upd library reference of missing jsch-0.1.21 in seeduploadscp.xml
upd to jsch-0.1.52.jar
2015-05-11 01:35:12 +02:00
reger
609c52e987 refactor getBookmark
to consistenly check existance by != null (w/o throwing exception on not found)
2015-05-11 00:37:04 +02:00
reger
1481a8ab56 add opensearch rss results to dht collection (due to text = snippet)
which is used to differentiate meta from full data
- make sure check for dht is not dependant on number of collection entries
2015-05-10 18:52:33 +02:00
reger
5f4d35437e add bookmark.query to edit form 2015-05-10 15:30:21 +02:00
reger
f134aa7f7f persist bookmark timestamp
on setTimeStamp()
2015-05-10 15:29:23 +02:00
reger
752eec6697 fix NPE in addToIndex when used outside searchEvent 2015-05-10 05:18:23 +02:00
reger
a6daddbeaa upd to commons-io-2.4.jar 2015-05-10 03:00:05 +02:00
reger
89124335c4 update bookmark autosearch description
- add german translation
2015-05-10 02:29:08 +02:00
Michael Peter Christen
fbf85a1561 added temporary debug output in http client 2015-05-08 15:31:01 +02:00
Michael Peter Christen
ff29b0e503 added option to re-index exported xml snapshot dumps to
HTCACHE/snapshots by just placing them in the SURROGATES/in path
2015-05-08 15:30:26 +02:00
Michael Peter Christen
6f4fe4b175 revert of 8a7c68e4c7
keeping surrogates after processing is essential for some users. If the
space they are taking is too high, please set up an automatic deletion
process (like a cronjob).
2015-05-08 14:01:30 +02:00
Michael Peter Christen
213401a446 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-05-08 13:48:29 +02:00
Michael Peter Christen
97930a6aad added must-not-match filter to snapshot generation.
also: fixed some bugs
2015-05-08 13:46:27 +02:00
Michael Peter Christen
9d8f426890 adding a try-catch to link graph processing to prevent that a single
malformed url interrupts the storage process
2015-05-08 10:38:33 +02:00
reger
b47267b79c precaution against NPE on createorgetBookmark on search result 2015-05-07 03:25:19 +02:00
Michael Peter Christen
75879e051b Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-05-03 03:03:45 +02:00
reger
8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
(instead of using the description field)
- adjust display and autosearch
- don't overwrite existing bookmark but combine info
2015-05-03 02:31:50 +02:00
reger
7224209486 break out of NormalizeDistributor loop on timeout 2015-05-02 02:36:18 +02:00
reger
cf1fc7f700 harmonize filesearch input box layout 2015-05-01 19:24:14 +02:00
reger
4d73e9de06 upd to metadata-extractor-2.8.1 2015-04-30 00:01:11 +02:00
Michael Peter Christen
e334a06370 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-04-29 10:51:09 +02:00
reger
0904a041a6 upd to poi-3.11.jar 2015-04-29 01:53:04 +02:00
reger
47e61f8325 fix typo in image filter query
(extra bracket)
2015-04-28 03:12:14 +02:00
reger
4b4ab6799f fix String out of range in Collection Nav
see http://mantis.tokeek.de/view.php?id=573
2015-04-27 22:38:40 +02:00
reger
572cfe8fd4 improve character encoding for urlproxy servlet
for none utf-8 pages
2015-04-26 17:42:39 +02:00