Commit Graph

11792 Commits

Author SHA1 Message Date
Michael Peter Christen
34de1e8cbc gzip compression will perform more efficient and with better compression
level
2015-06-01 01:24:33 +02:00
Michael Peter Christen
98be59ce9c full solr xml exports will now be automatically compressed during
export. That makes it possible to export a solr xml dump even if disc
space is low.
2015-05-30 19:02:54 +02:00
Michael Peter Christen
a1a8edfc0a wrap HeaReader close() in a catch Throwable block to prevent that an
excpetion during close blocks the whole shotdown process
2015-05-30 17:54:02 +02:00
Michael Peter Christen
b43811d38c added surrogate import process for exported solr dumps.
Just throw your solr dump file into DATA/SURROGATES/in/ and it will be
imported!
2015-05-30 13:19:59 +02:00
Michael Peter Christen
b77537294d prevent disc usage when showing tray animation 2015-05-30 06:57:15 +02:00
Michael Peter Christen
eec78e1b0c added intensity option to graphics 2015-05-30 06:31:08 +02:00
Michael Peter Christen
a5007f345e re-licensing some of my old visualization classes under LGPL 2.1 2015-05-30 06:12:08 +02:00
Michael Peter Christen
c99a665593 adding a 3-pixel font generator made some time ago.. 2015-05-30 06:01:52 +02:00
Michael Peter Christen
c7576d6028 added a full solr export to the IndexControlURLs_p.html servlet. The
export function is also now the default export option. The export file
format for a full solr export is very similar to a solr search result
xml, only the <lst name="responseHeader"> tag is missing.

The exported xml has a special line termination feature: all documents
will be exported into a single line without any CR in between. That
means that every document is completely inside a single line. While this
is not readable at all for humans, it is very useful for linux line
processing scripts, like grep. Using grep it will be easy to select
single documents which match for a given pattern.

Such dumps shall be importable with the DATA/SURROGATE/in import
function, but that import is not yet adopted to the new file format.
2015-05-29 15:05:52 +02:00
Michael Peter Christen
47682bf467 fix for unresolved pattern 2015-05-28 17:43:52 +02:00
Michael Peter Christen
197f7449e5 All entities of crawl profiles are now editable in the crawl profile
editor.
2015-05-28 16:07:40 +02:00
reger
1d8e1e4bac - Image search expand box, adjust javascript hs padtominsize parameter, to make sure expand box doesn't shrink on small images
- asure ImageResult.imagetext has value for the link text (use filename if no alt text given)
2015-05-27 02:31:13 +02:00
reger
8b35656007 remove hard throw exception in makeResultEntry
remove not used "share." peername.yacy url rewrite
2015-05-26 23:57:06 +02:00
reger
af57fbefad use available mime (instead null) on imageresult from metadatanode 2015-05-26 23:54:04 +02:00
reger
dd7782bac0 revert deletion of BinSearch
(accident)
2015-05-26 04:26:26 +02:00
reger
000dde9511 Eleminate duplication of values for search ResultEntry
by instatiation from URIMetadataNode, by eleminating differentiation of ResultEntry/URIMetadataNode.
- moved remaining ResultEntry functionallity to URIMetadataNode
   - for 1:1 functionallity added a function makeResultEntry() 
- removed ResultEntry 
- refactored related code

Main difference is after makeResultEntry the text_t content is removed and alternative title/url strings for display are calculated.


Main difference left is, that
2015-05-26 04:15:00 +02:00
reger
29c4aa3991 fix compiler notification of missing serialID
from last commit
2015-05-25 21:51:32 +02:00
reger
3d53da8236 refactor ResultEntry to be based on MetadataNode/SolrDocument
to share/reuse common access routines
2015-05-25 21:28:48 +02:00
reger
d882991bc5 Implement sharing of ioDispatcher for term & citation index
as proposed in ioDispatcher description
2015-05-25 19:46:26 +02:00
reger
17e820cfd7 use doctype() in ViewFile to choose display routines
in preference of getfileExtension()
2015-05-25 00:08:38 +02:00
reger
370ba9da71 On imageSearch prefere mime to sort out none-image documents
Generalize the hack to prevent urls with just a img extension beeing returned

improving http://mantis.tokeek.de/view.php?id=528
2015-05-24 21:48:58 +02:00
reger
cd31633369 improve MultiprotocolURL.getFileExtension()
prevent string OOB while querypart contains a dot (return just "")
see log snippet in http://mantis.tokeek.de/view.php?id=533
2015-05-24 19:38:04 +02:00
reger
c60ccdfbcf Increase IODspatcher dumpQueue size to 2 to reduce risk of concurrent emergency dump,
skip concurrent emergency merge
dealing with/see  http://mantis.tokeek.de/view.php?id=566
2015-05-24 18:03:27 +02:00
reger
8a9622c31c fix string OoB on getImagelinks with long alttext
in description calculation
2015-05-24 01:59:40 +02:00
reger
aa83931765 Convert content charset for display via CacheResource_p
Cached resource charset encoding might not fit to internal handling (using utf-8),
convert resource to utf-8
see http://mantis.tokeek.de/view.php?id=576
2015-05-23 20:31:37 +02:00
reger
3e742d1e34 Init remote crawler on demand
If remote crawl option is not activated, skip init of remoteCrawlJob to save the resources of queue and ideling thread.
Deploy of the remoteCrawlJob deferred on activation of the option.
2015-05-23 02:06:39 +02:00
Michael Peter Christen
dbf9e3503d Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-05-22 11:39:00 +02:00
Michael Peter Christen
8b1a30be50 removed a -UNRESOLVED_PATTERN- 2015-05-22 11:22:36 +02:00
Michael Peter Christen
9938c81378 fix for division by zero 2015-05-22 11:15:53 +02:00
reger
13f013f64a Limit extra sleep of BusyThread on LowMemCycle 2015-05-17 06:21:12 +02:00
reger
cd7c0e0aae detail optimization of RecrawlThread 2015-05-17 00:13:00 +02:00
reger
ace71a8877 Initial (experimental) implementation of index update/re-crawl job
added to IndexReIndexMonitor_p.html
Selects existing documents from index and feeds it to the crawler.
currently only the field fresh_date_dt is used determine documents for recrawl (fresh_date_dt:[* TO NOW-1DAY]
Documents are  added in small chunks (200) to the crawler, only if no other crawl is running.
2015-05-16 01:23:08 +02:00
reger
141cd80456 correct log msg text 2015-05-16 00:01:54 +02:00
reger
f3ce99bfb8 fix extract of inboundlinks_protocol_sxt
url counter maybe > 999
2015-05-14 00:03:09 +02:00
reger
2bc9cb5828 fix early return in addToCrawler
check / handle all supplied urls after error url
2015-05-13 21:58:43 +02:00
Michael Peter Christen
f5f88272e4 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-05-12 12:06:42 +02:00
Michael Peter Christen
5c67c4d460 fix for latest commit, see
f810915717 (commitcomment-11145880)
2015-05-12 12:06:21 +02:00
reger
c37dda8849 fix NPE on MultiProtocolURL on url with parameter value and '='
in getAttribute
- added test case for it
2015-05-12 01:09:10 +02:00
Michael Peter Christen
f810915717 added crawl start from a clone with very, very large url: they are now
encoded as post submit form inside a javascript creation function.
2015-05-11 16:30:41 +02:00
Michael Peter Christen
51de86c992 disabled debug thread dumps 2015-05-11 14:46:09 +02:00
Michael Peter Christen
d524a9d77c Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-05-11 14:42:40 +02:00
Michael Peter Christen
0710648c31 enable api calls with very long urls 2015-05-11 14:42:21 +02:00
reger
31346e873b upd library reference of missing jsch-0.1.21 in seeduploadscp.xml
upd to jsch-0.1.52.jar
2015-05-11 01:35:12 +02:00
reger
609c52e987 refactor getBookmark
to consistenly check existance by != null (w/o throwing exception on not found)
2015-05-11 00:37:04 +02:00
reger
1481a8ab56 add opensearch rss results to dht collection (due to text = snippet)
which is used to differentiate meta from full data
- make sure check for dht is not dependant on number of collection entries
2015-05-10 18:52:33 +02:00
reger
5f4d35437e add bookmark.query to edit form 2015-05-10 15:30:21 +02:00
reger
f134aa7f7f persist bookmark timestamp
on setTimeStamp()
2015-05-10 15:29:23 +02:00
reger
752eec6697 fix NPE in addToIndex when used outside searchEvent 2015-05-10 05:18:23 +02:00
reger
a6daddbeaa upd to commons-io-2.4.jar 2015-05-10 03:00:05 +02:00
reger
89124335c4 update bookmark autosearch description
- add german translation
2015-05-10 02:29:08 +02:00