Commit Graph

11829 Commits

Author SHA1 Message Date
Michael Peter Christen
0e87a99ab8 more fixes for special windows paths 2015-07-10 17:34:29 +02:00
Michael Peter Christen
e5b6424eed patch for bad windows file paths 2015-07-10 17:14:14 +02:00
Michael Peter Christen
0aa6fcf259 remove old vocabularies and synonyms before adding new 2015-07-10 16:47:19 +02:00
Michael Peter Christen
289018b559 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-07-08 17:37:03 +02:00
Michael Peter Christen
7b412e8c07 added msg (text emails) format; should be handled by html parser. 2015-07-08 17:36:37 +02:00
reger
f91298d3b6 fix one implicit Integer/Long type conversion
-> causes Java 1.8 compile error
2015-07-08 03:02:10 +02:00
reger
821262a179 add CommonPattern for multiple spaces
to eliminate empty split words on following spaces
2015-07-04 22:49:01 +02:00
Michael Peter Christen
90f75c8c3d added enrichment of synonyms and vocabularies for imported documents
during surrogate reading: those attributes from the dump are removed
during the import process and replaced by new detected attributes
according to the setting of the YaCy peer.
This may cause that all such attributes are removed if the importing
peer has no synonyms and/or no vocabularies defined.
2015-07-02 00:23:50 +02:00
Michael Peter Christen
7829480b82 refactoring: separated condenser and tokenizer 2015-07-01 18:28:18 +02:00
reger
00d2062813 Rem depreciated AdminHandlers in solrconfig.xml
avoid warning log
W  org.apache.solr.handler.admin.AdminHandlers <requestHandler name="/admin/"  class="solr.admin.AdminHandlers" /> is deprecated . It is not required anymore
2015-07-01 00:58:23 +02:00
Michael Peter Christen
f901e7d3cf fix for non-authorized view of IndexBrowser: show only the number of
non-failure documents
2015-06-30 11:12:36 +02:00
Michael Peter Christen
593de05922 enhanced surrogate import process speed (dramatically!) 2015-06-29 12:28:34 +02:00
Michael Peter Christen
3c4c69adea fix for
- bad regex computation for crawl start from file (limitation on domain
did not work)
- servlet error when starting crawl from a large list of urls
2015-06-29 02:02:01 +02:00
Michael Peter Christen
1fec7fb3c1 suppress access to solr when doing search suggestions in case that the
index has more than two million documents. This protects the index from
beeing flooded with search requests that cannot be resolved before the
real search query has to be computet.
2015-06-24 13:02:12 +02:00
Michael Peter Christen
886fca2260 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-06-24 01:59:46 +02:00
Michael Peter Christen
694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
This is a very complex migration: many classes had been renamed or
removed, dependencies changed and the solr index type is now aligned to
be a solr cloud repository.
Together with the Solr 5.2 library update, one other dependent library
had been updated as well: httpclient 4.4->4.4.1

Older indexes are migrated from 4_10 to 5_2. However, the new index
structure is more efficient and we recommend to re-index everything.
Please use the index export before you do the update to a large
surrogate xml file. After the update, start with an empty index and then
initialize this with your dump.
2015-06-24 01:55:51 +02:00
Michael Peter Christen
6c2e6f1f37 remove redundant code 2015-06-23 23:41:43 +02:00
sixcooler
e427efbe54 Next Try for a fix for upload-connection staying in blocked state.
This was caused by reading via GZIP from close-wait connection an caused
high cpu- and system-loads.
Instat of implementing handling of the RedListener now I found a
timelimeted 'get' "realy" solving this problem.
2015-06-14 22:56:26 +02:00
reger
0fab445b19 Resourceobserver log warning - deleting releases files - only on actual deletes
instead of entering routine
2015-06-10 02:35:37 +02:00
sixcooler
ef6a64b2a4 Fix for upload-connection staying in blocked state.
This was caused by reading via GZIP from close-wait connection an caused
high cpu- and system-loads.
Solved by implementing handling of the RedListener.
2015-06-09 21:26:10 +02:00
reger
c973f94936 add log entry on release file delete by ResourceObserver 2015-06-08 03:17:12 +02:00
reger
121972752c implement deleteOldDownloads in RexourceObserver on low diskspace
- direct assign sb.observer (skip redundant InitThread)
2015-06-08 02:52:13 +02:00
Michael Peter Christen
0d5ac6e527 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-06-07 22:25:26 +02:00
Michael Peter Christen
9c12555be5 added link to Snapshots in search results if the snapshot exists and
option is set in ConfigSearchPage_p
(this is a stub: we also need a visualization of pdf files!)
2015-06-07 20:37:37 +02:00
sixcooler
480e4a6a5c Update to Jetty-9.2.11 - a bugfix-release that did not solve my
Problems, but does not harm anything
2015-06-07 20:09:27 +02:00
reger
72f6a0b0b2 enhance recrawl job
- allow to modify the query to select documents to  process (after job has started)
- allow to include failed urls (httpstatus <> 200)
2015-06-06 18:45:39 +02:00
Michael Peter Christen
e0a23c56c7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-06-05 08:32:55 +02:00
Michael Peter Christen
fb9e1dd3f5 servlet for latest commit 2015-06-05 07:22:35 +02:00
reger
5183ad718d upd to poi-3.12.jar 2015-06-05 03:36:57 +02:00
reger
7478338a40 remove augmented parsing activation from frontend
experimental implementation not used and based on error prone experimental rdfaparser
2015-06-05 00:51:00 +02:00
reger
11aa2edfe1 remove RDFa parser activation from frontend
reason: experimental implementatin of RDFa parser not executed (limited to special urls) but may cause error on normal html parsing due to a inputstream.reset
2015-06-05 00:15:16 +02:00
Michael Peter Christen
ff11ac89f7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2015-06-04 23:04:04 +02:00
Michael Peter Christen
5e2d23b7a0 removed the new index export method from the IndexControlURLs_p.html
servlet and moved it to a new /IndexExport_p.html servlet. This servlet
is now more prominent linked in the main menu under Production -> Index
Export/Import
2015-06-04 23:03:46 +02:00
reger
64a7b0b140 Merge origin/master 2015-06-04 22:44:46 +02:00
reger
49b79987c9 remove obsolete searchfl work table
was used to register urls with not complete words in snippet but is never accessed
2015-06-04 22:44:01 +02:00
sixcooler
4533f392b0 correct the dark themes to show also a dark navbar on searchresults 2015-06-04 22:15:38 +02:00
Michael Peter Christen
d0aff91f23 fix for index import 2015-06-01 01:56:09 +02:00
Michael Peter Christen
34de1e8cbc gzip compression will perform more efficient and with better compression
level
2015-06-01 01:24:33 +02:00
Michael Peter Christen
98be59ce9c full solr xml exports will now be automatically compressed during
export. That makes it possible to export a solr xml dump even if disc
space is low.
2015-05-30 19:02:54 +02:00
Michael Peter Christen
a1a8edfc0a wrap HeaReader close() in a catch Throwable block to prevent that an
excpetion during close blocks the whole shotdown process
2015-05-30 17:54:02 +02:00
Michael Peter Christen
b43811d38c added surrogate import process for exported solr dumps.
Just throw your solr dump file into DATA/SURROGATES/in/ and it will be
imported!
2015-05-30 13:19:59 +02:00
Michael Peter Christen
b77537294d prevent disc usage when showing tray animation 2015-05-30 06:57:15 +02:00
Michael Peter Christen
eec78e1b0c added intensity option to graphics 2015-05-30 06:31:08 +02:00
Michael Peter Christen
a5007f345e re-licensing some of my old visualization classes under LGPL 2.1 2015-05-30 06:12:08 +02:00
Michael Peter Christen
c99a665593 adding a 3-pixel font generator made some time ago.. 2015-05-30 06:01:52 +02:00
Michael Peter Christen
c7576d6028 added a full solr export to the IndexControlURLs_p.html servlet. The
export function is also now the default export option. The export file
format for a full solr export is very similar to a solr search result
xml, only the <lst name="responseHeader"> tag is missing.

The exported xml has a special line termination feature: all documents
will be exported into a single line without any CR in between. That
means that every document is completely inside a single line. While this
is not readable at all for humans, it is very useful for linux line
processing scripts, like grep. Using grep it will be easy to select
single documents which match for a given pattern.

Such dumps shall be importable with the DATA/SURROGATE/in import
function, but that import is not yet adopted to the new file format.
2015-05-29 15:05:52 +02:00
Michael Peter Christen
47682bf467 fix for unresolved pattern 2015-05-28 17:43:52 +02:00
Michael Peter Christen
197f7449e5 All entities of crawl profiles are now editable in the crawl profile
editor.
2015-05-28 16:07:40 +02:00
reger
1d8e1e4bac - Image search expand box, adjust javascript hs padtominsize parameter, to make sure expand box doesn't shrink on small images
- asure ImageResult.imagetext has value for the link text (use filename if no alt text given)
2015-05-27 02:31:13 +02:00
reger
8b35656007 remove hard throw exception in makeResultEntry
remove not used "share." peername.yacy url rewrite
2015-05-26 23:57:06 +02:00