Commit Graph

301 Commits

Author SHA1 Message Date
Michael Peter Christen
872f83ebe0 refactoring 2012-09-25 21:04:58 +02:00
Michael Peter Christen
fb9460f0a8 using the search filter to drill down search to file types.
A search like "mp3 filetype:mp3" will now maybe surprise you.
2012-09-25 17:52:33 +02:00
Michael Peter Christen
15ea053c3a - added xml output in IndexControlURLs to get the storage page of index
dump commands
- adjusted the apicall.sh script to get the downloaded text as output to
stdout which is necessary to parse the content out of it
- added indexdump.sh script which creates a solr dump and prints out the
storage path for the index dump
- added synchronization to the Fulltext class to prevent that data is
stored to a non-existing solr index while this index is disabled during
the storage of the dump
2012-09-25 00:19:52 +02:00
Michael Peter Christen
1b474139dd used the new zip writer/reader to add a solr dump process: the whole
solr index can be written to a zip dump and also restored during runtime
2012-09-24 17:05:28 +02:00
Michael Peter Christen
e57bf2ca39 simplified DHT classes 2012-09-24 01:04:39 +02:00
Michael Peter Christen
8219a445f3 refactoring 2012-09-21 16:46:57 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
orbiter
563d584420 removed more dependencies in cora from kelondro 2012-09-21 11:02:36 +02:00
orbiter
63762d8f89 removed kelondro dependencies from cora 2012-09-20 19:38:22 +02:00
orbiter
60b1e23f05 added new crawl options:
- indexUrlMustMatch and indexUrlMustNotMatch which can be used to select
loaded pages for indexing. Default patterns are in such a way that all
loaded pages are also indexed (as before) but when doing an expert crawl
start, then the user may select only specific urls to be indexed.
- crawlerNoDepthLimitMatch is a new pattern that can be used to remove
the crawl depth limitation. This filter a never-match by default (which
causes that the depth is used) but the user can select paths which will
be loaded completely even if a crawl depth is reached.
2012-09-16 21:27:55 +02:00
Michael Peter Christen
6ec02deec6 added new crawl attributes in crawl profile (not active yet) 2012-09-14 16:49:29 +02:00
Michael Peter Christen
0504b01bdc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-14 00:48:17 +02:00
orbiter
9413f77b65 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-13 23:54:26 +02:00
orbiter
a55e77a115 added twitter search heuristic 2012-09-13 23:53:53 +02:00
Michael Peter Christen
e54ac38095 - some corrections in usage of getFile() and getFileName()
- added more attributes in json response writer according to yacy
servlet
2012-09-11 23:28:21 +02:00
Michael Peter Christen
62add1d564 added the protocol and the file name extension to the solr fields since
these fields are probably facets in file search
2012-09-11 22:46:39 +02:00
Michael Peter Christen
9db032664e activate two solr fields which will be used by administration interface
(later)
2012-09-11 20:15:54 +02:00
Michael Peter Christen
4634f0e626 fix for images_withalt 2012-09-10 12:30:03 +02:00
Michael Peter Christen
4d29f59a27 removed warnings 2012-09-10 07:15:52 +02:00
Michael Peter Christen
10b911eed4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-07 22:07:02 +02:00
Michael Peter Christen
be67c70a47 added Solr fields:
inboundlinks_text_chars_val
inboundlinks_text_words_val
inboundlinks_alttag_txt
outboundlinks_text_chars_val
outboundlinks_text_words_val
outboundlinks_alttag_txt
2012-09-07 22:06:51 +02:00
orbiter
d73fff0e0e added solr field images_withalt_i 2012-09-07 21:33:45 +02:00
sixcooler
e78fe3f477 also do a clearcache on the solr-connector-caches 2012-09-06 22:07:07 +02:00
Michael Peter Christen
d8425e6809 added collections to crawl monitor 2012-09-04 14:47:53 +02:00
Michael Peter Christen
ee23fc7a32 added h1..h6 counter fields 2012-09-04 14:11:11 +02:00
Michael Peter Christen
b2b516cc3e added a collection attribute to crawls and searches:
- a solr field collection_sxt can be used to store a set of crawl tags
- when this field is activated, a crawl tag can be assigned when crawls
are started
- the content of the collection field can be comma-separated, all of
them are assigned to the documents when they are indexed as result of
such a crawl start
- a search result can be drilled down to a specific collection; this is
currently only available in the solr interface and also in the gsa
interface using the 'site' option
- this adds a mandatory field for gsa queries (the google api demands
that field all the time)
2012-09-03 15:26:08 +02:00
Michael Peter Christen
f75b3f8a47 added more patches to work without RWI data structure 2012-08-31 14:35:56 +02:00
Michael Peter Christen
31d4d38804 - extended the solr interface by a references-by-word-count method
- reduced danger that a non-existing RWI database causes NPEs
- added Solr queries to did-you-mean: this makes it possible that our
did-you-mean algorithm works together with only Solr and without RWIs
2012-08-31 13:03:00 +02:00
Michael Peter Christen
528d6763fa - added new solr fields:
title_count_i, title_chars_val, title_words_val
description_count_i, description_chars_val, description_words_val
- added many asserts to ensure data type correctness from YaCy to Solr
and vice versa
- made many fixes according to new findings from these asserts (!)
2012-08-31 10:30:43 +02:00
Michael Peter Christen
2ddc33646a added new field for solr:
url_paths_sxt
url_parameter_i
url_parameter_key_sxt
url_parameter_value_sxt
url_chars_i
2012-08-29 16:11:23 +02:00
Michael Peter Christen
75d5e3475d Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-08-29 10:13:51 +02:00
cominch
dc468dad01 add content control features for custom filter lists 2012-08-29 09:04:28 +02:00
Michael Peter Christen
316b5fe116 - added a solr type definition verifier
- fixed type definition found by the verifier
- added multivalue-string fields for solr with extension 'sxt'
- added multivalue-integer fields for solr with extension 'val'
- renamed some solr attributes from txt to sxt
- changed solr query line to an explicit AND/OR structure
- added a country code second level domain list to Domains class; with
parser
- added a host string parser to get domain class name, country-code
second-level domain and subdomain out of it
- removed old coordinate attributes
2012-08-28 16:58:06 +02:00
orbiter
a3d5959981 Merge commit '65d49df865f60511d22d86fb15c33a082176e7ab' 2012-08-27 16:56:22 +02:00
Michael Peter Christen
4521d63c92 added boosts to solr search queries 2012-08-27 15:25:25 +02:00
Michael Peter Christen
e8acd542b5 - added faceted drill-down for host and geolocation to solr queries
- added a new geolocation field to index schema, the old values are
migrated if possible
2012-08-27 14:41:33 +02:00
reger
65d49df865 security fix: clear automtic password only if adminAccountForLocalhost=false to prevent remote access to protected pages after restart.
if adminAccountForLocalhost=true leave automatic password unchanged so access from local host is granted but remote access is preventet from the 1st second.
2012-08-26 22:28:14 +02:00
orbiter
29171e2f6c fixed generation of ontologies from index enumerations 2012-08-24 14:13:42 +02:00
orbiter
01a63ef595 redesign of YaCySchema and SolrDoc handling 2012-08-23 09:51:45 +02:00
orbiter
479bfca571 refctoring 2012-08-23 09:30:11 +02:00
Michael Peter Christen
48a82bc705 log queries anonymous from gsa+solr requests 2012-08-22 23:50:40 +02:00
Michael Peter Christen
ab6ec4ec52 added snippet computation to solr/rss and gsa result writer 2012-08-22 17:37:34 +02:00
Michael Peter Christen
4716546ef5 - reduced memory usage in index transmission using a transformation of
Node to Row objects
- removed peerDeparture in solr remote search in case that peer does not
answer (this may be normal because it is allowed to switch this off)
2012-08-22 16:30:33 +02:00
Michael Peter Christen
653645c1cf corrected solr query syntax 2012-08-22 00:48:03 +02:00
orbiter
716ea0cfe2 sorted the solr schema into mandatory and optional fields; reduced
number of used field to reduce solr index size
2012-08-21 23:52:56 +02:00
orbiter
9b8c8c0f47 fix from gaston in
http://forum.yacy-websuche.de/viewtopic.php?p=26909#p26909
2012-08-21 21:03:26 +02:00
Michael Peter Christen
a049761e0c fixed double-check 2012-08-20 14:16:37 +02:00
Michael Peter Christen
f42a57cd7d gsa format update 2012-08-20 12:50:51 +02:00
Michael Peter Christen
ff3eaa21b0 added remote search to solr on YaCy peers!
- when doing a remote search, node peers are selected for solr queries
- the solr query is done concurrently to the standard YaCy rwi search
- the solr search result is feeded into the same data structure that
prepares the rwi search result
- the same remote seach that is done to several outside peers is done to
the local solr index
- the search process works now also without any 'old' RWI data using
solr
2012-08-20 12:16:11 +02:00
Michael Peter Christen
a06123aec6 more abstraction and less parameter overhead for remote search 2012-08-20 01:29:15 +02:00