Michael Peter Christen
23f68f2a69
force usage of default faceting mechanisms for search
2012-09-26 18:48:59 +02:00
Michael Peter Christen
24d2ee3c52
- better date ranking
...
- more protection against NPE and time travel effects
2012-09-26 18:36:32 +02:00
Michael Peter Christen
ca313e404f
- if a "/date" modifier is used, the solr remote query applies an
...
ordering by date (ascending)
- added also some 'anti-timetravel' protection (check if date is in the
future within any metadata date field)
2012-09-26 16:56:33 +02:00
Michael Peter Christen
a4214694df
We assert that no other metadata storage than solr is used now.
...
Therefore a property like solrConnected() must be true all the time.
Removal of this method causes removal of all write operations to the old
metadata index.
2012-09-26 16:05:11 +02:00
Michael Peter Christen
0cec7e761a
enhanced snippet extractor to find snippets also inside of tokens of an
...
url
2012-09-26 15:33:37 +02:00
sixcooler
6c50d016ed
pdf- and zipParser should not use forced Memory-Limits
2012-09-26 14:03:51 +02:00
Michael Peter Christen
562183932b
- removed ip_s from default profile since that needs a DNS lookup to
...
create an document entry. This makes remote search much slower.
- removed synchronization of add method if ip_s is activated to prevent
that a user configuration causes bad behavior. The disadvantage of that
is, that a index dump can cause data loss if an indexing is running
during index dump
- catched more exceptions and more NPE
- better abstraction in MirrorSolrConnector
- slight performance enhancement when only the index count is requested
(rows=0 is sufficient to get a total count)
2012-09-26 13:38:04 +02:00
Michael Peter Christen
24f4ca4d85
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-26 12:01:34 +02:00
apfelmaennchen
116f429e35
fix for java.lang.RuntimeException: TableColumnIndex not available...
2012-09-26 09:56:16 +02:00
Michael Peter Christen
5ac61591f3
better abstraction for solr query params
2012-09-25 23:59:30 +02:00
Michael Peter Christen
c913b2ba77
- fix for NPEs during remote solr configuration
...
- fixed remote solr setting switch
- added more logging
2012-09-25 23:59:09 +02:00
Michael Peter Christen
1533bfd63b
refactoring
2012-09-25 21:20:03 +02:00
Michael Peter Christen
e49359cc95
removed tenant query attribute since it is not used any more and is
...
replaced by the site-operator in the GSA interface. This operator can
also be simulated in the Solr interface using the collections_sxt field.
2012-09-25 21:09:06 +02:00
Michael Peter Christen
872f83ebe0
refactoring
2012-09-25 21:04:58 +02:00
Michael Peter Christen
fb9460f0a8
using the search filter to drill down search to file types.
...
A search like "mp3 filetype:mp3" will now maybe surprise you.
2012-09-25 17:52:33 +02:00
Michael Peter Christen
15ea053c3a
- added xml output in IndexControlURLs to get the storage page of index
...
dump commands
- adjusted the apicall.sh script to get the downloaded text as output to
stdout which is necessary to parse the content out of it
- added indexdump.sh script which creates a solr dump and prints out the
storage path for the index dump
- added synchronization to the Fulltext class to prevent that data is
stored to a non-existing solr index while this index is disabled during
the storage of the dump
2012-09-25 00:19:52 +02:00
Michael Peter Christen
1b474139dd
used the new zip writer/reader to add a solr dump process: the whole
...
solr index can be written to a zip dump and also restored during runtime
2012-09-24 17:05:28 +02:00
Michael Peter Christen
4a3e684f8c
added a directory-to-zip writer and zip-to-directory reader
2012-09-24 17:04:37 +02:00
Michael Peter Christen
d9ebf4a40f
a bit more logging
2012-09-24 15:01:44 +02:00
Michael Peter Christen
5683162bd3
simplifications in DHT Distribution class and more documentation
2012-09-24 12:01:09 +02:00
Michael Peter Christen
e57bf2ca39
simplified DHT classes
2012-09-24 01:04:39 +02:00
orbiter
a053b356ee
added new classes to renovate the YaCy protocol based on simple data
...
structures in cora:
- added the Peer object, which is a fresh version of Seed
- added the Peers object, which is a fresh version of Network
- added the Network api access class to retrieve a list of peers based
on the Network.xml servlet in all YaCy peers.
2012-09-22 11:10:11 +02:00
Michael Peter Christen
8219a445f3
refactoring
2012-09-21 16:46:57 +02:00
Michael Peter Christen
f879a344e7
fix for no depth limit default value
2012-09-21 16:05:17 +02:00
Michael Peter Christen
00c1c777fa
refactoring
2012-09-21 15:48:16 +02:00
orbiter
563d584420
removed more dependencies in cora from kelondro
2012-09-21 11:02:36 +02:00
orbiter
aa65282259
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-21 10:27:30 +02:00
orbiter
63762d8f89
removed kelondro dependencies from cora
2012-09-20 19:38:22 +02:00
orbiter
6e0f4557f8
added ftp to getName
2012-09-20 18:29:04 +02:00
cominch
23204d2245
change parameter to support the smw extension for list import
2012-09-20 15:02:57 +02:00
Michael Peter Christen
c235d5c0f1
fixed size parsing in RSS message parser (for YaCy size parameter)
2012-09-19 06:36:07 +02:00
Michael Peter Christen
5bc8f34150
fix for success query counter
2012-09-18 11:06:36 +02:00
orbiter
60b1e23f05
added new crawl options:
...
- indexUrlMustMatch and indexUrlMustNotMatch which can be used to select
loaded pages for indexing. Default patterns are in such a way that all
loaded pages are also indexed (as before) but when doing an expert crawl
start, then the user may select only specific urls to be indexed.
- crawlerNoDepthLimitMatch is a new pattern that can be used to remove
the crawl depth limitation. This filter a never-match by default (which
causes that the depth is used) but the user can select paths which will
be loaded completely even if a crawl depth is reached.
2012-09-16 21:27:55 +02:00
orbiter
4987921d3d
fixed the size() method which counted also failed pages (which are also
...
inside the solr index)
2012-09-16 21:22:56 +02:00
Michael Peter Christen
6ec02deec6
added new crawl attributes in crawl profile (not active yet)
2012-09-14 16:49:29 +02:00
Michael Peter Christen
975bc95ddf
added default facet fields for json response format (stub)
2012-09-14 12:09:20 +02:00
Michael Peter Christen
0504b01bdc
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-14 00:48:17 +02:00
orbiter
9413f77b65
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-13 23:54:26 +02:00
orbiter
a55e77a115
added twitter search heuristic
2012-09-13 23:53:53 +02:00
Michael Peter Christen
e54ac38095
- some corrections in usage of getFile() and getFileName()
...
- added more attributes in json response writer according to yacy
servlet
2012-09-11 23:28:21 +02:00
Michael Peter Christen
62add1d564
added the protocol and the file name extension to the solr fields since
...
these fields are probably facets in file search
2012-09-11 22:46:39 +02:00
Michael Peter Christen
e072632a54
no complaints about memory if the database is empty
2012-09-11 22:28:10 +02:00
Michael Peter Christen
b846f585fa
fixed a bug with size_i field usage
2012-09-11 20:24:27 +02:00
Michael Peter Christen
9db032664e
activate two solr fields which will be used by administration interface
...
(later)
2012-09-11 20:15:54 +02:00
orbiter
fcd5c7eec3
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-11 09:16:38 +02:00
orbiter
6171143b4a
added facet stub in JsonResponseWriter
2012-09-11 09:15:47 +02:00
Michael Peter Christen
e84ffdb4f3
enhanced solr writers
2012-09-11 03:02:02 +02:00
Michael Peter Christen
5df553c152
- added a json writer for solr (yes there was one using xslt but this
...
one writes the same way as yacysearch.json)
- using the new json solr result to change the ajax search in
IndexControlURLs to the new solr search
2012-09-10 14:30:44 +02:00
Michael Peter Christen
4634f0e626
fix for images_withalt
2012-09-10 12:30:03 +02:00
Michael Peter Christen
e65cecc419
- updated lucene libraries to 3.6.1
...
- added lucene-grouping which enables faceted search; try this:
http://localhost:8090/solr/select?q=*:*&start=0&rows=3&facet=true&facet.field=host_s
2012-09-10 10:12:38 +02:00