Commit Graph

490 Commits

Author SHA1 Message Date
Michael Peter Christen
e45a3235e0 default cache size was much too high; decreased solr cache size 2012-10-11 12:03:48 +02:00
Michael Peter Christen
5f0ab25382 removed the option to prevent removal of & parts inside of the
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
2012-10-10 11:46:22 +02:00
Michael Peter Christen
7e3e45fd04 added Open Graph Metadata default fields, see http://ogp.me/ns# 2012-10-09 17:28:48 +02:00
Michael Peter Christen
c3e5f667a7 added schema.org breadcrumb counter to parser and solr schema 2012-10-09 13:02:43 +02:00
Michael Peter Christen
877042a6b5 fix for portal mode 2012-10-08 14:54:06 +02:00
Michael Peter Christen
584663ae8c - redesign of solr query construction
- fix for solr boosts and location search
- fix for number of search results in local search
2012-10-07 07:46:55 +02:00
Michael Peter Christen
a8167e6e5b clean-up: removed unused methods in kelondro 2012-10-06 03:34:52 +02:00
Michael Peter Christen
31485a963d refactoring 2012-10-02 21:57:50 +02:00
Michael Peter Christen
f8a3ab2d82 added the usage of synonyms to the GSA search interface 2012-10-02 14:29:45 +02:00
Michael Peter Christen
3d33a5bdf6 turned the synonyms_t Text field into a multi-valued String field
synonyms_sxt
2012-10-02 11:13:06 +02:00
Michael Peter Christen
41ab2a2279 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-02 10:24:03 +02:00
orbiter
c8b1a693dc ups, added missing class for last commit 2012-10-02 10:23:10 +02:00
Michael Peter Christen
3b959ee002 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-10-02 10:14:09 +02:00
orbiter
3190347814 added a synonyms_t field to solr and a process to read synonym files.
This can be used to add another stemming to solr using stemming files
that are expressed as synonyms for grammatical alternatives. The
synonym/stemming files must have the following form:
- each line is a comma-separated list of synonyms
- the list of synonyms may be enclosed with {} (like the GSA synonyms
file)
- the file may contain comments which are lines starting with a '#'
The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and
are activated by default whenever a synonym file is in place.
Then, for each word that is found in a document all synonyms are added
to a long text field which is stored into synonyms_t. Processes using
the synonyms must query with that field as optional matcher.
2012-10-02 00:02:50 +02:00
Michael Peter Christen
411d0e839b added an underline text field to solr to record all underlined texts 2012-10-01 14:16:49 +02:00
Michael Peter Christen
f45f7fc12e added new Host Browser to main menu:
this new search interface is something completely new for search, but
completely common on desktops: browser a web space like one would browse
a file system in a file browser. The file listing is created using the
search index and a faceted restriction to specific domains.
2012-09-28 22:45:16 +02:00
Michael Peter Christen
8556a3d521 extended solr connector with a method to retrieve a single facet. 2012-09-28 13:50:13 +02:00
Michael Peter Christen
816cb6ce93 another fix for the debian installer: the installer fails because some
classes had unresolved dependencies. This fix removes the dependencies.
2012-09-28 09:00:40 +02:00
Michael Peter Christen
ca313e404f - if a "/date" modifier is used, the solr remote query applies an
ordering by date (ascending)
- added also some 'anti-timetravel' protection (check if date is in the
future within any metadata date field)
2012-09-26 16:56:33 +02:00
Michael Peter Christen
562183932b - removed ip_s from default profile since that needs a DNS lookup to
create an document entry. This makes remote search much slower.
- removed synchronization of add method if ip_s is activated to prevent
that a user configuration causes bad behavior. The disadvantage of that
is, that a index dump can cause data loss if an indexing is running
during index dump
- catched more exceptions and more NPE
- better abstraction in MirrorSolrConnector
- slight performance enhancement when only the index count is requested
(rows=0 is sufficient to get a total count)
2012-09-26 13:38:04 +02:00
Michael Peter Christen
c913b2ba77 - fix for NPEs during remote solr configuration
- fixed remote solr setting switch
- added more logging
2012-09-25 23:59:09 +02:00
Michael Peter Christen
1533bfd63b refactoring 2012-09-25 21:20:03 +02:00
Michael Peter Christen
e49359cc95 removed tenant query attribute since it is not used any more and is
replaced by the site-operator in the GSA interface. This operator can
also be simulated in the Solr interface using the collections_sxt field.
2012-09-25 21:09:06 +02:00
Michael Peter Christen
872f83ebe0 refactoring 2012-09-25 21:04:58 +02:00
Michael Peter Christen
1b474139dd used the new zip writer/reader to add a solr dump process: the whole
solr index can be written to a zip dump and also restored during runtime
2012-09-24 17:05:28 +02:00
Michael Peter Christen
4a3e684f8c added a directory-to-zip writer and zip-to-directory reader 2012-09-24 17:04:37 +02:00
Michael Peter Christen
5683162bd3 simplifications in DHT Distribution class and more documentation 2012-09-24 12:01:09 +02:00
Michael Peter Christen
e57bf2ca39 simplified DHT classes 2012-09-24 01:04:39 +02:00
orbiter
a053b356ee added new classes to renovate the YaCy protocol based on simple data
structures in cora:
- added the Peer object, which is a fresh version of Seed
- added the Peers object, which is a fresh version of Network
- added the Network api access class to retrieve a list of peers based
on the Network.xml servlet in all YaCy peers.
2012-09-22 11:10:11 +02:00
Michael Peter Christen
8219a445f3 refactoring 2012-09-21 16:46:57 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
orbiter
563d584420 removed more dependencies in cora from kelondro 2012-09-21 11:02:36 +02:00
orbiter
63762d8f89 removed kelondro dependencies from cora 2012-09-20 19:38:22 +02:00
orbiter
6e0f4557f8 added ftp to getName 2012-09-20 18:29:04 +02:00
Michael Peter Christen
c235d5c0f1 fixed size parsing in RSS message parser (for YaCy size parameter) 2012-09-19 06:36:07 +02:00
Michael Peter Christen
5bc8f34150 fix for success query counter 2012-09-18 11:06:36 +02:00
orbiter
4987921d3d fixed the size() method which counted also failed pages (which are also
inside the solr index)
2012-09-16 21:22:56 +02:00
Michael Peter Christen
975bc95ddf added default facet fields for json response format (stub) 2012-09-14 12:09:20 +02:00
Michael Peter Christen
e54ac38095 - some corrections in usage of getFile() and getFileName()
- added more attributes in json response writer according to yacy
servlet
2012-09-11 23:28:21 +02:00
Michael Peter Christen
62add1d564 added the protocol and the file name extension to the solr fields since
these fields are probably facets in file search
2012-09-11 22:46:39 +02:00
Michael Peter Christen
b846f585fa fixed a bug with size_i field usage 2012-09-11 20:24:27 +02:00
Michael Peter Christen
9db032664e activate two solr fields which will be used by administration interface
(later)
2012-09-11 20:15:54 +02:00
orbiter
fcd5c7eec3 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-11 09:16:38 +02:00
orbiter
6171143b4a added facet stub in JsonResponseWriter 2012-09-11 09:15:47 +02:00
Michael Peter Christen
e84ffdb4f3 enhanced solr writers 2012-09-11 03:02:02 +02:00
Michael Peter Christen
5df553c152 - added a json writer for solr (yes there was one using xslt but this
one writes the same way as yacysearch.json)
- using the new json solr result to change the ajax search in
IndexControlURLs to the new solr search
2012-09-10 14:30:44 +02:00
Michael Peter Christen
8c099d2106 Merge remote-tracking branch 'origin/master'
Conflicts:
	htroot/api/ymarks/import_ymark.java
	source/de/anomic/data/ymark/YMarkEntry.java
	source/de/anomic/data/ymark/YMarkTables.java
2012-09-10 07:05:20 +02:00
apfelmaennchen
d31a632951 - added dmoz RDF dump importer
- added indexing to Tables columns to support larger bookmark
collections
- added RDF output (HTTP) for public bookmarks at /YMarks.rdf
- YMarkRDF also provides a Jena RDF Model as "internal" API
- various other changes/fixes for YMarks (mainly backend)
2012-09-09 09:53:58 +02:00
sixcooler
9ee2e09983 statistics for solr-cache 2012-09-06 22:02:29 +02:00
Michael Peter Christen
b2b516cc3e added a collection attribute to crawls and searches:
- a solr field collection_sxt can be used to store a set of crawl tags
- when this field is activated, a crawl tag can be assigned when crawls
are started
- the content of the collection field can be comma-separated, all of
them are assigned to the documents when they are indexed as result of
such a crawl start
- a search result can be drilled down to a specific collection; this is
currently only available in the solr interface and also in the gsa
interface using the 'site' option
- this adds a mandatory field for gsa queries (the google api demands
that field all the time)
2012-09-03 15:26:08 +02:00