Commit Graph

4218 Commits

Author SHA1 Message Date
Michael Peter Christen
788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
The default schema uses only some of them and the resting search index
has now the following properties:
- webgraph size will have about 40 times as much entries as default
index
- the complete index size will increase and may be about the double size
of current amount
As testing showed, not much indexing performance is lost. The default
index will be smaller (moved fields out of it); thus searching
can be faster.
The new index will cause that some old parts in YaCy can be removed,
i.e. specialized webgraph data and the noload crawler. The new index
will make it possible to:
- search within link texts of linked but not indexed documents (about 20
times of document index in size!!)
- get a very detailed link graph
- enhance ranking using a complete link graph

To get the full access to the new index, the API to solr has now two
access points: one with attribute core=collection1 for the default
search index and core=webgraph to the new webgraph search index. This is
also avaiable for p2p operation but client access is not yet
implemented.
2013-02-22 15:45:15 +01:00
Michael Peter Christen
89ede0fe84 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-21 13:24:10 +01:00
Michael Peter Christen
91a0401d59 introduced a second core named 'webgraph'. This core will hold the link
structure, but is not filled yet. To have the opportunity of a second
core, multi-core functionality had to be implemented to the
deep-embedded solr:
- migrated the solr_40 directory content to a subdirectory
'collection1'; the previously used default core is now called
collection1
- added solr_40/webgraph subdirectory as second core
- added a servlet configuration for the second core 'webgraph' in
/IndexSchema_p.html
- added instance handling as addition to solr connections: all solr
connectors are now instances of an solr 'instance' object; this required
a complete re-design of the solr embedding
- migrated also caching and sharding ontop of new instance handling
- migrated the search apis to handle now the access to a specific core,
the default core named 'collection1'
- migrated the remote solr search interface to access shards of cores;
for the yacy remote search the default core is now called 'solr'; using
the peer address as solr address
- migrated the solr backup and restore process: old backups cannot be
used after this migration!
- redesign of solr instance handling in all methods which access the
instances: they cannot hold copies of these instances any more; the must
retrieve the actuall connection object every time they want to write to
it (this solves also some bugs when switching the index/network)
- added another schema 'solr.webgraph.schema', the old solr.keys.list is
replaced by solr.collection.schema
2013-02-21 13:23:55 +01:00
orbiter
594ed63f2a fixed interactive search which caused an error if pubDate is not present
in a search result
2013-02-16 20:33:27 +01:00
Michael Peter Christen
98a4a4aa97 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-15 01:38:23 +01:00
Michael Peter Christen
b6de1f42dc Full redesign of solr connection architecture. This was done to support
multiple solr cores instead of just one. Therefore it is now necessary
to distuingish between solr server connections (called an 'Instance')
and a connection to a single solr core. One Instance may now have
multiple connector classes assigned to it, each connecting to a single
core.
To support multiple cores it is also necessary to distinguish between
the connection configuration and the configuration of the index schema.
We will have multiple schema configurations in the future, each for
every solr core. This caused that the IndexFederated servlet had to be
split into two parts, the new Servlet for the Schema editor is now in
the IndexSchema Servlet.
2013-02-15 01:38:10 +01:00
Marc Nause
efb6cf7d21 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2013-02-13 19:31:12 +01:00
Marc Nause
ce5b7afab2 *) removed Skype online indicator (was not working anymore)
*) updated ICQ URLs
2013-02-13 19:29:40 +01:00
Michael Peter Christen
4111606654 removed the commitWithin attribute because that is not the way how the
index is updated the right way for us. May also be be superfluous with
the solr 4.0 softcommit.
2013-02-13 02:29:47 +01:00
Michael Peter Christen
c20fa3640d fix to unbalanced tag and license for null objects 2013-02-13 01:23:05 +01:00
Michael Peter Christen
3a6097966d added jsonp option to yjson result writer 2013-02-13 01:11:57 +01:00
Michael Peter Christen
de58043205 Added image license generation for solr image search results when
results are generated within yjson result writer. This makes it possible
to view images in yacyinteractive from solr.
2013-02-13 00:33:53 +01:00
Michael Peter Christen
d3508fa8ff fixed json search, quotes, auto-facets, urls etc. for
yacyinteractive.html
2013-02-13 00:01:38 +01:00
Michael Peter Christen
02fa31b5bf better filesearch layout 2013-02-12 12:21:29 +01:00
Michael Peter Christen
e55ec3071d reduced number of facets in yacyinteractive (only filetype necessary) 2013-02-12 12:00:54 +01:00
Michael Peter Christen
16d90859b7 reverted put-semantics back to as-usual in serverObjects and introduced
an add-method to put in several objects for the same key
2013-02-12 11:52:33 +01:00
Michael Peter Christen
c34af7fe94 extended JSON Response Writer and Opensearch Response Writer for the
Solr search interface in such way that it is possible to use this
interface for the yacyinteractive search. This search interface is now
much faster using the Solr search directly. For the Solr interface it
was necessary to create a translation from the YaCy search modifiers to
the Solr facet selection. This was added in such a way that it becomes
generic for the normal YaCy search and as a on-top evaluation for Solr
queries.
2013-02-12 03:42:46 +01:00
Michael Peter Christen
762b687e47 extended the serverObjects to be able to hold multipel values for a
single key. This is done using the solr class MultiMapSolrParams. That
class is needed in the OpensearchResultWriter to get multiple facet
requests.
2013-02-11 22:12:15 +01:00
Michael Peter Christen
d70d99fab5 added more metadata fields and facets to OpensearchResponseWriter.
This should make it possible to replace the original and enriched yacy
opensearch result with a solr output in opensearch format.
2013-02-11 22:10:14 +01:00
Michael Peter Christen
51e7ab4f70 moved bookmarks back to more prominent location (even if this does not
fit to the 'Search Interfaces' headline)
2013-02-09 06:57:20 +01:00
Michael Peter Christen
dee8b24d3c better error handling for bookmarks 2013-02-09 06:55:57 +01:00
Marc Nause
27894d2c1a Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2013-02-05 21:09:41 +01:00
Marc Nause
75f9568472 *) only install files from the RELEASE directory
*) minor changes
2013-02-05 21:02:32 +01:00
Michael Peter Christen
eb80405a16 added a disable function in RemoteCrawl_p servlet which prevents setting
of remote crawl if peer is not a senior or principal peer
2013-02-05 12:47:20 +01:00
Michael Peter Christen
1e3d8cc235 show a link for the host in the host browser; see 2013-02-04 21:24:57 +01:00
Michael Peter Christen
7de502f43d Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-02-04 20:02:35 +01:00
Marc Nause
3bc5ee6e3d *) added protection against CSRF in update download page
(http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release
does not work anymore)
2013-02-04 19:57:28 +01:00
Michael Peter Christen
3834829b37 bugfixes and more logging for solr connector 2013-02-04 16:42:10 +01:00
Michael Peter Christen
d1cb4cbc84 enhanced network scanner, is faster and more flexible now
- start more processes
- remove superfluous host name resolution
- better/more flexible subnet ip range calculation
- prefer ipv4 makes better usable ip pre-settings in servlet
- extended servlet by new subnet /20 - option
- redesign of scanner start process in servlet (generalization)
2013-02-02 09:51:43 +01:00
Michael Peter Christen
7dfcc92b71 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-01-31 13:15:42 +01:00
Michael Peter Christen
0b6566a389 optimizations when starting large crawl requests with many start urls in
one request:
- allow larger match-fields in html interface
- delete all host hashes at once from zurl
- when deleting by host, do not count size of deleted entries since that
was the reason it took so long
2013-01-31 13:15:28 +01:00
orbiter
a2160054d7 ability to create vocabularies also without any objectspace: this
iterates over all urls in the index do create terms
2013-01-30 19:33:48 +01:00
Michael Peter Christen
be27567b53 allow more links when starting a crawl by file 2013-01-28 17:50:23 +01:00
reger
3777b338c7 bugfix: location url for migrate urldb button onclick 2013-01-27 06:13:49 +01:00
reger
8447814a31 correct headermenue in migrateurldb_p.html
- update NetBeans project path
2013-01-26 23:43:09 +01:00
Michael Peter Christen
99185d7048 one more fix for author_sxt 2013-01-26 03:59:39 +01:00
Michael Peter Christen
b6ae6262f6 - add the copyField author_sxt only if author exists
- set the solr default search field according to existing fields
2013-01-26 03:34:46 +01:00
Michael Peter Christen
088373b4ea catch exception if solr connection change fails 2013-01-25 16:06:58 +01:00
Michael Peter Christen
e23a596c1d added a copyField for author_sxt for automated schema generation 2013-01-24 18:25:28 +01:00
Michael Peter Christen
f1a4feda3e security fix for suggest (don't let users ask for too much) 2013-01-24 17:57:28 +01:00
Michael Peter Christen
244b157299 fix for external solr schema definition 2013-01-24 16:34:15 +01:00
Michael Peter Christen
0fe7b6fd3b migrated the index export methods from the old metadata to solr. Now
exports are done using solr queries. removed superfluous methods and
servlets.
2013-01-24 12:39:19 +01:00
Michael Peter Christen
8eebeea533 fix for search result link in ViewFile 2013-01-24 01:50:59 +01:00
Michael Peter Christen
31e854bef6 Merge remote-tracking branch 'copro/master' 2013-01-23 14:41:17 +01:00
Michael Peter Christen
4735bd47f4 - changed solr commit call and added an optimize option. Since Solr
4.0.0 there is a new softcommit feature which implements a
near-real-time (NRT) search option. The softcommit does not do IO and
does not cause performance issues.
YaCy has now an extension in its solr connectors to use the softcommit
feature. The softcommit call now replaces all places where a hard commit
was used. Furthermore the commit strategy in when doing a search from
the web interface was changed (it's done every time before a search is
done).

The softcommit feature was implemented because it was needed for the
following changes (customer demands), which is also included in this
git commit:

- added a feature to identify all documents which have unique titles
and/or unique descriptions. These unique flags are disabled by default.
- added also a feature to set a flag when the url from a canonical tag
is equal to the document url. This is also disabled by default.

To support the new softcommit strategy, the commitWithinMs option was
set to -1 do disable automatic commit based on document insert times. If
documents are inserted permanently then also a commit would happen
permanently whenever the commitWithinMs time is reached. This would
conflict with the regular autocommit of 10 minutes and the new
softcommit strategy.
2013-01-23 14:40:58 +01:00
Copro
0025983993 Fix typo embedd -> embed 2013-01-23 04:11:55 +01:00
Copro
3ea8380959 Adding Vimeo tag to wiki commands to embedd Video video with id 2013-01-23 04:00:15 +01:00
Copro
ee9d7fd93d Added feature to embedd Youtube videos to wiki commands for usage in
Wiki, Blog or other servlets
2013-01-23 02:43:58 +01:00
Michael Peter Christen
9ccdd21d76 Merge remote-tracking branch 'aleksejs/fixtrans'
Conflicts:
	locales/ru.lng
	
Tried to merge this but I had to made this 'blind'.
Sorry if I deleted something that was right.
2013-01-22 11:54:38 +01:00
Michael Peter Christen
aa067da86b set the 'all' option as option at end of the list because the all option
currently select also lists which cannot be exported in xml correctly
2013-01-17 01:04:50 +01:00