yacy_search_server/source/net/yacy/document/parser
Michael Peter Christen 788288eb9e added the generation of 50 (!!) new solr field in the core 'webgraph'.
The default schema uses only some of them and the resting search index
has now the following properties:
- webgraph size will have about 40 times as much entries as default
index
- the complete index size will increase and may be about the double size
of current amount
As testing showed, not much indexing performance is lost. The default
index will be smaller (moved fields out of it); thus searching
can be faster.
The new index will cause that some old parts in YaCy can be removed,
i.e. specialized webgraph data and the noload crawler. The new index
will make it possible to:
- search within link texts of linked but not indexed documents (about 20
times of document index in size!!)
- get a very detailed link graph
- enhance ranking using a complete link graph

To get the full access to the new index, the API to solr has now two
access points: one with attribute core=collection1 for the default
search index and core=webgraph to the new webgraph search index. This is
also avaiable for p2p operation but client access is not yet
implemented.
2013-02-22 15:45:15 +01:00
..
augment - added field options to all solr queries. This can be used to restrict 2012-11-19 17:24:34 +01:00
html added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
images added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
rdfa - added new solr fields: 2012-08-31 10:30:43 +02:00
xml Adding heuristic to get search results from configured systems which support opensearch specification 2012-12-29 08:24:48 +01:00
audioTagParser.java Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based 2012-10-05 18:54:26 +02:00
bzipParser.java - moved triple store to net.yacy.cora.lod (should be generalized there 2012-06-11 16:48:53 +02:00
csvParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
docParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
dwgParser.java - replaced all length() == 0 and size() == 0 with isEmpty() 2012-07-10 22:59:03 +02:00
genericParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
gzipParser.java - moved triple store to net.yacy.cora.lod (should be generalized there 2012-06-11 16:48:53 +02:00
htmlParser.java added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
mmParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
odtParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
ooxmlParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
pdfParser.java update to pdf parser 2012-12-27 04:16:31 +01:00
pptParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
psParser.java reduced logging overhead (a bit) 2012-07-12 19:23:40 +02:00
rdfParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
rssParser.java added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
rtfParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
sevenzipParser.java added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
sidAudioParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
sitemapParser.java added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
swfParser.java added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
tarParser.java added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
torrentParser.java added a synonyms_t field to solr and a process to read synonym files. 2012-10-02 00:02:50 +02:00
vcfParser.java added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00
vsdParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
xlsParser.java - added new solr fields: 2012-08-31 10:30:43 +02:00
zipParser.java added the generation of 50 (!!) new solr field in the core 'webgraph'. 2013-02-22 15:45:15 +01:00