Commit Graph

13 Commits

Author SHA1 Message Date
Michael Peter Christen
f5ca5cea44 - added field options to all solr queries. This can be used to restrict
the actual data which is fetched from solr.
- used the new field options to reduce generic options like getting the
load date or the count of search results. should increase overall speed
- used the new field options to reduce overhead in the host browser
during aquisition of links.
- used the field options to make checking of links in crawler faster
- if the crawler is paused, the crawl queue is not cleaned
2012-11-19 17:24:34 +01:00
reger
722a447b0d - optimize code of augmented parsing to enhence document tags
- commented out augmentedparser.analyse (not function implemented yet)
- adjust init of document title list to always use same list type
2012-10-26 18:50:45 +02:00
reger
87aab9aa7c - fix: with augmented parsing = on; missing metadata in index (like title) due to overwriting metadata by adding multiple result docs from augmentparser with same url
- fix Document.addsubdocuments: sections might be initialized as Arrays.toList which does not provide the used .addAll methode
   see e.g. http://kamleshkr.wordpress.com/2010/02/17/inside-java-arrays-aslistt-a/
2012-10-22 22:48:35 +02:00
Michael Peter Christen
5f0ab25382 removed the option to prevent removal of & parts inside of the
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
2012-10-10 11:46:22 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
Michael Peter Christen
528d6763fa - added new solr fields:
title_count_i, title_chars_val, title_words_val
description_count_i, description_chars_val, description_words_val
- added many asserts to ensure data type correctness from YaCy to Solr
and vice versa
- made many fixes according to new findings from these asserts (!)
2012-08-31 10:30:43 +02:00
Michael Peter Christen
b0c408788b made class methods static where possible 2012-07-05 12:38:41 +02:00
Michael Peter Christen
7c1ba99755 removed more unused method parameters 2012-07-05 10:44:30 +02:00
orbiter
78fc3cf8f8 refactoring and new usage of SentenceReader: this class appeared as one
of the major CPU users during snippet verification. The class was not
efficient for two reasons:
- it used a too complex input stream; generated from sources and UTF8
byte-conversions. The BufferedReader applied a strong overhead.
- to feed data into the SentenceReader, multiple toString/getBytes had
been applied until a buffered Reader from an input stream was possible.
These superfluous conversions had been removed.
- the best source for the Sentence Reader is a String. Therefore the
production of Strings had been forced inside the Document class.
2012-07-04 21:15:10 +02:00
Michael Peter Christen
50c576599b allow multiple parser options instead of printing an error 2012-06-12 01:42:58 +02:00
Michael Peter Christen
5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
later
- added abstract add, delete, get methods in the triplestore
- added generation of triples after auto-annotation
- migrated all MultiProtocolURI objects to DigestURI in the parser since
the url hash is needed as subject value in the triples in the triple
store
2012-06-11 16:48:53 +02:00
cominch
5f8ba7f4f2 small changes
Conflicts:
	source/net/yacy/document/parser/augment/AugmentParser.java
	source/net/yacy/interaction/Interaction.java
2012-06-10 13:02:00 +02:00
cominch
bcbd8eee33 Add several parsers, for RDFa and rdf files.
Conflicts:
	source/net/yacy/document/TextParser.java
2012-06-10 10:42:33 +02:00