Michael Peter Christen
ccc3760a47
Refactoring and redesign of data architecture to make URIMetadataRow
...
superfluous. The target is to make a solr document as the core of YaCy
documents which would cause that many conversions can be removed. On the
way to this target the Equivalence of URIMetadataRow and URIMetadataNode
had to be removed to expose the usage of the old URIMetadataRow data
structure.
This refactoring already removes unneccessary conversions and should
make memory usage during indexing lower.
2012-10-18 14:29:11 +02:00
Michael Peter Christen
e5b3c172ff
removed hack which translated Solr documents to virtual RWI entries
...
which had been then mixed with remote RWIs. Now these Solr documents are
feeded into the result set as they appear during local and remote
search. That makes the search much faster.
2012-10-17 17:45:41 +02:00
Michael Peter Christen
5d16c23a1f
specified more URIMetadata as URIMetadataNode
2012-10-16 18:26:21 +02:00
Michael Peter Christen
21fe8339b4
- enhanced generation of url objects
...
- enhanced computation of link structure graphics
- enhanced collection of data for link structures
2012-10-15 13:17:13 +02:00
Michael Peter Christen
5f0ab25382
removed the option to prevent removal of & parts inside of the
...
MultiProtocolURI during normalform computation because that should
always be done and also be done during initialization of the
MultiProtocolURI Object. The new normalform method takes only one
argument which should be 'true' unless you know exactly what you are
doing.
2012-10-10 11:46:22 +02:00
Michael Peter Christen
a06930662c
replaced some more .getBytes() with UTF8/ASCII.getBytes()
2012-10-09 12:14:28 +02:00
Michael Peter Christen
0cec7e761a
enhanced snippet extractor to find snippets also inside of tokens of an
...
url
2012-09-26 15:33:37 +02:00
Michael Peter Christen
1533bfd63b
refactoring
2012-09-25 21:20:03 +02:00
Michael Peter Christen
8219a445f3
refactoring
2012-09-21 16:46:57 +02:00
Michael Peter Christen
00c1c777fa
refactoring
2012-09-21 15:48:16 +02:00
Michael Peter Christen
e54ac38095
- some corrections in usage of getFile() and getFileName()
...
- added more attributes in json response writer according to yacy
servlet
2012-09-11 23:28:21 +02:00
Michael Peter Christen
0cab06c47c
refactoring
2012-08-17 15:52:33 +02:00
Michael Peter Christen
9bece5ac5f
enhanced snippet fetch - removed a bug that caused documents to be
...
parsed even if a solr text was available
2012-08-17 14:22:07 +02:00
Michael Peter Christen
f9c0e6e950
- Implemented and integrated the URIMetadataNode object which is a
...
metadata representation from the solr index. This shall replace metadata
from the built-in database in the future.
- added the Solr-driven metadata into the search index of YaCy which
makes it now possible to run YaCy without the old metadata index. This
is a major stept forward to a full migration to Solr.
2012-08-10 13:26:51 +02:00
Michael Peter Christen
24d9db1613
snippet retrieval loading processes may use a smaller minimum load time
...
value than crawling processes. This speeds up the search result
preparation dramatically.
2012-07-30 10:38:23 +02:00
Michael Peter Christen
1687737771
Abstraction of HandleMap and HandleSet
2012-07-27 12:13:53 +02:00
orbiter
69e743d9e3
- more abstraction for the RWI index as preparation for solr integration
...
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
orbiter
0cbda0b2b8
- replaced all length() == 0 and size() == 0 with isEmpty()
...
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
Michael Peter Christen
7c1ba99755
removed more unused method parameters
2012-07-05 10:44:30 +02:00
Michael Peter Christen
0301aba1e9
removed unused method parameters
2012-07-05 10:23:07 +02:00
Michael Peter Christen
ea10766bfd
cleaned unnecessary nested code
2012-07-05 08:44:39 +02:00
orbiter
fc0f9543fe
More SentenceReader cleanup
2012-07-05 00:20:58 +02:00
orbiter
78fc3cf8f8
refactoring and new usage of SentenceReader: this class appeared as one
...
of the major CPU users during snippet verification. The class was not
efficient for two reasons:
- it used a too complex input stream; generated from sources and UTF8
byte-conversions. The BufferedReader applied a strong overhead.
- to feed data into the SentenceReader, multiple toString/getBytes had
been applied until a buffered Reader from an input stream was possible.
These superfluous conversions had been removed.
- the best source for the Sentence Reader is a String. Therefore the
production of Strings had been forced inside the Document class.
2012-07-04 21:15:10 +02:00
Michael Peter Christen
de903a53a0
parser refactoring & hacks
2012-07-03 06:06:38 +02:00
Michael Peter Christen
1825f165b8
better integration of blacklist according to use case
2012-07-02 13:57:29 +02:00
Roland 'Quix0r' Haeder
edaa09b9b1
Rewrote all String blacklist types to enum 'BlacklistType', closes bug
...
#143
Conflicts:
htroot/Supporter.java
htroot/yacy/crawlReceipt.java
htroot/yacy/transferRWI.java
htroot/yacy/transferURL.java
source/de/anomic/crawler/CrawlStacker.java
source/de/anomic/data/ListManager.java
source/net/yacy/peers/Protocol.java
source/net/yacy/repository/Blacklist.java
source/net/yacy/repository/LoaderDispatcher.java
source/net/yacy/search/Switchboard.java
source/net/yacy/search/index/MetadataRepository.java
source/net/yacy/search/index/Segment.java
source/net/yacy/search/query/RWIProcess.java
source/net/yacy/search/snippet/MediaSnippet.java
2012-06-11 00:17:30 +02:00
Michael Peter Christen
00f2df1120
a variety of possible memory leak fixes
2012-06-06 18:23:18 +02:00
Michael Peter Christen
9b4c699526
ehanced location search:
...
- search request are now made using a map boundary
- search results are only computed for the map boundary
- the number of results is adopted to the results in the visible range
- added a double-buffering for the search result markers
- added a search query option for the search results:
/radius/<lat>/<lon>/<radius>
2012-05-31 22:39:53 +02:00
Michael Peter Christen
10da7335ea
performance hack: use a hash cache for all hashes that are computed by a
...
byte array. If this hash is used in a HashMap (which is very often the
case) then this hack eliminates a lot of re-computations of the same
hash.
2012-05-30 16:59:13 +02:00
Michael Peter Christen
c15fcde1c8
add-on to latest commit
2012-05-21 17:52:30 +02:00
Michael Peter Christen
cf47d94888
performance hack to parse numbers inside of substrings without actually
...
generating a substring. This avoids the allocation of a String object
ech time a substring is parsed. Should affect CPU load during RWI
transmission.
2012-05-21 13:40:46 +02:00
Michael Peter Christen
7e0ddbd275
added a "fromCache" flag in Response object to omit one cache.has()
...
check during snippet generation. This should cause less blockings
2012-05-21 03:03:47 +02:00
Michael Peter Christen
76157dc2c3
bugfix for http://bugs.yacy.net/view.php?id=173
2012-05-21 00:18:00 +02:00
Michael Peter Christen
a3badd3205
changed search process for images: no more media snippet load process,
...
show only links from index which had been on the text search page
before. This creates a superfast search process for images!
2012-04-24 12:55:58 +02:00
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
2012-04-22 00:04:36 +02:00
Michael Peter Christen
a1a5b015d8
refactoring: moved document Classification to cora package
2012-04-21 21:31:13 +02:00
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
2012-04-21 13:34:07 +02:00
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
2012-04-13 06:47:33 +02:00
Michael Peter Christen
ef78f22ee1
performance hack
2012-01-25 12:48:48 +01:00
Roland 'Quix0r' Haeder
a3083d13bf
Blacklist checks are now always turned on, in media searches (e.g. image search) images matching blacklist entries are no longer shown to the user
2011-12-28 20:09:17 +01:00
Michael Christen
9e5894c784
Removed handling of components objects for URIMetadataRows.
...
This is a preparation to replace this rows with nodes from the node
store.
2011-12-17 01:27:08 +01:00
Michael Christen
044f83feed
added some pauses into the search process which shall produce
...
better-ranked search results. without that pauses the result page will
only contain links from the peer that answers first which is not a good
average picture of all the peers that provided results
2011-12-06 15:28:48 +01:00
orbiter
5a55397f99
some last-minute performance hacks
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-25 11:23:52 +00:00
orbiter
4ad9fc2bff
new snippet strategy for search hits in metadata: show beginning of text instead of hit position
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7999 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-13 00:34:52 +00:00
orbiter
a7df70221e
refactoring
...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7987 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-10-04 09:06:24 +00:00
orbiter
d2ea250d99
refactoring:
...
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00