orbiter
67edfd991c
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-08-05 15:49:48 +02:00
orbiter
d9173ba7ed
added more solr fields to integrate values from URIMetadataRow. All
...
writings to the Metadata-DB are now also done to solr. This includes
metadata transfer during search and rwi transfer.
The new/added solr fields are:
## time when resource was loaded
load_date_dt
## date until resource shall be considered as fresh
fresh_date_dt
## id of the host, a 6-byte hash that is part of the document id
host_id_s
## ids of referrer to this document
referrer_id_ss
## the md5 of the raw source
md5_s
## the name of the publisher of the document
publisher_t
## the language used in the document; starts with primary language
language_ss
## an external ranking value
ranking_i
## the size of the raw source
size_i
## number of links to audio resources
audiolinkscount_i
## number of links to video resources
videolinkscount_i
## number of links to application resources
applinkscount_i
2012-08-05 15:49:27 +02:00
Michael Peter Christen
24d9db1613
snippet retrieval loading processes may use a smaller minimum load time
...
value than crawling processes. This speeds up the search result
preparation dramatically.
2012-07-30 10:38:23 +02:00
Michael Peter Christen
1687737771
Abstraction of HandleMap and HandleSet
2012-07-27 12:13:53 +02:00
Michael Peter Christen
6f1ddb2519
Moved solr index-add method to the same method where the YaCy index is
...
written. Also done some code-cleanup.
2012-07-25 01:53:47 +02:00
Michael Peter Christen
1f41d9c6f5
bugfix for a NPE
2012-07-24 17:29:32 +02:00
Michael Peter Christen
d3f243e2e1
fixed node type calculation for principal peers
2012-07-23 23:40:50 +02:00
orbiter
69e743d9e3
- more abstraction for the RWI index as preparation for solr integration
...
- added options in search index to switch parts of the index on or off
2012-07-22 13:18:45 +02:00
orbiter
c00a3cf74d
less usage of generic logger to avoid logger generation overhead
2012-07-12 19:54:54 +02:00
orbiter
0cbda0b2b8
- replaced all length() == 0 and size() == 0 with isEmpty()
...
- replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be
done automatically
- implemented some isEmpty() methods
2012-07-10 22:59:03 +02:00
orbiter
62202e2d71
refactoring of query attribute variable names for better consistency
...
with (next) stored query words
2012-07-09 11:14:50 +02:00
Michael Peter Christen
b0c408788b
made class methods static where possible
2012-07-05 12:38:41 +02:00
Michael Peter Christen
0301aba1e9
removed unused method parameters
2012-07-05 10:23:07 +02:00
Michael Peter Christen
241dd8410a
removed snippet pattern filter - it was not used
2012-07-05 09:21:27 +02:00
Michael Peter Christen
d3964253ae
- added @SuppressWarnings to unused servlet method parameters
...
- removed unnecessary casts
- removed unnecessary throw statements
2012-07-05 09:14:04 +02:00
Michael Peter Christen
ea10766bfd
cleaned unnecessary nested code
2012-07-05 08:44:39 +02:00
Michael Peter Christen
1481037820
replaced non-generic array with collection
2012-07-05 01:02:51 +02:00
Michael Peter Christen
613b45f604
- better data structures in secondary search
...
- fixed a big memory leak in secondary search
2012-07-03 07:12:20 +02:00
Michael Peter Christen
1825f165b8
better integration of blacklist according to use case
2012-07-02 13:57:29 +02:00
Michael Peter Christen
96aeb127e3
generalized localhost naming.
...
this is also a preparation for a better IPv6 implementation.
2012-06-26 00:08:25 +02:00
Michael Peter Christen
77f795756c
fixing redirects and status codes: storing of status code in
...
ResponseHeader to make it available for late evaluations, like storage
in solr.
2012-06-25 18:17:31 +02:00
Michael Peter Christen
b9d42fd9c8
using com.google.common.io.Files instead of homebrew methods
2012-06-22 11:39:17 +02:00
Michael Peter Christen
3f55dc7c1e
- added solr core and libraries that solr needs (lucene is missing, will
...
follow later)
- added embedded solr connector which can connect to solr
programmatically (without using a server in between)
2012-06-21 14:55:38 +02:00
Michael Peter Christen
82a682b31d
fixed problem with seed when switching network
2012-06-19 07:44:44 +02:00
Michael Peter Christen
8e97ada7c9
IPv6 bugfix
2012-06-18 00:33:32 +02:00
Roland 'Quix0r' Haeder
edaa09b9b1
Rewrote all String blacklist types to enum 'BlacklistType', closes bug
...
#143
Conflicts:
htroot/Supporter.java
htroot/yacy/crawlReceipt.java
htroot/yacy/transferRWI.java
htroot/yacy/transferURL.java
source/de/anomic/crawler/CrawlStacker.java
source/de/anomic/data/ListManager.java
source/net/yacy/peers/Protocol.java
source/net/yacy/repository/Blacklist.java
source/net/yacy/repository/LoaderDispatcher.java
source/net/yacy/search/Switchboard.java
source/net/yacy/search/index/MetadataRepository.java
source/net/yacy/search/index/Segment.java
source/net/yacy/search/query/RWIProcess.java
source/net/yacy/search/snippet/MediaSnippet.java
2012-06-11 00:17:30 +02:00
Michael Peter Christen
3b992e6b00
using utf8 String compression in Webstructure database
2012-06-09 11:00:33 +02:00
Michael Peter Christen
a1fe65b115
performance hacks
2012-06-05 12:06:26 +02:00
Michael Peter Christen
e0d8643226
- performance hacks
...
- added log warnings in case that search processes run into time-out
situations
- better concurrency for Integer formatter (used a non-synchronized
formatter before)
- bugfix for search termination (a poison pill was missing)
- added timeout parameters for search (again) -> target is, that they
are never reached.
2012-06-04 15:37:39 +02:00
Michael Peter Christen
7c1feefb28
introduced a default 10 second time-out in rwi normalization time
...
uring search process to prevent endless deadlocks after a very long
running search
2012-05-30 16:26:05 +02:00
Michael Peter Christen
65d37e6a20
only ASCII needed in seed bitflags
2012-05-30 15:42:28 +02:00
Michael Peter Christen
0f82fb3628
using double instead float for a better release ordering
2012-05-30 15:28:20 +02:00
Michael Peter Christen
71c3163f3d
- fixes to node identification
...
- added link to node in network list
- added marking of portal search node peers
2012-05-29 01:38:54 +02:00
Michael Peter Christen
ad222be7f8
added node state icon in network list
2012-05-25 17:29:54 +02:00
Michael Peter Christen
3c2bec681f
added a root node flag: identifies peers with short ping time
2012-05-25 15:33:02 +02:00
Michael Peter Christen
f294f2e295
bugfix to http://bugs.yacy.net/view.php?id=181
...
tried to make a bit less 'noise' to dns server
also included: less processes in snippet fetch to reduce load during
search on small computers
2012-05-19 01:06:33 +02:00
Michael Peter Christen
89142d1e8d
removed (not all) warnings
2012-05-16 13:42:32 +02:00
Michael Peter Christen
15db703808
added missing serialization to remove all warnings
2012-05-15 13:13:07 +02:00
Roland 'Quix0r' Haeder
a093ccf5eb
Now used synchronization in all close() methods to make sure all objects
...
are 'closed' in an ordered way
Conflicts:
source/de/anomic/http/server/ChunkedInputStream.java
source/de/anomic/http/server/ChunkedOutputStream.java
source/de/anomic/http/server/ContentLengthInputStream.java
source/net/yacy/cora/protocol/Domains.java
source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
source/net/yacy/document/content/dao/PhpBB3Dao.java
source/net/yacy/document/parser/html/AbstractTransformer.java
source/net/yacy/kelondro/blob/BEncodedHeap.java
source/net/yacy/kelondro/blob/HeapReader.java
source/net/yacy/kelondro/index/RAMIndexCluster.java
source/net/yacy/kelondro/io/ByteCountInputStream.java
source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
source/net/yacy/kelondro/table/SQLTable.java
2012-05-14 07:41:55 +02:00
Marc Nause
a691023d04
*) better formatting for network QPM
...
*) refactoring
2012-05-08 20:07:34 +02:00
Michael Peter Christen
77f8e9fb9b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-05-04 17:29:16 +02:00
Michael Peter Christen
ba6aaabc51
refactoring + parser bugfixes
2012-05-04 17:28:27 +02:00
Michael Peter Christen
2a0434efa4
Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'
2012-04-29 21:21:49 +02:00
reger
c1f6b4fb52
lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)
2012-04-24 00:05:01 +02:00
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
2012-04-22 02:05:17 +02:00
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
2012-04-22 00:04:36 +02:00
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
2012-04-21 13:34:07 +02:00
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
2012-04-16 09:50:55 +02:00
Michael Peter Christen
8c06925984
animation of the web structure picture
2012-02-25 15:42:29 +01:00
Michael Peter Christen
c639248c23
protection against strange answers from remote peers during search
2012-02-25 14:07:02 +01:00