Michael Peter Christen
15db703808
added missing serialization to remove all warnings
2012-05-15 13:13:07 +02:00
Michael Peter Christen
1795a7325b
made HandleSet serializable
2012-05-15 12:55:15 +02:00
Michael Peter Christen
e7e381d110
added configuration to switch off redirection following in crawler
2012-05-15 12:25:46 +02:00
Michael Peter Christen
2717c1b749
fixed bug in solr interface
2012-05-15 12:25:14 +02:00
Michael Peter Christen
f150bc218b
fixed bug in solr error document
2012-05-14 14:56:21 +02:00
Michael Peter Christen
cb54c1737b
solrj connector bugfix
2012-05-14 11:56:04 +02:00
Roland 'Quix0r' Haeder
a093ccf5eb
Now used synchronization in all close() methods to make sure all objects
...
are 'closed' in an ordered way
Conflicts:
source/de/anomic/http/server/ChunkedInputStream.java
source/de/anomic/http/server/ChunkedOutputStream.java
source/de/anomic/http/server/ContentLengthInputStream.java
source/net/yacy/cora/protocol/Domains.java
source/net/yacy/cora/services/federated/solr/SolrShardingConnector.java
source/net/yacy/cora/services/federated/solr/SolrSingleConnector.java
source/net/yacy/document/content/dao/PhpBB3Dao.java
source/net/yacy/document/parser/html/AbstractTransformer.java
source/net/yacy/kelondro/blob/BEncodedHeap.java
source/net/yacy/kelondro/blob/HeapReader.java
source/net/yacy/kelondro/index/RAMIndexCluster.java
source/net/yacy/kelondro/io/ByteCountInputStream.java
source/net/yacy/kelondro/logging/ConsoleOutErrHandler.java
source/net/yacy/kelondro/table/SQLTable.java
2012-05-14 07:41:55 +02:00
Michael Peter Christen
49cab2b85f
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-05-13 09:51:06 +02:00
Michael Peter Christen
0d58fea210
made multiple connector default
2012-05-12 10:39:01 +02:00
Michael Peter Christen
7740c02c56
- enhanced the solr connector
...
- added new multiple connector (to replace singleConnector)
2012-05-12 10:32:42 +02:00
Michael Peter Christen
0cf3d36eae
more tolerance in case of corrupted file
2012-05-11 20:46:50 +02:00
Michael Peter Christen
acc6db28ff
added missing classes for solr interface
2012-05-09 23:43:12 +02:00
Michael Peter Christen
adeb33bb36
better abstraction for solr objects
2012-05-09 17:21:19 +02:00
Michael Peter Christen
8864141872
more abstraction in solr connection classes
2012-05-09 17:00:56 +02:00
Michael Peter Christen
c00efc2717
made the solr connection more generic
2012-05-09 16:46:45 +02:00
Michael Peter Christen
ea2bd43b28
patch for broken configurations
2012-05-09 12:29:07 +02:00
Michael Peter Christen
e5ca7f22b1
enhancement in circle drawing
2012-05-09 12:28:43 +02:00
Michael Peter Christen
34f4225d7e
less 'wellformed' calls without asserts
2012-05-08 23:24:39 +02:00
Marc Nause
a691023d04
*) better formatting for network QPM
...
*) refactoring
2012-05-08 20:07:34 +02:00
Michael Peter Christen
77f8e9fb9b
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-05-04 17:29:16 +02:00
Michael Peter Christen
ba6aaabc51
refactoring + parser bugfixes
2012-05-04 17:28:27 +02:00
Michael Peter Christen
2a0434efa4
Merge commit 'c1f6b4fb5226d3d2f8b2bec9e361f6b3476e03ff'
2012-04-29 21:21:49 +02:00
Michael Peter Christen
942896fe46
removed methods not supported by new solrj connector for httpclient 4
...
Error was:
java.lang.UnsupportedOperationException: Client was created outside of
HttpSolrServer
at
org.apache.solr.client.solrj.impl.HttpSolrServer.setDefaultMaxConnectionsPerHost(HttpSolrServer.java:614)
at
net.yacy.cora.services.federated.solr.SolrSingleConnector.<init>(SolrSingleConnector.java:128)
at
net.yacy.cora.services.federated.solr.SolrShardingConnector.<init>(SolrShardingConnector.java:55)
at net.yacy.search.Switchboard.<init>(Switchboard.java:657)
at net.yacy.yacy.startup(yacy.java:222)
at net.yacy.yacy.main(yacy.java:1018)
2012-04-27 18:26:36 +02:00
Michael Peter Christen
22e1f68c0b
solrj user authentication patch
2012-04-27 17:53:45 +02:00
Michael Peter Christen
09484955dc
added new entry class for embed tags
2012-04-27 17:48:51 +02:00
Michael Peter Christen
62f2554a01
- fixed build problems (deprecated methods using httpclient 3.1)
...
- removed httpclient 3.1 lib which was used by solrj (solrj now uses
httpclient 4)
2012-04-27 17:46:08 +02:00
Michael Peter Christen
a6d60fc21f
concurrency enhancement in ConfigurationSet
2012-04-27 17:20:18 +02:00
Michael Peter Christen
453010bd68
- solved problems with backpath normalization
...
- redesigned in/outbound link handover
- removed iframe links from inbound/outbound in solr scheme
2012-04-27 16:48:51 +02:00
Michael Peter Christen
5f5ed33ed8
patch for media search (audio, video apps)
2012-04-27 14:18:02 +02:00
Michael Peter Christen
7860c1df80
fix needed for new solrj library
2012-04-27 14:13:59 +02:00
Michael Peter Christen
0e13022147
- enhanced solr field documentation
...
- added xml api button to IndexFederated_p - the solr schema.xml file
can be generated by YaCy
2012-04-26 15:25:07 +02:00
Michael Peter Christen
19efbf1b0f
- apply directDocByURL to NOLOAD Queue
...
- choose pushing to NOLOAD as default for site crawl
2012-04-26 00:23:18 +02:00
Michael Peter Christen
659178942f
- Redesigned crawler and parser to accept embedded links from the NOLOAD
...
queue and not from virtual documents generated by the parser.
- The parser now generates nice description texts for NOLOAD entries
which shall make it possible to find media content using the search
index and not using the media prefetch algorithm during search (which
was costly)
- Removed the media-search prefetch process from image search
2012-04-24 16:07:03 +02:00
Michael Peter Christen
a3badd3205
changed search process for images: no more media snippet load process,
...
show only links from index which had been on the text search page
before. This creates a superfast search process for images!
2012-04-24 12:55:58 +02:00
reger
c1f6b4fb52
lookupByIP: prevent comparing of port parameter if called with port -1 (=unknown)
2012-04-24 00:05:01 +02:00
Michael Peter Christen
f8cd57c92f
new indexing strategy: ALL links that appear anywhere are indexed, not
...
only links where the content can be parsed. All non-parseable links are
placed into the noload queue. The search process must therefore be able
to filter out non-text search results.
- This fixes the problem that image search results appeared in the text
search.
- The interactive search can retrieve now ALL types of links
- The p2p interface is now extended to retrieve only certain types of
links (text, image, video, apps)
- The search process has an extension to filter the right document type
according to the search query
2012-04-22 02:05:17 +02:00
Michael Peter Christen
14f67f217c
refactoring of ContentDomain: now subclass of Classification
2012-04-22 00:04:36 +02:00
Michael Peter Christen
8a08c96a82
removed dependency from logging
2012-04-21 21:32:31 +02:00
Michael Peter Christen
a1a5b015d8
refactoring: moved document Classification to cora package
2012-04-21 21:31:13 +02:00
Michael Peter Christen
33d1062c79
refactoring: the cache belongs to the crawler
2012-04-21 13:34:07 +02:00
Michael Peter Christen
4d5da75814
fix for parser problem if a <a>-tag is 'within' html tags with unclosed
...
tags. That prevented the <a> tags from beeing recognized. This is a fix
for http://forum.yacy-websuche.de/viewtopic.php?p=25516#p25516
2012-04-18 10:30:04 +02:00
Michael Peter Christen
91a86f0b06
fixed to network graph testing
2012-04-17 11:46:14 +02:00
Michael Peter Christen
7b5b9baee0
added citation rank to ranking profile
2012-04-16 23:43:50 +02:00
Michael Peter Christen
046f3a7e8d
check if httpc has decompressed the release file and rename the file
...
from .tar.gz to .tar if that happened
2012-04-16 09:50:55 +02:00
Michael Christen
02e4dedff2
fix to url citation collection
2012-04-13 11:52:59 +02:00
Michael Christen
e32055aa15
added stub classes for
...
- a new database for url reference data ('seen links')
- a new database extending the references to the full url metadata
attributes set which shall replace the old metadata database if it is
finished
- migration help classes stub to use old and new metadata databases
simultanously
2012-04-13 07:09:15 +02:00
Michael Christen
ac5d124ee0
experimental implementation of a citation ranking as post-ranking
...
method. (ranking coefficient fixed, need to be made configurable)
2012-04-13 06:47:33 +02:00
Michael Christen
8fc86fe397
added storage of full anchor link structure:
...
the links between all pages are now stored. The same index structure as
used for the word index is used to make a reverse link index.
The new file(s) in SEGMENT/default/citation.index.*.blob store the
citation index. This will be used to create much more detailed link
structures for the YaCy apis and to create a better ranking. A ranking
using the citation.index should provide better results especially for
portal indexes and initranets.
2012-03-29 17:20:14 +02:00
Lotus
0b3f39136e
allow custom ppm lower than minimum button on /Crawler_p.html
...
fixes http://bugs.yacy.net/view.php?id=166
2012-03-17 20:43:19 +01:00
Michael Peter Christen
532c7cf827
added physics experiment to the graph plotter. not active by default
2012-02-28 13:18:46 +01:00