Commit Graph

146 Commits

Author SHA1 Message Date
sof
5cb244b79b Merge remote branch 'origin/master' 2012-10-05 18:54:39 +02:00
apfelmaennchen
88b062210c Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based
on the jaudiotagger library. The parser is disabled by default as it
needs to store temporary files for non file:// protocols, which might be
disliked. For your local MP3-collection it loads nicely Artist,
Title, Album etc. from the audio files meta data.
2012-10-05 18:54:26 +02:00
sixcooler
9aa21506be bump to httpcore-4.2.2 (maintenance release) 2012-10-03 02:15:02 +02:00
Michael Peter Christen
d0015df61c added lucene memory library which is now necessary as solr has to
process more complex queries
2012-09-28 13:48:51 +02:00
Michael Peter Christen
e65cecc419 - updated lucene libraries to 3.6.1
- added lucene-grouping which enables faceted search; try this:
http://localhost:8090/solr/select?q=*:*&start=0&rows=3&facet=true&facet.field=host_s
2012-09-10 10:12:38 +02:00
Michael Peter Christen
ff3eaa21b0 added remote search to solr on YaCy peers!
- when doing a remote search, node peers are selected for solr queries
- the solr query is done concurrently to the standard YaCy rwi search
- the solr search result is feeded into the same data structure that
prepares the rwi search result
- the same remote seach that is done to several outside peers is done to
the local solr index
- the search process works now also without any 'old' RWI data using
solr
2012-08-20 12:16:11 +02:00
Michael Peter Christen
d39463a85c added deleteByQuery to solr connectors 2012-08-17 17:05:46 +02:00
Michael Peter Christen
2ccf1dba71 upgrade to solr 3.6.1 2012-08-17 15:11:21 +02:00
Michael Peter Christen
ea49a8aa8c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-08-14 12:40:44 +02:00
Michael Peter Christen
d988ba50cf added a very rudimentary, incomplete, non-verified GSA response writer
for solr. Try this:
http://localhost:8090/gsa/searchresult?q=pdf&site=col1&num=10
2012-08-14 12:40:26 +02:00
cominch
e74d66e28c augmented browsing: remove htmlparser library 2012-08-14 10:09:46 +02:00
cominch
e2119f4e76 augmented browsing: replace htmlparser by jsoup, which is more stable
and reliable
2012-08-14 10:06:12 +02:00
Michael Peter Christen
bf4968d748 source change in classpath 2012-07-20 09:04:02 +02:00
sixcooler
a99ef68422 bump to httpclient-4.2.1 2012-07-09 18:58:33 +02:00
Michael Peter Christen
7b53be141f upgraded to pdfbox 1.7.0
changes in http://www.apache.org/dist/pdfbox/1.7.0/RELEASE-NOTES.txt
with many bugfixes, including performance related
2012-06-22 16:49:58 +02:00
Michael Peter Christen
fad3b14813 added jetty libraries, needed for future use as web server and as
application server for the solr search interface
2012-06-22 15:31:17 +02:00
Michael Peter Christen
b9d42fd9c8 using com.google.common.io.Files instead of homebrew methods 2012-06-22 11:39:17 +02:00
Michael Peter Christen
1be0025a9c - added test for EmbeddedSolrConnector
- added needed libraries for this test
this includes most (all) files needed for an embedded solr
2012-06-22 00:36:49 +02:00
Michael Peter Christen
90b82ce994 using guava for host resolution (non-blocking for ips) and time-out 2012-06-21 16:04:48 +02:00
Michael Peter Christen
3f55dc7c1e - added solr core and libraries that solr needs (lucene is missing, will
follow later)
- added embedded solr connector which can connect to solr
programmatically (without using a server in between)
2012-06-21 14:55:38 +02:00
Michael Peter Christen
5fc6524ca8 - moved triple store to net.yacy.cora.lod (should be generalized there
later
- added abstract add, delete, get methods in the triplestore
- added generation of triples after auto-annotation
- migrated all MultiProtocolURI objects to DigestURI in the parser since
the url hash is needed as subject value in the triples in the triple
store
2012-06-11 16:48:53 +02:00
cominch
5d20cd324a Add Triplestore and RDF query interface
Conflicts:
	build.xml
	defaults/yacy.init
	source/net/yacy/interaction/AugmentHtmlStream.java
2012-06-10 10:35:59 +02:00
cominch
b21048892b augmentedParser add features and integrate external html parser to
modify existing web pages

Conflicts:
	addon/YaCy.app/Contents/Info.plist
	build.xml
2012-06-10 10:23:35 +02:00
sixcooler
56087c1f23 bump to httpclient- httpcore-, httpmime- 4.2 2012-05-30 14:46:21 +02:00
Michael Peter Christen
4d3cc02168 replaced old bzip2 library against better documented commons-compress
package from http://commons.apache.org/compress/
2012-05-28 23:53:48 +02:00
Michael Peter Christen
1795a7325b made HandleSet serializable 2012-05-15 12:55:15 +02:00
Michael Peter Christen
62f2554a01 - fixed build problems (deprecated methods using httpclient 3.1)
- removed httpclient 3.1 lib which was used by solrj (solrj now uses
httpclient 4)
2012-04-27 17:46:08 +02:00
Michael Peter Christen
248299d10f updated solrj lib 2012-04-27 11:22:34 +02:00
Michael Peter Christen
f838997126 updated commons io from 2.0.1 to 2.1 2012-02-24 01:35:01 +01:00
Michael Peter Christen
eeb57ae824 updated http client libraries 2012-02-24 01:06:30 +01:00
Michael Peter Christen
ef5192f8c9 using the generic document parser for crawl starts instead of the html
parser. This makes it possible that every type of document can be a
crawl start point, not only text documents or html documents. Testet
this with a pdf document.
2012-01-23 17:27:29 +01:00
Michael Peter Christen
a30b028cc0 updated libraries 2012-01-18 01:21:41 +01:00
Michael Christen
e69afae87e class path for servlets in eclipse 2011-12-05 12:51:13 +01:00
Al Sutton
8993cac4d8 Initial performance improvements 2011-11-30 11:15:54 +00:00
orbiter
5a7cec59f3 moved ynetSearch to get all files out of htroot/api/util/
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@8042 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-11-16 00:21:56 +00:00
orbiter
65ab067491 migration to solrj 3.4.0
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-14 20:08:59 +00:00
sixcooler
52b477cf6f bump to httpclient-4.1.2, httpcore-4.1.3 - bugfixrelease
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7876 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-08-12 17:42:32 +00:00
sixcooler
48560a44a9 bump to httpcore-4.1.2: a bugfixrelease
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7853 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-25 00:48:29 +00:00
orbiter
c0d9474b31 update to eclipse class path environmen
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-06 14:29:17 +00:00
orbiter
528b59e078 replaced xerces.jar library that was originally added 2005 with SVN 126 to the libx directory and that was moved to lib in SVN 5781
the new replacement is taken from http://xerces.apache.org and has the version 2.11.0 and was inside the file Xerces-J-bin.2.11.0.tar.gz
and consists of two files named xercesImpl.jar and xml-apis.jar
The original purpose of that library was to support:
- content parsers
- optional seed uploader
- SOAP API (which will be committed later)
Since the SOAP API does not exist any more the purpose is to support content parser and an optional seed uploader

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-07-02 22:33:35 +00:00
orbiter
77fe69395d added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7774 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-05 20:04:41 +00:00
sixcooler
efcd21e0ed new httpclient, httcore (bugfixrelease)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7769 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-06-02 21:34:50 +00:00
orbiter
761b1c71dc added latest pdfbox
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7761 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-05-30 14:56:36 +00:00
sixcooler
0abd99621c correct slip of click in classpath from last commit - I wonder there are 7658'is around
apflemaenchen, please don't take this amiss

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-16 03:08:25 +00:00
apfelmaennchen
a0e4960a4d YMark:
- first attempt for a firefox json bookmark importer
- added JSON library json-simple-1.1.jar

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7658 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-15 20:58:58 +00:00
orbiter
19fd13d3bc Added federated index storage to solr.
YaCy supports now the storage to remote solr indexes.
More federated storage (and search) methods may follow.

The remote index scheme is the same as produced by the SolrCell; see
http://wiki.apache.org/solr/ExtractingRequestHandler
Because this default scheme is used, the default example scheme can be used as solr configuration
This is also the same scheme that solr uses if documents are imported with apache tika.

federated solr storage is switched off by default.

To use this, do the following:
- set federated.service.solr.indexing.enabled = true
- download solr from http://www.apache.org/dyn/closer.cgi/lucene/solr/
- extract the solr (3.1) package, 'cd example' and start solr with 'java -jar start.jar'
- start yacy and then start a crawler. The crawler will fill both, YaCy and solr indexes.
- to check whats in solr after indexing, open http://localhost:8983/solr/admin/

Until now it is not possible to use the solr index to search with YaCy in that solr index.
This functionality is now available for two reasons:
1) to compare the functionality of Solr and YaCy and to compare the search speed
2) to use YaCy as a search appliance for people who need a crawler or other source harvesting methods
   that YaCy provides (like dublin core reading, wikimedia dump reading, rss feed reader etc) if people still
   want to use solr instead of YaCy.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-04-14 20:05:04 +00:00
sixcooler
9199b9e3c6 also putting jcifs-1.3.15 into classpath
(let me me build YaCy again :-)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7588 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-13 22:44:50 +00:00
sixcooler
45dcfa3460 update to httpclient-4.1
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-08 21:46:24 +00:00
orbiter
ca738ac924 - added a tag cloud to search results (using the topics)
- some refactoring of score classes
- added default package for new classes add_ymark and delete_ymark

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-10-15 22:01:39 +00:00
sixcooler
f4357dff03 bump to httpclient-4.0.3 which fixes a number of bugs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-27 13:24:40 +00:00