Commit Graph

8886 Commits

Author SHA1 Message Date
Michael Peter Christen
1b474139dd used the new zip writer/reader to add a solr dump process: the whole
solr index can be written to a zip dump and also restored during runtime
2012-09-24 17:05:28 +02:00
Michael Peter Christen
4a3e684f8c added a directory-to-zip writer and zip-to-directory reader 2012-09-24 17:04:37 +02:00
Michael Peter Christen
d9ebf4a40f a bit more logging 2012-09-24 15:01:44 +02:00
Michael Peter Christen
5683162bd3 simplifications in DHT Distribution class and more documentation 2012-09-24 12:01:09 +02:00
Michael Peter Christen
e57bf2ca39 simplified DHT classes 2012-09-24 01:04:39 +02:00
orbiter
a053b356ee added new classes to renovate the YaCy protocol based on simple data
structures in cora:
- added the Peer object, which is a fresh version of Seed
- added the Peers object, which is a fresh version of Network
- added the Network api access class to retrieve a list of peers based
on the Network.xml servlet in all YaCy peers.
2012-09-22 11:10:11 +02:00
orbiter
14897d4bfc fixed mistake in wt-option which caused that the yacy json format
overlapped the solr built-in json format
2012-09-21 21:38:50 +02:00
Michael Peter Christen
8219a445f3 refactoring 2012-09-21 16:46:57 +02:00
Michael Peter Christen
f879a344e7 fix for no depth limit default value 2012-09-21 16:05:17 +02:00
Michael Peter Christen
fa7f6f0be8 added HostBrowser servlet (stub) 2012-09-21 15:48:40 +02:00
Michael Peter Christen
00c1c777fa refactoring 2012-09-21 15:48:16 +02:00
orbiter
563d584420 removed more dependencies in cora from kelondro 2012-09-21 11:02:36 +02:00
orbiter
aa65282259 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-21 10:27:30 +02:00
orbiter
63762d8f89 removed kelondro dependencies from cora 2012-09-20 19:38:22 +02:00
orbiter
39564fddbd more ignore 2012-09-20 18:45:51 +02:00
orbiter
6e0f4557f8 added ftp to getName 2012-09-20 18:29:04 +02:00
cominch
23204d2245 change parameter to support the smw extension for list import 2012-09-20 15:02:57 +02:00
Michael Peter Christen
c235d5c0f1 fixed size parsing in RSS message parser (for YaCy size parameter) 2012-09-19 06:36:07 +02:00
orbiter
089a03114e full memory usage for debian and when changing the size: debian seems to
dislike the big difference between xmx and xms (I have crashes here
which stop if both values are same)
2012-09-18 22:31:01 +02:00
Michael Peter Christen
5bc8f34150 fix for success query counter 2012-09-18 11:06:36 +02:00
orbiter
60b1e23f05 added new crawl options:
- indexUrlMustMatch and indexUrlMustNotMatch which can be used to select
loaded pages for indexing. Default patterns are in such a way that all
loaded pages are also indexed (as before) but when doing an expert crawl
start, then the user may select only specific urls to be indexed.
- crawlerNoDepthLimitMatch is a new pattern that can be used to remove
the crawl depth limitation. This filter a never-match by default (which
causes that the depth is used) but the user can select paths which will
be loaded completely even if a crawl depth is reached.
2012-09-16 21:27:55 +02:00
orbiter
4987921d3d fixed the size() method which counted also failed pages (which are also
inside the solr index)
2012-09-16 21:22:56 +02:00
Michael Peter Christen
6ec02deec6 added new crawl attributes in crawl profile (not active yet) 2012-09-14 16:49:29 +02:00
Michael Peter Christen
a13e5153ac - added the possibility to have not one but a list of crawl start urls
- the list of urls is entered in the expert crawl start in a textfield;
the one-line input field was replaced with a text box
- start urls can also be given in one single line where the urls are
separated by a '|'-character
- as an effect, the crawl profile cannot carry a single start url for
identificaton because it is possible to have more. Therefore the url was
removed from the crawl profile
- this affect all servlets which display a crawl profile: removed the
url field from all there servlets
- to work consistently with several start urls and the other crawl
starts which computed crawl start url lists from sitelists or sitemaps,
the crawl start servlet was restructured completely
- new rules for must-match patterns were created to make it possible
that site crawl starts also work with several crawl starts at once
2012-09-14 12:25:46 +02:00
Michael Peter Christen
975bc95ddf added default facet fields for json response format (stub) 2012-09-14 12:09:20 +02:00
Michael Peter Christen
2f218df55d added missing license headers 2012-09-14 12:06:06 +02:00
Michael Peter Christen
a30653a864 added a regular expression test servlet which is linked within the
parser/crawler error page whenever a problem with regular expression
occurs.
This makes it easy to correct and enhance the must-match and
must-not-match patterns just by trying out which pattern could be
correct.
2012-09-14 12:04:54 +02:00
Michael Peter Christen
0504b01bdc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-14 00:48:17 +02:00
orbiter
9413f77b65 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-13 23:54:26 +02:00
orbiter
a55e77a115 added twitter search heuristic 2012-09-13 23:53:53 +02:00
Michael Peter Christen
e54ac38095 - some corrections in usage of getFile() and getFileName()
- added more attributes in json response writer according to yacy
servlet
2012-09-11 23:28:21 +02:00
Michael Peter Christen
62add1d564 added the protocol and the file name extension to the solr fields since
these fields are probably facets in file search
2012-09-11 22:46:39 +02:00
Michael Peter Christen
e072632a54 no complaints about memory if the database is empty 2012-09-11 22:28:10 +02:00
Michael Peter Christen
b846f585fa fixed a bug with size_i field usage 2012-09-11 20:24:27 +02:00
Michael Peter Christen
9db032664e activate two solr fields which will be used by administration interface
(later)
2012-09-11 20:15:54 +02:00
orbiter
fcd5c7eec3 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-11 09:16:38 +02:00
orbiter
6171143b4a added facet stub in JsonResponseWriter 2012-09-11 09:15:47 +02:00
Michael Peter Christen
e6330f648a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-09-11 03:02:47 +02:00
Michael Peter Christen
e84ffdb4f3 enhanced solr writers 2012-09-11 03:02:02 +02:00
Michael Peter Christen
9644c186a4 added search functionality to ViewFile.html servlet 2012-09-11 02:03:14 +02:00
Marc Nause
03f3a8b647 *) fix for http://www.yacy-forum.org/viewtopic.php?f=2&t=759 2012-09-10 20:22:26 +02:00
Michael Peter Christen
b69ed96f0b - added collections to yacydoc
- changed yacydoc.htm to yacydoc.json
- added query logging in solr and gsa search result
2012-09-10 15:20:55 +02:00
Michael Peter Christen
5df553c152 - added a json writer for solr (yes there was one using xslt but this
one writes the same way as yacysearch.json)
- using the new json solr result to change the ajax search in
IndexControlURLs to the new solr search
2012-09-10 14:30:44 +02:00
Michael Peter Christen
4634f0e626 fix for images_withalt 2012-09-10 12:30:03 +02:00
Michael Peter Christen
e65cecc419 - updated lucene libraries to 3.6.1
- added lucene-grouping which enables faceted search; try this:
http://localhost:8090/solr/select?q=*:*&start=0&rows=3&facet=true&facet.field=host_s
2012-09-10 10:12:38 +02:00
Michael Peter Christen
1754fbb6d9 Merge remote-tracking branch 'reger/master' 2012-09-10 08:10:53 +02:00
Michael Peter Christen
4d29f59a27 removed warnings 2012-09-10 07:15:52 +02:00
Michael Peter Christen
8c099d2106 Merge remote-tracking branch 'origin/master'
Conflicts:
	htroot/api/ymarks/import_ymark.java
	source/de/anomic/data/ymark/YMarkEntry.java
	source/de/anomic/data/ymark/YMarkTables.java
2012-09-10 07:05:20 +02:00
apfelmaennchen
59bd478ed1 Added more sophisticated RDF output for YMarks, including the folder
structure (b:Topic) and support for multiple tags (dc:subject) and
folders (b:hasTopic) via rdf:Bag container.
2012-09-09 22:56:24 +02:00
apfelmaennchen
d31a632951 - added dmoz RDF dump importer
- added indexing to Tables columns to support larger bookmark
collections
- added RDF output (HTTP) for public bookmarks at /YMarks.rdf
- YMarkRDF also provides a Jena RDF Model as "internal" API
- various other changes/fixes for YMarks (mainly backend)
2012-09-09 09:53:58 +02:00