orbiter
563d584420
removed more dependencies in cora from kelondro
2012-09-21 11:02:36 +02:00
orbiter
aa65282259
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-21 10:27:30 +02:00
orbiter
63762d8f89
removed kelondro dependencies from cora
2012-09-20 19:38:22 +02:00
orbiter
39564fddbd
more ignore
2012-09-20 18:45:51 +02:00
orbiter
6e0f4557f8
added ftp to getName
2012-09-20 18:29:04 +02:00
cominch
23204d2245
change parameter to support the smw extension for list import
2012-09-20 15:02:57 +02:00
Michael Peter Christen
c235d5c0f1
fixed size parsing in RSS message parser (for YaCy size parameter)
2012-09-19 06:36:07 +02:00
orbiter
089a03114e
full memory usage for debian and when changing the size: debian seems to
...
dislike the big difference between xmx and xms (I have crashes here
which stop if both values are same)
2012-09-18 22:31:01 +02:00
Michael Peter Christen
5bc8f34150
fix for success query counter
2012-09-18 11:06:36 +02:00
orbiter
60b1e23f05
added new crawl options:
...
- indexUrlMustMatch and indexUrlMustNotMatch which can be used to select
loaded pages for indexing. Default patterns are in such a way that all
loaded pages are also indexed (as before) but when doing an expert crawl
start, then the user may select only specific urls to be indexed.
- crawlerNoDepthLimitMatch is a new pattern that can be used to remove
the crawl depth limitation. This filter a never-match by default (which
causes that the depth is used) but the user can select paths which will
be loaded completely even if a crawl depth is reached.
2012-09-16 21:27:55 +02:00
orbiter
4987921d3d
fixed the size() method which counted also failed pages (which are also
...
inside the solr index)
2012-09-16 21:22:56 +02:00
Michael Peter Christen
6ec02deec6
added new crawl attributes in crawl profile (not active yet)
2012-09-14 16:49:29 +02:00
Michael Peter Christen
a13e5153ac
- added the possibility to have not one but a list of crawl start urls
...
- the list of urls is entered in the expert crawl start in a textfield;
the one-line input field was replaced with a text box
- start urls can also be given in one single line where the urls are
separated by a '|'-character
- as an effect, the crawl profile cannot carry a single start url for
identificaton because it is possible to have more. Therefore the url was
removed from the crawl profile
- this affect all servlets which display a crawl profile: removed the
url field from all there servlets
- to work consistently with several start urls and the other crawl
starts which computed crawl start url lists from sitelists or sitemaps,
the crawl start servlet was restructured completely
- new rules for must-match patterns were created to make it possible
that site crawl starts also work with several crawl starts at once
2012-09-14 12:25:46 +02:00
Michael Peter Christen
975bc95ddf
added default facet fields for json response format (stub)
2012-09-14 12:09:20 +02:00
Michael Peter Christen
2f218df55d
added missing license headers
2012-09-14 12:06:06 +02:00
Michael Peter Christen
a30653a864
added a regular expression test servlet which is linked within the
...
parser/crawler error page whenever a problem with regular expression
occurs.
This makes it easy to correct and enhance the must-match and
must-not-match patterns just by trying out which pattern could be
correct.
2012-09-14 12:04:54 +02:00
Michael Peter Christen
0504b01bdc
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-14 00:48:17 +02:00
orbiter
9413f77b65
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-13 23:54:26 +02:00
orbiter
a55e77a115
added twitter search heuristic
2012-09-13 23:53:53 +02:00
Michael Peter Christen
e54ac38095
- some corrections in usage of getFile() and getFileName()
...
- added more attributes in json response writer according to yacy
servlet
2012-09-11 23:28:21 +02:00
Michael Peter Christen
62add1d564
added the protocol and the file name extension to the solr fields since
...
these fields are probably facets in file search
2012-09-11 22:46:39 +02:00
Michael Peter Christen
e072632a54
no complaints about memory if the database is empty
2012-09-11 22:28:10 +02:00
Michael Peter Christen
b846f585fa
fixed a bug with size_i field usage
2012-09-11 20:24:27 +02:00
Michael Peter Christen
9db032664e
activate two solr fields which will be used by administration interface
...
(later)
2012-09-11 20:15:54 +02:00
orbiter
fcd5c7eec3
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-11 09:16:38 +02:00
orbiter
6171143b4a
added facet stub in JsonResponseWriter
2012-09-11 09:15:47 +02:00
Michael Peter Christen
e6330f648a
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-11 03:02:47 +02:00
Michael Peter Christen
e84ffdb4f3
enhanced solr writers
2012-09-11 03:02:02 +02:00
Michael Peter Christen
9644c186a4
added search functionality to ViewFile.html servlet
2012-09-11 02:03:14 +02:00
Marc Nause
03f3a8b647
*) fix for http://www.yacy-forum.org/viewtopic.php?f=2&t=759
2012-09-10 20:22:26 +02:00
Michael Peter Christen
b69ed96f0b
- added collections to yacydoc
...
- changed yacydoc.htm to yacydoc.json
- added query logging in solr and gsa search result
2012-09-10 15:20:55 +02:00
Michael Peter Christen
5df553c152
- added a json writer for solr (yes there was one using xslt but this
...
one writes the same way as yacysearch.json)
- using the new json solr result to change the ajax search in
IndexControlURLs to the new solr search
2012-09-10 14:30:44 +02:00
Michael Peter Christen
4634f0e626
fix for images_withalt
2012-09-10 12:30:03 +02:00
Michael Peter Christen
e65cecc419
- updated lucene libraries to 3.6.1
...
- added lucene-grouping which enables faceted search; try this:
http://localhost:8090/solr/select?q=*:*&start=0&rows=3&facet=true&facet.field=host_s
2012-09-10 10:12:38 +02:00
Michael Peter Christen
1754fbb6d9
Merge remote-tracking branch 'reger/master'
2012-09-10 08:10:53 +02:00
Michael Peter Christen
4d29f59a27
removed warnings
2012-09-10 07:15:52 +02:00
Michael Peter Christen
8c099d2106
Merge remote-tracking branch 'origin/master'
...
Conflicts:
htroot/api/ymarks/import_ymark.java
source/de/anomic/data/ymark/YMarkEntry.java
source/de/anomic/data/ymark/YMarkTables.java
2012-09-10 07:05:20 +02:00
apfelmaennchen
59bd478ed1
Added more sophisticated RDF output for YMarks, including the folder
...
structure (b:Topic) and support for multiple tags (dc:subject) and
folders (b:hasTopic) via rdf:Bag container.
2012-09-09 22:56:24 +02:00
apfelmaennchen
d31a632951
- added dmoz RDF dump importer
...
- added indexing to Tables columns to support larger bookmark
collections
- added RDF output (HTTP) for public bookmarks at /YMarks.rdf
- YMarkRDF also provides a Jena RDF Model as "internal" API
- various other changes/fixes for YMarks (mainly backend)
2012-09-09 09:53:58 +02:00
reger
40d8086bf7
keep input order of translation entries within one file section.
...
Allowing on translation conflicts (translaton of words contained in other sentence) to put shorter key at the end of the translation list.
2012-09-09 06:15:25 +02:00
Michael Peter Christen
10b911eed4
Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git
2012-09-07 22:07:02 +02:00
Michael Peter Christen
be67c70a47
added Solr fields:
...
inboundlinks_text_chars_val
inboundlinks_text_words_val
inboundlinks_alttag_txt
outboundlinks_text_chars_val
outboundlinks_text_words_val
outboundlinks_alttag_txt
2012-09-07 22:06:51 +02:00
orbiter
d73fff0e0e
added solr field images_withalt_i
2012-09-07 21:33:45 +02:00
orbiter
66ac4076c2
added disjunction '|' option to site parameter in GSA API
2012-09-06 22:35:55 +02:00
sixcooler
a975bcffcb
clear fulltext-cache and stop crawling if running out of memory
2012-09-06 22:10:03 +02:00
sixcooler
e78fe3f477
also do a clearcache on the solr-connector-caches
2012-09-06 22:07:07 +02:00
sixcooler
9ee2e09983
statistics for solr-cache
2012-09-06 22:02:29 +02:00
Michael Peter Christen
d8425e6809
added collections to crawl monitor
2012-09-04 14:47:53 +02:00
Michael Peter Christen
ee23fc7a32
added h1..h6 counter fields
2012-09-04 14:11:11 +02:00
Michael Peter Christen
4b36a2c3b4
small style changes
2012-09-04 11:23:41 +02:00