Commit Graph

9936 Commits

Author SHA1 Message Date
orbiter
b743e6d79f - prevent that crawl filter have empty (never-match) content
- rewrite the description of the options "Restrict to start domain(s)"
and "Restrict to sub-path(s)" to an explanation, that the restriction
applies to all links in the link list of the option "From Link-List of
URL" if this option is selected
- allow "Restrict to sub-path(s)" if the "From Link-List of URL" is
selected. This is supported in the crawl start.
2013-10-18 14:14:13 +02:00
orbiter
20bbde8665 fix for mustmatch regex computation: result had correct semantic, but
may have contained multiple same expressions within the disjunction of
domain-restrictions. This fix removes the redundant restrictions and
makes the regex shorter.
2013-10-18 13:55:37 +02:00
orbiter
f597fdb602 make it easier to filter properties (case insensitive) 2013-10-17 18:36:35 +02:00
Michael Peter Christen
c833d02cf5 fixed webgraph postprocessing (did nothing and repeated to do this...) 2013-10-16 11:49:04 +02:00
Michael Peter Christen
74d0256e93 enhanced postprocessing: fixed bugs, enable proper postprocessing also
without the harvestingkey, remove crawl profiles after postprocessing,
speed-up for clickdepth computation.
2013-10-16 11:27:06 +02:00
Michael Peter Christen
299f51cb7f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-10-16 04:26:19 +02:00
reger
e7a596afda Merge branch 'master' of git://gitorious.org/yacy/rc1.git 2013-10-16 02:28:13 +02:00
reger
37d24f3318 make use of declared static string ACTION_LOCATION 2013-10-16 02:25:39 +02:00
Michael Peter Christen
7b69c438f7 more methods for the table class 2013-10-15 16:46:59 +02:00
Michael Peter Christen
820b896146 Replaced the inframe loading from yacy.net for donations with the
loading of this iframe from the local host. To make this more flexible,
this iframe is loaded once after startup from yacy.net.
2013-10-15 16:46:06 +02:00
sixcooler
dfb73c9519 bump to httpclient-4.3.1 - a bugfix release 2013-10-14 23:32:24 +02:00
reger
0d4efabaa8 fix YaCy version string in proxy headers
(config parameter vString not longer used)
2013-10-13 17:56:53 +02:00
sixcooler
d9a02ed277 NPE fix for my last commit 2013-10-11 00:44:04 +02:00
sixcooler
61f627eb85 fix for ssl-connections from proxy-usage staying in close-wait-state
+ some extra 'close' in HttpClient
2013-10-10 20:57:37 +02:00
Michael Peter Christen
91fa99e9bb added new icon/image for latest commit 2013-10-09 22:07:59 +02:00
Michael Peter Christen
9fac9249bc - replaced 'edit' link with a clone symbol in Table_API_p since that is
what it does: it clones the crawl, it does not change the crawl.
- moved the appearance of this clone link to the type column since this
makes it visible also if the URL column is not visible.
2013-10-09 22:07:32 +02:00
Michael Peter Christen
0f6db6ad5b Merge remote-tracking branch 'jensbees/crawlexpert-post' 2013-10-09 21:32:27 +02:00
bhoerdzn
3fcf7a94c5 rolling back wrong merge 2013-10-09 21:06:11 +02:00
Jens Bertram
3252c1ec39 Merge upstream/master into crawlexpert-post 2013-10-09 20:49:14 +02:00
Michael Peter Christen
d328cc4a83 fix for didyoumean, added also more asian alphabets 2013-10-09 16:17:50 +02:00
Michael Peter Christen
90c8577840 enhanced ranking; patches to replace old ranking 2013-10-09 15:10:03 +02:00
Jens Bertram
9f6b98d374 Merge master into crawlexpert-post 2013-10-09 14:39:20 +02:00
bhoerdzn
6e33be4ce6 reverting local changes to project.xml 2013-10-09 14:23:06 +02:00
bhoerdzn
a3824dfbaa check URL on inital load, if set 2013-10-09 13:52:44 +02:00
bhoerdzn
52f49d475b add a hidden field for "crawlingstart" since jQuery omits the submit button value 2013-10-09 13:38:20 +02:00
bhoerdzn
b0c0ec2dec link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler" 2013-10-09 12:55:42 +02:00
bhoerdzn
d64d45361c use integer types for boolean values 2013-10-09 12:42:04 +02:00
bhoerdzn
eda123d6fd remove debugging code intercepting post requests 2013-10-09 11:51:07 +02:00
bhoerdzn
5057f27bbd fix typo in parsing "cachePolicy" parameter 2013-10-09 11:41:15 +02:00
bhoerdzn
98f5c9018d Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load. 2013-10-09 11:32:17 +02:00
bhoerdzn
a6a62986d4 correct state handling for country code restriction 2013-10-09 10:42:35 +02:00
bhoerdzn
4066b85155 correctly set initial state for load filters 2013-10-09 10:36:08 +02:00
bhoerdzn
8c91c3e7cd set form boolean values to 0 & 1 instead of false & true 2013-10-09 10:05:51 +02:00
bhoerdzn
c27fabc88e fixed wrong parameter check 2013-10-09 10:00:16 +02:00
bhoerdzn
2214bf5396 Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation. 2013-10-09 09:48:00 +02:00
Michael Peter Christen
1b61bd40ed - Added new solr field url_file_name_tokens_t which stores the file name
tokens. This can be used to enhance the ranking.
- Added also a rating_i field as basis for later usage.
- enhanced the tokenization process.
2013-10-08 23:48:13 +02:00
orbiter
6efa7532d2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-10-08 19:04:57 +02:00
orbiter
5f5a97bafc added the anchor text within web pages to the searcheable entities of a
web page. This can be of benefit for the ranking if these fields are
used for boosts.
2013-10-08 18:41:07 +02:00
orbiter
705b3338ee list more fields available for search and for ranking boosts 2013-10-08 18:15:35 +02:00
sixcooler
d536092fe4 fix false fill NAME_CACHE_MISS-DNS-Cache in case of a timeout
for eg. caused by massive requests when crawl from file
2013-10-08 18:02:42 +02:00
bhoerdzn
405878182f Use list template for all other option lists. Fixed some template expressions. 2013-10-08 15:04:31 +02:00
bhoerdzn
8e74098cd4 Use list template for "reloadIfOlderNumber". 2013-10-08 13:26:09 +02:00
bhoerdzn
52bad7b908 Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields. 2013-10-08 13:24:27 +02:00
Michael Peter Christen
78e7aadb26 removed unused initialization method 2013-10-07 23:51:28 +02:00
Michael Peter Christen
e56aa4fe93 fixed search navigation 2013-10-07 23:51:08 +02:00
Michael Peter Christen
4fbc4740df removed warnings 2013-10-07 23:41:50 +02:00
Lotus
202a9fbdad adding synonyms from German OpenThesaurus ready for use in YaCy 2013-10-07 22:02:42 +02:00
Michael Peter Christen
21aa6a0321 migration to Solr 4.5.0 2013-10-07 17:09:40 +02:00
bhoerdzn
45cf553bc3 try to guess default crawling mode, if none set 2013-10-07 13:13:22 +02:00
bhoerdzn
b4f0c822f2 assign strings before checking contents 2013-10-07 13:01:39 +02:00