Commit Graph

9637 Commits

Author SHA1 Message Date
Felix Ableitner
fd90fcc4e0 Fixes #196. 2013-07-02 20:45:41 +02:00
Michael Peter Christen
5a5d411ec0 new robots_i attribute fields 2013-07-02 14:29:13 +02:00
Michael Peter Christen
fa08bd9d5a hack to prevent long waiting times in crawler 2013-07-01 13:24:52 +02:00
Michael Peter Christen
f1c5338210 prepartion for greedy crawl profiles and refactoring 2013-07-01 13:10:09 +02:00
Michael Peter Christen
e6f361f474 adding the canonical tag to crawl queues 2013-07-01 13:09:41 +02:00
orbiter
40c5ee47c1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-06-30 12:07:25 +02:00
orbiter
ae23a0badb updated copyright message; included LGPL for 'cora' and a warranty
warning.
2013-06-30 11:30:39 +02:00
reger
a6bf44212e bugfix: location (lat/lon) meta data retrival (Double.NaN check) 2013-06-30 03:50:07 +02:00
Michael Peter Christen
203921006a redesign of citation index storage 2013-06-30 02:11:46 +02:00
orbiter
7c6ccc426c set crawlingQ to true by default because most webpages are dynamic and
crawlingQ should only be switched off in case of crawler traps
2013-06-29 20:28:14 +02:00
Lotus
5de4267a9d windows installer: update to latest jre 2013-06-29 18:54:30 +02:00
reger
83763ee4a4 jpeg parser: extract GPS location from meta data 2013-06-29 00:35:43 +02:00
Michael Peter Christen
e92b9275ce Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-06-28 15:33:29 +02:00
Michael Peter Christen
56cdcfa2fa fixed greedy learning mode - global is not a search attribute in
searchitems
2013-06-28 15:33:19 +02:00
Michael Peter Christen
32aa1d4569 removed unused option for queries 2013-06-28 15:32:36 +02:00
Michael Peter Christen
0c5bed7e2c added configuration option for greedy learning function to ConfigPortal
servlet
2013-06-28 15:31:36 +02:00
sixcooler
5d1f619f07 possible helpful closing of solr-requests 2013-06-28 15:19:50 +02:00
Michael Peter Christen
9d291764d1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-06-28 15:03:25 +02:00
sixcooler
e5abccdfe4 added optimize-option 2013-06-28 14:51:37 +02:00
Michael Peter Christen
8ea6ddf636 removed attributes from ConfigPortal.html which are redundant to
ConfigSearchPage_p.html
2013-06-28 14:17:14 +02:00
Michael Peter Christen
64140f35cd fix for solr requests if no query part is given (prevent npe) 2013-06-28 13:16:25 +02:00
Michael Peter Christen
8caaf6203a fixed false multiple-generation of remote facet search which
caused high cpu usage on remote side.
2013-06-28 12:39:36 +02:00
Michael Peter Christen
23fb458963 - fix to gsa searchresult answer in case that no query part is given
- fix to gsa default number of results (is 'num')
2013-06-28 12:22:33 +02:00
Michael Peter Christen
823ae4d6a7 added url_protocol_s to error documents 2013-06-26 16:51:36 +02:00
Michael Peter Christen
660a196989 refactoring 2013-06-26 09:27:22 +02:00
Michael Peter Christen
c4538d8d91 added metadata-extractor-2.6.2.jar to eclipse classpath, removed old lib 2013-06-26 09:26:34 +02:00
reger
3760e2616b bump up lib/metadata-extractor-2.6.2.jar (used for image parser) with needed code adjustments 2013-06-25 23:24:02 +02:00
Michael Peter Christen
9a6fcdf597 npe fix 2013-06-25 16:36:16 +02:00
Michael Peter Christen
54024958ac added url_file_name_s in qeury for live-search of urls 2013-06-25 16:36:05 +02:00
Michael Peter Christen
16d1d744fa added url_file_name_s in default collection schema for the file name
without the file extension. This part of the file path is removed from
the multi-field url_paths_sxt, which has now not the file name as last
part of the path list.

The same applies to the new fields source_file_name_s and
target_file_name_s in the webgraph schema.
2013-06-25 16:27:20 +02:00
reger
8d1c4c423d make imageparser fileextension detection case insensitive (extensions are often upper case) 2013-06-23 00:39:15 +02:00
Michael Peter Christen
f542cf7d9c fix for daterange: the to-date is inclusive 2013-06-21 15:47:12 +02:00
Michael Peter Christen
f9d859f5dc now writing image alt texts and (camelcase-)parsed urls into a text
search field for a better image retrieval
2013-06-18 16:51:56 +02:00
Michael Peter Christen
c36720d45f added daterange option to gsa api 2013-06-18 16:25:00 +02:00
Michael Peter Christen
e441a9d4c8 to avoid confusion, the gsa api is available at /search? and
/searchresult?
2013-06-18 16:22:06 +02:00
orbiter
8792e6c6e9 stub for better image indexing 2013-06-18 13:28:30 +02:00
orbiter
97f2ac9091 added hint to gsa response writer that the result comes from a yacy peer 2013-06-17 13:29:03 +02:00
orbiter
d62464f129 start of next development cycle with small version number 0.01 (as in
the past)
2013-06-17 13:28:28 +02:00
Michael Peter Christen
363e955a0c Release 1.5 2013-06-13 23:50:00 +02:00
Michael Peter Christen
14186e815e npe fix 2013-06-13 22:42:21 +02:00
Michael Peter Christen
4e3007f4a0 typo 2013-06-13 22:40:46 +02:00
Michael Peter Christen
bdf306e0a7 increased time-out for loading of seed-lists 2013-06-13 22:32:06 +02:00
Michael Peter Christen
2cb6b6bc21 added target="_blank" to shutdown links 2013-06-13 22:31:39 +02:00
orbiter
c8e94ad7c7 fix for citation search in case that the citation is very fresh 2013-06-13 18:27:57 +02:00
orbiter
57dcf68665 added a feed-back message inside the shutdown page 2013-06-13 14:44:47 +02:00
Michael Peter Christen
0600d510e1 show the citation report also in ViewFile 2013-06-13 13:22:43 +02:00
Michael Peter Christen
1a92b61d69 fixed usage of ViewFile which needs a commit before showing latest crawl
result pages.
2013-06-13 13:08:24 +02:00
Michael Peter Christen
374d2e2a52 removed warning message during crawling 2013-06-13 13:03:56 +02:00
Michael Peter Christen
570511f3c8 removed fields references_internal_id_sxt and
references_internal_url_sxt because they had been shown to be
superfluous. The citation of referrer in the host browser is possible
without them. Therefore now the host browser does not only show
internal, but also external referrer to each link.
2013-06-13 13:01:28 +02:00
Michael Peter Christen
fd1776a3b0 added a new 'Citations' function: each search result item can now be
explored for citations within other documents. A click on the
'Citations' link shows an analysis with all text lines in the document
each with a complete list of documents which contain the same line. A
second section shows the linking documents in ascending order of number
of citations from the original document. Because documents from
different hosts are most interesting here, they are listed at the top of
the page as possible 'copypasta' source.
2013-06-12 15:02:49 +02:00