Commit Graph

9081 Commits

Author SHA1 Message Date
Michael Peter Christen
2e7219f9fd removed hightlighting of search results within collections in GSA
interface
2012-11-09 16:25:24 +01:00
Michael Peter Christen
074dfd297b added icons and a selection for hosts with urls pending for crawler or
with errors
2012-11-09 16:24:56 +01:00
Michael Peter Christen
f07e5fb553 release 1.2 2012-11-07 23:14:45 +01:00
Michael Peter Christen
4c4e0eece2 added new submenu 'Target Analysis' with three servlets which are useful
to analyse the target servers: robots.txt table, mass target analysis
and a regex tester
2012-11-07 21:26:01 +01:00
Michael Peter Christen
61995d508e do the commit anyway before calling a search interface 2012-11-07 17:27:50 +01:00
Michael Peter Christen
842faf96a2 fixed media search 2012-11-07 17:27:13 +01:00
Michael Peter Christen
86ec199126 using a better file name 2012-11-07 16:39:49 +01:00
Michael Peter Christen
93001586a0 removed warnings, removed too-fast pausing of crawls 2012-11-07 15:37:14 +01:00
Michael Peter Christen
8041742e48 added matching of path to query pattern 2012-11-07 15:06:13 +01:00
Michael Peter Christen
8b1c9cba3d fixed a problem with non-terminating crawls 2012-11-07 15:05:44 +01:00
Michael Peter Christen
61a1d32356 fix to ftp client 2012-11-07 14:58:28 +01:00
Michael Peter Christen
5105256927 update to search result logging (this was a remaining issue from the
solr 4.0.0 migration)
2012-11-07 14:15:27 +01:00
Michael Peter Christen
570e42c4e3 fix for filetype naviagtor 2012-11-07 13:53:29 +01:00
Michael Peter Christen
71ed8e5e07 bugfixes for crawler 2012-11-07 12:52:19 +01:00
Michael Peter Christen
29fbbb49dc better colors for host browser and corrected document count 2012-11-07 12:23:21 +01:00
Michael Peter Christen
12c0db20e5 fixed npe for surrogate import 2012-11-07 02:46:51 +01:00
Michael Peter Christen
6244b084cd fixed wrong order of result count values 2012-11-07 02:29:33 +01:00
Michael Peter Christen
631b08e7e2 update to HostBrowser 2012-11-07 02:17:24 +01:00
Michael Peter Christen
51f420e4f5 removed location search because it is only working in special cases 2012-11-07 02:04:41 +01:00
Michael Peter Christen
52df6ee369 more logging 2012-11-07 02:04:08 +01:00
Michael Peter Christen
158732af37 automatically delete entries from the crawl profile list if crawl is
terminated.
2012-11-07 02:03:44 +01:00
Michael Peter Christen
15d1460b40 added information about the reason of pausing of crawls 2012-11-06 15:21:56 +01:00
Michael Peter Christen
2371ef031c added solr faceted search support to YaCy search results
added solr highlighting / YaCy snippets to YaCy search results
- facets are now much more complete
- facets are computed and searched much faster
- snippet computation is done by solr if solr knows the snippet
2012-11-06 14:32:08 +01:00
Michael Peter Christen
b30a7162fa added more thread-renaiming for search processes 2012-11-06 12:31:23 +01:00
Michael Peter Christen
900445d8e9 set the thread name during solr queries to the solr query to get better
debugging options
2012-11-06 11:48:04 +01:00
Michael Peter Christen
d481abd087 added the visualization of error-urls to host browser
- only visible for admins
- a faceted search generates a huge list for all hosts in the host list
- the faceted search algorithms had to be modified for that
- within the browsing of the directory path, the error cause is written
to the url which is presented as error-url
- the errors are also accumulated for directory sums
2012-11-06 00:29:37 +01:00
Michael Peter Christen
a15819fbec fix for some interface problems 2012-11-05 22:14:52 +01:00
Michael Peter Christen
791e1dcfdf when a new crawl is started, delete all entries about error-urls for
crawl-start domains
2012-11-05 22:14:27 +01:00
Michael Peter Christen
c6a6f4c4e6 added a hack which makes the HostBrowser more performant when the given
host has a lot of urls. If the number of urls is > 1000, then the list
of documents is restricted to such which have no subpath, if the root
path is selected. However, this can cause a problem if no documents on
the root path exist but only on paths below that root path.
2012-11-05 18:57:21 +01:00
Michael Peter Christen
619bf7e875 fixed filetype modified for media types in text search 2012-11-05 18:08:00 +01:00
Michael Peter Christen
97f82994a6 automatically pause the crawler if there is a problem with solr 2012-11-05 16:34:42 +01:00
Michael Peter Christen
64ac2b7b7d new submenu template 2012-11-05 15:36:42 +01:00
Michael Peter Christen
5e77801aac update to web interface structure 2012-11-05 15:23:03 +01:00
Michael Peter Christen
8fb370d9f8 renovated the way how search results are count. should be correct now... 2012-11-05 03:19:28 +01:00
Michael Peter Christen
7bec253bb0 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2012-11-04 09:21:58 +01:00
Michael Peter Christen
d88eb657fd Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 2012-11-04 09:21:21 +01:00
orbiter
354ef8000d - added 'deleteold' option to crawler which causes that documents are
deleted which are selected by a crawl filter (host or subpath)
- site crawl used this option be default now
- made option to deleteDomain() concurrency
2012-11-04 02:58:26 +01:00
Michael Peter Christen
19d1f474ce host browser now shows also number of pending files per subdirectory +
bugfixes
2012-11-02 14:40:02 +01:00
Michael Peter Christen
75dd706e1b update to HostBrowser:
- time-out after 3 seconds to speed up display (may be incomplete)
- showing also all links from the balancer queue in the host list (after
the '/') and in the result browser view with tag 'loading'
2012-11-02 13:57:43 +01:00
Michael Peter Christen
e2c4c3c7d3 migration to solr 4.0.0 2012-11-02 12:29:48 +01:00
Michael Peter Christen
b764de424a code cleanup 2012-11-02 10:28:32 +01:00
Michael Peter Christen
69aa39d664 update to libraries required by solr 4.0.0 2012-11-02 10:27:44 +01:00
Michael Peter Christen
9330ad4838 - fixed the delete option in host browser
- added a delete method which can be used to delete a full subpath in
solr.
2012-11-02 01:22:31 +01:00
Michael Peter Christen
a63179f3f9 added the MIME attribute for the R tag in GSA search result writer 2012-11-02 00:14:29 +01:00
Michael Peter Christen
40df2fd193 added the host browser as link to search results. that means you can
select a browsing position after a search is done on the search results.
2012-11-01 21:38:05 +01:00
Michael Peter Christen
1168d09de8 more refactoring - integrated the code of SnippetProcess into
SearchEvent
2012-11-01 17:40:06 +01:00
Michael Peter Christen
6629e37685 tried to clean up the search process mess 2012-11-01 17:16:43 +01:00
Michael Peter Christen
c5f67a5d6d fixed a problem with local search from solr results: now all results
from solr are shown (again)
2012-11-01 10:22:22 +01:00
sixcooler
02957d5982 missing license-files
(sorry I didn't commit theses files by mistake)
2012-10-31 23:47:08 +01:00
Michael Peter Christen
16216c2344 added missing libraries 2012-10-31 23:29:47 +01:00