Commit Graph

5152 Commits

Author SHA1 Message Date
reger
ba276d3e64 add description_txt to default query fields,
Dublin Core Metadata field extracted by most parsers.
2015-02-22 05:42:04 +01:00
reger
ad1596f9ac upd lucene api doc link 2015-02-16 01:20:12 +01:00
reger
1196ff01c8 revert: formatting fix eats also up highlighting
need other solution for snippets with unwanted html code
2015-02-14 02:43:05 +01:00
reger
61f42a7928 fix formatting issue in search result display
if description contains html code
noticed e.g. for id=NmNdJ9uApLaQ  http://hswong3i.net/blog/hswong3i/virtualmin-drupal-7-x-ubuntu-12-04-howto
2015-02-13 00:20:33 +01:00
Michael Peter Christen
6578ff3ddb enhanced suggest function 2015-02-09 18:45:07 +01:00
reger
ab98f69592 fix: searchoption hint for heuristic 2015-02-08 00:15:30 +01:00
Michael Peter Christen
974d58b01f IPv6 Fix for push interface 2015-02-04 15:03:34 +01:00
Michael Peter Christen
fe50e5aef6 fix for failed selection of terms in faceted search with vocabularies 2015-02-04 11:55:27 +01:00
Michael Peter Christen
1309619a71 remove remote indexing option in crawl start if not in p2p mode 2015-02-04 11:37:07 +01:00
Michael Peter Christen
6324db1213 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2015-02-04 11:27:31 +01:00
reger
5cb05c3013 adjust table column width to not line wrap crawler traffic line 2015-02-04 03:51:34 +01:00
Michael Peter Christen
606d00c8f2 cloning a crawl now accepts the class name of vocabulary scapers 2015-02-04 01:50:35 +01:00
reger
11b21308c0 fix: malformed filename in image search
fix for http://mantis.tokeek.de/view.php?id=533
2015-02-01 05:35:09 +01:00
reger
9e1ec5fec4 refactor: just some more useages of constant for term ":[* TO *]" 2015-02-01 04:26:33 +01:00
Michael Peter Christen
b5ac29c9a5 added a html field scraper which reads text from html entities of a
given css class and extends a given vocabulary with a term consisting
with the text content of the html class tag. Additionally, the term is
included into the semantic facet of the document. This allows the
creation of faceted search to documents without the pre-creation of
vocabularies; instead, the vocabulary is created on-the-fly, possibly
for use in other crawls. If any of the term scraping for a specific
vocabulary is successful on a document, this vocabulary is excluded for
auto-annotation on the page.

To use this feature, do the following:
- create a vocabulary on /Vocabulary_p.html (if not existent)
- in /CrawlStartExpert.html you will now see the vocabularies as column
in a table. The second column provides text fields where you can name
the class of html entities where the literal of the corresponding
vocabulary shall be scraped out
- when doing a search, you will see the content of the scraped fields in
a navigation facet for the given vocabulary
2015-01-30 13:20:56 +01:00
Michael Peter Christen
68c605d637 replace with CommonPattern.SPACE for split 2015-01-29 02:28:03 +01:00
Michael Peter Christen
1f5047b15f using precompiled pattern CommonPattern.SEMICOLON for splits 2015-01-29 02:19:41 +01:00
Michael Peter Christen
a8a2b7a803 persistency for vocabulary facet switch 2015-01-29 02:16:42 +01:00
Michael Peter Christen
efbc9a3561 introducting a new getConfig method which parses comma-separated llists
from setting fields; refactoring for all places where such lists are
parsed
2015-01-29 01:53:36 +01:00
Michael Peter Christen
69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
split(",") was used
2015-01-29 01:46:22 +01:00
Michael Peter Christen
5a060c9f26 refactoring of reindexSolr (just replaced constant string) 2015-01-29 00:33:07 +01:00
Michael Peter Christen
3d717b749a fix for urlmaskfilter 2015-01-28 13:40:41 +01:00
Michael Peter Christen
2636582435 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2015-01-28 10:32:17 +01:00
reger
0260d3d800 Allow to hide linkstructure graphic in crawl monitor
using/setting the config param DECORATION_GRAFICS_LINKSTRUCTURE
2015-01-28 03:59:01 +01:00
Michael Peter Christen
bee5ee7cce removed some warnings 2015-01-27 17:00:20 +01:00
Michael Peter Christen
6390454652 fix for vocabulary on/off setting 2015-01-27 16:24:27 +01:00
Michael Peter Christen
29f6e9db7a write java version to status page 2015-01-23 17:57:54 +01:00
Michael Peter Christen
7db2888336 fixed font size and print page generation in pdf snapshots 2015-01-20 17:14:14 +01:00
reger
24f68a4eb7 refactor opensearch heuristic
introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors,
which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector.
The manager enforces now a min 15s delay between calls to external systems.
Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation.

default heuristicopensearch.conf: 
- openbdb.com removed - seems not longer to deliver results
- config via solrconnector to  datacite.org added (large technical library archive)
2015-01-19 03:30:35 +01:00
Michael Peter Christen
3b51636ecb fix for mediawiki import 2015-01-12 00:35:47 +01:00
Michael Peter Christen
8cafdb989a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2015-01-09 11:00:02 +01:00
reger
4214f250d0 Add option for extended search (Autosearch) to Bookmark.html asking all connected peers for the searchterm added as description to the bookmark created by the bookmark icon.
Intended for searches/research projects with not sufficient results from local and DHT selected remote target peers.

Function: the process checks newly created bookmarks for description starting with "query=..." and takes this to ask every peer for 20 search results and adds it to the local index in a background job.
link to start/stop the process added to /Bookmarks.html
2015-01-09 02:06:30 +01:00
reger
bb37cb32e4 Add title import for bookmark icon
if avail in index
2015-01-09 01:33:45 +01:00
reger
8e751d754a - add javadoc to busythread with hint about the init parameter useage
- remove obsolete 10_httpd config parameter
2015-01-09 01:31:57 +01:00
Michael Peter Christen
0871e43fcc better scale 2015-01-06 14:22:43 +01:00
Michael Peter Christen
35c24608cc fix for division by zero (rare cases) 2015-01-06 14:21:20 +01:00
reger
4eb89d7f15 revert clickservlet
(default was indeed a mistakenly)
2015-01-05 09:10:20 +01:00
reger
ebe5faeb01 added url to bookmark icon link
url is anyway needed, saves index lookup and works w/o commited url.
Removed unused order parameter
2015-01-05 06:55:53 +01:00
reger
d44d8996d0 Added a “don't store remote search results” option
This is intended for peers who want to participate in the P2P network but don't wish to load/fill-up their index with metadata of every received search result. 
The DHT transfer is not effected by this option (and will work as usual, so that a peer disabling the new store to index switch still receives and holds the metadata according to DHT rules).
Downside for the local peer is that search speed will not improve if search terms are only avail. remote or by quick hits in local index.

To be able to improve the local index a Click-Servlet option was added additionally.
If switched on, all search result links point to this servlet, which forwards the users browser (by html header) to the desired page and feeds the page to the fulltext-index.
The servlet accepts a parameter defining the action to perform (see defaults/web.xml, index, crawl, crawllinks)

The option check-boxes are placed in ConfigPortal.html
2015-01-04 11:10:45 +01:00
reger
d729386787 fix NPE in viewimage
Caused by: java.lang.NullPointerException
	at net.yacy.peers.graphics.EncodedImage.<init>(EncodedImage.java:73)
	at ViewImage.respond(ViewImage.java:156)
2015-01-04 09:12:30 +01:00
reger
4ff018c9e4 fix ConfigPortal jumps to iframe focus
add focus parameter to yacysearch.html too
2015-01-04 06:57:13 +01:00
Michael Peter Christen
5b810f6d70 Merge branch 'master' of gitorious.org:yacy/whitrs-rc1 2015-01-02 00:57:37 +01:00
Ryszard Goń
3cdbd5f5c6 Fix for progress table background not resizing
when the post-processing started/ended.
2015-01-02 00:11:32 +01:00
reger
0dfeee154a adjustments for Bookmark icon to act on BookmarkDB,
it acts on YMarks but YMark interface seems not maintained,
for future features (e.g. query memory) BookmarkDB is the likely choice to expand, besides the crawlstart bookmark also the result bookmark icon now adds to BookmarkDB.
The YMark related code is (for now) left untouched so both tables are updated.
2015-01-01 02:41:20 +01:00
Michael Peter Christen
513e9259f5 Merge branch 'master' of git@gitorious.org:yacy/rc1.git 2014-12-30 02:36:17 +01:00
reger
e177d69387 remove obsolete config footer option (ConfigPortal user.login)
no footer or footer-option in use

remove unused yacy.init item allowUnlimitedReceiveIndexFrom
2014-12-29 03:50:00 +01:00
Michael Peter Christen
5d4167f977 reacivated clear stacks code for termination of all crawls because this
did not work wihtout that part of the code
2014-12-28 15:52:43 +01:00
Michael Peter Christen
ecb6a59e9e do not translate gif images into png images for thumbnails. Instead,
stream the original to the search result thumb viewer. This has two
reasons:
- animated gifs cause 100% cpu and deadlocks in the jvm gif parser; a
known bug which is obviously not yet fixed
- animated gifs now appear in the search result also as animation
2014-12-28 14:53:55 +01:00
Michael Peter Christen
d9603039ff automatically set the Q flag for smb/ftp start urls (split pdf support) 2014-12-28 14:36:43 +01:00
Michael Peter Christen
8600ea01dd automatically swith on query option in case intranet protocols (smb/ftp)
are used. This supports the new split-pdf option.
2014-12-28 14:27:42 +01:00