Commit Graph

552 Commits

Author SHA1 Message Date
Michael Peter Christen
fed26f33a8 enhanced timezone managament for indexed data:
to support the new time parser and search functions in YaCy a high
precision detection of date and time on the day is necessary. That
requires that the time zone of the document content and the time zone of
the user, doing a search, is detected. The time zone of the search
request is done automatically using the browsers time zone offset which
is delivered to the search request automatically and invisible to the
user. The time zone for the content of web pages cannot be detected
automatically and must be an attribute of crawl starts. The advanced
crawl start now provides an input field to set the time zone in minutes
as an offset number. All parsers must get a time zone offset passed, so
this required the change of the parser java api. A lot of other changes
had been made which corrects the wrong handling of dates in YaCy which
was to add a correction based on the time zone of the server. Now no
correction is added and all dates in YaCy are UTC/GMT time zone, a
normalized time zone for all peers.
2015-04-15 13:17:23 +02:00
Michael Peter Christen
6578ff3ddb enhanced suggest function 2015-02-09 18:45:07 +01:00
Michael Peter Christen
efbc9a3561 introducting a new getConfig method which parses comma-separated llists
from setting fields; refactoring for all places where such lists are
parsed
2015-01-29 01:53:36 +01:00
Michael Peter Christen
69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
split(",") was used
2015-01-29 01:46:22 +01:00
Michael Peter Christen
3d717b749a fix for urlmaskfilter 2015-01-28 13:40:41 +01:00
reger
24f68a4eb7 refactor opensearch heuristic
introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors,
which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector.
The manager enforces now a min 15s delay between calls to external systems.
Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation.

default heuristicopensearch.conf: 
- openbdb.com removed - seems not longer to deliver results
- config via solrconnector to  datacite.org added (large technical library archive)
2015-01-19 03:30:35 +01:00
reger
bb37cb32e4 Add title import for bookmark icon
if avail in index
2015-01-09 01:33:45 +01:00
reger
ebe5faeb01 added url to bookmark icon link
url is anyway needed, saves index lookup and works w/o commited url.
Removed unused order parameter
2015-01-05 06:55:53 +01:00
reger
4ff018c9e4 fix ConfigPortal jumps to iframe focus
add focus parameter to yacysearch.html too
2015-01-04 06:57:13 +01:00
reger
0dfeee154a adjustments for Bookmark icon to act on BookmarkDB,
it acts on YMarks but YMark interface seems not maintained,
for future features (e.g. query memory) BookmarkDB is the likely choice to expand, besides the crawlstart bookmark also the result bookmark icon now adds to BookmarkDB.
The YMark related code is (for now) left untouched so both tables are updated.
2015-01-01 02:41:20 +01:00
Michael Peter Christen
9e588944fa prevent NPE during initialization of very large vocabularies 2014-12-21 19:02:36 +01:00
Michael Peter Christen
d3e71ed070 fixes for searches when initialization of large autotagging libraries
have not been finished
2014-12-19 17:38:58 +01:00
reger
c475be2937 fix (enable) error msg on empty query 2014-11-28 22:44:33 +01:00
Michael Peter Christen
487a733c99 fix for catchall handling in search 2014-11-12 22:48:33 +01:00
Marc Nause
1e6e69bc40 Finished implementation of UPNP:
*) will try other ports if YaCy standard ports are not available
*) distinguish between internal and external port (not sure if this
works 100%)

Still to add: propery in config to enter own external port (in case of
manually configured NAT)
2014-10-07 13:10:06 +02:00
Michael Peter Christen
e4ccca9497 fix for xss bugs found by CTF365 2014-10-01 12:22:55 +02:00
reger
7c1707872b search result showPicture update search parameter
used parameter &cat=image is obsolete and returns no results
- remove &cat=image and &cat=href references
- remove &tenant= references (unused)
Use contentdom=image and inurl: parameter to make showPicture link display something (open in new window because of used inurl modifier changes original query)
2014-09-30 22:22:13 +02:00
Michael Peter Christen
81f9b34da7 increaesed ability ot search for all images on a single server within
the p2p remote search
2014-09-15 20:33:22 +02:00
Michael Peter Christen
9b92685771 automatically add a wild card if only a search on a single domain is
done. This makes it possible to search all documents on a single domain
even if no search word is given. This is in particular interesting when
searching for all images on a single domain.
2014-09-15 13:38:53 +02:00
reger
1d5d0b82a6 - skip html template specific servlet post variables (show_xxx) for feeds,
- add <updated> (in required format) to atom feed
2014-09-12 02:10:18 +02:00
reger
9962b9e548 use configured search items per page if not specified in post
- remove verify=cacheonly from admin screen search box to use the configured values
  (otherwise definition overwrites configured behavior and is used for following searches what might give unexpected/confusing different results compared to using /yacysearch )
2014-09-10 00:52:37 +02:00
reger
ec5b1d9e33 let NETWORK_WHITELIST take precedence over NETWORK_BLACKLIST
this makes it easier to config exception (for private networks),
like   blacklist= .*
        whitelist= 10\..*,127\..* .....     allows only listed ip pattern
2014-08-26 01:02:38 +02:00
orbiter
22ce4fb4dd better error handling for remote solr queries and exists-checks 2014-08-01 11:00:10 +02:00
reger
a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
- with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid)
- this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches
- the index.html searchoption text adjusted to be displayed only if option configured
- add Archive-It to predefined systems
2014-07-20 00:00:43 +02:00
reger
ba5a59a28d make search result also avail. as atom feed via /yacysearch.atom
- fix logo in rss feed
2014-07-03 22:01:13 +02:00
Michael Peter Christen
d2151857f1 Added collection navigation:
The collection field (can be filled i.e. in Crawl Start) can be used to
add categories to YaCy index entries. The usage of that field was
restricted to solr searches and post argument filters as implemented in
commit f7571386a3.
This commit extends collections to a full navigation option in the
standard YaCy search interface. The field is not active by default but
can be activated easily in the /ConfigSearchPage_p.html servlet (just
check the 'Collection' facet field). Collections can now be used for (at
least) two purposes:
- to provide search tenants (through post argument collection)
- to provide self-made category navigation
Search requests may now have (independently from switched on or off
collection facet) a "collection:<collection-name>" modifier attached;
firthermore collection names may use disjunctions using the '|' pipe
symbol. For example, this is a valid search request:
www collection:user|proxy
2014-06-15 12:11:23 +02:00
reger
c798a9d1bb fix unresolved pattern in yacysearch.rss title
and rss xml error due to html & encoding in url entries
2014-06-07 03:01:26 +02:00
Michael Peter Christen
fda591695c fixed visibility of custom icon 2014-03-28 17:25:39 +01:00
Michael Peter Christen
cbdfef7ce1 changed protocol facet to show also all other counts if one facet is
selected
2014-03-27 13:29:14 +01:00
Michael Peter Christen
e3cb0ffe16 - added text/image/audio/video/app search option to new navigation bar
- changed colors of privacy selector
2014-03-23 12:29:46 +01:00
Michael Peter Christen
721178dc84 misc style bugfixes 2014-03-22 07:02:26 +01:00
Michael Peter Christen
d1091e79f8 - added stealth button to navigation menu
- more fixes to progress bar
2014-03-21 18:01:26 +01:00
Michael Peter Christen
f0f22e68bb fix for page navigation bar 2014-03-21 16:17:56 +01:00
Michael Peter Christen
617dd9c97b - added new input field in index.html
- changed progress bar in yacysearch.html
- moved pagination navigation to page bottom
- moved search term input field to headline
2014-03-21 02:42:09 +01:00
orbiter
3c8d6e1eee added adminAccount switch to ConfigAccounts_p servlet to switch on
protection of all pages; some refactoring as well
2014-03-20 22:11:49 +01:00
Michael Peter Christen
ed7ad2ef0a replaced old navbar with bootstrap pagination 2014-03-20 02:10:27 +01:00
Michael Peter Christen
92655c7fd9 - added bootstrap css framework
- adopted all YaCy administration pages to new framework
- created new search page layout (working, but still work in progress)
- old skin files are fully appliable! (and looking good)
- target is a new style based on bootstrap examples, see /test.html
- icons in YaCy may be replaced by glyphicons (to be done)
2014-03-18 13:42:31 +01:00
orbiter
f6e441dd77 refactoring 2014-02-24 21:01:56 +01:00
Michael Peter Christen
cb2c25d930 in case that the crawler is running and the search user is the peer
admin, we expect that the user wants to check recently crawled document
to ensure that recent crawl results are inside the search results, we do
a soft commit here.
2014-02-11 22:02:10 +01:00
reger
f307d65dcf prepare for a language navigator
works fine to restrict language for local solrSearches.
More work needs to be done to make rwi/remote searches respect the modifier.language restriction.
2014-01-24 03:11:25 +01:00
reger
97e84439fb adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString
- since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic,
adjusted ConfigHeuristic to use OpensearchHeuristic settings only.
For this the default OSD search target list is made available (copied) by default and the other configs are removed.

- the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object,
but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns
just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers)

- started to adjust internal html href references from absolute to relative (currently it is mixed).
For future development we should prefer relative href targets (less trouble with context aware  servlets)
2014-01-20 00:58:17 +01:00
Michael Peter Christen
f8ce7040ab remote search peer selection schema change:
- all non-dht targets (previously separated into 'robinson' for dht-like
queries and 'node' for solr queries) are non 'extra' peers, which are
queries using solr
- these extra-peers are now selected using a ranking on last-seen,
peer-tag-matches, node-peer flags, peer age, and link count. The ranking
is done using a weight and a random factor.
- the number of extra peers is 50% of the dht peers
- the dht peers now exclude too young peers to prevent bad results
during strong growth of the network
- the number of dht peers (and therefore extra-peers) is reduced when
the memory of the peer is low and/or some documents still appear in the
indexing-queue. This shall prevent a peer from deadlocks when p2p
queries are made in a fast sequence on weak hardware.
2014-01-16 17:27:14 +01:00
reger
cabe0943cd fix opensearch resultcount in yacysearch.rss
see merge request https://gitorious.org/yacy/rc1/merge_requests/24
use result count in searchtrailer.xml which is on p2p search more accurate (timing)
2014-01-04 17:14:10 +01:00
reger
8eaabb9600 remove dependency from old serverCore.java
- remaining getPortNr not needed 
  (as current release allows only to set plain integer as port,
   see ConfigBasic)
2013-12-29 02:00:44 +01:00
orbiter
d4942ad5e0 startRecord fix; this is not according to SRU definition because this
states that the first record has number 0; but +1 is not consistent with
other places where the number is used.
2013-12-28 23:34:43 +01:00
Michael Peter Christen
25f9c35033 add patch which shall prevent that naive search mistakes like usage of
regular expressions cause no results. Usage of '*' followed by a dot or
any expression will now cause that this expression is used as a filetype
search.
2013-12-27 00:34:55 +01:00
Michael Peter Christen
2c39b65409 fixes for searches containing stopwords. The fix was done using a
reconstruction of the search word set access method to protect that
words are deleted from the sets from the outside of the QueryGoal class.
2013-11-26 02:24:47 +01:00
orbiter
61409788eb less word hash computations (removing some overhead because of MD5
calcs) using the clear word in a normalized form.
2013-11-25 15:20:54 +01:00
Michael Peter Christen
087df05e24 added option to Config_Network_p.html to enable remote search while
DHT-Receive is switched off.
2013-11-13 13:38:01 +01:00
Michael Peter Christen
81bb50118e found and fixed a huge memory leak in solr caching (inside Solr). The
not-flushed Solr cache is now handled in this way:
- it is smaller by default
- an Solr-internal process is started to flush the cache periodically
(this does NOT clean the cache, just removes old objects)
- a Solr-external process (the standard YaCy cleanup-process) now has
direct access to the solr internal cache and flushes them completely.
The time frame for such a flush is defined by the cleanup-process
frequency, by default 10 minutes.
2013-11-07 10:01:44 +01:00