Commit Graph

564 Commits

Author SHA1 Message Date
reger
8c9684cc45 optimize surftip data load,
double load (index, loader) not neccessary, getMetadata already suficient
+ lng file adjustments
2016-05-08 05:27:19 +02:00
reger
4765e374e6 altered clac. of search result items per page to display
taking the existing limits into account but make it consistent with search option screen for admin and public user
changes:
  - configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit)
  - otherwise requests are limited to 100 results per page ( = search option, index.html)
      (this basically is the major change, inc. limit from 20 to 100 for public user)
P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being)

see http://mantis.tokeek.de/view.php?id=627
2016-01-13 01:30:49 +01:00
reger
abd8ecb503 remove contendom depending override of search result items per page
initially introduced e4570bffaf (diff-ae6c130fc11088c830b00ed9256ab56b)
(as one part of unexpected difference in actual vs requested results, partial bugfix for http://mantis.tokeek.de/view.php?id=627 )
2016-01-12 01:04:10 +01:00
reger
e8256bb3b1 remove blekko from opensearch config (not available)
see https://blekko.com/
http://searchengineland.com/goodbye-blekko-search-engine-joins-ibms-watson-team-217633
2016-01-04 04:49:10 +01:00
reger
28b8bc290a fix use of NETWORK_SEARCHVERIFY for rwi verification
was not used to set the searchevent parameter (done in SearchEventCache.getEvent)
- remove unused corresponding QueryParams.filterfailurls param.
2015-12-13 20:01:49 +01:00
reger
020630efd8 remove unused network scanner parameter from queryparameter
Search event is not using networkscanner 
(removed filterscannerfail param always init to false)
2015-12-13 02:50:08 +01:00
reger
a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt() 2015-10-31 23:09:03 +01:00
Michael Peter Christen
df3314ac1a added a new facet type based on a probabilistic classifier using
bayesian filters. This can be used to classify documents during
indexing-time using a pre-definied bayesian filter.

New wordings:
- a context is a class where different categories are possible. The
context name is equal to a facet name.
- a category is a facet type within a facet navigation. Each context
must have several categories, at least one custom name (things you want
to discover) and one with the exact name "negative".

To use this, you must do:
- for each context, you must create a directory within
DATA/CLASSIFICATION with the name of the context (the facet name)
- within each context directory, you must create text files with one
document each per line for every categroy. One of these categories MUST
have the name 'negative.txt'.

Then, each new document is classified to match within one of the given
categories for each context.
2015-08-10 14:27:44 +02:00
Michael Peter Christen
dbbad23e12 removed warnings 2015-08-03 05:37:34 +02:00
Michael Peter Christen
1fec7fb3c1 suppress access to solr when doing search suggestions in case that the
index has more than two million documents. This protects the index from
beeing flooded with search requests that cannot be resolved before the
real search query has to be computet.
2015-06-24 13:02:12 +02:00
reger
b47267b79c precaution against NPE on createorgetBookmark on search result 2015-05-07 03:25:19 +02:00
reger
8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
(instead of using the description field)
- adjust display and autosearch
- don't overwrite existing bookmark but combine info
2015-05-03 02:31:50 +02:00
Michael Peter Christen
fed26f33a8 enhanced timezone managament for indexed data:
to support the new time parser and search functions in YaCy a high
precision detection of date and time on the day is necessary. That
requires that the time zone of the document content and the time zone of
the user, doing a search, is detected. The time zone of the search
request is done automatically using the browsers time zone offset which
is delivered to the search request automatically and invisible to the
user. The time zone for the content of web pages cannot be detected
automatically and must be an attribute of crawl starts. The advanced
crawl start now provides an input field to set the time zone in minutes
as an offset number. All parsers must get a time zone offset passed, so
this required the change of the parser java api. A lot of other changes
had been made which corrects the wrong handling of dates in YaCy which
was to add a correction based on the time zone of the server. Now no
correction is added and all dates in YaCy are UTC/GMT time zone, a
normalized time zone for all peers.
2015-04-15 13:17:23 +02:00
Michael Peter Christen
6578ff3ddb enhanced suggest function 2015-02-09 18:45:07 +01:00
Michael Peter Christen
efbc9a3561 introducting a new getConfig method which parses comma-separated llists
from setting fields; refactoring for all places where such lists are
parsed
2015-01-29 01:53:36 +01:00
Michael Peter Christen
69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
split(",") was used
2015-01-29 01:46:22 +01:00
Michael Peter Christen
3d717b749a fix for urlmaskfilter 2015-01-28 13:40:41 +01:00
reger
24f68a4eb7 refactor opensearch heuristic
introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors,
which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector.
The manager enforces now a min 15s delay between calls to external systems.
Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation.

default heuristicopensearch.conf: 
- openbdb.com removed - seems not longer to deliver results
- config via solrconnector to  datacite.org added (large technical library archive)
2015-01-19 03:30:35 +01:00
reger
bb37cb32e4 Add title import for bookmark icon
if avail in index
2015-01-09 01:33:45 +01:00
reger
ebe5faeb01 added url to bookmark icon link
url is anyway needed, saves index lookup and works w/o commited url.
Removed unused order parameter
2015-01-05 06:55:53 +01:00
reger
4ff018c9e4 fix ConfigPortal jumps to iframe focus
add focus parameter to yacysearch.html too
2015-01-04 06:57:13 +01:00
reger
0dfeee154a adjustments for Bookmark icon to act on BookmarkDB,
it acts on YMarks but YMark interface seems not maintained,
for future features (e.g. query memory) BookmarkDB is the likely choice to expand, besides the crawlstart bookmark also the result bookmark icon now adds to BookmarkDB.
The YMark related code is (for now) left untouched so both tables are updated.
2015-01-01 02:41:20 +01:00
Michael Peter Christen
9e588944fa prevent NPE during initialization of very large vocabularies 2014-12-21 19:02:36 +01:00
Michael Peter Christen
d3e71ed070 fixes for searches when initialization of large autotagging libraries
have not been finished
2014-12-19 17:38:58 +01:00
reger
c475be2937 fix (enable) error msg on empty query 2014-11-28 22:44:33 +01:00
Michael Peter Christen
487a733c99 fix for catchall handling in search 2014-11-12 22:48:33 +01:00
Marc Nause
1e6e69bc40 Finished implementation of UPNP:
*) will try other ports if YaCy standard ports are not available
*) distinguish between internal and external port (not sure if this
works 100%)

Still to add: propery in config to enter own external port (in case of
manually configured NAT)
2014-10-07 13:10:06 +02:00
Michael Peter Christen
e4ccca9497 fix for xss bugs found by CTF365 2014-10-01 12:22:55 +02:00
reger
7c1707872b search result showPicture update search parameter
used parameter &cat=image is obsolete and returns no results
- remove &cat=image and &cat=href references
- remove &tenant= references (unused)
Use contentdom=image and inurl: parameter to make showPicture link display something (open in new window because of used inurl modifier changes original query)
2014-09-30 22:22:13 +02:00
Michael Peter Christen
81f9b34da7 increaesed ability ot search for all images on a single server within
the p2p remote search
2014-09-15 20:33:22 +02:00
Michael Peter Christen
9b92685771 automatically add a wild card if only a search on a single domain is
done. This makes it possible to search all documents on a single domain
even if no search word is given. This is in particular interesting when
searching for all images on a single domain.
2014-09-15 13:38:53 +02:00
reger
1d5d0b82a6 - skip html template specific servlet post variables (show_xxx) for feeds,
- add <updated> (in required format) to atom feed
2014-09-12 02:10:18 +02:00
reger
9962b9e548 use configured search items per page if not specified in post
- remove verify=cacheonly from admin screen search box to use the configured values
  (otherwise definition overwrites configured behavior and is used for following searches what might give unexpected/confusing different results compared to using /yacysearch )
2014-09-10 00:52:37 +02:00
reger
ec5b1d9e33 let NETWORK_WHITELIST take precedence over NETWORK_BLACKLIST
this makes it easier to config exception (for private networks),
like   blacklist= .*
        whitelist= 10\..*,127\..* .....     allows only listed ip pattern
2014-08-26 01:02:38 +02:00
orbiter
22ce4fb4dd better error handling for remote solr queries and exists-checks 2014-08-01 11:00:10 +02:00
reger
a2cb366b25 Combine /heuristic search modifier with opensearch configured targets
- with search modifier /heuristic a request is send to all configured opensearch target systems (old /heuristic/blekko modifier not longer valid)
- this allows to use opensearch heuristic on individual search request (in contrast to configuration HEURISTIC_OPENSEARCH=true which sends a osd request on all global searches
- the index.html searchoption text adjusted to be displayed only if option configured
- add Archive-It to predefined systems
2014-07-20 00:00:43 +02:00
reger
ba5a59a28d make search result also avail. as atom feed via /yacysearch.atom
- fix logo in rss feed
2014-07-03 22:01:13 +02:00
Michael Peter Christen
d2151857f1 Added collection navigation:
The collection field (can be filled i.e. in Crawl Start) can be used to
add categories to YaCy index entries. The usage of that field was
restricted to solr searches and post argument filters as implemented in
commit f7571386a3.
This commit extends collections to a full navigation option in the
standard YaCy search interface. The field is not active by default but
can be activated easily in the /ConfigSearchPage_p.html servlet (just
check the 'Collection' facet field). Collections can now be used for (at
least) two purposes:
- to provide search tenants (through post argument collection)
- to provide self-made category navigation
Search requests may now have (independently from switched on or off
collection facet) a "collection:<collection-name>" modifier attached;
firthermore collection names may use disjunctions using the '|' pipe
symbol. For example, this is a valid search request:
www collection:user|proxy
2014-06-15 12:11:23 +02:00
reger
c798a9d1bb fix unresolved pattern in yacysearch.rss title
and rss xml error due to html & encoding in url entries
2014-06-07 03:01:26 +02:00
Michael Peter Christen
fda591695c fixed visibility of custom icon 2014-03-28 17:25:39 +01:00
Michael Peter Christen
cbdfef7ce1 changed protocol facet to show also all other counts if one facet is
selected
2014-03-27 13:29:14 +01:00
Michael Peter Christen
e3cb0ffe16 - added text/image/audio/video/app search option to new navigation bar
- changed colors of privacy selector
2014-03-23 12:29:46 +01:00
Michael Peter Christen
721178dc84 misc style bugfixes 2014-03-22 07:02:26 +01:00
Michael Peter Christen
d1091e79f8 - added stealth button to navigation menu
- more fixes to progress bar
2014-03-21 18:01:26 +01:00
Michael Peter Christen
f0f22e68bb fix for page navigation bar 2014-03-21 16:17:56 +01:00
Michael Peter Christen
617dd9c97b - added new input field in index.html
- changed progress bar in yacysearch.html
- moved pagination navigation to page bottom
- moved search term input field to headline
2014-03-21 02:42:09 +01:00
orbiter
3c8d6e1eee added adminAccount switch to ConfigAccounts_p servlet to switch on
protection of all pages; some refactoring as well
2014-03-20 22:11:49 +01:00
Michael Peter Christen
ed7ad2ef0a replaced old navbar with bootstrap pagination 2014-03-20 02:10:27 +01:00
Michael Peter Christen
92655c7fd9 - added bootstrap css framework
- adopted all YaCy administration pages to new framework
- created new search page layout (working, but still work in progress)
- old skin files are fully appliable! (and looking good)
- target is a new style based on bootstrap examples, see /test.html
- icons in YaCy may be replaced by glyphicons (to be done)
2014-03-18 13:42:31 +01:00
orbiter
f6e441dd77 refactoring 2014-02-24 21:01:56 +01:00