Commit Graph

601 Commits

Author SHA1 Message Date
luccioman
6e1959f469 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
Conflicts:
	htroot/yacysearchitem.java
	source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java
	source/net/yacy/search/schema/CollectionConfiguration.java
	source/net/yacy/server/serverObjects.java
2016-10-14 11:29:55 +02:00
luccioman
b3b75b0498 Accessibility : add a customizable alternative text to YaCy log
Applied W3C recommendations :
https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image
and
https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems
2016-09-22 16:08:33 +02:00
reger
35a7d57260 update lucenematchversion to current (5.2.0 -> 5.5.0)
there should be no need for reindex by the update
2016-07-23 18:36:43 +02:00
Marc Nause
1f7013a1e3 removed unused properties in default config (CGI capabilities of YaCy's
HTTPd have been removed many moons ago)
2016-07-21 21:36:00 +02:00
luccioman
893a40995a Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2016-07-04 21:24:40 +02:00
Michael Peter Christen
634e48309b another peer list update 2016-07-04 11:02:36 +02:00
Michael Peter Christen
16420e5507 added another principal peer 2016-07-03 22:50:50 +02:00
luccioman
6e96c7341a Merge remote-tracking branch 'origin/master'
Conflicts:
	htroot/Load_MediawikiWiki.java
	htroot/Load_PHPBB3.java
	htroot/ViewImage.java
2016-07-03 18:59:00 +02:00
JeremyRand
433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.) 2016-05-20 20:17:51 -05:00
reger
ef24593347 delete obsolete SEARCHRESULT busythread constants
not used since 29.05.2013 18:27:27
0c1a018bbd
2016-05-04 01:30:10 +02:00
reger
8410536f75 keep svnRevision in .init for convert of .conf until release >1.83 2016-03-20 18:12:55 +01:00
reger
726ebee65a include Version config string in yacy.init (replacing svnRevision) 2016-03-20 03:42:33 +01:00
Michael Peter Christen
f4591b1b51 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2016-03-11 18:12:38 +01:00
Michael Peter Christen
1ce38fdaed 0n - added experimental zeronet network which supports intranet peers
(still needs work)
2016-03-11 08:55:51 +01:00
Michael Peter Christen
d05ffa1c51 update to seed list 2016-03-11 07:20:38 +01:00
reger
16724c1283 remove unused proxyCookieWhiteList from yacy.init 2016-03-11 01:14:54 +01:00
luc
3cc5619d93 Improved HTML icons indexing and rendering in search results.
See http://mantis.tokeek.de/view.php?id=629
2016-02-02 09:57:54 +01:00
Michael Peter Christen
5d635879f8 Merge pull request #40 from Scarfmonster/autocrawl
Automatic crawling
2016-01-14 22:19:55 +01:00
Ryszard Goń
7d6e0d8470 Add missing settings to autocrawl settings page 2016-01-14 03:27:33 +01:00
Ryszard Goń
a98c395023 Add the Autocrawl thread 2016-01-14 00:50:23 +01:00
reger
4765e374e6 altered clac. of search result items per page to display
taking the existing limits into account but make it consistent with search option screen for admin and public user
changes:
  - configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit)
  - otherwise requests are limited to 100 results per page ( = search option, index.html)
      (this basically is the major change, inc. limit from 20 to 100 for public user)
P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being)

see http://mantis.tokeek.de/view.php?id=627
2016-01-13 01:30:49 +01:00
Ryszard Goń
1728cd30c6 Create autocrawl profiles 2016-01-12 16:28:34 +01:00
reger
e8256bb3b1 remove blekko from opensearch config (not available)
see https://blekko.com/
http://searchengineland.com/goodbye-blekko-search-engine-joins-ibms-watson-team-217633
2016-01-04 04:49:10 +01:00
reger
a5faf73afa remove obsolete yacy.init entries interaction.*
(related to removed triplestore)
2015-12-29 15:41:19 +01:00
sixcooler
dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master' 2015-12-28 23:20:42 +01:00
reger
e84d94f8ca fix mime table for ms office / open office documents
(causing wrong parser detect in intranet mode)
2015-12-22 17:48:24 +01:00
reger
15e46b2bad exclude in/outboundlinksnofollowcount_i from default schema fields
(not used in any function)
2015-12-19 21:25:08 +01:00
luc
8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
local index. See mantis #626.
2015-12-16 02:20:03 +01:00
luc
55a4d15775 Added a note on deprecated default search field and operator. 2015-12-14 23:55:12 +01:00
reger
b2c8bc0ae6 remove md5_s from default index fields
it is not assigned a value / not used
Due to above also excluded from transfer protocol.
2015-11-27 02:41:02 +01:00
sixcooler
f5a9948860 do not store subfield *_coordinate 2015-11-10 20:32:42 +01:00
sixcooler
fca353e5eb set startuptype of most solr handlers to lazy 2015-11-10 20:32:05 +01:00
reger
c720b4c249 remove override of dynamicField coordinate_p in solr schema
(coordinate_p is not a mandatory field as such doesn't need to be declared as schema.field)
2015-10-24 22:44:28 +02:00
reger
f0b5bc93a3 remove obsolete yacy.init entry "secureHttps"
not used anywhere
2015-10-19 03:47:28 +02:00
reger
5e45f1a460 enable Solr schema dynamicField _p (type=location) for YaCy coordinate_p field 2015-09-01 21:47:25 +02:00
sixcooler
87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
moved and was not cleared anymore. This results in an huge fieldcache.
(http://lucene.apache.org/#highlights-of-the-lucene-release-include
https://issues.apache.org/jira/browse/LUCENE-5666)
Here I try to use DovValues where it is possible.
For this I used the Api-Scheme as new basis für the Solr-Schema.
This needs at least a complete optimization of the Solr-Index to get a
smaller FieldCache.
Everything that is indexed with these setting will not use the
Fieldcache at all.
2015-08-31 20:24:41 +02:00
reger
250f6457f0 remove exired domain titan.deep-one.in from bootstrap.seedlist 2015-08-26 23:58:08 +02:00
Michael Peter Christen
df3314ac1a added a new facet type based on a probabilistic classifier using
bayesian filters. This can be used to classify documents during
indexing-time using a pre-definied bayesian filter.

New wordings:
- a context is a class where different categories are possible. The
context name is equal to a facet name.
- a category is a facet type within a facet navigation. Each context
must have several categories, at least one custom name (things you want
to discover) and one with the exact name "negative".

To use this, you must do:
- for each context, you must create a directory within
DATA/CLASSIFICATION with the name of the context (the facet name)
- within each context directory, you must create text files with one
document each per line for every categroy. One of these categories MUST
have the name 'negative.txt'.

Then, each new document is classified to match within one of the given
categories for each context.
2015-08-10 14:27:44 +02:00
Michael Peter Christen
e1cd9c0dba added another default network / commented out 2015-07-09 16:25:11 +02:00
reger
00d2062813 Rem depreciated AdminHandlers in solrconfig.xml
avoid warning log
W  org.apache.solr.handler.admin.AdminHandlers <requestHandler name="/admin/"  class="solr.admin.AdminHandlers" /> is deprecated . It is not required anymore
2015-07-01 00:58:23 +02:00
Michael Peter Christen
694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
This is a very complex migration: many classes had been renamed or
removed, dependencies changed and the solr index type is now aligned to
be a solr cloud repository.
Together with the Solr 5.2 library update, one other dependent library
had been updated as well: httpclient 4.4->4.4.1

Older indexes are migrated from 4_10 to 5_2. However, the new index
structure is more efficient and we recommend to re-index everything.
Please use the index export before you do the update to a large
surrogate xml file. After the update, start with an empty index and then
initialize this with your dump.
2015-06-24 01:55:51 +02:00
Michael Peter Christen
9c12555be5 added link to Snapshots in search results if the snapshot exists and
option is set in ConfigSearchPage_p
(this is a stub: we also need a visualization of pdf files!)
2015-06-07 20:37:37 +02:00
reger
6bc8a9b11e make Quality of Service Servlet available to prioritize requests from local host
This assigns priorities to incoming requests. Higher priority numbers are served before lower.
(disabled by default in defaults/web.xml, 
uncomment or copy entry to DATA/Settings/web.xml)
2015-04-26 04:29:32 +02:00
Michael Peter Christen
b060ba900d added parsing of contentprop attribute in html tags for
content='startDate' and content='endDate'. The value of these field is
now written to new solr fields startDates_dts and endDates_dts.
2015-04-13 16:20:00 +02:00
Michael Peter Christen
4cb4f67f38 added parsing of dd, dt and article html fields. The parsed result is
written to special solr fields which are deactivated by default.
2015-04-12 22:02:45 +02:00
Michael Peter Christen
36e9cdb376 testing switching off cold searchers; maybe this brings performance
enhancements when using large facets
2015-04-07 13:14:41 +02:00
Michael Peter Christen
535f1ebe3b added a new way of content browsing in search results:
- date navigation

The date is taken from the CONTENT of the documents / web pages, NOT
from a date submitted in the context of metadata (i.e. http header or
html head form). This makes it possible to search for documents in the
future, i.e. when documents contain event descriptions for future
events.

The date is written to an index field which is now enabled by default.
All documents are scanned for contained date mentions.
To visualize the dates for a specific search results, a histogram
showing the number of documents for each day is displayed. To render
these histograms the morris.js library is used. Morris.js requires also
raphael.js which is now also integrated in YaCy.

The histogram is now also displayed in the index browser by default.

To select a specific range from a search result, the following modifiers
had been introduced:
from:<date>
to:<date>
These modifiers can be used separately (i.e. only 'from' or only 'to')
to describe an open interval or combined to have a closed interval. Both
dates are inclusive. To select a specific single date only, use the
'to:' - modifier.

The histogram shows blue and green lines; the green lines denot weekend
days (saturday and sunday).

Clicking on bars in the histogram has the following reaction:
1st click: add a from:<date> modifier for the date of the bar
2nd click: add a to:<date> modifier for the date of the bar
3rd click: remove from and date modifier and set a on:<date> for the bar
When the on:<date> modifier is used, the histogram shows an unlimited
time period. This makes it possible to click again (4th click) which is
then interpreted as a 1st click again (sets a from modifier).

The display feature is NOT switched on by default; to switch it on use
the /ConfigSearchPage_p.html servlet.
2015-03-02 04:30:10 +01:00
reger
ba276d3e64 add description_txt to default query fields,
Dublin Core Metadata field extracted by most parsers.
2015-02-22 05:42:04 +01:00
reger
fe6f5a395d fix Umlaut handling in blekko heuristic search term
http://mantis.tokeek.de/view.php?id=169
observation: blekko seams to block xxxbot agents (=0 results)
2015-02-08 23:40:33 +01:00
Michael Peter Christen
97ba5ddbb7 configuration option for maxload limit for remote search 2015-02-04 01:12:25 +01:00