Commit Graph

13532 Commits

Author SHA1 Message Date
luccioman
0cdee4e26a Fixed loss of "meanCount" search param when using facets or page buttons
Then on new search queries, no suggestions at all could be displayed.
2018-02-08 08:07:30 +01:00
luccioman
117a859879 Do not clear all search modifiers when unselecting one modifier.
Previously, when clicking a selected facet in the search results page to
unselect it, all other eventually selected modifiers/facets were also
removed.
2018-02-07 15:54:46 +01:00
luccioman
33593c22e9 Fixed loss of other modifiers on keywords/tags search navigation links 2018-02-06 17:17:13 +01:00
luccioman
a9dc0874c0 Remove old query terms from search results suggestions links.
Especially when old terms were misspelled, suggestions links then
provided most of the time empty results.
2018-02-06 15:14:14 +01:00
luccioman
c71b545235 Enable results suggestions (Did you Mean) even when RWI is not enabled.
RWI is no more necessary for suggestions processing since commit
c40ba51ca6.
Revealed by a question about spell check from ouahpiti on YaCy forum
(http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6084 ).
2018-02-06 12:33:44 +01:00
luccioman
9412881230 Added basic support for autotagging microdata annotated item types.
With the appropriate vocabulary settings in Vocabulary_p.html page, this
can produce Vocabulary search facets displaying item types referenced in
html documents by microdata annotation.
Tested notably, but not limited to, vocabulary classes/types defined by
Schema.org and Dublin Core.
2018-02-06 10:25:38 +01:00
luccioman
5a14d34a7d Refactoring : documented and extracted autotagging processing functions. 2018-02-02 10:27:36 +01:00
luccioman
58b9834729 Added HTML microdata typed items parsing capability.
This adds the possibility for the HTML parser to gather typed items URLs
annotated in HTML tags with itemscope and itemtype attributes (see
microdata specification https://www.w3.org/TR/microdata/ ), notably
Types from the schema.org vocabulary, but also Types/Classes from any
other vocabulary, such as the common ones listed in the RDFa core
context ( https://www.w3.org/2011/rdfa-context/rdfa-1.1.html ).
2018-02-02 09:31:40 +01:00
luccioman
80fb1026d0 Create recrawl requests with the relevant crawl profile.
Recrawl default profile was previously effectively used for crawl
stacker acceptance check, but request entries were indeed still created
with the "snippetGlobalText" profile.
2018-01-30 21:00:18 +01:00
luccioman
539925a275 Added an utility to generate/update XLIFF master file from lng files. 2018-01-29 18:34:47 +01:00
luccioman
41a6b052d9 Updated master and French translation for the IndexReIndexMonitor_p page 2018-01-29 16:51:00 +01:00
luccioman
fa6d030b0b Moved dbtest to the test source folder. 2018-01-29 14:03:01 +01:00
luccioman
6cd3847d0a Fixed NullPointerException case on Table init with relative file path.
Can occur for example when running dbtest with relative test table file
name (wihout explicit parent folder).
2018-01-29 14:00:43 +01:00
luccioman
28883d8a71 Shutdown daemon threads at the end of dbtest 2018-01-29 13:56:37 +01:00
luccioman
929e0d6eae Replaced improper ByteBuffer.equals() implementation by Arrays.equals()
Renamed also ByteBuffer.equals() to startsWith() as this is the
appropriate function implementation semantics.
2018-01-29 13:38:25 +01:00
luccioman
098ee63911 Added a manual performance test for the HostBalancer.
Consequently to the report in mantis 776
(http://mantis.tokeek.de/view.php?id=776).

Running the perfs test with different control parameters seems to reveal
that the YaCy's RowHandleMap used in the balancer depthCache is finally
more efficient than for example the ConcurrentHashMap from JDK 8.
2018-01-28 12:41:56 +01:00
luccioman
fefe2d1b6e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2018-01-28 12:30:43 +01:00
reger
5aa4fb1144 upd to metadata-extractor-2.11.0.jar 2018-01-27 18:32:45 +01:00
luccioman
46b5249c20 Removed time condition on HostBalancer initialization in JUnit test.
Its initialization in main application usage remains asynchronous.
2018-01-26 17:15:27 +01:00
luccioman
8b572b7337 Commit Solr index before simulating or starting recrawl job.
This ensures up-to-date simulation query results, and recrawl
processing.
2018-01-26 10:31:13 +01:00
luccioman
5b943c07ab
Merge pull request #155 from JeremyRand/readme-typo-fixes
Fix some typos in the README.
2018-01-26 09:50:40 +01:00
JeremyRand
dea856c854
Fix some typos in the README. 2018-01-26 04:34:31 +00:00
luccioman
733cacdbb8 Revised the RDFaParser main launcher for minimal proper operation.
This parser is still not enabled in the main text parsers list. More
would have to be done to make it functional.
2018-01-25 07:57:56 +01:00
luccioman
7baa99f26f Fixed stored URL in web cache when redirection(s) occurs.
Associate cached content to the last redirection location, instead of
the first URL of a redirection(s) chain :
 - for proper base URL processing in parsers (fixes mantis 636 -
http://mantis.tokeek.de/view.php?id=636)
 - to prevent duplicated content in Solr index when recrawling a
redirected URL
2018-01-20 18:56:40 +01:00
luccioman
5e2812c060 Automatically refresh running recrawl report when JavaScript is enabled.
For users who would prefer to keep JavaScript disabled, a manual Refresh
button is still available.
2018-01-19 11:58:52 +01:00
luccioman
19903a984f
Merge pull request #154 from tangdou1/master
update chinese translation
2018-01-19 10:18:35 +01:00
tangdou1
49d103ad16
Merge pull request #1 from tangdou1/tangdou1-patch-1
Update zh.lng
2018-01-16 17:16:14 +08:00
tangdou1
dd4f93f049
Update zh.lng
translate some untranslated words to chinese.
2018-01-16 17:11:07 +08:00
tangdou1
e585b4f597
Update zh.lng 2018-01-16 15:35:54 +08:00
luccioman
0fce264ba4 Set reindex page to html5 and removed presentational only html tables. 2018-01-15 18:32:34 +01:00
luccioman
83df922afc Removed unused duplicated HTML id on header hidden field 2018-01-15 17:16:54 +01:00
luccioman
9ddf92d143 Removed unncessary reflection usage for workflow tasks.
This improves code readability and maintainability (calls hierarchy are
easier to read) and eventually performance.
2018-01-15 10:05:49 +01:00
luccioman
897d3d30cc Added new recrawl job profile to the list of default crawl profiles 2018-01-15 08:30:37 +01:00
luccioman
9624516bf8 Refresh recrawl job profile threshold date like other default profiles 2018-01-15 08:06:28 +01:00
luccioman
b712a0671e Added a specific default crawl profile for the recrawl job.
- with only light constraint on known indexed documents load date, as it
can already been controlled by the selection query, and the goal of the
job is indeed to recrawl selected documents now
- using the iffresh cache strategy
2018-01-13 15:46:04 +01:00
luccioman
adf3fa493d Added comments about crawl profiles recrawl cycles 2018-01-13 12:13:04 +01:00
luccioman
3638e16c2e More comprehensive log on rejected recrawls caused by date constraint 2018-01-13 12:07:56 +01:00
luccioman
d47afe6fab Use a constant for crawler reject reason prefix with specific processing 2018-01-13 10:45:00 +01:00
luccioman
4e03335625 Added more details to the recrawl job report 2018-01-12 11:47:13 +01:00
luccioman
d95d393a0d Add a query link to local Solr to browse selected recrawl candidates 2018-01-12 10:48:54 +01:00
luccioman
59f7763af6 Display recrawl job report also when job is actively running 2018-01-11 09:53:27 +01:00
luccioman
6425963cee Fixed internal tables exact value match iterator 2018-01-10 18:38:42 +01:00
luccioman
0c9e0b3566 Record recrawl calls to make them schedulable 2018-01-10 17:05:53 +01:00
luccioman
433e241e4f Added a report info box about eventual last terminated recrawl job
For easier monitoring of recrawls.
2018-01-09 22:33:15 +01:00
luccioman
b2af25b14f Added a stop condition to the Recrawl busy thread 2018-01-09 10:22:26 +01:00
luccioman
421728d25a Made possible to customize selection query before launching a recrawl 2018-01-08 21:20:46 +01:00
luccioman
fab6e54fec Enforced controls (HTTP method, token) on ReIndex and ReCrawl operations 2018-01-07 15:25:16 +01:00
luccioman
36e9b1c5b3 Fixed SegmentTest test case time dependant occasional failures
As highlighted by latest automated Travis builds.
2018-01-02 10:21:07 +01:00
luccioman
8a4ea1c11e Added UI switch to control content domain constraint per search request 2018-01-02 08:13:14 +01:00
luccioman
36a45b3905 Added UI setting for strictness of content-type checking on media search 2017-12-29 11:32:42 +01:00