luccioman
fb3032c530
Added a crawl filtering possibility on documents Media Type (MIME)
2018-03-23 10:28:19 +01:00
luccioman
e45afedee4
Added support for enclosures (media links) to the RSS loader
2018-03-21 08:22:29 +01:00
luccioman
aaefd5219c
Reduce log verbosity of RSS loader on feed items with no link
2018-03-20 10:09:17 +01:00
luccioman
cf62b571bd
Added RSS reader support for enclosure
feed item sub element.
...
Enclosure element (see
http://www.rssboard.org/rss-specification#ltenclosuregtSubelementOfLtitemgt
) can be seen for example in podcasts feeds.
2018-03-20 07:38:29 +01:00
luccioman
e5f5de0fc7
Added some JavaDoc to the RSSMessage class.
2018-03-19 11:15:31 +01:00
luccioman
0d7625ecfb
Handle Solr fields restrict and alias in YaCy html and exml writers
...
Thus allowing for example to read more easily the local Solr index full
metadata in HTML by restricting if desired to some fields of interest.
See Solr documentation about the 'fl' (Field List) parameter at
https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefl_FieldList_Parameter
2018-03-16 11:35:42 +01:00
luccioman
3da2739bbd
Parse and index more common audio metadata text tag fields.
2018-03-15 09:59:57 +01:00
luccioman
846aba00fa
Added parsing of URLs eventually present in audio metadata tags
2018-03-13 23:08:52 +01:00
Michael Peter Christen
187075b878
added nav filter
2018-03-10 15:46:53 +01:00
luccioman
bcbd0ae1a4
Enabled partial parsing of audio resources.
2018-03-01 20:50:44 +01:00
luccioman
fda0189613
Updated audio file extensions with ones recently added to audioTagParser
2018-02-28 13:46:40 +01:00
luccioman
978e2be95b
Let a chance for other parsers on audioTagParser error
...
As done in all other parsers, eventually falling back in the end to the
genericParser which creates a minimal index entry.
2018-02-28 12:27:17 +01:00
luccioman
9e5846a26e
Small fix on svg parser error message
2018-02-28 12:23:52 +01:00
luccioman
11611dbdcf
Reuse existing File copy function to handle audio parser tmp files
2018-02-28 11:58:32 +01:00
luccioman
f77f8f40f9
Factored audio parser tag processing
2018-02-28 08:19:13 +01:00
luccioman
9a7a353d0e
Removed some unnecessary intermediate list creation on array copy.
2018-02-28 07:49:40 +01:00
luccioman
fb6457f5bc
Fixed NPE case when on audio resource parsed with null tag
2018-02-28 07:31:32 +01:00
luccioman
c3ff50c17a
Updated the list of audio file formats supported by the audioTagParser
...
Follows upgrade to Jaudiotagger dependency to version 2.2.5.
2018-02-27 18:04:12 +01:00
luccioman
1b90479a76
Added missing vocabulary navigator increment on results from RWI
2018-02-23 11:36:03 +01:00
luccioman
46c9da6428
Allow creation of vocabularies from remote CSV file URLs.
2018-02-21 08:41:13 +01:00
luccioman
17c7a85f18
Make StreamResponse usable in Java try-with-resources statements
2018-02-21 08:38:35 +01:00
luccioman
b67742336e
Provide user interface messages on vocabulary creation read/write errors
2018-02-19 11:48:40 +01:00
luccioman
3e8dd90211
Use https rather than http in links and queries to openstreetmap.org
2018-02-15 19:14:07 +01:00
luccioman
3a973dbb23
Removed unused import
2018-02-14 09:27:17 +01:00
luccioman
e9527cd0e5
Reuse the same Pattern instance when matching multiple key/values
2018-02-14 07:14:25 +01:00
luccioman
dbf4c1cd76
Improved blacklist entries editing operations :
...
- Fixes issue #160 : handle properly syntax exceptions with a user
friendly message
- Fixes loss of information on multiple blacklist entries editions
- Fixes loss of entries when moving entries from one list to another
2018-02-13 18:24:26 +01:00
reger
87077b8fb6
Adjust and move Language Navigator to be member of the navigatior plugin
...
list.
2018-02-12 00:16:34 +01:00
luccioman
eb20589e29
Fixed issue #158 : completed div CSS class ignore in crawl
2018-02-10 11:56:28 +01:00
luccioman
0cdee4e26a
Fixed loss of "meanCount" search param when using facets or page buttons
...
Then on new search queries, no suggestions at all could be displayed.
2018-02-08 08:07:30 +01:00
luccioman
117a859879
Do not clear all search modifiers when unselecting one modifier.
...
Previously, when clicking a selected facet in the search results page to
unselect it, all other eventually selected modifiers/facets were also
removed.
2018-02-07 15:54:46 +01:00
luccioman
33593c22e9
Fixed loss of other modifiers on keywords/tags search navigation links
2018-02-06 17:17:13 +01:00
luccioman
a9dc0874c0
Remove old query terms from search results suggestions links.
...
Especially when old terms were misspelled, suggestions links then
provided most of the time empty results.
2018-02-06 15:14:14 +01:00
luccioman
9412881230
Added basic support for autotagging microdata annotated item types.
...
With the appropriate vocabulary settings in Vocabulary_p.html page, this
can produce Vocabulary search facets displaying item types referenced in
html documents by microdata annotation.
Tested notably, but not limited to, vocabulary classes/types defined by
Schema.org and Dublin Core.
2018-02-06 10:25:38 +01:00
luccioman
5a14d34a7d
Refactoring : documented and extracted autotagging processing functions.
2018-02-02 10:27:36 +01:00
luccioman
58b9834729
Added HTML microdata typed items parsing capability.
...
This adds the possibility for the HTML parser to gather typed items URLs
annotated in HTML tags with itemscope and itemtype attributes (see
microdata specification https://www.w3.org/TR/microdata/ ), notably
Types from the schema.org vocabulary, but also Types/Classes from any
other vocabulary, such as the common ones listed in the RDFa core
context ( https://www.w3.org/2011/rdfa-context/rdfa-1.1.html ).
2018-02-02 09:31:40 +01:00
luccioman
80fb1026d0
Create recrawl requests with the relevant crawl profile.
...
Recrawl default profile was previously effectively used for crawl
stacker acceptance check, but request entries were indeed still created
with the "snippetGlobalText" profile.
2018-01-30 21:00:18 +01:00
luccioman
539925a275
Added an utility to generate/update XLIFF master file from lng files.
2018-01-29 18:34:47 +01:00
luccioman
fa6d030b0b
Moved dbtest to the test source folder.
2018-01-29 14:03:01 +01:00
luccioman
6cd3847d0a
Fixed NullPointerException case on Table init with relative file path.
...
Can occur for example when running dbtest with relative test table file
name (wihout explicit parent folder).
2018-01-29 14:00:43 +01:00
luccioman
28883d8a71
Shutdown daemon threads at the end of dbtest
2018-01-29 13:56:37 +01:00
luccioman
929e0d6eae
Replaced improper ByteBuffer.equals() implementation by Arrays.equals()
...
Renamed also ByteBuffer.equals() to startsWith() as this is the
appropriate function implementation semantics.
2018-01-29 13:38:25 +01:00
luccioman
46b5249c20
Removed time condition on HostBalancer initialization in JUnit test.
...
Its initialization in main application usage remains asynchronous.
2018-01-26 17:15:27 +01:00
luccioman
8b572b7337
Commit Solr index before simulating or starting recrawl job.
...
This ensures up-to-date simulation query results, and recrawl
processing.
2018-01-26 10:31:13 +01:00
luccioman
733cacdbb8
Revised the RDFaParser main launcher for minimal proper operation.
...
This parser is still not enabled in the main text parsers list. More
would have to be done to make it functional.
2018-01-25 07:57:56 +01:00
luccioman
7baa99f26f
Fixed stored URL in web cache when redirection(s) occurs.
...
Associate cached content to the last redirection location, instead of
the first URL of a redirection(s) chain :
- for proper base URL processing in parsers (fixes mantis 636 -
http://mantis.tokeek.de/view.php?id=636 )
- to prevent duplicated content in Solr index when recrawling a
redirected URL
2018-01-20 18:56:40 +01:00
luccioman
9ddf92d143
Removed unncessary reflection usage for workflow tasks.
...
This improves code readability and maintainability (calls hierarchy are
easier to read) and eventually performance.
2018-01-15 10:05:49 +01:00
luccioman
897d3d30cc
Added new recrawl job profile to the list of default crawl profiles
2018-01-15 08:30:37 +01:00
luccioman
9624516bf8
Refresh recrawl job profile threshold date like other default profiles
2018-01-15 08:06:28 +01:00
luccioman
b712a0671e
Added a specific default crawl profile for the recrawl job.
...
- with only light constraint on known indexed documents load date, as it
can already been controlled by the selection query, and the goal of the
job is indeed to recrawl selected documents now
- using the iffresh cache strategy
2018-01-13 15:46:04 +01:00
luccioman
adf3fa493d
Added comments about crawl profiles recrawl cycles
2018-01-13 12:13:04 +01:00