Commit Graph

30 Commits

Author SHA1 Message Date
reger
474f0476c6 adjust Tokenizer sentence count on trailing text after last recognized sentence
+ upd test case for rwi multi-word-query  (leaving results known to fail untested)
2016-10-05 05:52:37 +02:00
reger
1a79c64495 generalize DateDetection with holiday date rules readily available in icu
to make sure current dates are recognized (was fixed to 2014 - 2016)
+ adjust holiday date parser from pattern.match to pattern.find to deal with leading and trailing text
+ moved relative date recognition (morgen, tomorrow) to parseline (used by query parser only), as not working and problematic for indexing
+ add test case for parseline (used by query parser)
2016-10-02 03:19:12 +02:00
reger
32a2e3a22a have RSSFeed.getChannel return empty message on missing channel element,
a) required b) prevent NPE in rss servlets
+ add test
2016-09-30 21:46:57 +02:00
luccioman
4585a60d7e Made use of the constant corresponding to the hard-coded value. 2016-09-30 17:12:29 +02:00
luccioman
1bb0b135ac Avoid duplication of various MS Windows file URLs flavors
Fix for mantis 692 (http://mantis.tokeek.de/view.php?id=692)
2016-09-27 07:53:08 +02:00
reger
6f8c3ccea4 improve url hash computation for file path with mixed java & windows
file.separator to compute equal hashes (by normalizing path for computation)
+ expand test case for to check mixed java / windows file url notation
like e.g. file:///c:/test/file.html vs. file:///c:\test/file.html
- relates partially to http://mantis.tokeek.de/view.php?id=692
2016-09-25 22:08:12 +02:00
reger
330768c8a2 fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686
The embedded core holds a lock on the index and must be closed. Earlier commit
comment states that core should be closed with solr instance instead on close 
of connector.
Adjusted the InstanceMirror.close() to take care of closing the embedded 
instance to release the lock.
In 2 routines of fulltext this was already explicite implemented (disconnectLocalSolr).
Now this disconnect is part of the InstanceMirror.close().
2016-09-22 00:16:22 +02:00
reger
11786457b7 add test case for EmeddedSolrConnector close()
for issue http://mantis.tokeek.de/view.php?id=686
(without solving the issue here)
2016-09-21 21:08:21 +02:00
reger
585d2a6441 test case: for NewsPool to check the id modificator (for unique id)
and observe the distribution order .. hands on.
+ add test/DATA to gitignor
2016-09-20 01:55:56 +02:00
reger
ff6589fc0f test case: simulating multi word query for local rwi index
Purpose of the test case is to be able to (controlled) analyse the rwi ranking for
multi word searches (with focus on posintext and word-distance ranking)
2016-09-18 00:59:27 +02:00
reger
7f63fc50f3 prepare a IndexSegment test case for RWI index testing
+ prevent NPE in Segment.clear() on missing embedded solr instance.
2016-09-11 23:25:44 +02:00
reger
272cdd496a reactivate sentence counter in WordTokenizer for phrasepos ranking,
by counting punktuation (delivered as 1 char word) again.
2016-09-07 02:16:16 +02:00
Michael Peter Christen
5e165a8150 removed unused imports 2016-09-06 18:46:24 +02:00
reger
e310ec5f70 fix posInText ranking calculation to score 0 on no position info
+ fix Word posInText calc in Tokenizer to start with 1
+ test case
2016-09-06 00:05:59 +02:00
reger
39dd244693 fix ConcurrentScoreMap.set() calculation of totalCount()
+ test case
2016-09-04 22:18:07 +02:00
reger
ebde21079a refactor xlsParser to include Excel file attribute (like author) in parser result doc.
Similar to ppt and doc parser, completing a TODO in xlsParser.
2016-08-13 23:46:36 +02:00
reger
5e335b32da fix Blacklist.contains() matching path pattern to string
similar to 5e9e871192
+ add proof testcase
2016-08-04 01:12:49 +02:00
reger
f89d4eb51d fix MultiProtocolURL init (assign of host) for urls with '/' in query part
+ add to test case
2016-07-17 04:17:01 +02:00
reger
87fcfc6d78 Adjusted hash computation and toNormalform for file:// protocol to deliver
same hash same file on Windows filesystem path with forward- and backslash in path.
Background see http://mantis.tokeek.de/view.php?id=671
+Test case
2016-07-16 01:59:09 +02:00
reger
7b226afc33 fix HostQueueTest - changed open parameter 2016-07-06 23:52:02 +02:00
reger
fcc29c36f0 test case for HostBalancer issue in intranet mode
with file:// protocol, 2 hostqueues accessing same cache file concurrently
http://mantis.tokeek.de/view.php?id=668
Reason seems to be diff. hosthash key of hostqueues on reopen. 
Internal queue key and external representation (directoryname currently hostname.port) must be adjusted to fix it (not done yet).
2016-07-04 02:44:58 +02:00
reger
a476d06aec wiki header code test string add "closing" tag 2016-06-25 02:59:44 +02:00
reger
d4da4805a8 internal wiki code, require header line to start with markup
(to allow something like  "one=two"  as text)
+ incl. test case
2016-06-25 02:46:44 +02:00
reger
223071337b Translator to take caution of word boundaries to identify text portion to
be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it"
translation of partial terms/words.
Add check of word boundary before and after sourcetext (incl. take care
of current praxis for key to be delimetered by > < 
+ add test case
2016-06-10 01:14:19 +02:00
reger
a6ba1faa80 introduce a translation edit servlet Translator_p.html YaCy's UI text translation
This is the 1st rudimentary approach to support the translatio utilities.
It allows currently to edit untranslated text and save it in a local translation file
in the DATA/LOCALE directory.
+ refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine
+ adjust TranslatorXliff to check for local translations in DATA/LOCALE,
  this includes storing manually downloaded translation files in DATA as well 
  (to keep default untouched)
+ on 1st call of Translator_p a master tanslation file is generated, checking
the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages)
Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.
2016-06-03 01:46:30 +02:00
reger
b74cddc49c upd to Jetty v9.2.16.v20160414
- exclude unused mime4j
- remove unused yacy-cora build
2016-05-16 20:34:19 +02:00
reger
24b0fa2a38 extend snapshot Html2Image.pdf2image to use PDFBox image export capability
if no external tool installed (and for Win)
Resulting jpg are not always perfect (if graphic included) but imho sufficient.
2016-05-16 02:13:33 +02:00
reger
902e79e261 Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649
Allows longer term also to store translation maps for the htroot files 
in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ).
+ added test case creating and comparing xliff file with internal custom prop file.
(currently the introduced class is not used in core code)
2016-03-28 23:26:30 +02:00
reger
ec24a0c85a add test case for optimized toTokens() 2016-03-24 19:26:38 +01:00
reger
84c970eaec move test classes to test/java (subdirectory as in Maven standard subdir layout)
because ViewImage*Test.java breaks test run
2016-01-16 19:22:27 +01:00