yacy_search_server/test/java/net/yacy/document
luccioman e41d046a9d Improved parsing support for OOXML spreadsheets (.xlsx)
As reported edycop in mantis 765 (
http://mantis.tokeek.de/view.php?id=765 ), parsing of xlsx files was
quite incomplete.
Now properly support "Shared String Table" entry in Office Open XML
spreadsheets, an also detect embedded URLs.

Integrating the Apache poi-ooxml library could be an option for finer
OOXML formats support, but their SAX style parsing example (
http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api ) tends to
show that a custom SAX handler is still efficient for lightweight and
low memory footprint processing.
2017-08-21 09:38:20 +02:00
..
content Extend DCEntry.getLanguage convert to ISO639-1 codes for more languages 2017-03-05 02:26:10 +01:00
parser Improved parsing support for OOXML spreadsheets (.xlsx) 2017-08-21 09:38:20 +02:00
DateDetectionTest.java migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8 2017-06-09 12:25:23 +02:00
DocumentTest.java Adjust mergeDocuments to keep youngest last-modified date of document 2017-05-09 22:52:54 +02:00
ParserTest.java Improved parsing support for OOXML spreadsheets (.xlsx) 2017-08-21 09:38:20 +02:00
TextParserTest.java Made mime type and extension normalization locale independent. 2017-06-26 17:33:56 +02:00
TokenizerTest.java optimize condenser language detection a little. 2016-10-06 19:03:52 +02:00
WordTokenizerTest.java reactivate sentence counter in WordTokenizer for phrasepos ranking, 2016-09-07 02:16:16 +02:00