yacy_search_server/test/java/net/yacy
luccioman e41d046a9d Improved parsing support for OOXML spreadsheets (.xlsx)
As reported edycop in mantis 765 (
http://mantis.tokeek.de/view.php?id=765 ), parsing of xlsx files was
quite incomplete.
Now properly support "Shared String Table" entry in Office Open XML
spreadsheets, an also detect embedded URLs.

Integrating the Apache poi-ooxml library could be an option for finer
OOXML formats support, but their SAX style parsing example (
http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api ) tends to
show that a custom SAX handler is still efficient for lightweight and
low memory footprint processing.
2017-08-21 09:38:20 +02:00
..
cora Add junit test for AbstractOperations.addOperand() 2017-08-14 02:16:43 +02:00
crawler Added HT Cache basic statistics (hit rate) 2017-06-15 09:50:02 +02:00
data Extended WikiCode template inclusion syntax support. 2017-04-27 09:50:04 +02:00
document Improved parsing support for OOXML spreadsheets (.xlsx) 2017-08-21 09:38:20 +02:00
http/servlets Adjust DefaultServlet test case to recent change, 2017-02-26 02:39:52 +01:00
kelondro Fixed read/copy on input streams reading sometimes less than expected. 2017-07-11 09:00:27 +02:00
peers Updated Javadoc and Junit tests for the WebStructureGraph class. 2017-01-17 17:01:56 +01:00
repository Improved new blacklist entries URL scheme detection. 2017-05-04 16:36:45 +02:00
search Refacture rwi reference word position and word distance calculation 2016-10-23 19:40:02 +02:00
server HTML validation : fixed URL encoding of Pictures link. 2016-10-14 09:58:14 +02:00
utils/translation Fixed 2 failing JUNit tests. 2017-01-09 17:59:01 +01:00