Commit Graph

4 Commits

Author SHA1 Message Date
reger
0b6db04e40 fix contentscraper img height/width parsing
prevent numberformat exception on common "100px" property

- include in test case
2014-04-28 04:59:47 +02:00
reger
86f6975edc exclude html tags in in/outboundlinks_anchortext_txt parsed text
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
2014-04-23 00:55:16 +02:00
reger
71649bf22d add test case htmlParser.parse - getCharset
(which fails)
2014-04-01 02:55:22 +02:00
reger
c8d437b69a clean up test sources
rename to current package names and move to default location
2014-02-27 22:48:17 +01:00