Commit Graph

11 Commits

Author SHA1 Message Date
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
orbiter
65b1d51e70 added xml version of windows office test files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6244 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-27 12:45:15 +00:00
f1ori
67da20647f * add new odf parser based on sax-xml-parser
* remove odf_utils-jar
* test metadata in ParserTest


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-18 15:04:34 +00:00
orbiter
d553e4ff39 added visio test files and mime types
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6165 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-02 15:17:39 +00:00
lotus
bb570716e6 added more testfiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-15 09:00:24 +00:00
orbiter
84185baa81 added more test files for windows from lulabad
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-13 23:17:30 +00:00
orbiter
3246358485 mistake -> rename
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 20:10:52 +00:00
orbiter
55ec57d27f added linux umlute test files from low012
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5335 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 20:02:19 +00:00
orbiter
e9262b3890 re-named old test files
added more mac test files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5333 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 19:41:48 +00:00
orbiter
ff2a54da68 added more umlaute test files: mac
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 19:33:48 +00:00
orbiter
204220ecd5 added test files for UTF-8 / Umlaute - Testing:
These 3 files contain the same text in different HTML encodings. We use this documents to test if the parser and indexer creates the same set of word hashes for all three texts.

To use these files, run a indexing/crawling on them. To get the files inside the localhost-path, do the following:

cd <yacy-home>
rmdir DATA/HTDOCS/repository
ln -s test/parsertest DATA/HTDOCS/repository

you have then linked the test directory as repository directory which you can reach in yacy if you switch to intranet indexing mode. So the next step is to start yacy, then
- switch to intranet use case
- go to the crawl start page
- the repository directory should be the default path as crawl start
- start the crawl
- search for any word that appears in the demo texts
- search not only for words with umlautel but also for words without umlaute to ensure that you find _all_ three documents
- see how yacy presents the snippet with the text containing umlaute

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5293 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 11:07:14 +00:00