Commit Graph

290 Commits

Author SHA1 Message Date
reger
223071337b Translator to take caution of word boundaries to identify text portion to
be translated. To avoid key="TEST" sourcetext="this is a myTESTcase for it"
translation of partial terms/words.
Add check of word boundary before and after sourcetext (incl. take care
of current praxis for key to be delimetered by > < 
+ add test case
2016-06-10 01:14:19 +02:00
reger
a6ba1faa80 introduce a translation edit servlet Translator_p.html YaCy's UI text translation
This is the 1st rudimentary approach to support the translatio utilities.
It allows currently to edit untranslated text and save it in a local translation file
in the DATA/LOCALE directory.
+ refactor Translator (less static's) to leverage on class overrides and support garbage collection for this 1 time routine
+ adjust TranslatorXliff to check for local translations in DATA/LOCALE,
  this includes storing manually downloaded translation files in DATA as well 
  (to keep default untouched)
+ on 1st call of Translator_p a master tanslation file is generated, checking
the supported languages for missing translation text (later this masterfile is planned to part of the distribution, to harmonize translation key text between the languages)
Outlook: the local modifications (possibly as translation fragments instead of complete file) to be shared with maintainer using xlif features.
2016-06-03 01:46:30 +02:00
reger
b74cddc49c upd to Jetty v9.2.16.v20160414
- exclude unused mime4j
- remove unused yacy-cora build
2016-05-16 20:34:19 +02:00
reger
24b0fa2a38 extend snapshot Html2Image.pdf2image to use PDFBox image export capability
if no external tool installed (and for Win)
Resulting jpg are not always perfect (if graphic included) but imho sufficient.
2016-05-16 02:13:33 +02:00
reger
902e79e261 Introduce a TranslatorXliff wich can read/write xliff from/to internal translation map.
This eases up suggested initatives from http://mantis.tokeek.de/view.php?id=649
Allows longer term also to store translation maps for the htroot files 
in standardized/reuseable xliff format ( http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html ).
+ added test case creating and comparing xliff file with internal custom prop file.
(currently the introduced class is not used in core code)
2016-03-28 23:26:30 +02:00
reger
ec24a0c85a add test case for optimized toTokens() 2016-03-24 19:26:38 +01:00
luc
26f1ead57c Created ViewFavicon class specialized in favicon viewing.
Main image processing is now in ImageViewer, used by both ViewImage and
ViewFavicon.

Fixed URIMetadataNode.getFavicon to use non-standard icons with no size
ass fallback.
2016-02-09 20:46:44 +01:00
luc
07222b3e1a Added favicon url transmission in RWI chunks. 2016-02-05 17:05:36 +01:00
luc
53781299d8 Extracted intranet and filtype related rules from getFaviconURL func 2016-02-04 08:14:49 +01:00
luc
3cc5619d93 Improved HTML icons indexing and rendering in search results.
See http://mantis.tokeek.de/view.php?id=629
2016-02-02 09:57:54 +01:00
luc
ef83e34b8a Merge branch 'master' of https://github.com/yacy/yacy_search_server 2016-01-19 08:06:49 +01:00
reger
84c970eaec move test classes to test/java (subdirectory as in Maven standard subdir layout)
because ViewImage*Test.java breaks test run
2016-01-16 19:22:27 +01:00
luc
cfdbc2b487 Improved URLLicence reliability for use by conccurrent non authaurized
users.
Removed URLLicence generation when unnecessary (authorized users)
2016-01-08 20:42:57 +01:00
luc
571bc55937 Refactoring : use StandardCharsets constants instead of hard-coded
charset names.
2016-01-05 23:37:05 +01:00
reger
1af0e9ef74 remove workaround for Solr bug regarding multivalued date fields
fixed in 5.4.0
http://issues.apache.org/jira/browse/SOLR-8050
2016-01-03 01:11:27 +01:00
reger
4d2b934487 prevent mailto links getting into parser result document's in/outbound link collection
by checking mailto scheme early.
- fix upper case mailto protocol assignment
- add test case for getProtocol
2015-12-16 03:01:17 +01:00
reger
288acceac3 fix test htmlParserTest, charset parameter
+ upd maven templating-plugin version
2015-12-15 02:09:43 +01:00
luc
f01d49c37a Process large or local file images dealing directly with content
InputStream.
2015-11-18 10:15:38 +01:00
luc
0de6988604 Added links to more image test suites. 2015-11-12 08:21:37 +01:00
luc
745e97a575 Merge branch 'master' of https://github.com/yacy/yacy_search_server 2015-11-02 08:10:11 +01:00
luc
2895ab552a Made ViewImagePerfTest extend ViewImageTest to ease automated image
render tests
2015-10-30 04:19:56 +01:00
luc
4a03cf06e1 Corrected encoding extension arg parsing 2015-10-29 02:24:17 +01:00
reger
d223cf0ae4 adjust MediaWiki importer geo coordinate calculation
- allow lat/long 0.xxx
- south / west assignment
include test class
2015-10-26 21:19:35 +01:00
luc
8da20718aa Created a class to test ViewImage rendering against multiple image
files.
2015-10-23 15:49:07 +02:00
luc
ec04d27473 Corrected APNG test suite link name. 2015-10-23 14:12:00 +02:00
luc
cbb84ba073 Detailed javadoc. 2015-10-23 13:57:24 +02:00
luc
70111876d2 Filled ViewImageTest.html with all remaining IANA image file formats.
Added some links to test suites and specifications.
2015-10-23 12:27:52 +02:00
luc
e093fb228d Created a generic ViewImage performance render test. 2015-10-15 09:18:24 +02:00
luc
3ad564e2e4 Created a ViewImage rendering performance measurement test. 2015-10-14 10:17:09 +02:00
luc
b3f044072e Updated table headers and SVG file url for case sensitive OS. 2015-10-14 10:13:37 +02:00
luc
f5746b5490 Added ico and bmp sample pictures 2015-10-06 20:48:09 +02:00
luc
baede48161 Added JPEG 2000 and FITS samples 2015-10-06 09:53:09 +02:00
luc
7c9d80c5d0 Added image formats and informations for each format. 2015-10-06 09:51:47 +02:00
luc
0ae9297ca5 Created a html test page to check ViewImage rendering with different
file formats.
2015-10-02 12:41:30 +02:00
reger
bad34804fe optimize parseInt for <img> tag attribute parsing
Performance better as using Numberformat.parse or parseInt(substring())
2015-09-26 15:42:23 +02:00
reger
d2cc11ea8f fix html parser taking <style> content as text.
Noticed some result description contain css content from style tag.
Added <style> to tag list to scrape it's content not as text
+ test case included
2015-09-19 05:30:55 +02:00
reger
e594130aec add test case for partial update - to discover effect on YaCy for update of documents with multivalued date fields (like dates_in_content_dts)
current result: loss of fields/information in index document, see EmbeddedSolrConnectorTest.testUdate_withMultivaluedDateField()
2015-09-13 06:02:07 +02:00
reger
d5da9e5a38 fix test methode (add throw for URIMetadataNode) 2015-09-12 20:07:43 +02:00
reger
4cf875336c complete TODO: getFileExtension handle dot in query part
+ testcase
2015-08-31 23:28:03 +02:00
reger
c37dda8849 fix NPE on MultiProtocolURL on url with parameter value and '='
in getAttribute
- added test case for it
2015-05-12 01:09:10 +02:00
reger
71bf95af8a upd parser calls in test cases 2015-04-25 03:24:28 +02:00
reger
f63fff9008 fix snippet containig number with comma as desmo point http://mantis.tokeek.de/view.php?id=344
to keep it as one word (by altering the split regex)
- added sniipet test case with number
- regex for word split to match multiple splitcars
2015-03-16 02:03:40 +01:00
reger
2ef8ffdb60 apply UTF-8 encoding
copied from escape()
2015-03-15 06:02:45 +01:00
reger
7120ea42f1 fix for path with char code > 255
(causing index out of bound exception)
+ test cas for it
2015-03-15 03:37:32 +01:00
reger
1d81bd0687 fix url encoding for path see http://mantis.tokeek.de/view.php?id=559
So far we used same escape procedure for all parts of the url (which includes x-www-form-urlencoded for all url components)
Added capability to use different encoding rules for the different url components (through specific bitset for each component).
(this is inspired by org.apache.http.client and java.net.uri implementation).
- Added test case for  http://mantis.tokeek.de/view.php?id=559
2015-03-15 00:46:07 +01:00
reger
f94e34058c fix url (path) %-decoding http://mantis.tokeek.de/view.php?id=519
- add test case for this
2015-03-11 01:05:14 +01:00
reger
16bc267a32 add test case for snippet html encoding check 2015-03-01 23:50:17 +01:00
reger
77851fa53c fix parser test cases
(Vocabulary paramete)
2015-02-11 01:43:02 +01:00
reger
df83fcc4fc disable optimistic GC assumption in StandardMemoryStrategy
After several tests found that eom is not prevented. Major reason in testing was assumption future GC will free avg of last 5 GC.
Disabeling this check improved eom exceptions.

Added simplest testcase used for verification
2015-02-11 01:42:01 +01:00
Michael Peter Christen
68c605d637 replace with CommonPattern.SPACE for split 2015-01-29 02:28:03 +01:00
reger
9edc7308aa update to metadata-extractor-2.7.0.jar
add 2 simple JUnit test cases for jpeg and tif parsing
2014-12-15 20:45:05 +01:00
reger
5d67e165d9 remove redundant null check in ResponseHeader.lastModified
added a JUnit testcase for ResponseHeader dates (using age()),
adjusted age() to pass all tests
2014-12-09 00:58:08 +01:00
reger
ea633a794c including small junit test case for WordTokenizer 2014-11-29 22:13:24 +01:00
reger
aa2e15d846 allow url parameter in worktable apicall
allow url=wwwl?param=a&param=b (with ?, & encoded)
fix:  http://mantis.tokeek.de/view.php?id=100

fix double adding of  '&' in MultiProtocolURL.escape()
2014-10-05 20:05:03 +02:00
reger
e88537522d allow single quote " ' " in query
see http://mantis.tokeek.de/view.php?id=379
-add QueryGoal test case for this
2014-08-16 14:29:52 +02:00
reger
e50b2b4d04 fix test case MultiProtocolURL.toString()
(only allowed on AnchorURL)
2014-08-11 04:29:43 +02:00
reger
b510b182d8 - update Maven pom
- add ppt parser test case
2014-08-01 01:47:53 +02:00
Michael Peter Christen
2de159719b added an option to set 'obey nofollow' for links with rel="nofollow"
attribute in the <a> tag for each crawl. This introduces a lot of
changes because it extends the usage of the AnchorURL Object type which
now also has a different toString method that the underlying
DigestURL.toString. It is therefore not advised to use .toString at all
for urls, just just toNormalform(false) instead.
2014-07-18 12:43:01 +02:00
reger
1f2eba977d add test case for Records (used in HostBalancer)
- simulating seek error (http://mantis.tokeek.de/view.php?id=411)
2014-07-06 20:41:26 +02:00
reger
e94efd4d7c update to JUnit 4.11
- fix build.xml -> parserTest error on Windows due to javac encoding
2014-07-06 05:38:32 +02:00
reger
3b77e41f1a adding test for HostQueue crawl stack
- simulating problem with zero length stack file (but not fixing it)
- adding test data clean to maven pom
2014-07-06 00:38:16 +02:00
reger
431a5f9c4e added test case for TextSnippet,
removed obsolete/unused parameter and reference to MediaSnippet
2014-06-30 05:36:48 +02:00
reger
7847a93558 fix AbstractParser.singleList not adding null strings
- prevents null titles in oo... parser  (as detected by ParserTest)
- correct ParserTest dc_description check (dc_description allowed to return 0 length array)
2014-06-26 02:56:45 +02:00
reger
0b6db04e40 fix contentscraper img height/width parsing
prevent numberformat exception on common "100px" property

- include in test case
2014-04-28 04:59:47 +02:00
reger
bb8181b2be fix: resolve url without path but searchpart
e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/"
fixes http://mantis.tokeek.de/view.php?id=47

added test case for getHost
2014-04-25 20:15:55 +02:00
reger
86f6975edc exclude html tags in in/outboundlinks_anchortext_txt parsed text
- some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags,
remove all tags for text property (inline img tags are still parsed)
- added test case for above (to htmlParserTest)
- fix solr test case
2014-04-23 00:55:16 +02:00
reger
71649bf22d add test case htmlParser.parse - getCharset
(which fails)
2014-04-01 02:55:22 +02:00
reger
6878c90f99 fix: IPv6 INTRANET_PATTERNS for local ip (see http://bugs.yacy.net/view.php?id=378)
requiring following ":" for fc and fd prefix and made pattern match case insesitive
- add some more ipv6 test cases to MultiProtocolURLTest.java
2014-03-02 06:13:21 +01:00
reger
c8d437b69a clean up test sources
rename to current package names and move to default location
2014-02-27 22:48:17 +01:00
reger
18a56446ce reorg URL test classes add isLocal test with some IPv6 examples
- putting in default location and clean old package names
- add some valid RFC IPv6 sample urls (which don't pass the isLocal test)
2014-02-26 02:01:40 +01:00
reger
10a6346056 clean-up test cases
to work with current source
2013-12-01 03:38:58 +01:00
reger
b4fdb8c887 cleanup test directory from Jetty 9 implementation samples
- current Jetty implementation advances so that it seems not beneficial to keep the code
as it makes the test unuseable and use of Jetty 9 is due to Java 1.7 dependency not in sight.
2013-11-10 22:01:31 +01:00
reger
71d2655c02 downgrade to Jetty 8 to assure support of JRE 1.6
- introduce a YaCyHttp interface to modulize/separate http server
- adjust the Jetty version specific implementation part (in package net.yacy.http)
     - putting the version specific code in classes starting with Jetty8xxxx
     - moved existing Jetty9xxx implementation into a test class (to keep the code)
- adjust build to the changed jars
- make use of the introduced YaCyHttpServer interface in related htroot servlets

- adjust other test cases/classes
2013-10-09 00:40:48 +02:00
reger
f7f86d8a5d update to Jetty 9 jars
- include javax.servlet 3.0
2013-09-14 20:49:05 +02:00
reger
fe87fb638a adjust test/ParserTest to dc_description data type 2013-09-10 20:05:10 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
reger
97ab5b90e8 - odt & ooxml (office document) parser correction to add content to fulltext index
- adjust Junit yacyVersionTest & ParserTest 
- update yacyVersion.combined2prettyVersion to the default 4-digit minor ver.
2013-05-20 01:50:09 +02:00
reger
4fec35a665 adjust Test case EmbeddedSolrConnector 2013-05-03 03:55:14 +02:00
reger
160ce568b3 move testing SolrServlet.main to test, making include of jetty*.jar in distribution and classpath obsolete
- move jetty*.jar to test library 
- move SolrServlet.main as is to test, add also a junit test simulating main 
  - add build.xml cleanup for EmbeddedSolrConnectorTest created test/DATA
- adjust some test compile errors
2013-02-03 22:32:38 +01:00
orbiter
d2ea250d99 refactoring:
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00
orbiter
49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 10:08:57 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
orbiter
cd19d0517e added dns resolve to HTTPClient POST using a dns cache to prevent that that not-thread-safe built-in dns cache inside apache http client is used
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7513 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-22 22:58:19 +00:00
f1ori
01cb3bbaec * fix patchCharsetEncoding-test (patchCharsetEncoding now returns null on input null)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7465 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 15:28:41 +00:00
f1ori
fd74bc388c * fix small bug in sessionid-removal
* add testcase for seesionid-removal

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7333 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-11-21 23:55:40 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
844f158686 - removed dependencies in header framework:
moved http date methods from DateFormatter to HeaderFramework
  changed logging to log4j
- added ftp load access to MultiProtocolURI
- ensured termination of RSS feed iteration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 11:41:12 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
b68deb407a - moved test data from /bin to /test/words
- refactoring of stopYACY.sh by introduction of /bin/apicall which is able to call any api file with attached authorization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6691 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-02-22 20:14:16 +00:00
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
orbiter
b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 00:53:43 +00:00
f1ori
34c71b22e8 fix and enable parser unit tests (tested with eclipse)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-16 09:33:18 +00:00
orbiter
ce8dc575ca refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 00:12:19 +00:00
orbiter
bea3b99aff moved table and util classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 01:14:19 +00:00
orbiter
ce7924d712 better concurrency for rwi entry parsing during search processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6273 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-27 22:06:52 +00:00
orbiter
72ac5bd80f refactoring of search process.
this is the beginning of some architecture changes that will hopefully bring some more stability, speed and transparency to the search process.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6260 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-24 15:24:02 +00:00
f1ori
d515bc11e2 added ooxmlparser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6256 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-08 15:34:41 +00:00
f1ori
8c1b02af04 * fix warning in testcase
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6255 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-08 15:18:02 +00:00
orbiter
65b1d51e70 added xml version of windows office test files
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6244 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-27 12:45:15 +00:00
f1ori
67da20647f * add new odf parser based on sax-xml-parser
* remove odf_utils-jar
* test metadata in ParserTest


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-18 15:04:34 +00:00
f1ori
06557485f5 * added parser unittest!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6229 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 22:03:34 +00:00
f1ori
69dfd03985 reactivate unittests
* fix old tests
* add buildtarget "ant test"


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 20:58:21 +00:00
orbiter
d553e4ff39 added visio test files and mime types
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6165 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-02 15:17:39 +00:00
lotus
bb570716e6 added more testfiles
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-15 09:00:24 +00:00
orbiter
84185baa81 added more test files for windows from lulabad
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-13 23:17:30 +00:00
orbiter
3246358485 mistake -> rename
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 20:10:52 +00:00
orbiter
55ec57d27f added linux umlute test files from low012
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5335 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 20:02:19 +00:00
orbiter
e9262b3890 re-named old test files
added more mac test files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5333 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 19:41:48 +00:00
orbiter
ff2a54da68 added more umlaute test files: mac
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-12 19:33:48 +00:00
orbiter
204220ecd5 added test files for UTF-8 / Umlaute - Testing:
These 3 files contain the same text in different HTML encodings. We use this documents to test if the parser and indexer creates the same set of word hashes for all three texts.

To use these files, run a indexing/crawling on them. To get the files inside the localhost-path, do the following:

cd <yacy-home>
rmdir DATA/HTDOCS/repository
ln -s test/parsertest DATA/HTDOCS/repository

you have then linked the test directory as repository directory which you can reach in yacy if you switch to intranet indexing mode. So the next step is to start yacy, then
- switch to intranet use case
- go to the crawl start page
- the repository directory should be the default path as crawl start
- start the crawl
- search for any word that appears in the demo texts
- search not only for words with umlautel but also for words without umlaute to ensure that you find _all_ three documents
- see how yacy presents the snippet with the text containing umlaute

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5293 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-22 11:07:14 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
theli
2399ed817c *) robots.txt parser now extracts the sitemap-URL (will be used later)
*) some javadoc added
*) junit testclass for robots.txt parser added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-26 15:42:38 +00:00
theli
1b7fda12ee *) SOAP: separate function to get the active/passive/potential peer list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-28 07:34:44 +00:00
auron_x
d451ad48d3 *) improved peerloadgraphic:
- unnecessary (0 %) pieces are removed
 - percent-values of each thread displayed in legend

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-12 19:08:17 +00:00
karlchenofhell
a1d68fe092 - use .class rather than Class.forName for classes in class-path
- added Bost's patch for Diff.findDiagonale() from: http://www.yacy-forum.de//files/patch_685.txt
- fixed minor bugs in Blog

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 22:52:22 +00:00
orbiter
d25caa07bf redesigned some parts of http authentication
added another access check for peer hops

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-05 19:46:50 +00:00
theli
eb20ec3837 *) soap-service: adding function to check if a specific url is blacklisted
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3014 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-26 08:53:43 +00:00
theli
5c0669429e *) soap: adding function to query the peer list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-17 19:26:55 +00:00
theli
203f2bde9a *) adding function to query the pause/resume state of the crawling queues
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2958 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-13 06:25:15 +00:00
theli
6d3a130878 *) bugfix needed because of db refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-13 06:13:15 +00:00
theli
892b9f2fc4 *) additional soap function to query peer status
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 16:46:32 +00:00
theli
bd3710a974 *) new xml template to view peer profile as xml
*) bugfix for wrong profile display (some fields where displayed twice)
*) new soap functions to get and set peer profile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 16:26:25 +00:00
theli
d1afe1ce6b *) adding xml template to get the message list as xml
*) Bugfix in client stub jar generation (too many files where added)
*) new soap service to manage peer messages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 15:18:33 +00:00
theli
f37e2041e8 *) adding soap function to import yacy bookmarks from xml or html (transfered via soap attachments)
*) soapHandler: code cleanup for service deployment

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2915 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 09:56:39 +00:00
theli
4a3ec63e34 *) new soap service to manage yacy bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2906 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-04 13:47:43 +00:00
theli
5e57e0814d *) new soap function to display log
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-03 14:39:48 +00:00
theli
c7bea4addb *) soap api
- adding function to get and set message forwarding
   - adding new testclass 


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-29 08:40:48 +00:00
theli
8bdf22f325 *) addapting test class to new function name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2873 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-28 13:07:17 +00:00
theli
532c23b5c7 *) soap handler
- better errorhandling 
   - adding support for outgoing transfer- and content-encoding
   - avoid holding outgoing messages into memory before sending them

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2872 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-28 12:31:48 +00:00
theli
7299dc30e3 *) new soap service to manage the yacy file-share
- upload / download files (as soap attachment)
   - create directory
   - receive directory listing
   - delete files / directories
   - change file comment

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 12:15:56 +00:00
theli
9e8942a064 *) adding method to implement blacklist from file
- file transfer is done via soap attachments (see BlaclistSerivceTest for details)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 06:18:19 +00:00
theli
d38ef0493d *) be more tolerant against missing ports in url
"http://yacy.net:/" is now interpreted as "http://yacy.net/"
   See: http://www.yacy-forum.de/viewtopic.php?p=27102

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2852 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 05:22:54 +00:00
theli
cfe54fedc7 *) Bugfix for resolveBackpath problem with tailing /..
*) Junit testclass for resolveBackpath testing 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 05:07:34 +00:00
theli
ac13fa763a *) bugfix for blacklist remove (blacklist was not informed about remove)
*) adding new soap service class for blacklist management
*) new junit class to test soap blacklist service

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-22 08:32:55 +00:00
theli
3e0516446b *) new soap function to get the current queue status
*) new junit testclass to test soap statusService
*) refactoring of admin service (usage of constants instead of strings)
*) libraries upgraded to newer version + adding missing dependency

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-21 15:11:01 +00:00
theli
92f774edd1 *) Better charset encoding detection
*) New testclass for charset encoding detection tests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2808 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 07:02:18 +00:00
theli
eedb898c45 *) adding date parsing test routine to determine if we have a date-parsing bug
See: http://www.yacy-forum.de/viewtopic.php?t=3007

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 05:50:08 +00:00
theli
07d9309b95 *) Adding YaCy Version Parsing testclass by bost
See: http://www.yacy-forum.de/viewtopic.php?t=2717

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 05:36:46 +00:00