Commit Graph

51 Commits

Author SHA1 Message Date
reger
10a6346056 clean-up test cases
to work with current source
2013-12-01 03:38:58 +01:00
reger
71d2655c02 downgrade to Jetty 8 to assure support of JRE 1.6
- introduce a YaCyHttp interface to modulize/separate http server
- adjust the Jetty version specific implementation part (in package net.yacy.http)
     - putting the version specific code in classes starting with Jetty8xxxx
     - moved existing Jetty9xxx implementation into a test class (to keep the code)
- adjust build to the changed jars
- make use of the introduced YaCyHttpServer interface in related htroot servlets

- adjust other test cases/classes
2013-10-09 00:40:48 +02:00
reger
fe87fb638a adjust test/ParserTest to dc_description data type 2013-09-10 20:05:10 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
reger
97ab5b90e8 - odt & ooxml (office document) parser correction to add content to fulltext index
- adjust Junit yacyVersionTest & ParserTest 
- update yacyVersion.combined2prettyVersion to the default 4-digit minor ver.
2013-05-20 01:50:09 +02:00
reger
160ce568b3 move testing SolrServlet.main to test, making include of jetty*.jar in distribution and classpath obsolete
- move jetty*.jar to test library 
- move SolrServlet.main as is to test, add also a junit test simulating main 
  - add build.xml cleanup for EmbeddedSolrConnectorTest created test/DATA
- adjust some test compile errors
2013-02-03 22:32:38 +01:00
orbiter
d2ea250d99 refactoring:
- moved many classes from de.anomic to net.yacy
- made more sub-packages for search classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-25 16:59:06 +00:00
orbiter
49e5ca579f added new configuration property "crawler.embedLinksAsDocuments". If this is switched on (this is default now), the all embedded image, audio and video links from all parsed documents are added to the search index as individual document. This will increase the search index size dramatically but will also enable us to create a much faster image, audio and video search. If the flag is switched on, the index entries are also stored to a solr index, if this is also enabled.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-09-07 10:08:57 +00:00
orbiter
cb1f49d0f2 replaced all 'new String' with default encoding (missing) or UTF-8 encoding with a String generation method that uses a pre-defined Charset constant for UTF-8. This avoids a cache-lookup for the Charset object using String hashing of the String 'UTF-8'.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-03-07 20:36:40 +00:00
f1ori
01cb3bbaec * fix patchCharsetEncoding-test (patchCharsetEncoding now returns null on input null)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7465 6c8d7289-2bf4-0310-a012-ef5d649a1542
2011-02-02 15:28:41 +00:00
orbiter
3197ca42ed preparations to move the HTCache into cora:
- move the header framework classes to cora
- move the ARC caching classes to cora
- refactoring of code to call these classes from cora

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 12:32:02 +00:00
orbiter
844f158686 - removed dependencies in header framework:
moved http date methods from DateFormatter to HeaderFramework
  changed logging to log4j
- added ftp load access to MultiProtocolURI
- ensured termination of RSS feed iteration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-23 11:41:12 +00:00
orbiter
b6fb239e74 redesign of parser interface:
some file types are containers for several files. These containers had been parsed in such a way that the set of resulting parsed content was merged into one single document before parsing. Using this parser infrastructure it is not possible to parse document containers that contain individual files. An example is a rss file where the rss messages can be treated as individual documents with their own url reference. Another example is a surrogate file which was treated with a special operation outside of the parser infrastructure.
This commit introduces a redesigned parser interface and a new abstract parser implementation. The new parser interface has now only one entry point and returns always a set of parsed documents. In case of single documents the parser method returns a set of one documents.
To be compliant with the new interface, the zip and tar parser had been also completely redesigned. All parsers are now much more simple and cleaner in its structure. The switchboard operations had been extended to operate with sets of parsed files, not single parsed files.
additionally, parsing of jar manifest files had been added.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-06-29 19:20:45 +00:00
orbiter
11639aef35 - added new protocol loader for 'file'-type URLs
- it is now possible to crawl the local file system with an intranet peer
- redesign of URL handling
- refactoring: created LGPLed package cora: 'content retrieval api' which may be used externally by other applications without yacy core elements because it has no dependencies to other parts of yacy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-05-25 12:54:57 +00:00
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
orbiter
b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 00:53:43 +00:00
f1ori
34c71b22e8 fix and enable parser unit tests (tested with eclipse)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-16 09:33:18 +00:00
orbiter
ce8dc575ca refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-11 00:12:19 +00:00
orbiter
bea3b99aff moved table and util classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-10 01:14:19 +00:00
orbiter
ce7924d712 better concurrency for rwi entry parsing during search processing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6273 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-27 22:06:52 +00:00
orbiter
72ac5bd80f refactoring of search process.
this is the beginning of some architecture changes that will hopefully bring some more stability, speed and transparency to the search process.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6260 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-24 15:24:02 +00:00
f1ori
d515bc11e2 added ooxmlparser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6256 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-08 15:34:41 +00:00
f1ori
8c1b02af04 * fix warning in testcase
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6255 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-08-08 15:18:02 +00:00
f1ori
67da20647f * add new odf parser based on sax-xml-parser
* remove odf_utils-jar
* test metadata in ParserTest


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6231 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-18 15:04:34 +00:00
f1ori
06557485f5 * added parser unittest!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6229 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 22:03:34 +00:00
f1ori
69dfd03985 reactivate unittests
* fix old tests
* add buildtarget "ant test"


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6228 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-17 20:58:21 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
theli
2399ed817c *) robots.txt parser now extracts the sitemap-URL (will be used later)
*) some javadoc added
*) junit testclass for robots.txt parser added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-26 15:42:38 +00:00
theli
1b7fda12ee *) SOAP: separate function to get the active/passive/potential peer list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3526 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-28 07:34:44 +00:00
karlchenofhell
a1d68fe092 - use .class rather than Class.forName for classes in class-path
- added Bost's patch for Diff.findDiagonale() from: http://www.yacy-forum.de//files/patch_685.txt
- fixed minor bugs in Blog

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 22:52:22 +00:00
orbiter
d25caa07bf redesigned some parts of http authentication
added another access check for peer hops

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-05 19:46:50 +00:00
theli
eb20ec3837 *) soap-service: adding function to check if a specific url is blacklisted
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3014 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-26 08:53:43 +00:00
theli
5c0669429e *) soap: adding function to query the peer list
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-17 19:26:55 +00:00
theli
203f2bde9a *) adding function to query the pause/resume state of the crawling queues
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2958 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-13 06:25:15 +00:00
theli
6d3a130878 *) bugfix needed because of db refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-13 06:13:15 +00:00
theli
892b9f2fc4 *) additional soap function to query peer status
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 16:46:32 +00:00
theli
bd3710a974 *) new xml template to view peer profile as xml
*) bugfix for wrong profile display (some fields where displayed twice)
*) new soap functions to get and set peer profile

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 16:26:25 +00:00
theli
d1afe1ce6b *) adding xml template to get the message list as xml
*) Bugfix in client stub jar generation (too many files where added)
*) new soap service to manage peer messages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 15:18:33 +00:00
theli
f37e2041e8 *) adding soap function to import yacy bookmarks from xml or html (transfered via soap attachments)
*) soapHandler: code cleanup for service deployment

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2915 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 09:56:39 +00:00
theli
4a3ec63e34 *) new soap service to manage yacy bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2906 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-04 13:47:43 +00:00
theli
5e57e0814d *) new soap function to display log
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-03 14:39:48 +00:00
theli
c7bea4addb *) soap api
- adding function to get and set message forwarding
   - adding new testclass 


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2878 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-29 08:40:48 +00:00
theli
532c23b5c7 *) soap handler
- better errorhandling 
   - adding support for outgoing transfer- and content-encoding
   - avoid holding outgoing messages into memory before sending them

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2872 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-28 12:31:48 +00:00
theli
7299dc30e3 *) new soap service to manage the yacy file-share
- upload / download files (as soap attachment)
   - create directory
   - receive directory listing
   - delete files / directories
   - change file comment

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2857 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 12:15:56 +00:00
theli
9e8942a064 *) adding method to implement blacklist from file
- file transfer is done via soap attachments (see BlaclistSerivceTest for details)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 06:18:19 +00:00
theli
d38ef0493d *) be more tolerant against missing ports in url
"http://yacy.net:/" is now interpreted as "http://yacy.net/"
   See: http://www.yacy-forum.de/viewtopic.php?p=27102

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2852 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 05:22:54 +00:00
theli
cfe54fedc7 *) Bugfix for resolveBackpath problem with tailing /..
*) Junit testclass for resolveBackpath testing 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 05:07:34 +00:00
theli
ac13fa763a *) bugfix for blacklist remove (blacklist was not informed about remove)
*) adding new soap service class for blacklist management
*) new junit class to test soap blacklist service

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-22 08:32:55 +00:00
theli
3e0516446b *) new soap function to get the current queue status
*) new junit testclass to test soap statusService
*) refactoring of admin service (usage of constants instead of strings)
*) libraries upgraded to newer version + adding missing dependency

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-21 15:11:01 +00:00
theli
92f774edd1 *) Better charset encoding detection
*) New testclass for charset encoding detection tests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2808 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 07:02:18 +00:00