Commit Graph

6128 Commits

Author SHA1 Message Date
orbiter
19f31bb043 - moved OAI-PMH source list file from SETTINGS to DICTIONARIES/harvesting
- added convenience method for loading of files from the web in LoaderDispatcher

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6455 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 16:15:28 +00:00
orbiter
2889b9426e missing code for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 12:03:19 +00:00
orbiter
b6a8887ff5 better handling of running sessions without explicit hashtable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 11:59:15 +00:00
orbiter
1dc7ea986a added a dynamic keep-alive time-out for http server sessions:
if there are many concurrent server sessions, the timout is decreased.
This should avoid a situation where the clean-up thread is too
late to stop running http sessions that should be terminated
before the maximum number of server sessions is reached.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-04 11:01:09 +00:00
low012
e77c906673 *) minor changes mainly in comments
*) added svn:keyword settings for several files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-03 22:47:53 +00:00
low012
f1740edbf8 *) added skript to change memory settings, password and port (experimental, don't blame me if it messes up your configuration)
*) minor change in Digest class, added option in main method, might not be optimal yet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6450 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-03 22:28:29 +00:00
orbiter
11f7da06ed - fixes to csv parser
- automatic OAI-PMH import by just clicking on one link from the provided resource list

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6449 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-03 21:18:19 +00:00
orbiter
9b6762ec2e - added a csv "comma separated values" parser to parse OAI-PMH sources from
http://roar.eprints.org/index.php?action=csv
- integrated the csv parser into the crawlers parser list
- added an extension to the OAI-PMH import function to download and show the roar csv file using the csv parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6448 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-03 20:10:59 +00:00
orbiter
0f63de8236 - it is now possible to start several OAI imports concurrently
(still not possible to start them with one single request, that will be next)
- added a monitor for all running and finished OAI imports (with a little bit of animation..)



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6447 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-03 16:15:22 +00:00
orbiter
176e334aa4 fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-02 19:23:05 +00:00
orbiter
2fa6bf440b workflow update to OAI-PMH importer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6445 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-02 18:19:30 +00:00
orbiter
b0b7a4f9a5 - added function to OAI-PMH reader that can pull all records from a server using an evaluation of the resumption token to get URL to retrieve remaining records
- added monitoring for retrieved records

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6444 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-02 11:53:14 +00:00
orbiter
350d13e153 very first working version of oai-pmh importer: if given the right url, the importer can read and index listRecord xml files and calculate the right resumptionURL which is then given as next default start point for the importer url input.
no automatic harvesting by now, this will be done later

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6443 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-02 00:14:14 +00:00
lotus
58616d99e4 patch for yacy disk usage detection on lvm host
by Michael S.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-11-01 08:54:16 +00:00
lotus
79251e6f60 configurable disk space hardlimit for dht
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-31 19:12:53 +00:00
orbiter
a0e891c63d - some redesign in UI menu structure to make room for new 'Content Integration' main menu containing import servlets for Wikimedia Dumps, phpbb3 forum imports and OAI-PMH imports
- extended the OAI-PMH test applet and integrated it into the menu. Does still not import OAI-PMH records, but shows that it is able to read and parse this data
- some redesign in ZURL storage: refactoring of access methods, better concurrency, less synchronization
- added a limitation to the LURL metadata database table cache to 20 million entries: this cache was until now not limited and only limited by the available RAM which may have caused a memory-leak-like behavior.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6440 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-31 11:58:06 +00:00
orbiter
8a1046feaa less maximum file size, too many problems with larger size
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6439 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-30 20:21:45 +00:00
orbiter
4240785f20 added anti-alias function for line drawing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-30 15:58:36 +00:00
orbiter
30f108f97d added stub of oai-pmh importer (not working yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-30 15:58:04 +00:00
orbiter
77c99e500f added more control over memory allocation
should avoid some of the OOMs

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-27 15:25:48 +00:00
orbiter
52470d0de4 - fix for xls parser
- fix for image parser
- temporary integration of images as document types in the crawler and indexer for testing of the image parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6435 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-22 22:38:04 +00:00
orbiter
5e8038ac4d - refactoring of blacklists
- refactoring of event origin encoding


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-21 20:14:30 +00:00
orbiter
26fafd85a5 - more refactoring
- fixed problem with parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6433 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-21 15:12:34 +00:00
orbiter
e48f3dfb1e added documentation for new yacy package structure
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6432 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-20 12:05:36 +00:00
orbiter
3528b970d6 - refactoring
- added new experimental (not-yet-working) image parser
- added new test image

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 22:34:44 +00:00
lotus
6414ac9ecf fix for debian int script
http://forum.yacy-websuche.de/viewtopic.php?t=2418

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-19 17:05:15 +00:00
lotus
63e489c5f7 removed win9x scripts because the latest jre has v1.3 for these systems
http://www.java.com/en/download/help/win95.xml

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 09:45:34 +00:00
orbiter
cde1611919 updated junit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 02:52:09 +00:00
orbiter
a8ce192f63 - shifted main classes to new package net.yacy
- fixed some bugs in last commit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 01:38:07 +00:00
orbiter
b79f4f062f refactoring of yacy documents and parsers: they depend now only on the kelondro classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-18 00:53:43 +00:00
hermens
0fd9540866 Configuration of HTTPDProxyHandler logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6425 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-17 14:04:18 +00:00
low012
519c3619ff *) minor changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6424 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-17 00:32:07 +00:00
low012
f5656b2ae1 *) Made sure that only files with appropriate file endings are listed as skin or language files.
*) Introduced protection against directory traversal attacks in configuration servlets for skin and language configuration. Files can only be deleted if they are contained in a list of files which has been read by the servlet first.


Until now it was possible to delete any data on a system YaCy is running on and which can be deleted by the user who's account has been used to start YaCy. Most of the times a user of YaCy is also the owner of the machine the peer is running on, but this might not always be the case and not even the owner of the machine should be able to use YaCy as a replacement for "rm" or "del".

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6423 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-17 00:26:14 +00:00
low012
3434ca381f *) grrr
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6422 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-16 22:17:21 +00:00
low012
ae42c51cf7 *) Skin names and language names are displayed in alphabetical order in dropdown menu now.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6421 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-16 22:16:36 +00:00
suessthomas
56a5bd090d Small fixes to header.template for more XHTML compatibility.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-16 20:31:06 +00:00
f1ori
34c71b22e8 fix and enable parser unit tests (tested with eclipse)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-16 09:33:18 +00:00
orbiter
99683f5f11 small changes to green color and round corners
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6418 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-15 19:39:11 +00:00
orbiter
76bca8cffd show interactive search without menu
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6417 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-15 13:26:14 +00:00
orbiter
3d5eeb842a new default skin 'pdblue'
The old default skin named 'default' is renamed to 'classic-blue'.
All users will keep their current default skin named default, but YaCy will copy the classic-blue also to the skin folder.
For all new peers, the new skin pdblue is used.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-15 12:59:44 +00:00
orbiter
cee7a05ff2 - de-serialized the pdf parser
- added fail callback for file indexer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-15 10:47:29 +00:00
orbiter
9db928ce53 replaced fontbox 0.7.3 with fontbox 0.8.0
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-15 09:51:16 +00:00
orbiter
c2272785c7 - fix for xlsx and pptx parsing
- less exception logging for swf parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6413 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-14 19:15:38 +00:00
suessthomas
afae2a0bee Small changes to the Yacy Skins.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-14 19:11:52 +00:00
lotus
0975b1b493 update for apache poi library
possible solves http://forum.yacy-websuche.de/viewtopic.php?p=17736#p17736

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6411 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-14 15:24:53 +00:00
orbiter
c864901087 - moved httpd.mime to defaults path
- some documentation fixes
- adopted a default setting for the search window: moves css setting to base.css
- some enhancements for the DocumentIndex class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6410 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-14 13:29:09 +00:00
low012
8829ec5f18 *) made sure that   is replaced with a space and not just deleted in CharacterCoding.java
*) added annotations and made minor changes to serverObjects.java
*) set subversion properties for several files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-13 20:57:56 +00:00
orbiter
6c347a37eb more options for DocumentIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6408 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-13 08:43:02 +00:00
orbiter
6192205533 more final modifier
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6407 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-12 21:59:39 +00:00
orbiter
0f6b011e1a fix for new index location and better way to use own classes by reflection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-10-12 21:12:42 +00:00