Commit Graph

218 Commits

Author SHA1 Message Date
lotus
7f868ca3c2 resource observer: support for yacyroot\DATA on an NTFS hardlink (Windows)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6162 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-02 10:02:20 +00:00
orbiter
1f1399e5c5 extending visibility of objects and methods to avoid synthetic accessor methods and increase performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 13:25:46 +00:00
orbiter
154bbc3364 code cleanup: call of static methods directly to the class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 13:01:35 +00:00
orbiter
222850414e simplification of the code: removed unused classes, methods and variables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 09:27:46 +00:00
apfelmaennchen
a10c8022d1 DidYouMean:
- limit the number of consumer threads to available CPUs
- added some javadoc

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6144 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-27 07:23:34 +00:00
orbiter
fd31a3616a - more logging in server process
- fix for bas ascii in comment

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-16 15:10:59 +00:00
orbiter
ce1adf9955 serialized all logging using concurrency:
high-performance search query situations as seen in yacy-metager integration showed deadlock situation caused by synchronization effects inside of sun.java code. It appears that the logger is not completely safe against deadlock situations in concurrent calls of the logger. One possible solution would be a outside-synchronization with 'synchronized' statements, but that would further apply blocking on all high-efficient methods that call the logger. It is much better to do a non-blocking hand-over of logging lines and work off log entries with a concurrent log writer. This also disconnects IO operations from logging, which can also cause IO operation when a log is written to a file. This commit not only moves the logger from kelondro to yacy.logging, it also inserts the concurrency methods to realize non-blocking logging.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-15 21:19:54 +00:00
apfelmaennchen
39779e4796 DidYouMean: as I moved to only 8 consumer and 4 producer threads, I removed poison pills as it does not make sense anymore - threads are interrupted directly. Having a consumer thread per test case just didn't make sense either (see svn 6070) due to the massive overhead.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-14 16:31:31 +00:00
apfelmaennchen
c3c4dd0933 DidYouMean - changed to much simpler LinkedBlockingQueue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-14 15:25:57 +00:00
apfelmaennchen
01ac1b5d7e - blocking queue implementation of DidYouMean
- timeout ist set to 500ms

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-14 11:53:09 +00:00
orbiter
b8bb1bb364 join with a timeout does not cause that the corresponding thread is stopped after the time-out. It does only cause that the waiting is stopped. Here we need additionally a signal to the thread to stop after we finished waiting.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:54:52 +00:00
orbiter
b69f22e9ca mistake in last commit: computation of loops in ReversingTwoConsecutiveLetters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:37:51 +00:00
orbiter
3130334932 - start first with threads that run more loops
- join first with threads that run less loops

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:34:16 +00:00
apfelmaennchen
6cde7ebf16 DidYouMean
- without I/O intensive sorting by count
- but with multiple threads

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6066 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:16:14 +00:00
orbiter
7c4d1d471c hand-over of more specific object
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 10:22:25 +00:00
apfelmaennchen
09acfa66d1 - improved "did you mean"
- added &meanCount= to query string
- &meanCount=0 ==> no suggestion, no performance loss
- sorting suggestions by sb.indexSegment.termIndex().count()

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 06:20:05 +00:00
apfelmaennchen
da6ce37f7b - fixed encoding problem
- added limit to 10 suggestions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-12 21:36:26 +00:00
apfelmaennchen
54a48b4184 - added "did you mean" to search page
- currently works for single word queries only!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-12 20:36:03 +00:00
orbiter
bead0006da replaced tmp file extensions by prt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 18:09:58 +00:00
orbiter
89aeb318d3 enhanced the wikimedia dump import process
enhanced the wiki parser and condenser speed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 10:36:13 +00:00
orbiter
5fb77116c6 added a submenu to index administration to import a wikimedia dump (i.e. a dump from wikipedia) into the YaCy index: see
http://localhost:8080/IndexImportWikimedia_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 07:54:10 +00:00
orbiter
c097531e3d added a catch Exception to all thread to check if any of them silently dies without any other notification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 06:31:35 +00:00
orbiter
9c6ac43f66 fixes for wiki parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-30 22:03:35 +00:00
orbiter
d079d6dfdb small changes in surrogate reader, wiki code and portal test
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5894 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-27 20:30:43 +00:00
orbiter
2e3186189b fix for mediawikiIndex surrogate producer + added concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 21:52:21 +00:00
orbiter
1b9e532c87 some concurrency for wikipedia dump reader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 17:43:27 +00:00
orbiter
16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/

this generates a series of files to DATA/SURROGATES/in

if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 22:12:19 +00:00
orbiter
0b2c98edc9 some more work on the wikipedia-dump exporter (not finished yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 15:19:32 +00:00
f1ori
d93a2a6552 * ignore whitespaces so you can copy&paste signatures better
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 14:52:42 +00:00
orbiter
fbcbcc5bdb export of yacy document objects as dublin core record in xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 14:20:12 +00:00
f1ori
44daec7936 * introduce signatures to autoupdate
as long as there aren't publickeys for the updatelocations set,
  no signatures are checked
* wiki-article follows...


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 09:58:06 +00:00
orbiter
8a24350036 - fix for join method with new generalized RWI data structure (caused by latest commit)
- added more functions to mediawiki parser


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-15 10:26:24 +00:00
orbiter
d4d87d90c4 - extended experimental wikipedia dump parser
- removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-09 14:55:20 +00:00
orbiter
c08f9b36a4 refactoring of wiki parser.
This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-08 15:28:45 +00:00
orbiter
9da69d6b68 - better selection of files to be merged
- fix for getChannel().close(), which works on windows but not on macs and linux

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5761 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-31 16:49:02 +00:00
orbiter
d39a5b42ca more care about open file handles. Now files also close on windows and can be deleted afterwards.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-31 12:42:12 +00:00
orbiter
96eaecda3e - added migration class to go from index collections to the index cell data structure.
- added better control over file deletion, because this sometimes fails, especially on windows

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-30 15:31:25 +00:00
f1ori
c545fcb9fa * add class to handle keys and signatures
* fix bug in serverCharBuffer
* add build-target to sign tar.gz (run ant dist sign)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 13:29:50 +00:00
lotus
39a177649b * added upnp listener for devices that do not respond to discovery but advertise themselves
* moved package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-28 14:36:23 +00:00
orbiter
c12bb8a6d0 - refactoring of the http client
- added a protection against memory leaks for the access tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-19 16:24:46 +00:00
orbiter
62505bb3cb more bugfixes as recommendet by findbugs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5619 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-17 09:12:47 +00:00
lotus
4aad461100 added UPnP support
YaCy can now automatically forward ports on home routers
off by default

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-14 13:12:08 +00:00
lotus
e8ae2599fd * some refactoring/moves to consoleInterface
* added possibility to find maximum possible heap size
you can get it via getWin32MaxHeap.bat
this may cause high system load
moreover the found limit is no guarantee for stable startups since it depends on system configuration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-07 11:53:48 +00:00
f1ori
76cdc59789 * added some convertions to and from UTF-8
* this might fix problems on windows systems
  (like http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1824)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5574 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 12:12:07 +00:00
orbiter
94110df85a moved logging partially to kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-31 01:06:56 +00:00
orbiter
024da2916b refactoring of logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 23:33:47 +00:00
orbiter
83ce65707a (almost) completed partition of classes in kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:44:20 +00:00
orbiter
7ee494fde5 more refactoring of kelondro:
- seperated BLOB from table classes
- renamed 'coding' package to 'order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:08:08 +00:00
orbiter
bf93767ec6 refactoring of kelondro database classes
(to be continued)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 15:33:00 +00:00
orbiter
fc27bf8c4c refactoring of kelondro classes:
kelondro shall become independent from other packages.
moved bytebuffer, date and memory to kelondro

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 14:48:11 +00:00
orbiter
e004da48d3 - added fast fingerprint computation for files (any). Will be used in new index dump method
- refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 12:22:13 +00:00
orbiter
47292e696a more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 12:54:16 +00:00
orbiter
b0f2003792 fast database initialization and fast start.up of yacy:
- applied knowledge about concurrent files stream reading and index processing from the wikimedia reader
   to the EcoTable initialization process: the file reader is now concurrent to the index generation
- changed also some initialization processes to avoid some pauses during initialization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-21 23:21:33 +00:00
orbiter
0ca4bc7b79 - added reader and visualization for mediawiki-export files:
files exported from mediawiki using the xml schema according to
http://www.mediawiki.org/xml/export-0.3/
can be processed to be viewed in a YaCy servlet.
To acces such a file, place it into
DATA/HTCACHE/mediawiki/
i.e. the export from german wikipedia would be:
DATA/HTCACHE/mediawiki/wikipedia.de.xml
This file can then be accessed using the URL
http://localhost:8080/mediawiki_p.html?dump=wikipedia.de.xml&title=YaCy
if this is done the first time, an index file is created
(for this case: more than 4 million lines must be written, this takes about 15 minutes)
Then try the same url again.

- enhanced also the md5 computation speed


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 18:31:52 +00:00
orbiter
6941bf42b1 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-20 14:07:09 +00:00
danielr
f095137238 - respecting httpdMaxBusySessions (refusing new connections if limit is hit)
- comments in serverBusyThread converted to JavaDoc
- better debug output for npe-case in diskUsage


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 10:53:32 +00:00
orbiter
8ba33f104e fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5269 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 21:59:53 +00:00
lotus
9d50bfd0b3 fix for npe: http://forum.yacy-websuche.de/viewtopic.php?p=10562
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5267 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 09:09:53 +00:00
lotus
fe2792e9ce use accept-language header instead of user agent for language detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-01 17:47:11 +00:00
orbiter
00c1535f84 added ranking and evaluation of language type in a search
the wanted language is taken from the browser user-agent string

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 00:04:42 +00:00
orbiter
bfcf9b7aa3 - added language detection using metadata from documents: html and odt documents provide this information
- metadata and results from statistical analysis are compared and result is printed out as debug lines
- added ranking profile for wanted language
- added class with ISO 639 table, a list of all valid country codes that will be used for the language identification

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-19 22:19:11 +00:00
orbiter
536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
- removed distinction between header file types for http and ftp; ftp is simulated by using http properties
- removed all old resourceInfo classes that handled this distinction
- introduced a new distinction between http request and http response objects
- unified new response objects with two other object types that had been introduced elsewhere
- changed all servlet call methods to use the new http request header object type
- divided static object keys for http header properties into request and response types
- refactoring here and there (a large number of type changes and many methods merged/moved)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:11:47 +00:00
orbiter
bdae051d9a - extended new performance graph (better timing)
- added paths for new libraries in classpath for eclipse
- refactoring to remove compiler warnings (static access to finals variables)
- removed some unused import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-13 10:37:53 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
orbiter
50ef5c406f - refactoring of robots parser (removed opaque Objects[] result vector)
- added Allow-component to robots result object

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-24 11:54:37 +00:00
orbiter
c3d461d191 - removed superfluous copyright statement
- updated my email address

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 17:14:51 +00:00
lotus
5488543b8f disabled disk usage logpoints
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 07:30:50 +00:00
orbiter
7052f2f61f - added copyright header of ResourceObserver
- commented/removed some code to eliminate code warnings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:40:45 +00:00
orbiter
1400cdc91e - refactoring of resourceObserver (moved it to crawler)
- partly redesign of diskUsage: little bit more functional behavior, less side effects, better error case handling
- the resourceObserver can now show a error message if the diskUsage is 'out of order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:03:37 +00:00
f1ori
b6301a54fa * added class ListDirs to provoid generic listing of directories in systemdirectories and jar-files
* yacy runs, when classes are in a jar-file (->build-jar ant-target)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-06 14:11:40 +00:00
orbiter
e81be7d4f2 added many missing user-agent declarations for yacy http client connections.
the most important fix was the addition of the yacybot user-agent for robots.txt loading,
because web masters look for that access to see if the crawler behaves correctly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-04 11:03:03 +00:00
orbiter
69aac0d74c modified the diskUsage class regarding the following two aspects:
1. The usage and dependency of the plasmaSwitchboad was used many times in the past but this was
a bad mistake. The classes should be independent from the switchboard to support a better abstraction. Therefore the object was removed. The parameters from the switchboard are computed outside and then handed over.
2. the class is considered as a tightly connected to hardware resources. Classes which handle data that cannot be replicated because it would need to replicate hadware should not support dynamic object allocation, but should be coded as collection of private static methods. Therefore all class objects had been transformed into static private objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 21:47:53 +00:00
danielr
f7f9ceb967 diskUsage: replaced blocking sleep with semaphore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-26 12:05:12 +00:00
danielr
63eadfdf84 fixed unlimited FileSizeLimit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-24 19:11:27 +00:00
det
609aaf0df3 rework of the windows part
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4943 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 12:13:06 +00:00
det
1a4f26ba30 exclude HTDOCS from recursiv scan
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 10:03:49 +00:00
det
6c07e894d9 add needed sleep
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 09:53:23 +00:00
danielr
6b7e873962 resourceObserver refactoring and some synchronisation for console output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4939 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-19 12:40:44 +00:00
danielr
68c38c2d34 - WatchCrawler shows status without JavaScript
- Performance can be scaled + DHT-profile
- names for pool-threads
- some small refactorings


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 10:24:58 +00:00
det
c0dfe49743 also exclude collection.0028.commons and RANKING at startup check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-12 15:17:01 +00:00
det
11656741f1 exclude LOCALE and RELEASE at startup check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4917 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-12 11:25:25 +00:00
det
0727bb1e63 rework of console message handling; add of debugging output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4914 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-11 18:43:12 +00:00
orbiter
f5ef7f222e - fixed a bug in parser (directory paths had not been recognized)
- no access check when a search is made only local without snippet fetch
- added comment and status message in resourceObserver (this takes very long at startup time!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4911 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-11 09:54:58 +00:00
lotus
ed24eab737 small fix for windows in resource observer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-10 19:19:54 +00:00
det
6afeb535cd another bugfix for the windows drive check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-07 12:51:07 +00:00
det
b416af7568 bugfix for the windows drive check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 20:38:09 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
det
f597185026 Initial import of the resource observer framework
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 13:10:21 +00:00
orbiter
03438ee977 added missing implementation of network-path reference
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-22 00:08:14 +00:00
lotus
4a48717017 * automatic update for windows
pleas disable before release because 2nd update fails at the moment
and commandline handling has to be improved for windows
* update via new unTar class
please review stream- and exceptionhandling because I'm fairly new to Java
maybe it can be done concurrent
* updated windows startscripts to values from yacy.init

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4832 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-21 15:23:56 +00:00
danielr
d4bce6affd refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-03 09:06:00 +00:00
danielr
7a35126e91 http timeouts von alten httpc wieder gesetzt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 11:02:14 +00:00
danielr
d96e2badc7 - fixed POST in proxy
- prepared http connection tracking
- refactoring (mainly moving StreamTools to serverFileUtils)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 21:17:40 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
3e44293f07 - fixed a problem with thread pools in row collection
- added a line-viewing feature in threaddump	

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4587 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 14:21:58 +00:00
orbiter
7cc4ff05c9 some code enhancements and bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:48:24 +00:00
orbiter
4fdf695064 - fixed a bug in remote search that prevented that any results had been generated (!)
- added a great number of printStackTrace and new exceptions that shall be used to find the cause
  for a bug in yacy client-server communication which causes the interruption of data transfer
  which then causes the parser bug for the seed strings.
- tried to fix the communication bug on server-side (copy functions)
Be aware that the log may be full of errors and bugs - there should not be more bugs but there is more to see


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 23:12:43 +00:00
orbiter
83860507c9 - added punycode class from gnu idn library
- added parser for international domains in yacyURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4514 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-26 22:18:40 +00:00