Commit Graph

168 Commits

Author SHA1 Message Date
lotus
7f868ca3c2 resource observer: support for yacyroot\DATA on an NTFS hardlink (Windows)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6162 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-07-02 10:02:20 +00:00
orbiter
1f1399e5c5 extending visibility of objects and methods to avoid synthetic accessor methods and increase performance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6156 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 13:25:46 +00:00
orbiter
154bbc3364 code cleanup: call of static methods directly to the class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6155 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 13:01:35 +00:00
orbiter
222850414e simplification of the code: removed unused classes, methods and variables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6154 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-30 09:27:46 +00:00
apfelmaennchen
a10c8022d1 DidYouMean:
- limit the number of consumer threads to available CPUs
- added some javadoc

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6144 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-27 07:23:34 +00:00
orbiter
fd31a3616a - more logging in server process
- fix for bas ascii in comment

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6084 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-16 15:10:59 +00:00
orbiter
ce1adf9955 serialized all logging using concurrency:
high-performance search query situations as seen in yacy-metager integration showed deadlock situation caused by synchronization effects inside of sun.java code. It appears that the logger is not completely safe against deadlock situations in concurrent calls of the logger. One possible solution would be a outside-synchronization with 'synchronized' statements, but that would further apply blocking on all high-efficient methods that call the logger. It is much better to do a non-blocking hand-over of logging lines and work off log entries with a concurrent log writer. This also disconnects IO operations from logging, which can also cause IO operation when a log is written to a file. This commit not only moves the logger from kelondro to yacy.logging, it also inserts the concurrency methods to realize non-blocking logging.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-15 21:19:54 +00:00
apfelmaennchen
39779e4796 DidYouMean: as I moved to only 8 consumer and 4 producer threads, I removed poison pills as it does not make sense anymore - threads are interrupted directly. Having a consumer thread per test case just didn't make sense either (see svn 6070) due to the massive overhead.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-14 16:31:31 +00:00
apfelmaennchen
c3c4dd0933 DidYouMean - changed to much simpler LinkedBlockingQueue
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-14 15:25:57 +00:00
apfelmaennchen
01ac1b5d7e - blocking queue implementation of DidYouMean
- timeout ist set to 500ms

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-14 11:53:09 +00:00
orbiter
b8bb1bb364 join with a timeout does not cause that the corresponding thread is stopped after the time-out. It does only cause that the waiting is stopped. Here we need additionally a signal to the thread to stop after we finished waiting.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:54:52 +00:00
orbiter
b69f22e9ca mistake in last commit: computation of loops in ReversingTwoConsecutiveLetters
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:37:51 +00:00
orbiter
3130334932 - start first with threads that run more loops
- join first with threads that run less loops

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6067 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:34:16 +00:00
apfelmaennchen
6cde7ebf16 DidYouMean
- without I/O intensive sorting by count
- but with multiple threads

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6066 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 23:16:14 +00:00
orbiter
7c4d1d471c hand-over of more specific object
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 10:22:25 +00:00
apfelmaennchen
09acfa66d1 - improved "did you mean"
- added &meanCount= to query string
- &meanCount=0 ==> no suggestion, no performance loss
- sorting suggestions by sb.indexSegment.termIndex().count()

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6059 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-13 06:20:05 +00:00
apfelmaennchen
da6ce37f7b - fixed encoding problem
- added limit to 10 suggestions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-12 21:36:26 +00:00
apfelmaennchen
54a48b4184 - added "did you mean" to search page
- currently works for single word queries only!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-12 20:36:03 +00:00
orbiter
bead0006da replaced tmp file extensions by prt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 18:09:58 +00:00
orbiter
89aeb318d3 enhanced the wikimedia dump import process
enhanced the wiki parser and condenser speed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5931 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 10:36:13 +00:00
orbiter
5fb77116c6 added a submenu to index administration to import a wikimedia dump (i.e. a dump from wikipedia) into the YaCy index: see
http://localhost:8080/IndexImportWikimedia_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 07:54:10 +00:00
orbiter
c097531e3d added a catch Exception to all thread to check if any of them silently dies without any other notification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 06:31:35 +00:00
orbiter
9c6ac43f66 fixes for wiki parser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5905 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-30 22:03:35 +00:00
orbiter
d079d6dfdb small changes in surrogate reader, wiki code and portal test
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5894 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-27 20:30:43 +00:00
orbiter
2e3186189b fix for mediawikiIndex surrogate producer + added concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 21:52:21 +00:00
orbiter
1b9e532c87 some concurrency for wikipedia dump reader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 17:43:27 +00:00
orbiter
16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/

this generates a series of files to DATA/SURROGATES/in

if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 22:12:19 +00:00
orbiter
0b2c98edc9 some more work on the wikipedia-dump exporter (not finished yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 15:19:32 +00:00
f1ori
d93a2a6552 * ignore whitespaces so you can copy&paste signatures better
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 14:52:42 +00:00
orbiter
fbcbcc5bdb export of yacy document objects as dublin core record in xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 14:20:12 +00:00
f1ori
44daec7936 * introduce signatures to autoupdate
as long as there aren't publickeys for the updatelocations set,
  no signatures are checked
* wiki-article follows...


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 09:58:06 +00:00
orbiter
8a24350036 - fix for join method with new generalized RWI data structure (caused by latest commit)
- added more functions to mediawiki parser


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-15 10:26:24 +00:00
orbiter
d4d87d90c4 - extended experimental wikipedia dump parser
- removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-09 14:55:20 +00:00
orbiter
c08f9b36a4 refactoring of wiki parser.
This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-08 15:28:45 +00:00
orbiter
9da69d6b68 - better selection of files to be merged
- fix for getChannel().close(), which works on windows but not on macs and linux

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5761 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-31 16:49:02 +00:00
orbiter
d39a5b42ca more care about open file handles. Now files also close on windows and can be deleted afterwards.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-31 12:42:12 +00:00
orbiter
96eaecda3e - added migration class to go from index collections to the index cell data structure.
- added better control over file deletion, because this sometimes fails, especially on windows

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-30 15:31:25 +00:00
f1ori
c545fcb9fa * add class to handle keys and signatures
* fix bug in serverCharBuffer
* add build-target to sign tar.gz (run ant dist sign)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 13:29:50 +00:00
lotus
39a177649b * added upnp listener for devices that do not respond to discovery but advertise themselves
* moved package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-28 14:36:23 +00:00
orbiter
c12bb8a6d0 - refactoring of the http client
- added a protection against memory leaks for the access tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-19 16:24:46 +00:00
orbiter
62505bb3cb more bugfixes as recommendet by findbugs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5619 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-17 09:12:47 +00:00
lotus
4aad461100 added UPnP support
YaCy can now automatically forward ports on home routers
off by default

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-14 13:12:08 +00:00
lotus
e8ae2599fd * some refactoring/moves to consoleInterface
* added possibility to find maximum possible heap size
you can get it via getWin32MaxHeap.bat
this may cause high system load
moreover the found limit is no guarantee for stable startups since it depends on system configuration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-07 11:53:48 +00:00
f1ori
76cdc59789 * added some convertions to and from UTF-8
* this might fix problems on windows systems
  (like http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1824)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5574 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 12:12:07 +00:00
orbiter
94110df85a moved logging partially to kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-31 01:06:56 +00:00
orbiter
024da2916b refactoring of logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 23:33:47 +00:00
orbiter
83ce65707a (almost) completed partition of classes in kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:44:20 +00:00
orbiter
7ee494fde5 more refactoring of kelondro:
- seperated BLOB from table classes
- renamed 'coding' package to 'order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:08:08 +00:00
orbiter
bf93767ec6 refactoring of kelondro database classes
(to be continued)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 15:33:00 +00:00
orbiter
fc27bf8c4c refactoring of kelondro classes:
kelondro shall become independent from other packages.
moved bytebuffer, date and memory to kelondro

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 14:48:11 +00:00