Commit Graph

194 Commits

Author SHA1 Message Date
orbiter
2e3186189b fix for mediawikiIndex surrogate producer + added concurrency
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5880 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-25 21:52:21 +00:00
orbiter
1b9e532c87 some concurrency for wikipedia dump reader
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5855 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 17:43:27 +00:00
orbiter
16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/

this generates a series of files to DATA/SURROGATES/in

if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 22:12:19 +00:00
orbiter
0b2c98edc9 some more work on the wikipedia-dump exporter (not finished yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 15:19:32 +00:00
f1ori
d93a2a6552 * ignore whitespaces so you can copy&paste signatures better
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5828 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 14:52:42 +00:00
orbiter
fbcbcc5bdb export of yacy document objects as dublin core record in xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5826 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 14:20:12 +00:00
f1ori
44daec7936 * introduce signatures to autoupdate
as long as there aren't publickeys for the updatelocations set,
  no signatures are checked
* wiki-article follows...


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5822 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-17 09:58:06 +00:00
orbiter
8a24350036 - fix for join method with new generalized RWI data structure (caused by latest commit)
- added more functions to mediawiki parser


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-15 10:26:24 +00:00
orbiter
d4d87d90c4 - extended experimental wikipedia dump parser
- removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5790 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-09 14:55:20 +00:00
orbiter
c08f9b36a4 refactoring of wiki parser.
This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5785 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-08 15:28:45 +00:00
orbiter
9da69d6b68 - better selection of files to be merged
- fix for getChannel().close(), which works on windows but not on macs and linux

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5761 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-31 16:49:02 +00:00
orbiter
d39a5b42ca more care about open file handles. Now files also close on windows and can be deleted afterwards.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-31 12:42:12 +00:00
orbiter
96eaecda3e - added migration class to go from index collections to the index cell data structure.
- added better control over file deletion, because this sometimes fails, especially on windows

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-30 15:31:25 +00:00
f1ori
c545fcb9fa * add class to handle keys and signatures
* fix bug in serverCharBuffer
* add build-target to sign tar.gz (run ant dist sign)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5665 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 13:29:50 +00:00
lotus
39a177649b * added upnp listener for devices that do not respond to discovery but advertise themselves
* moved package

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5659 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-28 14:36:23 +00:00
orbiter
c12bb8a6d0 - refactoring of the http client
- added a protection against memory leaks for the access tracker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5621 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-19 16:24:46 +00:00
orbiter
62505bb3cb more bugfixes as recommendet by findbugs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5619 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-17 09:12:47 +00:00
lotus
4aad461100 added UPnP support
YaCy can now automatically forward ports on home routers
off by default

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5609 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-14 13:12:08 +00:00
lotus
e8ae2599fd * some refactoring/moves to consoleInterface
* added possibility to find maximum possible heap size
you can get it via getWin32MaxHeap.bat
this may cause high system load
moreover the found limit is no guarantee for stable startups since it depends on system configuration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-07 11:53:48 +00:00
f1ori
76cdc59789 * added some convertions to and from UTF-8
* this might fix problems on windows systems
  (like http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1824)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5574 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 12:12:07 +00:00
orbiter
94110df85a moved logging partially to kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-31 01:06:56 +00:00
orbiter
024da2916b refactoring of logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 23:33:47 +00:00
orbiter
83ce65707a (almost) completed partition of classes in kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:44:20 +00:00
orbiter
7ee494fde5 more refactoring of kelondro:
- seperated BLOB from table classes
- renamed 'coding' package to 'order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:08:08 +00:00
orbiter
bf93767ec6 refactoring of kelondro database classes
(to be continued)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 15:33:00 +00:00
orbiter
fc27bf8c4c refactoring of kelondro classes:
kelondro shall become independent from other packages.
moved bytebuffer, date and memory to kelondro

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 14:48:11 +00:00
orbiter
e004da48d3 - added fast fingerprint computation for files (any). Will be used in new index dump method
- refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 12:22:13 +00:00
orbiter
47292e696a more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 12:54:16 +00:00
orbiter
b0f2003792 fast database initialization and fast start.up of yacy:
- applied knowledge about concurrent files stream reading and index processing from the wikimedia reader
   to the EcoTable initialization process: the file reader is now concurrent to the index generation
- changed also some initialization processes to avoid some pauses during initialization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-21 23:21:33 +00:00
orbiter
0ca4bc7b79 - added reader and visualization for mediawiki-export files:
files exported from mediawiki using the xml schema according to
http://www.mediawiki.org/xml/export-0.3/
can be processed to be viewed in a YaCy servlet.
To acces such a file, place it into
DATA/HTCACHE/mediawiki/
i.e. the export from german wikipedia would be:
DATA/HTCACHE/mediawiki/wikipedia.de.xml
This file can then be accessed using the URL
http://localhost:8080/mediawiki_p.html?dump=wikipedia.de.xml&title=YaCy
if this is done the first time, an index file is created
(for this case: more than 4 million lines must be written, this takes about 15 minutes)
Then try the same url again.

- enhanced also the md5 computation speed


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-20 18:31:52 +00:00
orbiter
6941bf42b1 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-20 14:07:09 +00:00
danielr
f095137238 - respecting httpdMaxBusySessions (refusing new connections if limit is hit)
- comments in serverBusyThread converted to JavaDoc
- better debug output for npe-case in diskUsage


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 10:53:32 +00:00
orbiter
8ba33f104e fix for npe
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5269 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 21:59:53 +00:00
lotus
9d50bfd0b3 fix for npe: http://forum.yacy-websuche.de/viewtopic.php?p=10562
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5267 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 09:09:53 +00:00
lotus
fe2792e9ce use accept-language header instead of user agent for language detection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5235 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-01 17:47:11 +00:00
orbiter
00c1535f84 added ranking and evaluation of language type in a search
the wanted language is taken from the browser user-agent string

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5192 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-21 00:04:42 +00:00
orbiter
bfcf9b7aa3 - added language detection using metadata from documents: html and odt documents provide this information
- metadata and results from statistical analysis are compared and result is printed out as debug lines
- added ranking profile for wanted language
- added class with ISO 639 table, a list of all valid country codes that will be used for the language identification

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5187 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-19 22:19:11 +00:00
orbiter
536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
- removed distinction between header file types for http and ftp; ftp is simulated by using http properties
- removed all old resourceInfo classes that handled this distinction
- introduced a new distinction between http request and http response objects
- unified new response objects with two other object types that had been introduced elsewhere
- changed all servlet call methods to use the new http request header object type
- divided static object keys for http header properties into request and response types
- refactoring here and there (a large number of type changes and many methods merged/moved)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:11:47 +00:00
orbiter
bdae051d9a - extended new performance graph (better timing)
- added paths for new libraries in classpath for eclipse
- refactoring to remove compiler warnings (static access to finals variables)
- removed some unused import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-13 10:37:53 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
orbiter
50ef5c406f - refactoring of robots parser (removed opaque Objects[] result vector)
- added Allow-component to robots result object

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5016 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-24 11:54:37 +00:00
orbiter
c3d461d191 - removed superfluous copyright statement
- updated my email address

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 17:14:51 +00:00
lotus
5488543b8f disabled disk usage logpoints
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4979 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 07:30:50 +00:00
orbiter
7052f2f61f - added copyright header of ResourceObserver
- commented/removed some code to eliminate code warnings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:40:45 +00:00
orbiter
1400cdc91e - refactoring of resourceObserver (moved it to crawler)
- partly redesign of diskUsage: little bit more functional behavior, less side effects, better error case handling
- the resourceObserver can now show a error message if the diskUsage is 'out of order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4973 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 00:03:37 +00:00
f1ori
b6301a54fa * added class ListDirs to provoid generic listing of directories in systemdirectories and jar-files
* yacy runs, when classes are in a jar-file (->build-jar ant-target)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4971 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-06 14:11:40 +00:00
orbiter
e81be7d4f2 added many missing user-agent declarations for yacy http client connections.
the most important fix was the addition of the yacybot user-agent for robots.txt loading,
because web masters look for that access to see if the crawler behaves correctly.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4968 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-04 11:03:03 +00:00
orbiter
69aac0d74c modified the diskUsage class regarding the following two aspects:
1. The usage and dependency of the plasmaSwitchboad was used many times in the past but this was
a bad mistake. The classes should be independent from the switchboard to support a better abstraction. Therefore the object was removed. The parameters from the switchboard are computed outside and then handed over.
2. the class is considered as a tightly connected to hardware resources. Classes which handle data that cannot be replicated because it would need to replicate hadware should not support dynamic object allocation, but should be coded as collection of private static methods. Therefore all class objects had been transformed into static private objects.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-30 21:47:53 +00:00
danielr
f7f9ceb967 diskUsage: replaced blocking sleep with semaphore
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-26 12:05:12 +00:00
danielr
63eadfdf84 fixed unlimited FileSizeLimit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-24 19:11:27 +00:00
det
609aaf0df3 rework of the windows part
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4943 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 12:13:06 +00:00
det
1a4f26ba30 exclude HTDOCS from recursiv scan
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 10:03:49 +00:00
det
6c07e894d9 add needed sleep
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4941 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-20 09:53:23 +00:00
danielr
6b7e873962 resourceObserver refactoring and some synchronisation for console output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4939 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-19 12:40:44 +00:00
danielr
68c38c2d34 - WatchCrawler shows status without JavaScript
- Performance can be scaled + DHT-profile
- names for pool-threads
- some small refactorings


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 10:24:58 +00:00
det
c0dfe49743 also exclude collection.0028.commons and RANKING at startup check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-12 15:17:01 +00:00
det
11656741f1 exclude LOCALE and RELEASE at startup check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4917 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-12 11:25:25 +00:00
det
0727bb1e63 rework of console message handling; add of debugging output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4914 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-11 18:43:12 +00:00
orbiter
f5ef7f222e - fixed a bug in parser (directory paths had not been recognized)
- no access check when a search is made only local without snippet fetch
- added comment and status message in resourceObserver (this takes very long at startup time!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4911 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-11 09:54:58 +00:00
lotus
ed24eab737 small fix for windows in resource observer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4909 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-10 19:19:54 +00:00
det
6afeb535cd another bugfix for the windows drive check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-07 12:51:07 +00:00
det
b416af7568 bugfix for the windows drive check
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 20:38:09 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
det
f597185026 Initial import of the resource observer framework
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4892 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 13:10:21 +00:00
orbiter
03438ee977 added missing implementation of network-path reference
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4834 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-22 00:08:14 +00:00
lotus
4a48717017 * automatic update for windows
pleas disable before release because 2nd update fails at the moment
and commandline handling has to be improved for windows
* update via new unTar class
please review stream- and exceptionhandling because I'm fairly new to Java
maybe it can be done concurrent
* updated windows startscripts to values from yacy.init

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4832 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-21 15:23:56 +00:00
danielr
d4bce6affd refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-03 09:06:00 +00:00
danielr
7a35126e91 http timeouts von alten httpc wieder gesetzt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4670 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-09 11:02:14 +00:00
danielr
d96e2badc7 - fixed POST in proxy
- prepared http connection tracking
- refactoring (mainly moving StreamTools to serverFileUtils)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4668 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 21:17:40 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
3e44293f07 - fixed a problem with thread pools in row collection
- added a line-viewing feature in threaddump	

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4587 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 14:21:58 +00:00
orbiter
7cc4ff05c9 some code enhancements and bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:48:24 +00:00
orbiter
4fdf695064 - fixed a bug in remote search that prevented that any results had been generated (!)
- added a great number of printStackTrace and new exceptions that shall be used to find the cause
  for a bug in yacy client-server communication which causes the interruption of data transfer
  which then causes the parser bug for the seed strings.
- tried to fix the communication bug on server-side (copy functions)
Be aware that the log may be full of errors and bugs - there should not be more bugs but there is more to see


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 23:12:43 +00:00
orbiter
83860507c9 - added punycode class from gnu idn library
- added parser for international domains in yacyURL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4514 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-26 22:18:40 +00:00
low012
652086159a *) Replaced System.err.println() by logging function. Left System.err.println()s as comments to be able to quickly revert changes since gzip is an application with it's own main method and Orbiter maybe wants to keep it this way.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4505 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-24 19:56:19 +00:00
orbiter
36b898ca7a - tested successfully z-presentation of yacy seed encoding
- added alternative switch that takes shortest representation as yacy seed string encoding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4491 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-17 12:36:43 +00:00
orbiter
4e70dff8cf more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-23 21:09:56 +00:00
borg-0300
3cab85158c update for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-12 00:41:45 +00:00
borg-0300
53367d941a more information (BASE64)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-12 00:24:24 +00:00
orbiter
4dc438f7e7 moved to Java 1.5:
- changed build script to use java 1.5 compiler
- first stept to resolve missing generics definition (about 400 from over 4100 'missing'-warnings)
- added key-iterator to kelondro databases (for rapid from-memory enumerations, will be used for domain name collection, not used yet)

please set your development environment to use java 1.5!


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-27 17:56:59 +00:00
fuchsi
f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers.
- some minor code cleanups (mostly unnecessary casts, null checks)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 04:13:46 +00:00
fuchsi
3352474dd8 Remove grouping separator in Network.xml (yacystats will woork without it) and format a few more numbers.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4163 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-16 13:29:11 +00:00
fuchsi
06e6a1ff62 Add a generalized Formatter class yFormatter inspired by http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
At the current state it allows formatting of numbers (integer + decimal types) for output according to the Locale derived from the language setting in yacy. Network.(html|xml) and Status.html have been changed to use it for now (TODO: should be integrated into other servlets as well to reduce duplicate formatting code).
NOTE: For now the output format for Network.xml simulates the old behaviour which is wrong (it uses '.' as decimal and grouping separator), to make sure external scripts like the yacystats.de one won't break with this update.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4162 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-16 02:12:31 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
c48c9735cb bugfix for property table loader (did not exclude comment lines)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4000 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-23 13:38:17 +00:00
low012
d95a06e2e4 *) cosmetics: fixed a typo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3964 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-14 12:49:11 +00:00
orbiter
36a37f758b fix for oom exception during release download
see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=101&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3950 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-03 22:55:47 +00:00
orbiter
ad1e4aa88e added selection of audio, video, image and application resources
to search procedure. This function can currently not used through the
search interface, but only through remote search.

added accumulation of search attributes to enable the audio, video,
image and application selection.

fixed a problem with external URL representation generation


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3036 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-01 16:21:17 +00:00
orbiter
fb7902aa68 fix for http://www.yacy-forum.de/viewtopic.php?p=26142#26142
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-01 11:01:56 +00:00
orbiter
76fceb9997 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2945 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-09 16:32:34 +00:00
orbiter
b79e06615d - added new LURL.Entry class for next database migration
- refactoring of affected classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 22:25:07 +00:00
orbiter
bcf2b800b4 applied UTF-8 encoding parameter to yacy-internal protocol communication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 13:35:38 +00:00
orbiter
c40fca08a2 fixed bad handling of string separation
you can now use a new encoding attribute to create strings from byte arrays

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 10:21:14 +00:00
orbiter
5a40ea7866 refactoring of wget string list generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-02 09:59:20 +00:00
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
borg-0300
42173462f5 rename cutUrlText to shortenURLString;
other little things;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2635 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-19 20:47:45 +00:00
borg-0300
af1d89e381 check url == null added;
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-19 20:12:26 +00:00
orbiter
6ad471ef96 * applied many compiler warning recommendations
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-16 19:49:31 +00:00