Commit Graph

96 Commits

Author SHA1 Message Date
orbiter
efcd95dc37 simplification of (internal) query process / refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5671 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-06 15:53:20 +00:00
orbiter
aa44d9bad9 more refactoring of kelondro.text / deleted de.anomic.index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5664 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-02 11:04:13 +00:00
orbiter
76ef5f0f14 refactoring of index package: better names for the classes (to be continued)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-03-01 23:58:14 +00:00
orbiter
c25c334b75 replaced old DHT transmission method with new method. Many things have changed! some of them:
- after a index selection is made, the index is splitted into its vertical components
- from differrent index selctions the splitted components can be accumulated before they are placed into the transmission queue
- each splitted chunk gets its own transmission thread
- multiple transmission threads are started concurrently
- the process can be monitored with the blocking queue servlet
To implement that, a new package de.anomic.yacy.dht was created. Some old files have been removed.
The new index distribution model using a vertical DHT was implemented. An abstraction of this model
is implemented in the new dht package as interface. The freeworld network has now a configuration
of two vertial partitions; sixteen partitions are planned and will be configured if the process is bug-free.
This modification has three main targets:
- enhance the DHT transmission speed
- with a vertical DHT, a search will speed up. With two partitions, two times. With sixteen, sixteen times.
- the vertical DHT will apply a semi-dht for URLs, and peers will receive a fraction of the overall URLs they received before.
  with two partitions, the fractions will be halve. With sixteen partitions, a 1/16 of the previous number of URLs.
BE CAREFULL, THIS IS A MAJOR CODE CHANGE, POSSIBLY FULL OF BUGS AND HARMFUL THINGS.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-10 00:06:59 +00:00
apfelmaennchen
35a5116606 - another update to Bookmarks.html
- only calculate tags and folders if display = 0

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5556 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-01 16:26:11 +00:00
apfelmaennchen
ab1a09ab95 small addition to last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-01 16:13:35 +00:00
apfelmaennchen
416d16e026 - partial work around for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1815#p12526
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5554 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-01 15:51:35 +00:00
orbiter
024da2916b refactoring of logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 23:33:47 +00:00
orbiter
7ee494fde5 more refactoring of kelondro:
- seperated BLOB from table classes
- renamed 'coding' package to 'order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:08:08 +00:00
orbiter
bf93767ec6 refactoring of kelondro database classes
(to be continued)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 15:33:00 +00:00
orbiter
fc27bf8c4c refactoring of kelondro classes:
kelondro shall become independent from other packages.
moved bytebuffer, date and memory to kelondro

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 14:48:11 +00:00
apfelmaennchen
3dc208fad0 bugfix: bookmarks can now handle folder names like /news and /newspaper without getting confused...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5470 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-11 19:39:51 +00:00
lotus
6569cbbec1 npe fix: http://forum.yacy-websuche.de/viewtopic.php?t=1646
(break to avoid bad side effects)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5394 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-16 20:53:31 +00:00
orbiter
d39d420b39 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 15:38:29 +00:00
lotus
029e16b653 replaced some put(String, String) by putHTML(String, String) on serverObjects respond
in htroot/ root
didn't touch htroot/xml/
this should solve potential xss issues

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5184 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-19 11:45:11 +00:00
orbiter
536e77e8b7 modifications towards a single database operation to read/write http header and cached file at once:
- removed distinction between header file types for http and ftp; ftp is simulated by using http properties
- removed all old resourceInfo classes that handled this distinction
- introduced a new distinction between http request and http response objects
- unified new response objects with two other object types that had been introduced elsewhere
- changed all servlet call methods to use the new http request header object type
- divided static object keys for http header properties into request and response types
- refactoring here and there (a large number of type changes and many methods merged/moved)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5079 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-25 18:11:47 +00:00
apfelmaennchen
58d7e6f1a6 - some small, rather optical changes to bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-22 18:19:21 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
apfelmaennchen
e1574fe02e - added autoReCrawl folders to bookmarks (DATA/SETTINGS/autoReCrawl.conf)
- the serverBusyThread checks folders every 60 min. (==> autoReCrawl_idlesleep in yacy.conf)
- added option to create bookmarks from CrawlStart URL

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-04 20:43:36 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
orbiter
c3d461d191 - removed superfluous copyright statement
- updated my email address

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 17:14:51 +00:00
orbiter
3ca98fee42 removed superfluous copyright statement
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5010 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 00:21:07 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
orbiter
d2ba1fd2ab major step forward to network switching (target is easy switch to intranet or other networks .. and back)
This change is inspired by the need to see a network connected to the index it creates in a indexing team.
It is not possible to divide the network and the index. Therefore all control files for the network was moved to the network within the INDEX/<network-name> subfolder.
The remaining YACYDB is superfluous and can be deleted.
The yacyDB and yacyNews data structures are now part of plasmaWordIndex. Therefore all methods, using static access to yacySeedDB had to be rewritten. A special problem had been all the port forwarding methods which had been tightly mixed with seed construction. It was not possible to move the port forwarding functions to the place, meaning and usage of plasmaWordIndex. Therefore the port forwarding had been deleted (I guess nobody used it and it can be simulated by methods outside of YaCy).
The mySeed.txt is automatically moved to the current network position. A new effect causes that every network will create a different local seed file, which is ok, since the seed identifies the peer only against the network (it is the purpose of the seed hash to give a peer a location within the DHT).
No other functional change has been made. The next steps to enable network switcing are:
- shift of crawler tables from PLASMADB into the network (crawls are also network-specific)
- possibly shift of plasmaWordIndex code into yacy package (index management is network-specific)
- servlet to switch networks 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-05 23:13:47 +00:00
danielr
d4bce6affd refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-03 09:06:00 +00:00
orbiter
e024e3b9cf added new default profiles to distinguish snippet fetch for local and global search
the difference is, that a local search will no not cause a re-indexing of loaded pages

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4731 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 08:42:08 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
7f9f639d20 - refactoring and abstraction of index reference (urls) handling: blacklisting is part of reference filtering
- refactoring of word/phrase handling: word abstraction from condenser becomes part of index element handling
- removed unused code parts from condenser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4603 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 15:37:49 +00:00
orbiter
d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 14:13:05 +00:00
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
a8a5df4a51 - more dublin core naming of page metadata
- better presentation of result counters in search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-30 21:58:30 +00:00
orbiter
9d693ee635 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 16:41:09 +00:00
apfelmaennchen
d3ce9ebffe added tooltips to folder tree
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-26 11:35:23 +00:00
apfelmaennchen
cb91cc5c7c fixed import Netscape Bookmarks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-25 19:23:15 +00:00
apfelmaennchen
aa53a46937 adjusted code for getFolderList() and cleanTagsString()
added input for folders in add/edit bookmark

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-24 20:14:31 +00:00
low012
41a3ff8ccc *) removed unused imports
*) some generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4374 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-23 00:10:15 +00:00
low012
c0fbab9cca *) heading, trailing and double commas are removed since they are unnecessary
*) trailing and double slashs in paths are removed, they are not only ugly, but also caused infinite loops

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 23:59:15 +00:00
low012
089faf1a00 *) added login link at bottom of page
*) empty tags will not be displayed any longer

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4372 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 23:14:57 +00:00
orbiter
634430c48a - more logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 20:44:12 +00:00
apfelmaennchen
d288987a93 replaced isEmpty() with equals("")
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 20:19:34 +00:00
apfelmaennchen
a870ac32b8 reorganized code and added folders; this update might have overwritten latest changes by orbiter - sorry!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 19:39:15 +00:00
orbiter
efd0b8371a - added parsing of Dublin Core - compliant metadata (see RFC 5013 and ISO 15836) to html parser
- refactoring of plasmaParserDocument to use Dublin Core - compatible property names
- redesign of url handling in parser and condenser (less String-to-yacyURL conversion)
- more generics

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 11:51:43 +00:00
apfelmaennchen
b9dd597e97 - einem Komma folgende Leerzeichen bei der Eingabe von Tags werden gelöscht
- Anpassungen für die Auswahlbox-Option "selected"
- Einschränkung der Tagwolke aus Tag-Selektion
- Rundung von font-size auf zwei Nachkommastellen

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-09 16:02:31 +00:00
daburna
214e37b6e3 added changes made by apfelmaennchen
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4307 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-07 20:01:53 +00:00
orbiter
03e7782269 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-06 19:23:38 +00:00
fuchsi
3c30c2da75 more cleanup and API consistency changes, more to come...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-19 13:29:50 +00:00
orbiter
c527969185 - enhanced monitoring of ranking parameters
for details, please try http://localhost:8080/IndexControlRWIs_p.html
- fixed computation of ranking ordering in some cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 14:48:09 +00:00
fuchsi
0e1738899f * Complete number localization and provide a more reasonable interface to serverObjects:
- put(key, value) methods are now used if a value added to the map should be kept as it is. Numbers are transformed (but not formatted) to an equivalent String representation.
- putASIS(...) have been removed, now done with simple put(...) (see above).
- puNum(...) can be used for number values which should be stored in a formatted way, either depending on the current locale setting for yacy (default) or in a "none" locale (see javadocs and setLocalize()).
- putHTML(...) escapes special characters into corresponding HTML enities ('<' => '&lt;') which was done with put(...) before and so was called too often, becauses it is necessary only for very few cases. Additionally there is a "forXML" mode which only replaces < > & ".
In short: Use put(...) for almost everything, use putXY(...) if you need some special transformation of the value.
A few bugs have been fixed as well, and there should be a small performance improvement for complex pages with a lot of values.

* added additional Sum/Avg rows to access tracker pages, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=456
* removed duplicate code (mostly related to the big changes above).

TODO:
- make sure, number formats work as expected _everywhere_, report overseen stuff http://forum.yacy-websuche.de/viewtopic.php?f=5&t=437
- probably a good idea to add special putDate() methods as they are used in many pages and create duplicated formatting code + maybe some centralized handling for memory value formatting.
- further improve the speed of page creation for the WatchCrawler.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4178 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-24 21:38:19 +00:00
low012
e2f3268c13 *) removed double encoding (http://forum.yacy-websuche.de/viewtopic.php?t=368)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4138 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-03 20:13:32 +00:00